How to Use Natural Language Processing Analytics for Ecommerce
March 4, 2021
March 4, 2021
Whether it’s product reviews, satisfaction surveys, or email/SMS responses, collecting text data from your customers is an invaluable source of information.
One way we like to collect customer feedback is directly after a purchase, as this helps us gain insight into the purchasing experience (versus a review of the product). The typical question we ask post-purchase is:
“On a scale of 1 – 10, how likely are you to recommend shopping on domain.com to a friend?”
(1 being “Very Unlikely” and 10 being “Highly Likely”)
This is a slight variation of the typical Net Promoter question for assigning a customer feedback score, but both variations are ways to gauge customer satisfaction and brand loyalty.
There’s also the option to leave a comment along with the rating, which many customers do, helping provide additional insight into the selected rating. This text data is now a real-time information source, highlighting the voice of the customer and their personal experience. Utilizing the comment on an individual basis is helpful for things like addressing solvable customer service issues and thanking customers for their purchase.
However, the real power comes in surfacing trends that may be causing a high volume of positive or poor experiences–and then taking action to either perpetuate or stop them from occurring in the future. Clearly, we need a tactic that will help us surface those insights.
In situations like this, we utilize natural language processing (NLP) to mine the text responses. Technically defined, it’s the automatic computational processing of human language. It might sound complicated, but it’s actually fairly straightforward–and knowing how to do NLP properly will provide you with some amazing discoveries about your customers’ pain points and preferences.
There are a few ways that machine learning and natural language processing can create surprising value for your ecommerce business. Below, using NLP modeling project examples to illustrate how you might go about your own analysis, we'll break down the basics of NLP and its applications.
The first and most obvious starting point is the words that most commonly occur in the comments.
The word cloud below highlights examples of what some high-frequency options might look like:
The words “service,” “shipping,” “customer,” “price,” and “website” are the highest volume, giving us a starting point.
We can filter down to the comments containing those words in order to better gain more context around what each customer was really referring to. It’s likely that some themes will emerge around shipping — for example, people might like that it’s free or fast.
Utilizing bigrams and trigrams, as well as individual word frequencies, is an easy way to gain insight and context into the trends without having to look at every individual comment itself.
Another useful tactic is to categorize the customer feedback as positive or negative (or neutral) utilizing customer sentiment analysis. “Sentiment analysis” essentially utilizes a library of words with positive or negative values attached to them to algorithmically classify the text in the comments.
Let’s look at this sentence to see how it works in practice: "I was happy to take my dog on a walk, but bummed when it started to rain.’
The algorithm would add scores for each word in the sentence that is also in the sentiment library to designate an overall sentiment score for the sentence.
For our example above: “Happy” = +5, “Bummed” = -3, “it” = 0, etc.
Although both positive and negative words both appeared, the sentiment positive score was higher, giving the sentence an overall score of 2. While it can be useful to score overall comments as positive or negative, we typically utilize sentiment analysis to assess the most common feelings being expressed in the comments.
Below is a graph of results from a sample use case of sentiment analysis. It shows the highest frequency words classified as negative that occurred in the post-purchase rating comments:
By summarizing the comments into overlying trends, we are able to dig deeper into the most common sentiments expressed. However, we still can’t determine exactly why the sentiment is being expressed. For instance, why was the word “hard” used over 100 times?
Once again, the analysis provides a great starting point from which to dig deeper into the comments themselves in order to learn more.
Because sentiment analysis often utilizes pre-existing libraries of scored words, there is some bias built into the process, and nuances in language — such as sarcasm or negation — can be misleading. For example, the phrase “I was not happy” could be scored as positive, because the word “happy” appeared and the negation was not taken into consideration.
Additionally, there may be important words that aren’t surfaced, simply because they don’t have a sentiment score associated with them (for example, “library” or “road”).
A Naive Bayes classification algorithm is one technique that can be utilized to determine a word’s probability of being in a lower or higher rated comment. One pro from this method is that it does not rely on any pre-existing conditions, and therefore has less inherent bias.
Well, some words are equally likely to appear in a review — “the,” “are,” “of,” and so on — while other words may only appear in comments with lower ratings and are therefore more predictive of a lower likelihood to recommend the company.
By labeling comments as either “good” or “bad” based on their rating and then running them through the algorithm, you can surface words with a higher probability of being in a good or bad review.*
The following table outlines the results of the Naive Bayes analysis we ran on some of our post-purchase text data:
The word “handling” occurred in 33 comments and thus has an 81.4 to 1 chance of being in a bad comment instead of a good one. As can be derived from the comment context, customers were unhappy that they had to pay a handling fee they weren’t expecting when they got to checkout, leading them to say they were less likely to recommend the site to a friend.
The word “hour” has a lower probability and lower occurrences, but is enlightening nonetheless. It tells us that the coupon code email is not delivering in a timely fashion, leading to a lower rating.
These words are unlikely to show up either in the high-frequency list or the sentiment list, but highlight highly actionable information. For example, a prominent label may need to be added to products with handling fees to inform customers upfront what to expect. This simple action could improve the customer experience and decrease the likelihood that they leave a poor rating.
Not only can the insights gained from the analyses described above be used to improve the customer shopping experience in the future, but they can also be used to:
And there’s still more that we can analyze from customer feedback tools.
Once you get into the data, there’s a seemingly unlimited number of applications that could arise. If you have text data available from customers in the form of reviews, surveys, or email responses, and you haven’t yet run some aggregate analysis on it, you are likely missing out on key insights. These insights should not go overlooked; they are imperative in making decisions to improve customer experience and honing in on your marketing strategy. With all this in mind, you’re all set up to gain a treasure trove of valuable insights into your business.
By utilizing natural language processing, finding the trends within your company becomes a much more simple (and automated) process. Tadpull's data science team can aggregate your data and bring clear insights on customer satisfaction so you don’t have to spend hours on all this text data analysis. Our world-class digital marketing team will drive traffic to your business, help you spot meaningful trends in your customer satisfaction, and inform you of the best ways to act on those trends.
Shoot us a message today if you’re interested in learning more.