How We Use NLP for Market Research with User Generated Content

Apr 22, 2022
How We Use NLP for Market Research with User Generated Content

How We Use NLP for Market Research with User Generated Content

Just as Natural Language Processing (NLP) has become ubiquitous in our everyday lives, it’s become a cornerstone of market research and insights.

The use of machines in data processing helps transform large data sets into something digestible for companies and brands. Natural Language Processing is particularly relevant for understanding user-generated content, like social and reviews where there are thousands of datapoints that are impossible to analyze with human capacity alone. That’s where NLP has proven to be a valuable tool for research and better than other, more traditional, methodologies.

The many ways NLP has proved valuable especially in comparison to manual human interpretation, including the removal of human error and the ability to process large data sets in a fraction of the time. NLP also tends to be less expensive than manual interpretation and provides a less biased analysis. And it facilitates connections to analytical techniques beyond human capabilities

For many use cases, Natural Language Processing of User-Generated Content (UGC) yields more robust and authentic sets of insights than surveys or focus groups. UGC provides unfiltered consumer insights and uninfluenced data (no screener, discussion guide, survey questions) once again removing much of the human bias, and when combined with NLP extraction of insights there’s an added measure of objectivity. Additionally, UGC provides vast data-sets to measure quant while still allowing for qualitative insights.

In an earlier blog post, we noted that text analytics was a top emerging research method with widespread implementation. User-generated Content is ripe for this sort of analysis.

Data Scientists may use one or all of these analyses to uncover key insights.

NLP Analyses

  • Sentiment Analysis: Describes the overall positivity or negativity around a product, feature or topic.
  • Thematic Analysis: Explores the emergent themes in a structured collection of data, sometimes tied to star rating (in the case of reviews data). When tied to star ratings we can derive insight into underlying drivers of satisfaction.
  • Emotional Lexicon Analysis: Explores the emotional undertones and verbiage attributed to a topic, product or brand. There are 8 basic emotions: anger, disgust, fear, sadness, anticipation, surprise, joy, trust. The emotional lexicon assigns scores for these 8 emotions for nearly every word in English and other languages, based on ways the word is used and emotions it may evoke. As such, strings of text, phrases, sentences, paragraphs and other bodies of text may be summarized and described by numerical scores on the 8 basic emotions.
  • Text Correlations Analysis: The strength of associations between words. Think of this as the relatedness of two words. Certain word pairs may occur together at high frequencies and provide additional contextual information. Conversely, certain word pairs may occur together very rarely, less frequently than the words appear independently, this low frequency occurrence also provides contextual information.
  • Text Cluster Analysis: Clustering or categorizing text according to similar features. Separating pieces of text into groups/clusters according to similarity. The emergent clusters reveal insights related to prevailing themes/topics and more.
  • Large Scale Text Summarization: Rapidly summarizing large bodies of text quantitatively, by converting text features to numerical scores and applying mathematical/statistical computations. The benefit is that computers can read-in and summarize more text in an minute than a person could read in a lifetime. Also, the computer will interpret the text consistently, within the algorithmic rules assigned, removing concerns about human bias/error.

There is a myriad of outcomes that brands can get to when leveraging this discipline, all of which focus on growth.

A brand may want to understand: What are the key topics of conversation? By measuring prevalence and using a thematic analysis, NLP can easily assess this concern. A brand might be curious as to how key topics are related. A Cluster Analysis will help provide this answer. Deeper understand surround the context of particular topics can be assessed via Sentiment Analysis, Correlations Analysis, N-Gram Analysis, and an Emotional Lexicon Analysis. Perhaps a brand wants to what are some of the macro themes in the category and how does my brand measure up to it. This would be best addressed with a themes analysis, and then simply comparing and contrasting the output summary across relevant brands.

By leveraging NLP brands can uncover timely and rich insights that grow both the top and bottom line.