Sentiment Analysis Accuracy Explained by a Data Scientist: Part One
Kimberly Surico |
 03/25/22 |
8 min read

Sentiment Analysis Accuracy Explained by a Data Scientist: Part One

In this two-part series, we interviewed NetBase Quid® Data Scientist, Michael Dukes, to help us break down precisely what sentiment analysis is, how it works, and the technological processes that differentiate “accurate” from “okay” analyses. This first piece will lay the foundation of what sentiment analysis is and why accuracy is a differentiator amongst the tools available today. And in the second part of the series, we’ll elaborate further, showing what deep learning tools offer brands – and the variety of back-end capabilities to watch for, as they make a huge difference when it comes to deriving actionable, accurate insight.

During the past few years, we’ve seen the conversation about sentiment analysis shift as users learn more about the depth of consumer insight available. They’re seeing firsthand how categories are disrupted by data-driven discoveries based on real-time intel powered by AI-powered analytics.

sentiment infographic

Q: What is sentiment analysis?

Sentiment analysis is trying to understand people’s thoughts and feelings based on what they write or say. Typically, we’re interested in people’s thoughts and feelings about some particular thing (e.g. products, people, companies and organizations), and then more generally, it covers abstract emotional states. At NetBase Quid®, we have developed tools for analyzing both of these domains.

The term itself, “sentiment analysis” is somewhat misleading, because our analysis is not limited to understanding feelings or opinions (e.g. “this phone sucks”). We’re also interested in objective facts (e.g. “the battery never charges properly”). In fact, any data that reports on the benefits or downsides of a product is also highly relevant, so we’ll want to capture and understand that too.

Many applications of sentiment analysis also rely on content that has nothing to do with anyone’s opinion or description.

For example, if you’re interested in the behavior of stocks on the NYSE you may interpret a statement such as “ACME stock fell today” to carry negative sentiment about ACME. This stock price drop is interpreted as negative news for the company, and we can infer that people will have negative sentiment about it.

Q: What are the key elements of an analysis? How is data analyzed?

In NetBase Quid®, the key elements of sentiment analysis are morphosyntactic structure and semantic features, along with lexical knowledge of a language.

As an example, consider a sentence, like the following, which can be found in numerous tweets:

               I want to go to IKEA so bad.

In each of the languages we support with Natural Language Processing (NLP) we identify word tokens (e.g. “delicious”, “pies”, etc), parts of speech (noun, adjective, verb, etc) and lemmas (e.g. ‘pie’ is the lemma – or basic form – for the word “pies”). We also identify terms that are names, even when they consist of multiple terms (e.g. ‘New York Stock Exchange’).

Based on this information, we use grammatical analysis to combine these words into syntactic constituents and then we assign semantic features to these grammatical units.

In the example above, we identify positive sentiment on the part of the subject towards IKEA, namely the desire to visit the place and presumably buy their products. We also recognize the emotional state of ‘desire’ in this sentence. The expression “so bad” is recognized as an intensifying phrase which conveys no negative sentiment.

And all of these semantic roles in the sentence – Agent (‘I’), Action (“go”), Emotion (“Desire”), Object (“IKEA”) are extracted into our index where NetBase Quid® allows users to capture an in-depth understanding of what people are saying about brands, products, politicians, movies, tv shows, or what have you.

And here’s the distinction: Many of our competitors, as well as many sentiment analysis tools developed in academia, simply do not provide this in-depth level of analysis. For each sentence, or even for entire documents, they simply output a label: ‘positive’, ‘negative’ or perhaps ‘neutral’. Some also present a score alongside or instead of a label, on a range from strongly negative to strongly positive (e.g. -100 to +100).

Also, – and critically – they often provide little detail about the ‘source’ of these positive or negative sentiments. Some products don’t even allow you to see sentiment results for individual posts. Instead, they offer a summary score across an entire group of results.

None of this is very helpful for understanding the details that give rise to the opinions found in social media content. And transparency around this intel is crucial, as it validates an accurate analysis. Without it, the results are unsubstantiated and make its reliability questionable.

Q: Walk us through a sentiment analysis minute – a topic is framed out in NetBase Quid® and I hit “enter” – then what?

Actually, ‘framing out the topic’ is very important because it strongly determines what kind of results you’ll get. So, before submitting your topic, it’s important to double check that it includes all of the terms you need to investigate.

After your topic is submitted in the NetBase Quid® application, content containing your search terms is looked up in the index and filtered based on the language(s) you requested, as well as the time ranges, location filters, etc., and any terms you wish to include or exclude from the search.

topic-definition

topic-search-criteria

You can then open the ‘Analyze’ widget and view any aspect of the content, depending on the display options you’ve selected. The overview tab allows you to see a timeline of the changing sentiment results for your brand or product over the period you’ve selected as well as the ‘Net Sentiment’ summary for that period.

summary-metrics

You can also view the Word Cloud of sentiment drivers (e.g. emotions, positive/negative attributes, etc).

word-cloud-emotions

This data is compiled and displayed based on the options you chose in your topic.

From that initial set of results, you can focus more precisely on areas of interest by adding filters (e.g. geographic, time or demographic filters) or changing the displayed metrics in the Analyze widget (e.g. view a stream of individual posts or the geographical distribution of posts).

You can filter down to the granularity of specific sentiment words and see exactly how people are talking about your product.

onversation-snippets-using-sentiment-analytics

You can set the widget so that sentiment results are focused very precisely on your search terms, or you can set the results to provide a broader picture of the sentiment occurring in content around your search terms.

At NetBase Quid®, we distinguish between what we call ‘high precision’ sentiment and ‘any sentiment found.’

  • If you want to identify and understand comments and opinions about a specific product, say a newly released smartphone, you would use a high precision topic. This lets you gather and group positive and negative comments about different aspects of the device (e.g. the screen, the battery, etc.).
  • If you’re more interested in what people are saying at an event (e.g. the Olympics, the SuperBowl, etc.) you’re more likely to use an ‘any sentiment found’ topic, because they may not be talking specifically about the Olympics but about things of interest going on around the main event.

sentiment-specificity

You’re also more likely to use an ‘any sentiment found’ topic if you’re more interested in people’s emotions rather than their descriptions of products or people. Emotional content is more likely to be conveyed with emoticons, emojis and hashtags – and these are often located quite far away from the search terms that you used in your topic. The connection between them is often less direct and not expressed in actual sentences. NetBase Quid® sentiment analysis includes special rules to help link these terms to one another.

Q: How has sentiment analysis changed over the years?

The most obvious change in sentiment analysis over the last several years is the advent of deep learning. Discussions about deep learning and AI have become ubiquitous in the technology space, but there are still a lot of misconceptions about what these technologies can actually do.

There’s a widespread assumption that all you need to do is hook up a deep learning system to solve all your remaining NLP problems in a flash. But nothing could be further from the truth.

Machine learning has made great strides in handling low-level tasks, such as POS-tagging and lemmatization and there is also considerable progress on syntactic analysis, when enough good data is available. But when it comes to understanding the meaning of human language, there are still many problems with the current state of the art – and there are many ways in which it is inferior to rule-based systems.

And, contrary to the latest statistics widely reported in the media, it is not easy to adapt these new technologies to a particular use case. We have a common example to clarify.

Consider the post below, which you might see in a restaurant review or a tweet:

We had a meal at Steakhouse X yesterday. The fries were delicious but the steak was awful.

Running this example through the Distilbert classifier available on the web at Huggingface, we are met with disappointing results. Please note: This is a widely used and very advanced transformer-based sentiment analysis tool derived from Google’s BERT. Be sure to follow the link and try the system out for yourself.

So – is this post “positive” or “negative?” Well, quite clearly it’s both: the reviewer loved the fries and hated the steak. If your system can’t analyze sentiment below sentence-level or post-level it can’t identify what sentiment is about or even summarize it correctly.

  • The post is positiveif you’re interested in people’s opinion of the fries, but negative if you’re focused on the quality of the steak.
  • From the perspective of the Steakhouse X brand as a whole, it would be accurate to say that the report card is mixed: they’re doing well with their fries, but more work is needed on the steak.

The NetBase Quid® application extracts both positive and negative sentiment for examples of this type, identifying the positive object (“fries”) and the negative object (“steak”) and allowing you to extract whichever of these labels is relevant to your ongoing research questions.

But, according to Distilbert, this post is 97.2% negative. The delicious fries don’t count for much in this case:

Distilbert

Weirdly, if we rephrase exactly the same content in a slightly different order, we get a very different sentiment result:

We had a meal at Steakhouse X yesterday. The steak was awful but the fries were delicious.

Distilbert analyzes this version as just shy of 100% positive. Just rearranging the sentence with no change to meaning leads to a complete reversal in sentiment! This makes no sense. And it highlights how oversimplified assumptions about how to classify data lead to incorrect analyses.

Distilbert-2

Important: This is not an uncommon error. Machine learning sentiment systems are actually very, very far from being able to handle even simple ‘document-level’ sentiment scoring, let alone to extract and identify the roles and emotions that humans can understand with ease.  

Machine learning systems are only as good as the data they are trained on, and unfortunately, these systems are often trained on very questionable datasets. And for the most part these datasets typically contain little to no social media data. Using them as the basis for social media sentiment analysis tools is not recommended without a great deal of human oversight and additional review.

Any company claiming that they have ‘upgraded’ or ‘retrained’ their sentiment systems based on machine learning models should be viewed with deep skepticism – even more so if the language involved is not English. Unfortunately, NLP resources available for languages other than English and a handful of other major European languages are still quite lacking.

And we’ll leave you here, as we’ve offered a lot to digest – and we have so much more to detail for you! Be sure to watch this space for Part Two of this interview, where we’ll dig into how deep learning systems work – and how sentiment analysis tools typically struggle. Until then, please reach out for a demo and we can show you all of this in action as well!

Premier social media analytics platform

Expand your social platform with LexisNexis news media

Power of social analytics for your entire team

Media analytics and market intelligence platform

Enrich your media analytics with social data

Social media benchmarking
and competitive intelligence

Data streams & custom KPIs for advanced data science

AI, Image Analytics, Reporting Tools & more

Out-of-the-box integration with other data sources