7 Tips for Text Analysis Success

Text analysis is the process of sorting, analyzing and gaining insight from text sources. The digital age produces enormous amounts of text data that is overwhelming yet rich in insight. Thanks to data science and artificial intelligence, machine learning can mine the rich resources of text data and break it down into its components to identify themes, sentiment, trends and many other indicators found within.

Your approach to text analysis allows you to quickly gain broad understanding of overarching themes in your data sources or dial down further to paint with a fine brush. However, the world of text analysis can be a bit intimidating. To help you get the most from your data analytics tools, we’ll share seven tips for your text analysis success.

4 external forces shaping your brand

Specifically, we’ll explore the following themes to offer clarity and best practices to keep in mind when you’re staring down your next big project:

  • Keyword spotting
  • Manual rules
  • Text categorization
  • Topic modeling
  • Thematic analysis
  • Disambiguation
  • Clustering

The value of text analysis to your business intelligence cannot be overstated. It’s how brands know to position themselves, react to pressing concerns and make swift decisions. Let’s look at a few facts and figures before jumping in: 

  • The ability to process and extract insight from vast data sets quickly and accurately is paramount to your brand’s success in today’s market. That’s because roughly 80% of data available to brands comes in the form of unstructured text from various sources.
  • Artificial intelligence breaks down texted based data sets for analysis using a process known as natural language processing (NLP).
  • Brands today have a voracious appetite for data. The need for artificial intelligence to analyze data on an enterprise scale is propelling the NLP market towards a 27% CAGR through 2023.

Modern text analysis depends on machine learning to provide human beings with insights that would otherwise be impossible at speed. However, no matter how sophisticated your AI, it can only work with what you give it. And that’s a critical aspect to keep in mind – especially when building search queries across social and media datasets. What methods you use to parse and cut across your data for insight matters too.

With that in mind, let’s look at a few tips, so you get the most from your next text analysis.

Keyword Spotting

Since text analysis aims to draw upon as many insights from your data sets as possible, keywords are a fantastic way to dig deep. If you use visualized data within your data analytics tools, then sorting for keywords within your data set goes beyond the topic clusters and main themes for a richer understanding.

That’s because words that are mentioned frequently can cut across these themes. In other words, if you are looking at a network map of your data set labeled by clusters, the underlying conversational drivers are not always apparent.

Therefore, dialing into word frequency can tell a story in itself, especially if it helps answer the questions that you seek from the data set.

When you’re starting a new analysis filtering your data set by top keywords gives you a broad picture of the underlying conversation. And the use cases transcend most data sets that you’ll encounter. You can use word frequency to target negative mentions in customer service logs, uncover hidden language cues in earnings call transcripts or mine social media topics for key talking points surrounding a theme. The opportunities are endless, and so are the insights they contain.

For instance, if your restaurant chain has recently placed a meatless option on the menu or is thinking about doing so, keyword identification within social media conversations will show you what works and what doesn’t. And these cut deeper into the heart of the conversation as opposed to broad themes of the meatless food industry or your quick service restaurant (QSR) segment.

Here is a bar chart sorting for top keywords found in a social media listening analysis of a leading meat-substitute brand sorted by sentiment.

408-meat-substitutes

In this way, you can quickly see that topics around veganism are heavily featured, a theme that wasn’t immediately noticeable in the broader network map. Additionally, we can gain insight into taste and texture conversations in keywords such as ground or patty. Dissecting these keywords by sentiment helps us to quickly dive into the areas that could be a concern.

In a nutshell, keywords are one of the quickest routes to a better understanding of your data set. The ideas they represent are hard to attain from traditional text analysis methods and represent the substantial insight that machine learning brings to the table.

Manual Rules

Sometimes getting dialed into the insight that you need or even finding a place to start can be difficult, depending on the topic. Language is not only nuanced but can also carry a lot of overlap. This is where some knowledge of the implementation of manual rules can help you make progress.

Unleashing artificial intelligence on a data set with vague search terms is like letting a hyper dog off its leash. You have no idea what it’s going to come back with in its mouth. When you find yourself in that position, Boolean search operators are your friend. This is especially true if you are trying to tease answers from social or media datasets.

For instance, say your client is looking to further align with the sustainability trend and look for alternative packaging ideas to reduce plastic waste. As such, they want to know what’s trending in the media regarding the subject. That’s not an easy topic to take on as there will be excessive amounts of noise if we simply put “plastic alternatives” into our query.

That’s where manual rules help you dial in and get focused results. Not only would “plastic alternatives” return muddied results, but there are also many different ways to express that idea. If we leave them out, then we’re missing insight.

And to really get the most from a topic such as this, your data analytics tools must offer robust support of search indicators and filters. It takes time to build searches such as these, but it’s well worth it to answer the tough questions.

Using our query builder in NetBase Quid, we’re able to craft a search that will scrape news and blog articles for every conceivable way of stating “plastic alternatives.” As you can see below, we’ve also included the “near operator” (~) with the number 10. That indicates that every word grouping within quotes must occur within ten words of each other. This helps ensure we pick up every instance, no matter the word order.

bolean-operators

Every data analyst at some point runs into the inevitable topic that makes you want to pull your hair out. Manual rules such as these make life easier and are worth their weight in gold once you familiarize yourself with them. 

Text Categorization

If you’ve ever seen a word cloud colored by sentiment, then you’ve seen text categorization in action. Text categorization, also commonly known as text classification, is the process of tagging unstructured text into categories to extract meaning from the data and aid in problem-solving.

While text categorization might not sound all that exciting, it’s the natural language processing going on within your data analytics tools that parse text data, classify it and render the ability to use it for things such as spam detection and sentiment analysis.

Here’s a real-life example of a sentiment analysis that several top brands are paying very close attention to in the ongoing chicken wars. All of these emotions are expressing favorability towards one brand or another. If this was your category, you need to know who’s who. And the intel has to be on point – your competitive intelligence depends on it.

408-chicken

Sentiment is determined by categorizing your data based on polarity. In other words, positive sentiment is ascribed a value while negative sentiment is given another. The thing to ask yourself, though, is how deep do your analytics tools dig? Artificial intelligence can categorize sentiment not only from opinions found in the text but also from sarcasm, context, and even emojis. Make sure your tools are grabbing insight from everything. Not all AI is created equal.

That said, the more that your tools are capable of categorizing in this way, the more insight you’ll be rewarded with from your data sources. Sentiment classification alone is huge for brands. It can inform their market intelligence, safeguard brand health and give you a deep dive into the voice of the consumer (VoC). And accuracy is absolutely critical here. Sentiment is a lot more than pretty word clouds. If it’s based on sub-par results, then your decision-making strategies could experience a misfire.

Topic Modeling

Topic modeling is how artificial intelligence uses unsupervised learning to detect patterns and cluster word groupings and expressions. It is the methodology of text classification that gives rise to clustering and thematic analysis, which we’ll discuss in more detail later on.

The important thing to remember here is that topic modeling gives structure to large volumes of text. And that means one thing – speed. The point being that you should use topic modeling within your data analytics tools to put a framework across all the unstructured data you can throw at it. That means internal emails and messages, documents, reports, etc.

The thing is, the more data you get comfortable throwing at your AI, the more informed you’ll be. Once you discover the power of analyzing patterns in any text – it’s game over. For example, here are the comparisons you can draw from feeding employee reviews into your AI. You can see the implications at a glance, and it’s a wealth of insight for just a few minute’s time.

positive-and-negative-glassdoor-reviews

Additionally, topic modeling eliminates the need to manually evaluate customer service interactions. Whether you are using chatbots or a call center and you’ve recorded the calls for quality assurance, transcribe them to text, format them in a .csv file and upload them into your dashboard. You’ll find insight far faster and open availability to your human resources for more pressing concerns.

How topic modeling works is highly technical but suffice it to say that once you get hooked on the insights available from any text, you’ll hunt high and low for new data sources to unlock.

Thematic Analysis  

Thematic analysis is the approach you take when you want to discover viewpoints, feelings, attributes or other psychographic insight from your qualitative data sets. And these types of data pools could be call transcripts, customer service chatbot logs, product reviews, surveys, etc.

Thematic analysis can be used to inform consumer, competitive and market intelligence for an understanding of the voice of the consumer (VoC), measuring baseline metrics for brand health or gaining perceptions for product development. The use cases of thematic analysis for brands is a game-changer for positioning and timing. It helps you be as informed as possible on who is thinking what and why they feel that way.

Example questions that thematic analysis can answer include:

  • How do patients feel about your new telemedicine platform?
  • What are the differences in opinion between men and women on grocery delivery?
  • What professions index highly with vegans?

The type of answers and insight you can derive from a thematic analysis is exceptionally effective. In particular, brands use it extensively to cut through social conversations to get to the heart of their target audience. As such, extensive use of filtering tools can coax out the voices that you need to hear from the most.

filters-available-in-netbase

Additionally, suppose you’re just beginning your exploration phase around a topic. In that case, a tool like the theme discovery feature in NetBase Quid is perfect for doing the heavy lifting for you. Once you’ve created a topic, it will train the AI language models on the subject matter and auto-discover conversational themes. Here’s a network of themes related to the topic of grocery delivery. It’s perfect for gaining quick insight and a bird’s eye view of an unfamiliar topic or category.

conversation-themes-for-grocery-delivery

To summarize, thematic analysis is one of the hallmark capabilities of text analysis, especially when combined with data visualization. It’s all about cutting through the data to find the answers to precise questions. Make sure you’re comfortable with your data analytics tools’ capabilities to really drill in, and you’ll up the quality of your analytics game for sure.

Disambiguation

Given the nature of language, disambiguation is crucial to your text analysis success. The ability to achieve quality results depends on two factors:

  • The ability of your artificial intelligence to differentiate between words and phrases with alternate meanings such as light, stick and root.
  • The user’s ability to manually filter, tag or employ advanced search operators to clear away noise and fine-tune results.

This being the case, depending on the data analytics platforms that you use, lexical ambiguity can be an issue. That depends on your artificial intelligence’s machine learning capabilities to differentiate between parts of speech used for alternate meanings. On the flip side of this issue is your tool’s ability to include common misspellings of your target.

Natural language processing is light years ahead of where it was just a couple years ago, but language is a moving target. New words, idioms, slang and meanings come and go. That said, your journey into text analysis will be an ongoing effort to reduce irrelevant noise. This is especially true with rapidly shifting data sets such as social media applications or contextual media analyses. The power of your data analytics tool plays a monumental role in your starting point.

However, as the human operator, the number one rule of thumb is that ambiguity is the enemy of clarity. Suppose we need competitive intelligence on Spirit Airlines. In that case, we don’t need posts or articles referencing spirit animals, high school pep rallies, liquor or the Spirit of ’76. And from a brand health perspective, if your reservation process is hitting turbulence, you need to act fast and clear away clutter to dial into the problem. Ambiguity is a nightmare during a brand crisis.

spirit-airlines-reservations-cluster

Boolean search indicators are incredibly effective on the user end to fight against known similarities. Conversely, data visualization helps identify topics carrying similar words that you may not have accounted for. This is the case most of the time when parsing data sets pulled from social media. If there’s a Norwegian metal band named Spirit of Doom, you don’t want their affiliated social commentary or hashtags cluttering your market intelligence. Seeing them show up in a topic cluster lets you know to filter them out.

Ambiguity is rife within text analysis, and your primary weapons are awareness and top-shelf artificial intelligence. After that, train your AI to filter it out. Clarity is king. 

Clustering

In text analysis data visualization, clustering represents the framework which artificial intelligence gives to unstructured data. This process is an example of unsupervised learning as the AI groups the data by similarity and determines outliers.

Essentially, machine learning is taking the data you’ve fed it and separates it into vectors. For example, if you were to upload text files of several books, the AI would begin vectorizing the data into parts – such as author name, titles, chapter headings, bios, keywords, etc. Without going too far down the rabbit hole, machine learning then groups similar vectors into clusters.

Visualized data explains it easier, which is why it’s such a valuable tool in understanding large data sets. For example, here’s a visualized network map of media articles about market research grouped by thematic topic clusters in NetBase Quid.

network-map-showing-visualized-data

Each article that matches our search indicators is represented by a node. Nodes with similar language and themes are grouped together in colorized clusters. As such, relative volume between clusters is visually evident. Centrality in the network means the articles towards the center most tightly reflect our query. Interconnectivity and proximity between clusters indicate thematically similar articles contained within. Outliers are distanced from other groupings, such as the e-commerce ad spend cluster seen above.

Giving structure to your text data in this way provides a platform from which to take a deeper dive into your text analysis. Additionally, it allows you to see what is unnecessary to your objectives.

Suppose you’re just approaching a topic for the first time. In that case, a few simple search terms can help you visualize what type of similar content you are likely to encounter that you need to watch out for. Visualizing a data set from social media can show you themes, niches, hashtags and similarly named brands you may be unaware of. As such, you can then craft a well-built search query that will filter out that noise, resulting in clusters that offer clarity to your text analysis. After all, clean data sets lead to well-informed decision-makers.

We often talk about speed to insight with data analytics and the benefits it gives to brands. Understanding your tools’ capabilities will heighten your skill and help you find what you’re looking for fast. If your data analytics tools are slowing you down, reach out for a demo, and we’ll get you back on track for text analysis success!

4 external forces shaping your brand

NetBase Product Line

Premier social media analytics platform

Tailored platform for growing businesses

Expand your social platform with LexisNexis news media

Power of social analytics for your entire team

Customer experience analytics platform

AI, Image Analytics, Reporting Tools, APIs & more

Product configurations to meet all needs

Quid Product Line

Media analytics and market intelligence platform

Enrich your media analytics with social data

Media coverage for historical & real-time monitoring

AI algorithms, NLP, data sources, and visualization

Tailored, configurable solutions