3 ways to experiment with text analytics

Sift through your unstructured text with cloud-native products, machine learning tools, or specialized text analytics programs.

Contributing Editor, InfoWorld |

3 ways to experiment with text analytics — nikada / Getty Images

Text analytics, sometimes called text data mining, is the process of uncovering insightful and actionable information, trends, or patterns from text. The extracted and structured data is much more convenient than the original text, making it easier to determine the information’s data quality and usefulness. Developers and data scientists can then use the mined data in downstream data visualizations, analytics, machine learning, and applications.

Text analytics aims to identify facts, relationships, sentiments, or other contextual information. The types of information extracted often start with tagging entities such as people’s names, places, and products. It can advance to assigning topics, determining categories, and discovering sentiments. When measures such as currencies, dates, or quantities are extracted, establishing their relationship to other entities (and any qualifiers) is a key text analytics capability.

Extracting data from documents versus form fields

The hardest challenges in text analytics are processing enterprise repositories and large documents such as aggregated news from websites, corporate SEC filings, electronic health records, and other unstructured or semistructured documents. Parsing documents has some unique challenges as the document’s size and structure often dictate domain-specific preprocessing rules and NLP (natural language processing) algorithms. For example, categorizing a 1,000-word blog post is a lot easier than ranking all of the topics found in a book collection. Also, larger documents often require validating the extracted information based on context; for instance, the medical conditions of a patient should be categorized independently from the conditions listed in their family history.

But what if you want to perform a potentially simpler task of extracting information from a form field or other short text snippet? Consider these possible scenarios:

Quantify feedback from an employee survey’s open-ended responses
Process social media posts for their sentiments about brands or products
Categorize different types of chatbot interactions
Assign topics to user stories on an agile backlog
Route service desk requests based on the problem details
Parse information submitted to marketing on your website

These problems require more simplified algorithms than parsing documents because the text fields are identifiable, short, and often carry a specific type of information.

Let’s say you need to leverage unstructured field data in an application or are asked to include insightful information extracted from text in a data visualization. Text analytics is an important first step, and agile data science teams often use spikes to conduct discovery work. The team needs tools, skills, and methodologies to perform text analytics. Here are three different approaches.

1. Use a public cloud’s NLP and cognitive services

The major public clouds offer natural language processing and other cognitive services, so teams already working in these environments and skilled at using these algorithms should research these options.

2. Use text analytics tools in data integration and machine learning platforms

If your organization invested in data integration, machine learning, or analytics platforms, then it’s likely one has some text analytics and NLP capabilities. Using these platforms may be an easier and faster way to perform lightweight text analytics, rather than coding to APIs or in data science notebooks. Here are some examples:

Other data science platforms such as RapidMiner, Knime, and Dataiku offer text mining functions natively, through plug-ins and integrations with public cloud services.

3. Use specialized text analytics tools

If coding on public cloud platforms is too complex, and if your organization does not already have an analytics, data science, or machine learning platform with text mining capabilities, then you’re probably seeking a third option. Specialized text analytics tools may be the answer. Take a look at KeatText, Lexalytics, MeaningCloud, MonkeyLearn, NetOwl, Provalis Research, Rosette Text Analytics, and other platforms that offer text analytics capabilities.

Text analytics is also common in customer experience, marketing automation, market research, social listening, chatbot, and other platforms that capture qualitative information around customers and sales prospects.

It’s no surprise that many tools have text analytics capabilities. Some offer simple on-ramps with prebuilt models based on standardized entities, categories, and topics, whereas others enable robust model building. The platforms also differ by target use cases, with some focusing on specific industries, document types, integration requirements, or technology use cases.

If you’re just getting started with text analytics, there are a few best practices. Begin any data and analytics discovery exercise by defining questions and target outcomes that potentially deliver business value. From there, consider the overall complexity of the document, content, and text fields that require processing, and examine the details around the target entities, topics, and semantics. Understanding the problem complexity can help separate whether an agile spike against a lightweight approach is viable or if a more extensive agile proof of concept co-constructed with text mining experts is needed.

Most importantly, recognize that text analytics and natural language processing is a form of machine learning. Arriving at robust solutions requires experimenting with different algorithms, improving models, adding new data sources, and validating the results’ quality. For organizations trying to improve customer experiences, text analytics is an important capability to develop.