Text analytics, taxonomy and auto-classification terms

Authority list

A set of preferred or authorized terms. For example, there may be an authority list for sports team names to encourage the use of formal names instead of team nicknames.

Controlled vocabulary

Some people use the term controlled vocabulary as a generic term to refer to classification models. Others say it as a pre-defined set of terms that can be used to classify content, and content can be labeled only with terms in the controlled vocabulary.

Content analytics

This moves us beyond looking at only text to look at video, audio and images

Data analytics

Looking at raw numerical data to find useful information from the data. The process often includes cleaning and normalizing the data to get it into a digestible format.


Lemmatization is similar to word stemming but uses a more sophisticated and rigorous procedure that will find word stems with greater accuracy.


A model is a generic term for any group of terms used to define a domain – no matter how it is arranged. People often combine it with the word classification to form “classification model.”

Named entity extraction

Named entity extraction looks for and labels things such as people, places, organizations, dates, forms of money, and companies, in text.


At first glance an ontology is similar to a taxonomy because it has a hierarchy of terms. However, in an ontology, those terms are related to one another and the relationships have specific names and definitions.

Part-of-speech tagging

Assigning to and labeling each word with a part of speech, such as noun, verb, adjective and adverb, using software.

Predictive analytics

A branch of data analytics concerned with predicting future conditions from current or past conditions.

Relationship extraction

Relationship extraction identifies how things are related. Two examples would be “Bakers make cakes” and “John lives in London.”


A structured framework or plan, so “classification schema” is a generic term that refers to any set of terms used to classify content.

Sentiment analysis

Text analytics used to determine the mood of text as positive, negative or neutral.

Synonym ring

A synonym ring adds a list of alternate ways of referring to a term. For example, the term the Affordable Healthcare Act will also have a listing for Obamacare, since it is known as both.


Taxonomy is a way of describing a particular area of knowledge in simple hierarchy.  Most people were probably first exposed to taxonomies as school children, where they would have learned the Dewey Decimal System for organizing library books or would have learned how animals and plants are sorted in the scientific world: Kingdom, Phylum, Class, Order, Family, Genus, Species, where a lion is a feline (Family) which is a mammal (Class), which is an animal (Kingdom). In such a hierarchical classification, each level helps define the one above it, as well as the one below it.

Text analysis

Seth Grimes makes a distinction between text analytics and text analysis, saying, “So we have text analytics on the one hand — text as data, fueling quantitative methods communicate business-required insights — and text analysis on the other, techniques that characterize and describe a text itself.”

Text analytics

Extracts quality information from unstructured data to distill meaning from the text. It goes beyond counting words to extract meaning and give some context to that meaning. It often can give you the why of something that has occurred, where data analytics can tell you what has occurred.

Text mining

Text mining and text analytics are sometimes used interchangeably. Some experts say there is an important distinction. Text mining looks at text as words and extracts the numbers of words within documents and the number of kinds of words within kinds of documents.

Word sense disambiguation

Word sense disambiguation is the ability to determine the meaning of a word with multiple definitions, often from context; an example is the word well.

Word stemming

Word stemming produces the semantic root of a word by applying heuristics, for example flying becomes fly. A word stemmer will look for both forms in text.