Lorem ipsum dolor sit

Create Your Own Genuine Web Masterpiece

Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam

Controlled vocabulary

aSome people use the term controlled vocabulary as a generic term to refer to classification models. Others say it as a pre-defined set of terms that can be used to classify content, and content can be labeled only with terms in the controlled vocabulary.

Authority list


A set of preferred or authorized terms. For example, there may be an authority list for sports team names to encourage the use of formal names instead of team nicknames.

Content analytics

aThis moves us beyond looking at only text to look at video, audio and images

Data analytics

aLooking at raw numerical data to find useful information from the data. The process often includes cleaning and normalizing the data to get it into a digestible format.



Lemmatization is similar to word stemming but uses a more sophisticated and rigorous procedure that will find word stems with greater accuracy.


A model is a generic term for any group of terms used to define a domain – no matter how it is arranged. People often combine it with the word classification to form “classification model.”

Named entity extraction

aNamed entity extraction looks for and labels things such as people, places, organizations, dates, forms of money, and companies, in text.


aAt first glance an ontology is similar to a taxonomy because it has a hierarchy of terms. However, in an ontology, those terms are related to one another and the relationships have specific names and definitions.

Part-of-speech tagging

aAssigning to and labeling each word with a part of speech, such as noun, verb, adjective and adverb, using software.

Predictive analytics

aA branch of data analytics concerned with predicting future conditions from current or past conditions.

Relationship extraction

aRelationship extraction identifies how things are related. Two examples would be “Bakers make cakes” and “John lives in London.”


aA structured framework or plan, so “classification schema” is a generic term that refers to any set of terms used to classify content.

Sentiment analysis

aText analytics used to determine the mood of text as positive, negative or neutral.

Synonym ring

aA synonym ring adds a list of alternate ways of referring to a term. For example, the term the Affordable Healthcare Act will also have a listing for Obamacare, since it is known as both.


aTaxonomy is a way of describing a particular area of knowledge in simple hierarchy.  Most people were probably first exposed to taxonomies as school children, where they would have learned the Dewey Decimal System for organizing library books or would have learned how animals and plants are sorted in the scientific world: Kingdom, Phylum, Class, Order, Family, Genus, Species, where a lion is a feline (Family) which is a mammal (Class), which is an animal (Kingdom). In such a hierarchical classification, each level helps define the one above it, as well as the one below it. Some people will describe a taxonomy as metadata because when a taxonomy is used to auto-classify content, a taxonomy is a brief description of the content it is used to classify. Taxonomies are only one type of classification system. Ontologies, thesauri, and authority lists also refer to classification systems, and people may use the terms interchangeably. All provide a consistent language to describe concepts in a domain and the relationships among them. Below are some helpful definitions related to classification.

Text analysis

aSeth Grimes makes a distinction between text analytics and text analysis, saying, “So we have text analytics on the one hand — text as data, fueling quantitative methods communicate business-required insights — and text analysis on the other, techniques that characterize and describe a text itself.”

Text analytics

aExtracts quality information from unstructured data to distill meaning from the text. It goes beyond counting words to extract meaning and give some context to that meaning. It often can give you the why of something that has occurred, where data analytics can tell you what has occurred.

Text mining

aText mining and text analytics are sometimes used interchangeably. Some experts say there is an important distinction. Text mining looks at text as words and extracts the numbers of words within documents and the number of kinds of words within kinds of documents.


aA thesaurus, like a taxonomy, is arranged in a hierarchy, but terms can have more than one parent. For example, the term “airline” might have a parent term of “travel and tourism” as well as “transportation.”  Thesauri also have related terms. For example, the Detroit Lions might be related to Detroit, its home city.

Word sense disambiguation

aWord sense disambiguation is the ability to determine the meaning of a word with multiple definitions, often from context; an example is the word well.

Word stemming


Word stemming produces the semantic root of a word by applying heuristics, for example flying becomes fly. A word stemmer will look for both forms in text.

Contact Us

Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut


Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut


Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut

Office Hours

Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut