The human benefit of natural language processing

Natural language processing is a core ability of cognitive computing systems and is often defined as helping computers process and understand human language. On a very basic level, NLP does that by putting words together in meaningful phrases, assigning meaning to those phrases, and drawing inferences from them. Some of the most well-known components of NLP are part-of-speech tagging, named entity resolution, word sense disambiguation, and coreference resolution, each of which play a vital role in identifying and characterizing the core text that carries the primary meaning of a phrase or sentence.

Other deep technical processes behind NLP include machine learning techniques, computational linguistics, and statistics across training corpora. The ability to process language naturally allows us to summarize documents, auto-classify text, conduct sentiment analysis, and provide search results with relevance ranking. In turn, those drive real solutions to problems. Take, for example, the speakers at Basis Technology's HLTCon on March 31, who discussed how they are using NLP as part of their solutions to gather intelligence, address terrorist threats, conduct research into social issues, tackle communication issues in refugee camps, and identify victims of human trafficking in the sex trade.

Parsing data for clues

Patrick Butler, senior research associate at Virginia Tech, discussed his work on EMBERS – a project that uses publicly available content to predict social events. The project is funded by IARPA and aims to create an automated system to parse open source data for clues about what is happening in a specific society. Butler and his team are using Tweets to determine not only what a protest is about but to predict when the next one might occur. They are also tracking flu cases through cancelled OpenTable reservations and by the number of cars they see parked outside of emergency rooms. They do all of this work in the language the content is written in, and some of their processing includes turning relative phrases into actual dates. "Next week" becomes the date of the content plus 7 days, for example.

Multi-lingual natural language processing is important to many of the cases presented at HLTCon, but not all languages are easily parsed through NLP. That is to say, the less written content there is in a language, the less developed NLP will be in that language: NLP in French is excellent, but NLP in Swahili is still difficult.

This is a barrier for Gregor Stewart's and Danielle Forsyth's projects, both of which deal with refugee crises. Stewart, vice president of product management at Basis Technology, and Forsyth, co-founder of Thetus Corporation, discussed how predicting political upheaval can help prepare for refugee movement to other areas. Stewart said that the refugee crisis in Europe now is not as new as it may seem. He said that there are about 6 million people who have been outside of their home countries for more than 5 years, and some of those have only recently been processed. The sheer volume of people moving into Europe has overwhelmed the governments there, and language differences are the biggest barrier to getting people to safety and creating mitigation policies. He speculated that this process would be greatly aided by better interpretation and translation tools that can be created through machine learning and natural language processing.

Predicting crises

Forsyth discussed anticipating refugee crises by parsing language for overt and hidden meaning. Her work currently focuses on Africa, and she recently found 5 phrases used by Burundi politicians that incite violence against minority groups, including the innocuous seeming "get to work." Monitoring this type of language and using sentiment analysis to determine its meaning helps indicate if a political crisis is likely to instigate a refugee crisis. If aid groups can successfully predict a humanitarian crisis, they can mitigate some of the effects of the crisis and perhaps keep refugees in safe areas inside of their home countries. Multi-lingual NLP is essential to understanding the local language enough for Forsyth to be successful. The talk about big data often is enterprise focused – how we make better business decisions, discovering new business opportunities and the like – but HLTCon highlighted a real ability to turn big data into information that can help people, both in the collective sense and in the individual sense.

Giant Oak is using a combination of technologies that includes NLP to identify sex trade workers who are victims of human trafficking. To do so, they have to determine the behavior of sex workers who are in the trade willingly, and then identify deviations from that behavior. They have mined 85 million online ads and more than 2 million reviews for sex workers for locations, phone numbers and other rich data. They are also looking for sentiment in these ads to determine if the ad writer was unhappy or drugged – a very difficult task since there may not be much difference in behavior of someone who is taking drugs and someone who is drugged. Giant Oak's work is still in early stages, but they are using machine learning and NLP to try a solve social issues and save lives.

So is Karthik Dinakar, Reid Hoffman Fellow at MIT. Dinakar uses models to understand and predict adolescent distress, crisis counseling, self-harm and heart disease. In his heart disease research he found that looking at a combination of a patient's history, parsing the words used by the patient to describe symptoms and an angiogram is better at predicting heart attacks in women than doctors are. Dinakar also found that women often use different language to describe their symptoms than do men. For the past few decades, doctors have thought that this means that men and women have different heart issues, but Dinakar's research indicates that the issues are the same. It is how the genders talk about them that is different. The overwhelming majority of male cardiologists simply do not understand what their female patients are saying. Mapping language differences may help more female heart attack victims survive.

The conversation about cognitive computing and big data often is enterprise focused—how we can make better business decisions, discover new business opportunities and the like—but the projects at HLTCon highlighted a real ability to turn big data into information that can help people in need, both in the collective sense and in the individual sense. It is this kind of creative use of NLP technologies that can literally make cognitive computing smart enough to do some good.

Previous
Previous

Iterative taxonomy building yields more reliable classification

Next
Next

An evolution in computing