Peering into the black box of machine learning
Confidence: Probability vs Trust
As a computer scientist working with data classification, I often get the question "What is your confidence, as a percentage, in this classification result from the software?" It has taken me some time and a number of false starts to work out what the word confidence means in the real context of this question.
It’s tempting – but mistaken – to think the user’s confidence question is about a confidence level in inferential statistics. Confidence levels in statistics provide a quantitative way to describe a set of outcomes from an experiment; they describe a fact about already-collected data, as a probability. For example, "95 percent of these data points fall into this range of values."
But I’ve found most often the question "what’s your confidence, as a percentage?" is not trying to describe a picture of the past, but of the future. People use computer classifiers to help work out "Should I do X or Y?" or even just "Should I do X?" and users want to know if they can trust the answer coming from the machine. When the machine is a black box to you, it’s often unclear how much you should trust the answer the machine produces.
So the question a decision maker has is "Should I trust the machine’s answer?" Trust, in this sense, involves a level of understanding what the machine is doing, and why it’s doing it.
Some classification techniques have a fairly straightforward logic path to follow, to establish the trust the user is looking for. For these methods, there is a clear connection between the logic the technique is using, and the resulting bucket or category a given blob of data falls in. For rule-based systems and decision trees, you can consult the rule base or the decision tree, and see the path of reasoning the classifier is using.
The BlackBoxitude of Machine Learning
Technologists tend to think of confidence in statistical terms, while laymen often use the term to ask about how trustworthy a machine learning classifier is, in an operational sense.
For most machine learning techniques, unfortunately, the inner workings of the classification algorithm are anything but transparent – even to techies who work with them on a daily basis. The complexity of these methods stems from the great number of input variables in play (dozens to thousands), and the subtlety of the interactions between variables, taken as an ensemble.
Some data is easy to classify. Consider the following data set:
The red line divides the data neatly into two categories. A classifier that can do this is a linear classifier. The shape that separates the classes is a line.
But in the most common case, your data points have a messy relationship with each other that can’t be accurately subdivided with a linear classifier, like the following picture:
In this case, you need a nonlinear classifier. When the number of input variables is high, you will likely have a hard time making a "classification map" like the previous picture that will help you visualize and understand why the classifier is giving the answers it’s giving - and whether you should trust these answers.
All is not lost. Researchers have been making progress on the problem of how to make the results of machine learning more interpretable and usable from a trust perspective.
Peering into the machine
One of the long running problems that neural nets (and many other machine learning techniques) have is that even when the magic box works, the box is still a black box. You really don't know why it's making choices.
There are two reasons you can't understand the workings of a black box – even when you have the complete internal diagrams of it. It is that either (a) the diagrams are too complex, or (b) the interactions between the parts don’t follow a predictable pattern. In machine learning methods, problem (a) happens when the dimensionality of the problem is too high to visualize – a very common situation. Problem (b) happens when the system the machine learning method is modeling, is mathematically nonlinear.
Over the past several years, researchers have been working at making "black box" machine learning results interpretable by people. An exciting result was written up by Marco Ribeiro et al at the University of Washington (https://arxiv.org/abs/1602.04938).
Ribeiro et al have come up with a scheme for making locally-linear explanations that let you understand "in the small" why a nonlinear machine learning data classifier is making the choices its making "in the large".
Here is an example. We have a nonlinear SVM – a type of machine learning classifier - that is trained on data and creates a classification boundary - the curve that separated the pink and blue areas of a two-variable space:
In this example, we want to understand why the big red + sign is in the pink region. (The pink island is the classification region that a particular SVM came up with based on a set of training data.)
The paper describes a method to come up with a "locally linear" explanation for why a point was classified as it was. In the picture above, the dashed line is the "line of locally linear explanation" that you use to understand why the big red plus sign is in the pink region.
You can use Ribeiro's technique for two useful things - [1] you give a collection of local explanations to a technician, who uses it to better tune the classifier, and [2] you give the local explanations to an end user who wants to have some sense of why the classifier’s black box is working as it does.
Here’s the abstract from Ribeiro’s paper:
Despite widespread adoption, machine learning models remain mostly black boxes. Understanding the reasons behind predictions is, however, quite important in assessing trust in a model. Trust is fundamental if one plans to take action based on a prediction, or when choosing whether or not to deploy a new model. Such understanding further provides insights into the model, which can be used to turn an untrustworthy model or prediction into a trustworthy one.
In this work, we propose LIME, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning an interpretable model locally around the prediction. We further propose a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem. We demonstrate the flexibility of these methods by explaining different models for text (e.g. random forests) and image classification (e.g. neural networks). The usefulness of explanations is shown via novel experiments, both simulated and with human subjects. Our explanations empower users in various scenarios that require trust: deciding if one should trust a prediction, choosing between models, improving an untrustworthy classifier, and detecting why a classifier should not be trusted.
Ribeiro’s work raises some questions, the greatest of which is: Does this technique produce something that an end user could even begin to understand? The feature space that Ribeiro’s method works in is very different than the semantic space or mental model an end user would use to think about the problem.
Even so, Ribeiro’s line of investigation may lead some day to having neural nets and other nonlinear machine learning methods that are more understandable to end-users than they are now. In the shorter term, there is promise that systems such as LIME will be helpful for machine learning technicians to better understand what is going on inside the present and future wave of neural net applications that are gaining popularity for advanced computer vision and machine understanding tasks.