Rulebases and machine learning

Providers primarily use two methods to metatag and categorize content. The first is rule-base systems that use classification schemes such as controlled vocabularies, taxonomies and ontologies. The second are machine learning systems.

 

Rulebase systems tend to be rigid, and they need a lot of maintenance, but their results are also more predictable. Rulebased systems tend to work very fast and be unsophisticated; consequently, that makes them easier to adjust and fix. They also tend to be simple to use and fairly accurate. If the rules put in place reflect actuality, then you have a good ability to classify things properly. The catch is that there has to be a mechanism or rule that says that relegates copy that matches no rules to an “unknown” bucket. Perhaps the biggest note of all on a rule-base system is that users already have to know what they need to classify to design the system. If you need to discover unknowns in your content, you may be relegated to looking at what did not classify to find it. But if you want control, a rulebase system is the way to go.

Machine learning systems allows for more human-like results and tend to be more flexible, but users have less control of the system, and when things go wrong, it can be difficult to tell why. In machine learning, the system works to achieve an end, and the process is adaptive. It looks at how well something is defined, using a pre-set measure, then it takes the data from a trusted, verified source and uses it to “train” itself to work better. In fields where there is a lot of data, this is very successful, because with machine learning, there’s a feedback loop that lets it be adaptive.

Potential issues with machine learning are:

  • It can be responsive to exactly the parameters that the designers intended, but it is responsive to only that which the designers anticipated. In other words, it works until something unanticipated changes, and then the machine needs new input to start learning again.

  • When it fails, it is not always easy to discover why it failed.

If you have very little time to work on building a rule-base system, and a machine learning based system may be the direction for you. However, there are some systems emerging that use a combination that hold hope for the best of both worlds.

Previous
Previous

Taxonomies, ontologies, and autoclassification

Next
Next

Text analytics, taxonomy and auto-classification terms