Limitations

Why AI, data science and machine learning are not perfect.

Artificial intelligence (AI), data science and machine learning (ML) are great but not perfect.

That there are limitations might not be clear, and with all the hype it is easy to misunderstand the capability and over-estimate what is possible. Where they have been successfully applied, it is often in a more limited sense than may be apparent from the way it is reported. So when reading reports on how these areas have aided response to humanitarian disasters, prevented crime, diagnosed illness and a host of other things, it is necessary to understand that the achievements are often very specific to a particular set of circumstances.

We know from a personal perspective that things do not always work as we’d like: how often has Siri, Alexa, Google misunderstood you?

How often do Amazon and Netflix recommend something that does not match your taste at all?

Have you ended up in an unexpected traffic jam after following your SatNav?

Have you ever had to end up calling your bank’s customer service because the chatbot was not answering your questions?

Thus an appreciation of limitations is as important as understanding strengths.

This is because even the most advanced methods have one clear limitation: they are all data dependent. Most ML models and data science methods are trained on large, annotated data sets. These annotations, more often than not, are generated by people. The quality, completeness and correctness of the data, and the annotations will directly affect the quality of your results.

Think about it this way, when you do not know how to do something, you look for examples on how to do it. A ML model will require the same: if it does not have enough examples of what classes you want to classify or predict, it is impossible for it to learn successfully to do so.

Examples of the harm of unbalanced, biased or incomplete data are as numerous as concerning: in 2014, a company created an ML-based system to automatically filter CVs to optimise hiring procedures and eliminate biases. However, since the system was trained on data from the company over a 10-year period, and hiring practices during that period were biased, the resulting system further supported said practices. Investigations revealed that the system penalised CVs with words like ‘women’s’, ‘women’s chess club captain’, and rejected graduates of 2 all-women’s colleges.

The big problem with ML

Put bluntly, the big problem with ML is that nobody knows what it’s doing!

The ML phase is very much a black box, we understand the general principles of Artificial Neural Networks (ANNs) but we don’t understand what is actually going on in any detail when it is being applied–for all intents and purposes it’s magic!

You may ask why this is a problem, after all its producing good results so why worry?

The problem is that because we don’t know what is going on, we can’t help fix things when they go wrong and it is difficult to understand the answers we’re given. This has serious implications when it comes to where we can apply ML. If all we’re interested in doing is scanning the web to find pictures of kittens, this is not really a problem. But, if we want to apply ML to control an aircraft, then we really do need to understand what is going on.

This is 1 area where symbolic AI and conventional programming has a massive advantage because, in these situations, we do know how they do what they do. Such approaches are thus transparent. There is a lot of work currently underway in the area of transparency and ML, this is known as ‘explainable AI’, but we are still far from a solution. At best we are able to determine which pixels the machine used to classify a picture of a dog as a dog–but its choice of pixels will be a bit weird to us and certainly not very informative as to how it is working.