Machine learning methods
A look into some of the many different methods used in machine learning (ML) - part of a growing list.
To give you some idea of the sheer number of methods that are employed in machine learning (ML) here is an incomplete list:
-
Auto Encoder (AE)
-
Boltzman Machine (BM)
-
Convolutional Neural Net (CNN)
-
Decision Trees (DT)
-
Deconvolutional Network (DN)
-
Deep Belief Network (DBN)
-
Deep Convolutional Network (DCN)
-
Deep Convolutional Inverse Graphics Network (DCIGN)
-
Deep Feed Forward (DFF)
-
Deep Q-networks
-
Deep Residual Network (DRN)
-
Denoising Auto Encoder (DAE)
-
Echo State Network (ESN)
-
Extreme Learning Machine (ELM)
-
Feed Forward (FF)
-
Gated Recurrent Unit (GRU)
-
Generative Adversarial Network (GAN)
-
Hopfield Network (HN)
-
Kohonen Network (KN)
-
Liquid State Machine (LSM)
-
Long/Short Term Memory (LSTM)
-
Markov Chain (MC)
-
Neural Turing Machine (NTM)
-
Perceptron
-
Radial Basis Network (RBF)
-
Random Forest (RF)
-
Recurrent Neural Network (RNN)
-
Restricted Boltzmann Machine (RBM)
-
Sparse Auto Encoder (SAE)
-
Support Vector Machine (SVM)
-
Variational Auto Encoder (VAE)
-
… and so on.
And, of course, this list is growing all the time. These are just the methods. The number of tools, whether commercial or open source, that use them is much greater.
Many of the listed methods fall into a class of methods known as Artificial Neural Networks (ANNs) as they are based on the principles under which our own brains are wired. ANNs are the most common form for ML methods nowadays and the ones where most progress has been made.
Structurally an ANN comprises a number of layers of nodes (the neurons) each connected to nodes in the next layer. The first layer is known as the input layer, the last is the output layer and the intermediate layers are hidden layers. Crudely, the way the neural networks work is that each connection between neurons is weighted and these weights are adjusted until the desired output matches that of the input. Each output neuron represents a probability as to how well the output matches the input.
In the diagram the ANN will be trained by showing it a series of happy, sad and neutral faces with no 2 faces the same. Each face will adjust the weights in some way until we get to a situation where presenting a happy face will result in the output registering it as happy.
Deep learning methods are a family of ANNs which have been consistently obtaining state-of-the-art results in most ML tasks since their development. Although deep learning networks can be used in unsupervised and reinforcement learning, they are generally used more frequently in supervised learning problems.
They work by using a succession of multiple layers where each set of layers effectively acts as a neural network its own right. So for example if we have an image processing problem, the first set of layers might take as input the raw pixel data and output basic shapes such as lines of various shapes. The next layer may take these and output specific shapes such as circles and rectangles and so on until the output comprises different types of vehicle.
Depending on your type of input data and the type of problem you want to solve, you might find some types of deep networks better than others.
Tabular datasets
Not that many deep networks can be used for tabular datasets because most popular networks use spatial information (for example, which attributes are next to each other) in order to extract features automatically. If you encounter tabular data, basic neural networks will be your best initial dataset.
Audio, video, temporal datasets
If you have temporal information (for example, data that varies through time) recurrent neural networks (RNNs), and long-short-term memory (LSTM) networks are 2 of the best candidates.
Image datasets
Most of the advances in deep learning have been done for image datasets. If you have images that you want to classify, segment or track, good deep learning candidates include convolutional neural networks (CNNs).
In order to appreciate just how complex things are getting, here are explanations for two popular methods in more detail than we have used elsewhere. The point is not to understand the method as such - that is a bonus. The point is to understand just how complex and specialised the detail actually is.
Before the detail, the long story short
CNNs are ANNs on steroids.
LSTM are ANNs on steroids with a time machine.
CNNs
Mostly applied to visual datasets and in natural language processing, CNNs are one of the most popular deep networks. CNNs consist of an input layer, multiple hidden layers, and an output layer.
The input layer is generally the image (or images) in your training set. The hidden layers mostly are convolutional layers, rectified linear unit (ReLu) layers, pooling layers, and fully connected layers (gasp!).
Convolutional layers, the core block of a CNN, learn filters that activate when specific features are recognised somewhere in the image. Pooling layers are sub-sampling layers that partition the input image into non-overlapping windows and output the maximum of each region. ReLu layers are used to remove negative values from an activation map by setting them to zero. Fully connected layers are where the classification is actually done. They learn which activations relate to which classes.
Finally, you will have an output layer which will return your prediction, and a loss layer, which specifies how training penalises the deviation between the prediction and the actual true output. Depending on your problem, you can choose different loss layers. Softmax loss, for example, is used for predicting a single class out of N mutually exclusive classes, while Euclidean loss is used for predicting (regressing) continuous values.
CNNs are the base of many other methods, such as Deep Q-networks (used in Reinforcement Learning), Fast RCNN, Fully convolutional neural networks, amongst others.
We did say it was complicated!
LSTM
LSTMs are a variation of RNNs, a type of deep network suited, not only for single data points (such as images or tabular data), but also sequences of data (such as video or speech). While they were introduced in 1997, making them relatively old in ML terms, they still maintain state-of-the-art results in very popular ML problems involving human action recognition and speech recognition. They have also been used in some unusual problems such as automatic music composition and caption generation for film and TV.
LSTMs are composed of:
-
input gates
-
output gates
-
cells
-
forget gates
A cell is able to remember past values at different time intervals, and the forget gates control the flow of information in and out of the cell (for example, what needs to be remembered and when it needs to be applied).
Low-shot learning
Due to the need of networks to first learn the features of your data, deep learning networks have traditionally needed vast quantities of data. Current state of the art results in image recognition and audio classification use millions of labelled data points, which is not always achievable.
As a response to these impossible requirements with regards to getting data, a new area on deep learning, called low or few-shot learning, has been gaining attention in the last few years. Low-shot learning aims to automatically learn features from datasets in which each class has very few samples, generally 20, 10, 5 or even 1. Results are still not able to improve those from networks trained in millions of samples, but they are still very competitive.