Supervised and Unsupervised Learning

In the post machine learning it was explained how supervised learning includes extra information passed into the model, called labels, that classify the input data. For example, human gesture data might include labels saying whether the data is running, walking or jumping. This helps the learning stage narrow in on the best neuron weights to achieve the required output.

It’s usually difficult to obtain labelled data, especially in situations dealing with sensor time series data. Labelling data often has to be performed manually or later added to the data. For example, to create model to infer human gestures from accelerometer data we need to initially record the x,y,x and the gesture at that time. This can be tedious, error prone and open to human bias.

So how does learning on unsupervised data know what to concentrate on to create the best model? It doesn’t. Instead, unsupervised methods concentrate on finding features in the data. Each feature has a numerical value signifying its strength.

Heatmap of 256 features from SensorCognition™ edge device unsupervised model

Going back to the human gesture example, the model might see a common up then down sequence in the x and output this as a feature. The model outputs lots, usually hundreds, of features that might be sub-features, features of interest (e.g. sitting, running) or a mix of features (jumping while running).

There’s usually something, usually simple traditional code, data science processing or more machine learning to turn the output features into detection of the feature of interest (e.g. sitting), detection of anomalies (e.g. falling), classification of the gesture or prediction of the next gesture (e.g. walking after running). A manual way of finding features for detection and classification is to feed in known gestures into inference and see which features fire. This obviously involves human effort but is much less effort that labelling all the supervised input.

Read about advantages of unsupervised learning.

Advantages of Unsupervised Learning

In the post on supervised and unsupervised learning it was mentioned that unsupervised learning doesn’t need labelled data and results in features being detected. Features are patterns in the data.

The main advantage of unsupervised learning is labelled data isn’t required. As labelling usually has to be performed manually this saves a significant amount of time. In some situations, the quantity of the data means it’s not physically possible to manually classify the data.

As the input data isn’t labelled there’s no extra human influence on the input and hence no human error or human bias.

The model detects features in the data that can be sub-features, features of interest or a mix of features. For example human gesture recognition produces features of interest (e.g. walking, running, jumping) sub-features (movement upwards, downwards) and combinations of features (jumping while running).

As sub-features are being detected, the same model can sometimes be used to detect features of interest for which it wasn’t trained on. For example, a human gesture model trained on running, sitting down and walking might detect enough parts of the movement to also allow a combination of features to signify lying down.

As the model isn’t directed (supervised) to find specific things, it can also find hidden features in the data. For example, a model trained to find features of interest in vehicle driving (turning left, turning right, stopped, accelerating, slowing) might inadvertently also detect potholes in the road.

More usefully, unsupervised models can be purposely used to find hidden features in the data that a human can’t correlate and hence detect. For example, it might be used to find:

  • a pattern in a vulnerable person’s movement that indicates they are about to fall,
  • a pattern in complex sensing of an industrial motor that indicates it is about to fail,
  • a hidden pattern in share price data that indicates you should sell.

The above presuppose there’s enough information in the data to detect such things. The latter share selling detection is a case in point as there’s often insufficient detail in financial data for such determination.

This is where domain experts are helpful as they can help direct what might be possible and advise on extra data that might be required. For example, the share selling detection is more likely to work if you added in weather in Columbia to a coffee company share price buy/sell model.

As unsupervised learning looks for features rather than, for supervised learning, specific patterns in the data, it’s more likely an existing pre-learnt model can be re-used in a new domain. For example, a model taught with lots of human gestures (running, walking, jumping) might become expert in movements of an accelerometer and be used for detecting movements of a car (left, right, slowing, speeding up). Re-use of a model can save the considerable time required for the learning part of machine learning.

Unsupervised Machine Learning Methods

Unsupervised learning extracts features from data without any human provided classification hints. Most methods involve cluster analysis and grouping based on Euclidean or probabilistic distance. Techniques include hierarchical clustering, k-Means clustering, Gaussian and self-organising maps. Long Short-Term Memory (LSTM) and Convolutional Neural Networks (CNN) and combinations thereof are particularly useful for detecting historical patterns in data.

The large choice of techniques and configurable parameters, called hyperparameters, for each of the techniques means without some experience it takes a long time to stumble upon the optimum ways of working with time series data. Given that the learning parts of machine learning can, in some instances, take days or months just to try out one method with one set of hyperparameters, starting from scratch usually isn’t viable. The number of research permutations is too large.

To complicate matters, techniques also have to be assessed not just on accuracy but also on how easily they can be implemented in real rather than ‘desk’ situation. We cover this more this in Focus on Practical Matters.

SensorCognition™ bypasses these problems as it encapsulates our knowledge of the set of techniques most suitable for use with time series data. Our Edge device provides a ready made system for data capture suitable both for prototyping and actual real use.