Machine Learning

Traditional programming uses a step by step program, effectively implementing a set of rules, to produce a particular output:

Traditional programming vs supervised and unsupervised machine learning

Machine learning differs in that it only takes in data to produce a particular output. In the ‘supervised’ case, it also takes in extra data in the form of answers, usually provided by a human, to guide the learning towards the the right output.

Traditional programming has just one stage. You run the program and the input data is transformed to output data. Machine learning has two stages. The inside of the machine learning can be thought of as a large complex filter that selectively processes selected data from input to output. The filter, called a model, starts dumb and needs to taught how to be configured to achieve the output.

The learning stage, that takes in old data, can take a very long time for large amounts of data and/or complex models. When we have a learnt model, we can feed in new data that produces (hopefully) the correct output. It can be seen that machine learning is really a complex pattern matcher.

If we look inside the model, there’s a very large number of interconnected nodes, called neurons. The nodes are separated into layers, one for input, one for output and one or more hidden layers that aren’t accessed externally.

Deep learning is the name for when we have multiple hidden layers. The more layers there are, the more complex the kind of processing that can be achieved.

Inside a machine learning model

The degree of interconnection between the neurons and hence the implicit functioning of the model is controlled by numerical values called weights. Learning is trying of various different weight values to achieve the output, driven by some clever maths, called gradient descent, to zoom in on the correct weights while quickly disregarding ranges of values that are unsuitable.

The very large number of combinations can cause learning to take a very long time. Since the same thing is being done across nodes in the model, it’s possible to use special hardware with thousands of CPUs, such as graphical processors, to perform learning is parallel to significantly improve performance.

Once the weights are known, the model is saved with the weights, usually to a file. This file can loaded to set up the model when a computer/process starts.

The second stage of machine learning is feeding new, current data into the model. This is called inference because it is inferring output from the input. The weights are known and the data quickly processes through the model, usually in tens or hundreds of milliseconds even on low end computers.

Advantages of Unsupervised Learning

In the post on supervised and unsupervised learning it was mentioned that unsupervised learning doesn’t need labelled data and results in features being detected. Features are patterns in the data.

The main advantage of unsupervised learning is labelled data isn’t required. As labelling usually has to be performed manually this saves a significant amount of time. In some situations, the quantity of the data means it’s not physically possible to manually classify the data.

As the input data isn’t labelled there’s no extra human influence on the input and hence no human error or human bias.

The model detects features in the data that can be sub-features, features of interest or a mix of features. For example human gesture recognition produces features of interest (e.g. walking, running, jumping) sub-features (movement upwards, downwards) and combinations of features (jumping while running).

As sub-features are being detected, the same model can sometimes be used to detect features of interest for which it wasn’t trained on. For example, a human gesture model trained on running, sitting down and walking might detect enough parts of the movement to also allow a combination of features to signify lying down.

As the model isn’t directed (supervised) to find specific things, it can also find hidden features in the data. For example, a model trained to find features of interest in vehicle driving (turning left, turning right, stopped, accelerating, slowing) might inadvertently also detect potholes in the road.

More usefully, unsupervised models can be purposely used to find hidden features in the data that a human can’t correlate and hence detect. For example, it might be used to find:

  • a pattern in a vulnerable person’s movement that indicates they are about to fall,
  • a pattern in complex sensing of an industrial motor that indicates it is about to fail,
  • a hidden pattern in share price data that indicates you should sell.

The above presuppose there’s enough information in the data to detect such things. The latter share selling detection is a case in point as there’s often insufficient detail in financial data for such determination.

This is where domain experts are helpful as they can help direct what might be possible and advise on extra data that might be required. For example, the share selling detection is more likely to work if you added in weather in Columbia to a coffee company share price buy/sell model.

As unsupervised learning looks for features rather than, for supervised learning, specific patterns in the data, it’s more likely an existing pre-learnt model can be re-used in a new domain. For example, a model taught with lots of human gestures (running, walking, jumping) might become expert in movements of an accelerometer and be used for detecting movements of a car (left, right, slowing, speeding up). Re-use of a model can save the considerable time required for the learning part of machine learning.

Unsupervised Machine Learning Methods

Unsupervised learning extracts features from data without any human provided classification hints. Most methods involve cluster analysis and grouping based on Euclidean or probabilistic distance. Techniques include hierarchical clustering, k-Means clustering, Gaussian and self-organising maps. Long Short-Term Memory (LSTM) and Convolutional Neural Networks (CNN) and combinations thereof are particularly useful for detecting historical patterns in data.

The large choice of techniques and configurable parameters, called hyperparameters, for each of the techniques means without some experience it takes a long time to stumble upon the optimum ways of working with time series data. Given that the learning parts of machine learning can, in some instances, take days or months just to try out one method with one set of hyperparameters, starting from scratch usually isn’t viable. The number of research permutations is too large.

To complicate matters, techniques also have to be assessed not just on accuracy but also on how easily they can be implemented in real rather than ‘desk’ situation. We cover this more this in Focus on Practical Matters.

SensorCognition™ bypasses these problems as it encapsulates our knowledge of the set of techniques most suitable for use with time series data. Our Edge device provides a ready made system for data capture suitable both for prototyping and actual real use.

Detect, Classify and Predict

Machine learning finds and uses insights in data that are difficult or impossible to identify by humans and are tricky to solve using conventional algorithmic programming. Problems able to be solved are wide-ranging and might be, for example, scheduling difficulties, maintenance issues, process problems, product quality issues or even worker related.

There are broadly four main ways machine learning can help. The first is detection which is taking in, usually complex, input data and determining if something has just happened. For example, you might detect a motor has failed in a predictable way by processing its vibration.

The second is classification where you want to know the type of something that has just happened. For the motor, you might want to know if it’s running at high, medium or low speed based on a non-contact rotational proximity sensor.

Related to both detection and classification is anomaly detection. This is knowing something isn’t behaving as normal. For a motor, this might be detecting a motor has failed in an unpredictable way by processing its vibration.

The final capability is prediction which is where a pattern in the data means something is about to happen. In the case of the motor, a subtle sound or vibration change might signify the motor is about to fail.

In terms of your business, the previously mentioned business problems need to be mapped onto detection, classification or prediction. We help you with this as part of the machine learning development process.

Shape Classification Demonstration

We have created a demonstration of using SensorCognition™ to classify shapes drawn by an accelerometer sensor.

Best viewed full screen.

This example is simplistic so as to demonstrate the concepts and workflow. In practice, analysis of the output features requires more sophisticated analysis. The features can also be analysed to detect anomalies and predict.