Getting Started

We are entering a new age where many organisations will need to incorporate AI in order to remain competitive. The tricky part for many owners and managers in knowing where to start.

It turns out the starting place has nothing to do with AI or machine learning and instead involves what should be familiar areas. You should instead start by looking at your current processes.

Most organisations have significant investment in legacy hardware, software and processes that can’t be replaced overnight. AI machine learning can significantly improve key aspects of your legacy systems. Once other stakeholders see the gains in efficiency, reduced cost and increased competitiveness you will be able to propose more far-reaching changes.

So how do you improve key aspects of your legacy systems? The secret is to start thinking about what things cost your organisation the most. These could be physical things, processes or even use of people.

Some simple examples. While cargo ships are expensive, the largest running cost is fuel and the large financial ‘losses’ are caused by downtime due to preventative maintenance. In health services, we spend a considerable amount on reducing symptoms rather than illness prevention. In the finance industry, many people use primitive ‘gut feeling’ approaches for investing money that can be costly.

The next step is to create some goals. Continuing the examples, we need to reduce the use of cargo ship fuel. We might lengthen the time between preventative maintenance if we can better predict when things are likely to fail. In some cases we might even replace preventative maintenance with prognostics (condition based maintenance). In health, we need to concentrate on early detection and illness prevention. In finance, we need to invest in approaches we better understand and hence have a better risk.

Now think about what data describes what affects the outcomes of these scenarios. For the cargo ship case, the use of fuel might be affected by routes and speeds. Sensors might detect vibration to aid ship machinery prognostics. In health we might have medical instrument data. In finance, we might have weather data that might, for example, affect investments in (grown) commodities.

The key thing is that, in the past, it has been very difficult for humans to use this data to derive insights. The combinations of data and possible methods are huge. This is where machine learning excels.

In very very simple terms, we pass this (past) data into a neural network during a process called learning. This creates a model, which when fed current data, might for example give ship efficiency, predict machinery failure, assess health or tell you when to buy or sell shares.

In summary, AI machine learning doesn’t require a big bang approach to change in your organisation. Concentrate on the costly problems in your organisation rather than letting the technology lead the innovation.

Machine Learning

Traditional programming uses a step by step program, effectively implementing a set of rules, to produce a particular output:

Traditional programming vs supervised and unsupervised machine learning

Machine learning differs in that it only takes in data to produce a particular output. In the ‘supervised’ case, it also takes in extra data in the form of answers, usually provided by a human, to guide the learning towards the the right output.

Traditional programming has just one stage. You run the program and the input data is transformed to output data. Machine learning has two stages. The inside of the machine learning can be thought of as a large complex filter that selectively processes selected data from input to output. The filter, called a model, starts dumb and needs to taught how to be configured to achieve the output.

The learning stage, that takes in old data, can take a very long time for large amounts of data and/or complex models. When we have a learnt model, we can feed in new data that produces (hopefully) the correct output. It can be seen that machine learning is really a complex pattern matcher.

If we look inside the model, there’s a very large number of interconnected nodes, called neurons. The nodes are separated into layers, one for input, one for output and one or more hidden layers that aren’t accessed externally.

Deep learning is the name for when we have multiple hidden layers. The more layers there are, the more complex the kind of processing that can be achieved.

Inside a machine learning model

The degree of interconnection between the neurons and hence the implicit functioning of the model is controlled by numerical values called weights. Learning is trying of various different weight values to achieve the output, driven by some clever maths, called gradient descent, to zoom in on the correct weights while quickly disregarding ranges of values that are unsuitable.

The very large number of combinations can cause learning to take a very long time. Since the same thing is being done across nodes in the model, it’s possible to use special hardware with thousands of CPUs, such as graphical processors, to perform learning is parallel to significantly improve performance.

Once the weights are known, the model is saved with the weights, usually to a file. This file can loaded to set up the model when a computer/process starts.

The second stage of machine learning is feeding new, current data into the model. This is called inference because it is inferring output from the input. The weights are known and the data quickly processes through the model, usually in tens or hundreds of milliseconds even on low end computers.

Supervised and Unsupervised Learning

In the post machine learning it was explained how supervised learning includes extra information passed into the model, called labels, that classify the input data. For example, human gesture data might include labels saying whether the data is running, walking or jumping. This helps the learning stage narrow in on the best neuron weights to achieve the required output.

It’s usually difficult to obtain labelled data, especially in situations dealing with sensor time series data. Labelling data often has to be performed manually or later added to the data. For example, to create model to infer human gestures from accelerometer data we need to initially record the x,y,x and the gesture at that time. This can be tedious, error prone and open to human bias.

So how does learning on unsupervised data know what to concentrate on to create the best model? It doesn’t. Instead, unsupervised methods concentrate on finding features in the data. Each feature has a numerical value signifying its strength.

Heatmap of 256 features from SensorCognition™ edge device unsupervised model

Going back to the human gesture example, the model might see a common up then down sequence in the x and output this as a feature. The model outputs lots, usually hundreds, of features that might be sub-features, features of interest (e.g. sitting, running) or a mix of features (jumping while running).

There needs to something, usually traditional code, data science processing or more machine learning to turn the output features into detection of the feature of interest (e.g. sitting), classification of the gesture or prediction of the next gesture. A manual way of finding features for detection and classification is to feed in known gestures into inference and see which features fire. This obviously involves human effort but is much less effort that labelling all the supervised input.

Read about advantages of unsupervised learning.

Advantages of Unsupervised Learning

In the post on supervised and unsupervised learning it was mentioned that unsupervised learning doesn’t need labelled data and results in features being detected. Features are patterns in the data.

The main advantage of unsupervised learning is labelled data isn’t required. As labelling usually has to be performed manually this saves a significant amount of time. In some situations, the quantity of the data means it’s not physically possible to manually classify the data.

As the input data isn’t labelled there’s no extra human influence on the input and hence no human error or human bias.

The model detects features in the data that can be sub-features, features of interest or a mix of features. For example human gesture recognition produces features of interest (e.g. walking, running, jumping) sub-features (movement upwards, downwards) and combinations of features (jumping while running).

As sub-features are being detected, the same model can sometimes be used to detect features of interest for which it wasn’t trained on. For example, a human gesture model trained on running, sitting down and walking might detect enough parts of the movement to also allow a combination of features to signify lying down.

As the model isn’t directed (supervised) to find specific things, it can also find hidden features in the data. For example, a model trained to find features of interest in vehicle driving (turning left, turning right, stopped, accelerating, slowing) might inadvertently also detect potholes in the road.

More usefully, unsupervised models can be purposely used to find hidden features in the data that a human can’t correlate and hence detect. For example, it might be used to find:

  • a pattern in a vulnerable person’s movement that indicates they are about to fall,
  • a pattern in complex sensing of an industrial motor that indicates it is about to fail,
  • a hidden pattern in share price data that indicates you should sell.

The above presuppose there’s enough information in the data to detect such things. The latter share selling detection is a case in point as there’s often insufficient detail in financial data for such determination.

This is where domain experts are helpful as they can help direct what might be possible and advise on extra data that might be required. For example, the share selling detection is more likely to work if you added in weather in Columbia to a coffee company share price buy/sell model.

As unsupervised learning looks for features rather than, for supervised learning, specific patterns in the data, it’s more likely an existing pre-learnt model can be re-used in a new domain. For example, a model taught with lots of human gestures (running, walking, jumping) might become expert in movements of an accelerometer and be used for detecting movements of a car (left, right, slowing, speeding up). Re-use of a model can save the considerable time required for the learning part of machine learning.

Unsupervised Machine Learning Methods

Unsupervised learning extracts features from data without any human provided classification hints. Most methods involve cluster analysis and grouping based on Euclidean or probabilistic distance. Techniques include hierarchical clustering, k-Means clustering, Gaussian and self-organising maps. Long Short-Term Memory (LSTM) and Convolutional Neural Networks (CNN) and combinations thereof are particularly useful for detecting historical patterns in data.

The large choice of techniques and configurable parameters, called hyperparameters, for each of the techniques means without some experience it takes a long time to stumble upon the optimum ways of working with time series data. Given that the learning parts of machine learning can, in some instances, take days or months just to try out one method with one set of hyperparameters, starting from scratch usually isn’t viable. The number of research permutations is too large.

To complicate matters, techniques also have to be assessed not just on accuracy but also on how easily they can be implemented in real rather than ‘desk’ situation. We cover this more this in Focus on Practical Matters.

SensorCognition™ bypasses these problems as it encapsulates our knowledge of the set of techniques most suitable for use with time series data. Our Edge device provides a ready made system for data capture suitable both for prototyping and actual real use.