Getting Started

We are entering a new age where many organisations will need to incorporate AI in order to remain competitive. The tricky part for many owners and managers in knowing where to start.

It turns out the starting place has nothing to do with AI or machine learning and instead involves what should be familiar areas. You should instead start by looking at your current processes.

Most organisations have significant investment in legacy hardware, software and processes that can’t be replaced overnight. AI machine learning can significantly improve key aspects of your legacy systems. Once other stakeholders see the gains in efficiency, reduced cost and increased competitiveness you will be able to propose more far-reaching changes.

So how do you improve key aspects of your legacy systems? The secret is to start thinking about what things cost your organisation the most. These could be physical things, processes or even use of people.

Some simple examples. While cargo ships are expensive, the largest running cost is fuel and the large financial ‘losses’ are caused by downtime due to preventative maintenance. In health services, we spend a considerable amount on reducing symptoms rather than illness prevention. In the finance industry, many people use primitive ‘gut feeling’ approaches for investing money that can be costly.

Consider what you might do to reduce costs. For example, you might lengthen the time between preventative maintenance if you can better predict when things are likely to fail. In some cases you might even replace preventative maintenance with prognostics (condition based maintenance). In health, you need to concentrate on early detection and illness prevention. In finance, you need to invest in approaches you better understand and hence have a better risk.

Think about what data describes what affects the outcomes of your high cost scenarios. For the cargo ship case, the use of fuel might be affected by routes and speeds. Sensors might detect vibration to aid ship machinery prognostics. In health you might have medical instrument data. In finance, you might have weather data that might, for example, affect investments in (grown) commodities.

The key thing is that, in the past, it has been very difficult for humans to use this data to derive insights. The combinations of data and possible methods are huge. This is where machine learning excels.

In very very simple terms, we pass this (past) data into a neural network during a process called learning. This creates a model, which when fed current data, might for example give ship efficiency, predict machinery failure, assess health or tell you when to buy or sell shares.

In summary, AI machine learning doesn’t require a big bang approach to change in your organisation. Concentrate on the costly problems in your organisation rather than letting the technology lead the innovation.

The Business Case

Physical sensors allow you to collect historical and current information on systems, sub-systems, components and even people. The aggregate of this information provides state information on processes.

This can be in industry, health, hospitality, utilities, education or transportation. Whatever the domain, the goal is to provide actionable alerts that enable intelligent decision-making for improved performance, safety, reliability or maintainability.

Alerts are either diagnostic or prognostic in nature in that they tell you the current status or an anticipated impending situation. They allow you to:

  • Prevent something happening that might be costly or dangerous. For example, in manufacturing, significant damage to manufacturing
    equipment, the products being fabricated or costly downtime. In healthcare, someone is about to fall.
  • Reduce the need for costly preventative manual checking or over-zealous regular replacement. For example, in manufacturing, reducing the time and costs for maintenance of products or processes. In healthcare, reducing the need for wasted human effort monitoring patients who are ok the majority of the time.

The overall aim is to save human effort while also avoiding failure and significant disruptions. Achieving this using traditional algorithmic programming is difficult if not impossible due to:

  • Noise in gathered data and the variance in environmental and operating conditions
  • The possibility of false alarms due to the difficulty with dealing with uncertainties
  • The scarce nature of intermittent events making them difficult to measure and hence predict
  • The complexity of some processes having many process factors
  • The closed nature of some existing systems that are already measuring but the data isn’t accessible
  • The varying nature of scenarios and end-user requirements preventing standard solutions

AI machine learning with auxiliary sensors is ideal for making sense of such complexity.

Machine Learning

Traditional programming uses a step by step program, effectively implementing a set of rules, to produce a particular output:

Traditional programming vs supervised and unsupervised machine learning

Machine learning differs in that it only takes in data to produce a particular output. In the ‘supervised’ case, it also takes in extra data in the form of answers, usually provided by a human, to guide the learning towards the the right output.

Traditional programming has just one stage. You run the program and the input data is transformed to output data. Machine learning has two stages. The inside of the machine learning can be thought of as a large complex filter that selectively processes selected data from input to output. The filter, called a model, starts dumb and needs to taught how to be configured to achieve the output.

The learning stage, that takes in old data, can take a very long time for large amounts of data and/or complex models. When we have a learnt model, we can feed in new data that produces (hopefully) the correct output. It can be seen that machine learning is really a complex pattern matcher.

If we look inside the model, there’s a very large number of interconnected nodes, called neurons. The nodes are separated into layers, one for input, one for output and one or more hidden layers that aren’t accessed externally.

Deep learning is the name for when we have multiple hidden layers. The more layers there are, the more complex the kind of processing that can be achieved.

Inside a machine learning model

The degree of interconnection between the neurons and hence the implicit functioning of the model is controlled by numerical values called weights. Learning is trying of various different weight values to achieve the output, driven by some clever maths, called gradient descent, to zoom in on the correct weights while quickly disregarding ranges of values that are unsuitable.

The very large number of combinations can cause learning to take a very long time. Since the same thing is being done across nodes in the model, it’s possible to use special hardware with thousands of CPUs, such as graphical processors, to perform learning is parallel to significantly improve performance.

Once the weights are known, the model is saved with the weights, usually to a file. This file can loaded to set up the model when a computer/process starts.

The second stage of machine learning is feeding new, current data into the model. This is called inference because it is inferring output from the input. The weights are known and the data quickly processes through the model, usually in tens or hundreds of milliseconds even on low end computers.

Supervised and Unsupervised Learning

In the post machine learning it was explained how supervised learning includes extra information passed into the model, called labels, that classify the input data. For example, human gesture data might include labels saying whether the data is running, walking or jumping. This helps the learning stage narrow in on the best neuron weights to achieve the required output.

It’s usually difficult to obtain labelled data, especially in situations dealing with sensor time series data. Labelling data often has to be performed manually or later added to the data. For example, to create model to infer human gestures from accelerometer data we need to initially record the x,y,x and the gesture at that time. This can be tedious, error prone and open to human bias.

So how does learning on unsupervised data know what to concentrate on to create the best model? It doesn’t. Instead, unsupervised methods concentrate on finding features in the data. Each feature has a numerical value signifying its strength.

Heatmap of 256 features from SensorCognition™ edge device unsupervised model

Going back to the human gesture example, the model might see a common up then down sequence in the x and output this as a feature. The model outputs lots, usually hundreds, of features that might be sub-features, features of interest (e.g. sitting, running) or a mix of features (jumping while running).

There’s usually something, usually simple traditional code, data science processing or more machine learning to turn the output features into detection of the feature of interest (e.g. sitting), detection of anomalies (e.g. falling), classification of the gesture or prediction of the next gesture (e.g. walking after running). A manual way of finding features for detection and classification is to feed in known gestures into inference and see which features fire. This obviously involves human effort but is much less effort that labelling all the supervised input.

Read about advantages of unsupervised learning.

Advantages of Unsupervised Learning

In the post on supervised and unsupervised learning it was mentioned that unsupervised learning doesn’t need labelled data and results in features being detected. Features are patterns in the data.

The main advantage of unsupervised learning is labelled data isn’t required. As labelling usually has to be performed manually this saves a significant amount of time. In some situations, the quantity of the data means it’s not physically possible to manually classify the data.

As the input data isn’t labelled there’s no extra human influence on the input and hence no human error or human bias.

The model detects features in the data that can be sub-features, features of interest or a mix of features. For example human gesture recognition produces features of interest (e.g. walking, running, jumping) sub-features (movement upwards, downwards) and combinations of features (jumping while running).

As sub-features are being detected, the same model can sometimes be used to detect features of interest for which it wasn’t trained on. For example, a human gesture model trained on running, sitting down and walking might detect enough parts of the movement to also allow a combination of features to signify lying down.

As the model isn’t directed (supervised) to find specific things, it can also find hidden features in the data. For example, a model trained to find features of interest in vehicle driving (turning left, turning right, stopped, accelerating, slowing) might inadvertently also detect potholes in the road.

More usefully, unsupervised models can be purposely used to find hidden features in the data that a human can’t correlate and hence detect. For example, it might be used to find:

  • a pattern in a vulnerable person’s movement that indicates they are about to fall,
  • a pattern in complex sensing of an industrial motor that indicates it is about to fail,
  • a hidden pattern in share price data that indicates you should sell.

The above presuppose there’s enough information in the data to detect such things. The latter share selling detection is a case in point as there’s often insufficient detail in financial data for such determination.

This is where domain experts are helpful as they can help direct what might be possible and advise on extra data that might be required. For example, the share selling detection is more likely to work if you added in weather in Columbia to a coffee company share price buy/sell model.

As unsupervised learning looks for features rather than, for supervised learning, specific patterns in the data, it’s more likely an existing pre-learnt model can be re-used in a new domain. For example, a model taught with lots of human gestures (running, walking, jumping) might become expert in movements of an accelerometer and be used for detecting movements of a car (left, right, slowing, speeding up). Re-use of a model can save the considerable time required for the learning part of machine learning.