Traditional programming uses a step by step program, effectively implementing a set of rules, to produce a particular output:
Machine learning differs in that it only takes in data to produce a particular output. In the ‘supervised’ case, it also takes in extra data in the form of answers, usually provided by a human, to guide the learning towards the the right output.
Traditional programming has just one stage. You run the program and the input data is transformed to output data. Machine learning has two stages. The inside of the machine learning can be thought of as a large complex filter that selectively processes selected data from input to output. The filter, called a model, starts dumb and needs to taught how to be configured to achieve the output.
The learning stage, that takes in old data, can take a very long time for large amounts of data and/or complex models. When we have a learnt model, we can feed in new data that produces (hopefully) the correct output. It can be seen that machine learning is really a complex pattern matcher.
If we look inside the model, there’s a very large number of interconnected nodes, called neurons. The nodes are separated into layers, one for input, one for output and one or more hidden layers that aren’t accessed externally.
Deep learning is the name for when we have multiple hidden layers. The more layers there are, the more complex the kind of processing that can be achieved.
The degree of interconnection between the neurons and hence the implicit functioning of the model is controlled by numerical values called weights. Learning is trying of various different weight values to achieve the output, driven by some clever maths, called gradient descent, to zoom in on the correct weights while quickly disregarding ranges of values that are unsuitable.
The very large number of combinations can cause learning to take a very long time. Since the same thing is being done across nodes in the model, it’s possible to use special hardware with thousands of CPUs, such as graphical processors, to perform learning is parallel to significantly improve performance.
Once the weights are known, the model is saved with the weights, usually to a file. This file can loaded to set up the model when a computer/process starts.
The second stage of machine learning is feeding new, current data into the model. This is called inference because it is inferring output from the input. The weights are known and the data quickly processes through the model, usually in tens or hundreds of milliseconds even on low end computers.