Firemark Singapore Thought Leadership Blog – Shen Lu
Ever since the inception of the idea of mimicking the ways neurons existing inside animal brains connect and interact using mathematical models, which we usually call Artificial Neural Networks, and implementation of those models in computer language in the 60s, we have seen great efforts in extending this idea in both theory and application.
In theoretical level, it is a journey from shallow to deep.
The first ANN contains a single layer of neurons between its input and output layers, while the modern day ANNs consist of hundreds of layers with millions of neurons.
Parallel to the theoretic development, in application level, it is a bottom-up journey from the success of approximating basic non-linear functions, all the way to approach or exceeding human-level accuracy in recognising natural images, Go game, etc., just to name a few.
In the recent years, the growing number of ANN-based Computer Vision (CV) applications are direct results of the boom of the Graphics Processing Unit (GPU) computing technology, which renders training deep ANNs containing millions of neurons possible.
This motivates the development of multiple deep ANNs with increasing number of layers. The depth of ANNs, however, is not the sole factor contributing to the success of these applications.
A novel operation used for connecting layers inside modern ANNs, the so-called convolution operation, is another key contributor. This is the main reason why modern ANNs are usually dubbed as CNNs – Convolutional Neural Networks.
CNNs extract, from images, the so-called hierarchical features, where lower levels (lower CNN layers) capture the fine details of objects inside an image while higher levels (higher CNN layers) capture the high-level representations of the objects. These hierarchical features resemble closely the way human vision system works when we look at an image and at the same time recognise objects on the image.