September 21, 2020

ML misconceptions (2): NNs are not a weak form of statistics

by Sam Sandqvist

The are a lot of misconceptions regarding neural networks. This article, and subsequent ones on the topic will present the major ones one need to take into account.(1)

Blog 9-1

Neural networks consist of layers of interconnected nodes called perceptrons and resemble multiple linear regression. The difference between a multiple linear regression and a perceptron is that a perceptron feeds the signal generated by a multiple linear regression into an activation function which may or may not be non-linear.

In a multi layered perceptron (MLP) perceptrons are arranged into layers and layers are connected with other another. In the MLP there are three types of layers:

  • the input layer receives input patterns
  • the output layer could contain a list of classifications or output signals to which those input patterns may map.

Hidden layers adjust the weightings on those inputs until the error of the neural network is minimised.

One interpretation of this is that the hidden layers extract salient features in the input data which have predictive power with respect to the outputs.

Blog 9-2

One of the crucial as aspects of neural networks, perceptrons, is the activation function. It is used in every node to determine the output of the node from its inputs.

Blog 9-3

The activation function may be linear or non-linear. Some of the common ones are shown in the figure.

The most common one is the sigmoid function (d).

As shown in the image below perceptrons are organised into layers.

The first layer or perceptrons, called the input layer, receives the patterns in the input set. The last layer maps to the expected outputs for those patterns. An example of this is that the patterns may be a list of quantities for different technical indicators regarding, for instance, a security and the potential outputs may be the categories.

A hidden layer is one which receives as inputs the outputs from another layer; and for which the outputs form the inputs into yet another layer. So what do these hidden layers do? One interpretation is that they extract salient features in the input data which have predictive power with respect to the outputs. This is called feature extraction and in a way it performs a similar function to statistical techniques such as principal component analysis.

Deep neural networks have a large number of hidden layers and are able to extract much deeper features from the data. Recently, deep neural networks have performed particularly well for image recognition problems.

Learning rules

As mentioned previously the objective of the neural network is to minimise some measure of error. The most common measure of error is sum-squared-error (SSE) although this metric is sensitive to outliers and may be less appropriate than just tracking error (e.g., in the context of financial markets).

We can use an optimisation algorithm to adjust the weights in the neural network. The most common learning algorithm for neural networks is the gradient descent algorithm although other and potentially better optimisation algorithms can be used. Gradient descent works by calculating the partial derivative of the error with respect to the weights for each layer in the neural network and then moving in the opposite direction to the gradient (because we want to minimise the error of the neural network). By minimising the error we maximise the performance of the neural network in-sample.

The learning rate controls how quickly or slowly the neural network converges.

Despite what some of the statisticians believe, neural networks are not just a "weak form of statistics for lazy analysts”, deployed as a black box that does everything.

Instead neural networks represent an abstraction of solid statistical techniques which date back hundreds of years.

Some practitioners like to treat neural networks as a "black box" which can be thrown at any problem without first taking the time to understand the nature of the problem and whether or not neural networks are an appropriate choice. This is something that we will return to later; NNs are not a panacea.

An example of this is the use of neural networks for trading; markets are dynamic yet neural networks assume the distribution of input patterns remains stationary over time.

(1) The inspiration for the misconceptions is adapted from an article by Stuart Reid from 8 May 2014 available at

Sam Sandqvist

Sam Sandqvist

Dr Sam Sandqvist is our in-house Artificial Intelligence Guru. He holds a Dr. Sc. in Artificial Intelligence and is a published author. He is specialized in AI Theory, AI Models and Simulations. He also has industry experience in FinServ, Sales and Marketing.