October 24, 2020

ML misconceptions (7): neural networks cannot be trained on any data

by Sam Sandqvist

The are a lot of misconceptions regarding neural networks. This article, and subsequent ones on the topic will present the major ones one need to take into account.(1)

Blog 14-1
One of the biggest reasons why neural networks may not work is because people do not properly pre-process the data being fed into the neural network. Raw data is called raw for a reason: it needs to be cooked in order to be palatable for your neural network.

As can be seen in the picture above, preprocessing your data prior to applying a learning algorithm is crucial. This includes data normalisation, removal of redundant information, and outlier removal which all should be performed to improve the probability of good neural network learning capability and at the end, performance. This also includes checks on data consistency: for example, string data where numeric is expected; missing or null data where something is expected, and so on.

An important point is that whatever preprocessing is executed on the training data, exactly the same must be done with the real data. This is not a training issue, but a data cleaning one.

Let’s treat these in order.

Data normalisation

Neural networks consist of various layers of perceptrons linked together by weighted connections. Each perceptron contains an activation function which each have an 'active range' (except for radial basis functions).

Inputs into the neural network need to be scaled within this range so that the neural network is able to differentiate between different input patterns.

For example, given a neural network trading system which receives indicators about a set of securities as inputs and outputs whether each security should be bought or sold. One of the inputs is the price of the security and we are using the sigmoid activation function. However, in this particular case, most of the securities cost between €5 and €15 per share and the output of the sigmoid function approaches 1.0. So the output of the sigmoid function will be be 1.0 for all securities, all of the perceptrons will 'fire' and the neural network will not learn. So this raw data is inappropriate, and needs to be normalised: typically we either normalise to the range [0..1] or [-1..1].

Outlier removal

An outlier is value that is much smaller or larger than most of the other values in some set of data. Outliers can cause problems with statistical techniques like regression analysis and curve fitting because when the model tries to accommodate the outlier performance of the model across all other data deteriorates.

The image above shows that trying to accommodate an outlier into the linear regression model results in a poor fits of the data set. The regression coefficient with the outlier is 0.4, but without it is 0.7 - a huge difference. The effect of outliers on non-linear regression models, including neural networks, is similar. Therefore it is good practice is to remove outliers from the training data set.

That said, identifying outliers is a challenge in and of itself.

Blog 14-2

Redundancy removal

When two or more of the independent variables being fed into the neural network are highly correlated (multiple colinearity, or multicollinearity) this can negatively affect the neural network’s learning ability.

Highly correlated inputs also mean that the amount of unique information presented by each variable is small, so the less significant input can be removed.

Another benefit to removing redundant variables is faster training times. Adaptive neural networks can be used to prune redundant connections and perceptrons.

The figure below shows how a network’s redundancy is removed, i.e., the network is pruned.

  1. Randomly initialise a neural network
  2. Train the network until it converges
  3. Prune a fraction of the network
  4. Reset the weights of the remaining network nodes to initial values
  5. Train the pruned, untrained network and observe convergence and accuracy
Blog 14-3

So the conclusion is clear: a lot of work has to be accomplished prior to even starting training the neural network.

(1) The inspiration for the misconceptions is adapted from an article by Stuart Reid from 8 May 2014 available at http://www.turingfinance.com/misconceptions-about-neural-networks/.

Sam Sandqvist
AUTHOR

Sam Sandqvist

Dr Sam Sandqvist is our in-house Artificial Intelligence Guru. He holds a Dr. Sc. in Artificial Intelligence and is a published author. He is specialized in AI Theory, AI Models and Simulations. He also has industry experience in FinServ, Sales and Marketing.