The are a lot of misconceptions regarding neural networks. This article, and subsequent ones on the topic will present the major ones one need to take into account.(1)

### Neural networks consist of layers of interconnected nodes called perceptrons and resemble
**multiple linear regression**. The difference between a multiple linear
regression and a perceptron is that a perceptron feeds the signal generated by a multiple
linear regression into an activation function which may or may not be non-linear.

In a multi layered perceptron (MLP) perceptrons are arranged into layers and layers are connected with other another. In the MLP there are three types of layers:

- the input layer receives input patterns
- the output layer could contain a list of classifications or output signals to which those input patterns may map.

Hidden layers adjust the weightings on those inputs until the error of the neural network is minimised.

One interpretation of this is that the hidden layers extract salient features in the input data which have predictive power with respect to the outputs.

One of the crucial as aspects of neural networks, perceptrons, is the *activation
function*. It is used in every node to determine the output of the node from its
inputs.

The activation function may be linear or non-linear. Some of the common ones are shown in the figure.

The most common one is the sigmoid function (d).

As shown in the image below perceptrons are organised into layers.

The first layer or perceptrons, called the input layer, receives the patterns in the input set. The last layer maps to the expected outputs for those patterns. An example of this is that the patterns may be a list of quantities for different technical indicators regarding, for instance, a security and the potential outputs may be the categories.

A hidden layer is one which receives as inputs the outputs from another layer; and for
which the outputs form the inputs into yet another layer. So what do these hidden
layers do? One interpretation is that they extract **salient** features in the
input data which have predictive power with respect to the outputs. This is called feature
extraction and in a way it performs a similar function to statistical
techniques such as principal component analysis.

**Deep** neural networks have a large number of hidden layers and are able
to extract much deeper features from the data. Recently, deep neural networks have performed
particularly well for image recognition problems.

**Learning rules**

As mentioned previously the objective of the neural network is to **minimise some
measure of error**. The most common measure of error is
**sum-squared-error** (SSE) although this metric is sensitive to outliers and
may be less appropriate than just tracking error (e.g., in the context of financial
markets).

We can use an optimisation algorithm to adjust the weights in the neural network. The most
common learning algorithm for neural networks is the **gradient descent**
algorithm although other and potentially better optimisation algorithms can be
used. Gradient descent works by calculating the partial derivative of the error with
respect to the weights for each layer in the neural network and then moving in the opposite
direction to the gradient (because we want to minimise the error of the neural network). By
minimising the error we maximise the performance of the neural network in-sample.

The **learning rate** controls how quickly or slowly the neural network
converges.

Despite what some of the statisticians believe, neural networks are not just a "weak form of
statistics for lazy analysts”, deployed as a **black box **that does
everything.

Instead neural networks represent an abstraction of solid **statistical**
techniques which date back hundreds of years.

Some practitioners like to treat neural networks as a "black box" which can be thrown at any
problem without first taking the time to understand the nature of the problem and whether or
not neural networks are an **appropriate** choice. This is something that we
will return to later; NNs are not a panacea.

An example of this is the use of neural networks for trading; markets are dynamic yet neural networks assume the distribution of input patterns remains stationary over time.

(1) The inspiration for the misconceptions is adapted from an article by Stuart Reid from 8 May 2014 available at http://www.turingfinance.com/misconceptions-about-neural-networks/.