Multilayer perceptrons

Download Report

Transcript Multilayer perceptrons

Artificial Intelligence Techniques
Multilayer Perceptrons
Overview




The multi-layered perceptron
Back-propagation
Introduction to training
Uses
Pattern space - linearly
separable
X2
X1
Non-linearly separable
problems



If a problem is not linearly separable,
then it is impossible to divide the
pattern space into two regions
A network of neurons is needed
Until fairly recently, it was not known
how to train a multi-layered network
Pattern space - non linearly
separable
X2
Decision surface
X1
The multi-layered perceptron
(MLP)
Input layer
Hidden layer
Output layer
Complex decision surface



The MLP has the ability to emulate any
function using one hidden layer with a
sigmoid function, and a linear output
layer
A 3-layered network can therefore
produce any complex decision surface
However, the number of neurons in the
hidden layer cannot be calculated
The multi-layered perceptron
(MLP)
Input layer
Hidden layer
Output layer
Network architecture




All neurons in one layer are connected to all
neurons in the next layer
The network is a feedforward network, so all
data flows from the input to the output
The architecture of the network shown is
described as 3:4:2
All neurons in the hidden and output layers
have a bias connection
Input layer




Receives all of the inputs
Number of neurons equals the number
of inputs
Does no processing
Connects to all the neurons in the
hidden layer
Hidden layer





Could be more than one layer, but theory
says that only one layer is necessary
The number of neurons is found by
experiment
Processes the inputs
Connects to all neurons in the output layer
The output is a sigmoid function
Output layer




Produces the final outputs
Processes the outputs from the hidden
layer
The number of neurons equals the
number of outputs
The output could be linear or sigmoid
Problems with networks


Originally the neurons had a hardlimiter on the output
Although an error could be found
between the desired output and the
actual output, which could be used to
adjust the weights in the output layer,
there was no way of knowing how to
adjust the weights in the hidden layer
The invention of backpropagation

By introducing a smoothly changing
output function, it was possible to
calculate an error that could be used to
adjust the weights in the hidden
layer(s)
Output function
The sigmoid function
1.2
1
0.6
0.4
0.2
net
5
4.5
4
3.5
3
2.5
2
1.5
1
0.5
-0
-0.5
-1
-1.5
-2
-2.5
-3
-3.5
-4
-4.5
0
-5
y
0.8
Sigmoid function




The sigmoid function goes smoothly
from 0 to 1 as net increases
The value of y when net=0 is 0.5
When net is negative, y is between 0
and 0.5
When net is positive, y is between 0.5
and 1.0
Back-propagation


The method of training is called the
back-propagation of errors
The algorithm is an extension of the
delta rule, called the generalised delta
rule
Generalised delta rule




The equation for the generalised delta
rule is ΔWi = ηXiδ
δ is the defined according to which
layer is being considered.
For the output layer, δ is y(1-y)(d-y).
For the hidden layer δ is a more
complex.
Pattern recognition


Many problems can be described as
pattern recognition
For example, voice recognition, face
recognition, optical character
recognition
Pattern classification




A more precise definition is pattern
classification
In pattern classification a system is
shown examples of a number of objects
Each object is given a label or class
The task of the system is to correctly
classify objects that it hasn’t seen
before
Example of 2-input data
X1
X2
1
2
2
4
5
4
1
1.5
5
4.5
4.5
5.5
4.5
3
3.5
3.5
3
4.5
4
5.5
Class
1.5
1.8
3.5
0.52
1.5
1
3
2
2
1.44
2.5
3.5
4
5
4
2
3
4
3.5
4.5
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
Pattern space
6
5
4
S e rie s 1
3
S e rie s 2
2
1
0
0
1
2
3
4
5
6
Training a network




The problem could not be implemented
on a single layer - nonlinearly separable
A 3 layer MLP was tried with 4 neurons
in the hidden layer - which trained
The number of neurons in the hidden
layer was reduced to 2 and still trained
With 1 neuron in the hidden layer it
failed to train
The weights



The weights for the 2 neurons in the
hidden layer are -9, 3.6 and 0.1 and
6.1, 2.2 and -7.8
These weights can be shown in the
pattern space as two lines
The lines divide the space into 4 regions
The hidden neurons
6
5
4
S e rie s 1
3
S e rie s 2
2
1
0
0
1
2
3
4
5
6
Training and Testing



Starting with a data set, the first step is
to divide the data into a training set and
a test set
Use the training set to adjust the
weights until the error is acceptably low
Test the network using the test set, and
see how many it gets right
A better approach


Critics of this standard approach have
pointed out that training to a low error
can sometimes cause “overfitting”,
where the network performs well on the
training data but poorly on the test data
The alternative is to divide the data into
three sets, the extra one being the
validation set
Validation set



During training, the training data is
used to adjust the weights
At each iteration, the test data is also
passed through the network and the
error recorded but the weights are not
adjusted
The training stops when the error for
the test set starts to increase
Stopping criteria
error
Stop here
Test set
Training set
time
Architecture
Input layer
Hidden layer
Output layer
Back-propagation


The method of training is called the
back-propagation of errors
The algorithm is an extension of the
delta rule, called the generalised delta
rule
Generalised delta rule




The equation for the generalised delta
rule is ΔWi = ηXiδ
δ is the defined according to which
layer is being considered.
For the output layer, δ is y(1-y)(d-y).
For the hidden layer δ is a more
complex.
Hidden Layer



We have to deal with the error from the
output layer being feedback backwards
to the hidden layer.
Lets look at example the weight w2(1,2)
Which is the weight connecting neuron 1
in the input layer with neuron 2 in the
hidden layer.


Δw2(1,2)=ηX1(1)δ2(2)
Where



X1(1) is the output of the neuron 1 in the
hidden layer.
δ2(2) is the error on the output of neuron
2 in the hidden layer.
δ2(2)=X2(2)[1-X2(2)]w3(2,1) δ3(1)

δ3(1)
x3(1)]

= y(1-y)(d-y)
=
x3(1)[1-x3(1)][d-
So we start with the error at the output
and use this result to ripple backwards
altering the weights.
Example





Exclusive OR using the network shown
earlier: 2:2:1 network
Initial weights
W2(0,1)=0.862518 W2(1,1)=-0.155797
W2(2,1)=0.282885
W2(0,2)=0.834986 w2(1,2)=-0.505997
0.864449
W3(0,1)=0.036498 w3(1,1)=-0.430437
w3(2,1)=0.48121
w2(2,2)=-
Feedforward – hidden layer
(neuron 1)

So if





X1(0)=1 (the bias)
X1(1)=0
X1(2)=0
The output of weighted sum inside
neuron 1 in the hidden layer=0.862518
Then using sigmoid function

X2(1)=0.7031864
Feedforward – hidden layer
(neuron 2)

So if





X1(0)=1 (the bias)
X1(1)=0
X1(2)=0
The output of weighted sum inside
neuron 2 in the hidden layer=0.834986
Then using sigmoid function

X2(2)=0.6974081
Feedforward – output layer

So if





X2(0)=1 (the bias)
X2(1)=0.7031864
X2(2)=0.6974081
The output of weighted sum inside neuron 2
in the hidden layer=0.0694203
Then using sigmoid function


X3(1)=0.5173481
Desired output=0





δ3(1)=x3(1)[1-x3(1)][d-x3(1)] =-0.1291812
δ2(1)=X2(1)[1-X2(1)]w3(1,1) δ3(1)=0.0116054
δ2(2)=X2(2)[1-X2(2)]w3(2,1) δ3(1)=-0.0131183
Now we can use the delta rule to
calculate the change in the weights
ΔWi = ηXiδ
Examples


If we set η=0.5
ΔW2(0,1) = ηX1(0)δ2(1)
0.0116054
=0.5 x 1 x
=0.0058027

ΔW3(2,1) = ηX2(1)δ3(1)
x –0.1291812
=0.5 x 0.7031864
=-0.04545192



What would be the results of the
following?
ΔW2(2,1) = ηX1(2)δ2(1)
ΔW2(2,2) = ηX1(2)δ2(2)

ΔW2(2,1) = ηX1(2)δ2(1)
=0.5x0x0.0116054

=0
ΔW2(2,2) = ηX1(2)δ2(2)
=0.5 x 0 x –
0.131183
=0




New weights
W2(0,1)=0.868321 W2(1,1)=-0.155797
W2(2,1)=0.282885
W2(0,2)=0.828427 w2(1,2)=-0.505997
0.864449
W3(0,1)=0.028093 w3(1,1)=-0.475856
w3(2,1)=0.436164
w2(2,2)=-
Conclusions




Train using training, test and validation
sets
An MLP can be used to recognise
(classify) complex data
It uses supervised learning with backpropagation to adjust the weights
It divides the pattern space in the
hidden layer
Conclusions





Extending the delta rule to do back
propagation
Need to calculate the error at the
outputs of neurones in the hidden and
output layers
δ3(1)=x3(1)[1-x3(1)][d-x3(1)]
δ2(1)=X2(1)[1-X2(1)]w3(1,1) δ3(1)
δ2(2)=X2(2)[1-X2(2)]w3(2,1) δ3(1)


Once you have the error values (δ’s) for
the neurones you then use the delta rule
to calculate the actual change in the
weights.
ΔWi = ηXiδ