Hopfield Network Calculations (Cont.)

Download Report

Transcript Hopfield Network Calculations (Cont.)

Artificial Neural
Network
Supervised Learning
‫دكترمحسن كاهاني‬
http://www.um.ac.ir/~kahani/
Back-propagation neural network
 Learning in a multilayer network proceeds the
same way as for a perceptron.
 A training set of input patterns is presented to
the network.
 The network computes its output pattern, and
if there is an error - or in other words a
difference between actual and desired output
patterns - the weights are adjusted to reduce
this error.
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
BP Training Phases

In a back-propagation neural network, the learning
algorithm has two phases.
1. A training input pattern is presented to the network
input layer. The network propagates the input
pattern from layer to layer until the output pattern is
generated by the output layer.
2. If this pattern is different from the desired output,
an error is calculated and then propagated
backwards through the network from the output
layer to the input layer. The weights are modified as
the error is propagated.
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
‫‪Three-layer BP network‬‬
‫سيستمهاي خبره و مهندسي دانش‪-‬دكتر كاهاني‬
The back-propagation
training algorithm
 Step 1: Initialisation
 Set all the weights and threshold levels of the
network to random numbers uniformly distributed
inside a small range:
 2 .4
2 .4 
 
, 
Fi 
 Fi
where Fi is the total number of inputs of neuron i in the network.
 The weight initialization is done on a neuron-byneuron basis.
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
Step 2: Activation
 Activate the BP network by applying:
 inputs x1(p), x2(p),…, xn(p)
 desired outputs yd,1(p), yd,2(p),…, yd,n(p).
(a) Calculate the actual outputs of the neurons in the
hidden layer:
n

y j ( p)  sigm oid xi ( p)  wij ( p) -  j 
 i 1

where n is the number of inputs of neuron j in the
hidden layer, and sigmoid is the sigmoid activation
function.
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
Step 2: Activation (cont.)
(b) Calculate the actual outputs of the neurons in the
output layer:
m

yk ( p)  sigm oid  x jk ( p)  w jk ( p) - k 
 j 1

where m is the number of inputs of neuron k in the
output layer.
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
Step 3: Weight training
 Update the weights in the BP network propagating
backward the errors associated with output neurons.
(a) Calculate the error gradient for the neurons in the
output layer:
d k ( p)  yk ( p) .[1- yk ( p)]. ek ( p)
where ek ( p)  yd ,k ( p) - yk ( p)
Calculate the weight corrections:
Dwjk ( p) a . y j ( p) .d k ( p)
Update the weights at the output neurons:
wjk ( p 1)  wjk ( p)  Dwjk ( p)
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
Step 3: Weight training (cont.)
(b) Calculate the error gradient for the neurons in the
hidden layer:
l
d j ( p)  y j ( p) [1 - y j ( p)]   d k ( p) w jk ( p)
k 1
Calculate the weight corrections:
Dwij ( p) a . xi ( p) .d j ( p)
Update the weights at the hidden neurons:
wij ( p 1)  wij ( p)  Dwij ( p)
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
Step 4: Iteration
 Increase iteration p by one, go back to Step 2 and
repeat the process until the selected error criterion is
satisfied.
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
Example
Three-layer network for solving the Exclusive-OR operation
-1
3
x1
1
w13
3
-1
w35
w23
5
5
w24
x2
2
y5
w45
4
w24
Input
layer
4
-1
Hidden layer
Output
layer
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
Example (cont.)
 The effect of the threshold applied to a neuron in the
hidden or output layer is represented by its weight, ,
connected to a fixed input equal to -1.
 The initial weights and threshold levels are set
randomly as follows:
w13 = 0.5, w14 = 0.9, w23 = 0.4, w24 = 1.0,
w35 = -1.2, w45 = 1.1
q3 = 0.8, q4 = -0.1 and q5 = 0.3.
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
Example (cont.)
 We consider a training set where inputs x1 and x2 are
equal to 1 and desired output yd,5 is 0. The actual
outputs of neurons 3 and 4 in the hidden layer are
calculated as
[
-  )  1 / [1  e
]
]  0.8808
y3  sigm oid( x1w13  x2 w23 - 3 )  1 / 1  e -(10.510.4 -10.8)  0.5250
y4  sigmoid ( x1w14  x2 w24
4
- (10.9 11.0 10.1)
 Now the actual output of neuron 5 in the output layer is
determined as:
[
]
y5  sigmoid( y3w35  y4 w45 - 5 )  1 / 1  e-(-0.52501.20.88081.1-10.3)  0.5097
 Thus, the following error is obtained:
e  yd,5 - y5  0 - 0.5097  -0.5097
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
Example (cont.)
 The next step is weight training. To update the weights
and threshold levels in our network, we propagate the
error, e, from the output layer backward to the input
layer.
 First, we calculate the error gradient for neuron 5 in the
output layer:
d5  y5 (1- y5) e  0.5097 .(1- 0.5097) . (-0.5097)  -0.1274
 Then we determine the weight corrections assuming that
the learning rate parameter, a, is equal to 0.1:
Dw35  a  y3  d 5  0.1 0.5250 (-0.1274)  -0.0067
Dw45  a  y4  d 5  0.1 0.8808 (-0.1274)  -0.0112
D5  a  (-1)  d 5  0.1 (-1)  (-0.1274)  -0.0127
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
Example (cont.)
 Next we calculate the error gradients for neurons 3 and
4 in the hidden layer:
d 3  y3 (1 - y3 )  d 5  w35  0.5250 (1- 0.5250) ( - 0.1274) ( -1.2)  0.0381
d 4  y4 (1 - y4 )  d 5  w45  0.8808 (1- 0.8808) ( - 0.1274)1.1  -0.0147
 We then determine the weight corrections:
Dw13  a  x1  d 3  0.11 0.0381 0.0038
Dw23  a  x2  d 3  0.11 0.0381 0.0038
D3  a  (-1)  d 3  0.1 (-1)  0.0381 -0.0038
Dw14  a  x1  d 4  0.11 (-0.0147)  -0.0015
Dw24  a  x2  d 4  0.11 (-0.0147)  -0.0015
D4  a  (-1)  d 4  0.1 (-1)  (-0.0147)  0.0015
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
Example (cont.)
 At last, we update all weights and threshold:
w13  w13  Dw13  0.5  0.0038  0.5038
w14  w14  Dw14  0.9 - 0.0015  0.8985
w23  w23  Dw23  0.4  0.0038  0.4038
w24  w24  Dw24  1.0 - 0.0015  0.9985
w35  w35  Dw35  -1.2 - 0.0067  -1.2067
w45  w45  Dw45  1.1- 0.0112  1.0888
3  3  D3  0.8 - 0.0038  0.7962
4  4  D4  -0.1  0.0015  -0.0985
5  5  D5  0.3  0.0127  0.3127
 The training process is repeated until the sum of squared
errors is less than 0.001.
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
Learning curve for Exclusive-OR
1
Sum-Squared Network Error for 224 Epochs
10
Sum-Squared Error
100
10-1
10-2
10-3
10-4
0
50
100
Epoch
150
200
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
Final results of learning
Inputs
Desired
output
x1
x2
yd
1
0
1
0
1
1
0
0
0
1
1
0
Actual
output
y5
Y
0.0155
0.9849
0.9849
0.0175
Error
e
-0.0155
0.0151
0.0151
-0.0175
Sum of
squared
errors
0.0010
e
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
McCulloch-Pitts model for
solving the Exclusive-OR
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
Decision boundaries
 (a) Decision boundary constructed by hidden neuron 3;
 (b) Decision boundary constructed by hidden neuron 4;
 (c) Decision boundaries constructed by the complete threelayer network
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
Accelerated learning in
multilayer neural networks
 A multilayer network learns much faster when the
sigmoidal activation function is represented by a
hyperbolic tangent:
2a
tan h
Y

a
-bX
1 e
where a and b are constants.
 Suitable values for a and b are:
a = 1.716 and b = 0.667
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
Accelerated learning (cont.)
 We also can accelerate training by including a
momentum term in the delta rule:
Dwjk ( p)  b .Dwjk ( p -1) a . y j ( p) . dk ( p)
 where b is a positive number (0 d b  1) called the
momentum constant. Typically, the momentum
constant is set to 0.95.
 This equation is called the generalised delta rule.
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
Learning with momentum
for operation Exclusive-OR
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
Learning with adaptive learning rate
 To accelerate the convergence and yet avoid the danger
of instability, we can apply two heuristics:
Heuristic 1
 If the change of the sum of squared errors has the same
algebraic sign for several consequent epochs, then the
learning rate parameter, a, should be increased.
Heuristic 2
 If the algebraic sign of the change of the sum of
squared errors alternates for several consequent
epochs, then the learning rate parameter, a, should be
decreased.
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
Learning with adaptive learning rate
(cont.)
 Adapting the learning rate requires some changes in
the back-propagation algorithm.
 If the sum of squared errors at the current epoch
exceeds the previous value by more than a
predefined ratio (typically 1.04), the learning rate
parameter is decreased (typically y multiplying by
0.7) and new weights and thresholds are calculated.
 If the error is less than the previous one, the learning
rate is increased (typically by multiplying by 1.05).
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
Learning with adaptive learning rate (cont.)
Training for 103 Epochs
2
Sum-Squared Error
10
101
100
10-1
10-2
10-3
10-4
0
10
20
30
40
50
60
Epoch
70
80
90
100
Learning Rate
1
0.8
0.6
0.4
0.2
0
0
20
40
60
Epoch
80
100
120
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
Learning with momentum
and adaptive learning rate
Training for 85 Epochs
2
Sum-Squared Error
10
101
100
10-1
10-2
10-3
10-4
0
10
0
10
20
30
40
50
Epoch
60
70
80
Learning Rate
2.5
2
1.5
1
0.5
0
20
30
40
50
Epoch
60
70
80
90
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
The Hopfield Network
 The brain’s memory, however, works by association.
 For example, we can recognize a familiar face even
in an unfamiliar environment within 100-200 ms.
 BPN is good for pattern recognition problems.
 we need a recurrent neural network.
 A recurrent neural network has feedback loops from
its outputs to its inputs.
 The stability of recurrent networks was solved only
in 1982, when John Hopfield formulated the physical
principle of storing information in a dynamically
stable network.
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
x1
1
y1
x2
2
y2
xi
i
yi
xn
n
yn
Output Signals
Input Signals
Single-layer n-neuron
Hopfield network
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
MultiLayer Hopfield Network
z-1
z-1
input
hidden
output
z-1
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
Hopfield Network Attributes
 The Hopfield network uses McCulloch and Pitts
neurons with the sign activation function as its
computing element:
Y
sign
 1, if X  0

 - 1, if X  
 Y , if X  

‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
Hopfield Network Attributes
 The current state of the Hopfield network is
determined by the current outputs of all neurons, y1,
y2, . . ., yn.
Thus, for a single-layer n-neuron network, the state can
be defined by the state vector as:
 y1 
y 
2

Y
 



 yn 

‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
Hopfield Network Attributes
 In the Hopfield network, synaptic weights between
neurons are usually represented in matrix form as
follows:
M
W   Ym YmT - M I
m1
where
M is the number of states to be memorized by the network,
Ym is the n-dimensional binary vector,
I is n × n identity matrix,
and superscript T denotes a matrix transposition.
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
Possible states for the three-neuron
Hopfield network
y2
(-1, 1, -1)
(1, 1, -1)
(1, 1, 1)
(-1, 1, 1)
y1
0
(1, -1, -1)
(-1, -1, -1)
(-1, -1, 1)
(1, -1, 1)
y3
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
Calculationnetwork Hopfield
 The stable state-vertex is determined by the
weight matrix W, the current input vector X,
and the threshold matrix 
 If the input vector is partially incorrect or
incomplete, the initial state will converge into
the stable state-vertex after a few iterations.
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
Hopfield Network Calculations (Cont.)
 Suppose, for instance, that our network is required to memorize two
opposite states, (1, 1, 1) and (-1, -1, -1). Thus,
1
Y1  1
1
- 1
Y2  - 1
- 1
or
Y1T  [1 1 1] Y2T  [- 1 -1 -1]
where Y1 and Y2 are the three-dimensional vectors.
 The 3 x 3 identity matrix I is
1 0 0
I  0 1 0
0 0 1
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
Hopfield Network Calculations (Cont.)
 Thus, we can now determine the weight matrix as
follows:
1
- 1
1 0 0 0
W  1[1 1 1]  - 1[- 1 - 1 - 1] - 2 0 1 0  2
1
- 1
0 0 1 2
2
0
2
2
2
0
 Next, the network is tested by the sequence of input
vectors, X1 and X2, which are equal to the output (or
target) vectors Y1 and Y2, respectively.
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
Hopfield Network Calculations (Cont.)
 First, we activate the Hopfield network by applying
the input vector X. Then, we calculate the actual
output vector Y, and finally, we compare the result
with the initial input vector X.
 0

Y1  sign 2
2

2
 0

Y2  sign 2
2

2
0
2
0
2
2 1 0  1

2 1 - 0   1
0 1 0  1
2 - 1 0  - 1

2 - 1 - 0   - 1
0 - 1 0  - 1
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
Hopfield Network Calculations (Cont.)
 The remaining six states are all unstable. However,
stable states (also called fundamental memories) are
capable of attracting states that are close to them.
 The fundamental memory (1, 1, 1) attracts unstable
states (-1, 1, 1), (1, -1, 1) and (1, 1, -1). Each of these
unstable states represents a single error, compared to
the fundamental memory (1, 1, 1).
 The fundamental memory (-1, -1, -1) attracts unstable
states (-1, -1, 1), (-1, 1, -1) and (1, -1, -1).
 Thus, the Hopfield network can act as an error
correction network.
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
Storage capacity of the
Hopfield network
 Storage capacity is the largest number of
fundamental memories that can be stored and
retrieved correctly.
 The maximum number of fundamental memories
Mmax that can be stored in the n-neuron recurrent
network is limited by
Mmax  0.15 n
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
Bidirectional associative memory
(BAM)
 The Hopfield network: autoassociative memory
 It can retrieve a corrupted or incomplete memory
 but cannot associate this memory with another different
memory.
 Human memory is essentially associative.
 Associative memory needs a recurrent neural network
capable of accepting an input pattern on one set of
neurons and producing a related, but different, output
pattern on another set of neurons.
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
Bidirectional associative
memory (BAM)
 Bidirectional associative memory (BAM), first
proposed by Bart Kosko, is a heteroassociative
network.
 It associates patterns from one set, set A, to patterns
from another set, set B, and vice versa.
 Like a Hopfield network, the BAM can generalize
and also produce correct outputs despite corrupted or
incomplete inputs.
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
BAM operation
x1(p)
x1(p+1)
1
1
x2(p)
2
xi(p)
i
xn(p)
Input
layer
y1(p)
2
y2(p)
j
yj(p)
m
ym(p)
x2(p+1)
2
xi(p+1)
i
xn(p+1)
n
Output
layer
(a) Forward direction.
1
1
y1(p)
2
y2(p)
j
yj(p)
m
ym(p)
n
Input
layer
Output
layer
(b) Backward direction.
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
BAM Concept
 The basic idea behind the BAM is to store pattern
pairs so that when n-dimensional vector X from set A
is presented as input, the BAM recalls mdimensional vector Y from set B, but when Y is
presented as input, the BAM recalls X.
 To develop the BAM, we need to create a correlation
matrix for each pattern pair we want to store.
 BAM weight matrix:
M
W
T
X
Y
 m m
m1
 where M is the number of pattern pairs to be stored
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬
in the BAM.
Stability and storage capacity of the BAM
 The BAM is unconditionally stable. This means that
any set of associations can be learned without risk of
instability.
Max. no. of associations < No. of neurons in the smaller layer.
 The more serious problem with the BAM is incorrect
convergence. The BAM may not always produce the
closest association. In fact, a stable association may be
only slightly related to the initial input vector.
‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬