Neural Networks

Download Report

Transcript Neural Networks

Chapter 9
Artificial Neural network
Introduction to Back Propagation
Neural Network BPNN
By KH Wong
Neural Networks Ch9. , ver. 5f2
1
Introduction
• Neural Network research is are very popular
• A high performance Classifier (multi-class)
• Successful in handwritten optical character
OCR recognition, speech recognition, image
noise removal etc.
• Easy to implementation
– Slow in learning
– Fast in classification
http://www.ninds.nih.gov/disorders/brain_basics/ninds_neuron.htm
http://yann.lecun.com/exdb/mnist/
Neural Networks Ch9. , ver. 5f2
2
Motivation
• Biological findings inspire the development of
Neural Net
– Input weights Logic function output
Neuron(Logic function)
• Biological relation
– Input
X=inputs
– Dendrites
W=weights
– Output
– Human computes using a net
Neural Networks Ch9. , ver. 5f2
Output
3
• Microsoft: XiaoIce. AI
•
Applications
http://imagenet.org/challenges/LSVRC/2015/
– 200 categories: accordion,
airplane ,ant ,antelope
….dishwasher ,dog ,domestic
cat ,dragonfly ,drum ,dumbbell
, etc.
• Tensor flow
ILSVRC 2015
Number of object classes
Training
Validation
Testing
200
Num images
456567
Num objects
478807
Num images
20121
Num objects
55502
Num images
40152
Num objects
---
Neural Networks Ch9. , ver. 5f2
4
Different types of artificial neural
networks
•
•
•
•
•
•
•
Autoencoder
DNN Deep neural network & Deep learning
MLP Multilayer perceptron
RNN (Recurrent neural network)
RBM Restricted Boltzmann machine
SOM (Self-organizing map)
Convolutional neural network
• From https://en.wikipedia.org/wiki/Artificial_neural_network
• The method discussed in this power point can be applied to many of the
Neural Networks Ch9. , ver. 5f2
above nets.
5
Theory of
Back Propagation Neural Net (BPNN)
• Use many samples to train the weights (W) &
Biases (b), so it can be used to classify an
unknown input into different classes
• Will explain
– How to use it after training: forward pass (classify
/or the recognition of the input )
– How to train it: how to train the weights and
biases (using forward and backward passes)
Neural Networks Ch9. , ver. 5f2
6
Back propagation is an essential step in
many artificial network designs
• For training an artificial neural network
• For each training example xi, a supervised (teacher)
output ti is given.
• For the ith training sample x: xi
1) Feed forward propagation: feed xi to the neural net,
obtain output yi. Error ei |ti-yi|2
2) Back propagation: feed ei to net from the output side
and adjust weight w (by finding ∆w) to minimize e.
• Repeat 1) and 2) for all samples until E is 0 or
very small.
Neural Networks Ch9. , ver. 5f2
7
Example :Optical character
recognition OCR
• Training: Train the system first by presenting a lot of
samples with known classes to the network
Training up the network:
weights (W) and bias (b)
Neural Net
• Recognition: When an image is input to the
system, it will tell what character it is
Neural Net
Output3=‘1’, other outputs=‘0’
Neural Networks Ch9. , ver. 5f2
8
Overview of this document
• Back Propagation Neural Networks (BPNN)
– Part 1: Feed forward processing (classification or
Recognition)
– Part 2: Back propagation (Training the network),
also include forward processing, backward
processing and update weights
• Appendix:
• A MATLAB example is explained
• %source :
http://www.mathworks.com/matlabcentral/fileexchange/
19997-neural-network-for-pattern-recognition-tutorial
Neural Networks Ch9. , ver. 5f2
9
Part 1 (classification in action /or
the Recognition process)
Forward pass of Back Propagation
Neural Net (BPNN)
Assume weights (W) and bias (b) are
found by training already (to be
discussed in part2)
Neural Networks Ch9. , ver. 5f2
10
Recognition: assume weight (W)
bias (b) are found earlier
Output
Output0=0
Output1=0
Output2=0
Output3=1
:
Outputn=0
Each pixel is
X(u,v)
Correct recognition
•
Neural Networks Ch9. , ver. 5f2
11
A neural network
•
X
l 1
X
W l 1
l 2
X
Xl N
l 3
W l 2
Wl  N
Output layer
Input layer
Hidden layers
Neural Networks Ch9. , ver. 5f2
12
•
•
•
•
•
•
Exercise 1
How many input and outputs neurons?
Ans: 4 input and 2 output neurons
How many hidden layers does this network have?
Ans: 3
How many weights in total?
Ans: First hidden layer has 4x4, second layer has 3x4,
third hidden layer has 3x3, fourth hidden layer to
output layer has 2x3 weights. total=16+12+9+6=43
X
X l 2
l 1
Inputs
neurons
W l 1
Neural Networks Ch9. , ver. 5f2
What is this layer of
neurons X called?
Ans: X l 4
X l 3
W l N  4
13
Multi-layer structure of a BP neural network
Output
layer
Input
layer
hidden
layer l :

l
A layer has multiple neurons
Each neuron has weights
w1 , w2 , w3 ,... and
one b  bias ,
a transfer function f ()
Other
hidden
layers
Y  outputs, X  set of inputs, W  set of weights, b  set of biases
such that for each neuron in layer l has
inputs x  X l , w  Wl , output y  Y l ,
Neural Networks Ch9. , ver. 5f2
•
14
Inside each neuron there is a bias (b)
• In between any neighboring 2
neuron layers, a set of weights
are found
x (i  3)
x (i  2 )
w(i  1)
w( i  2 )
x (i  I )
w(I )

u
f u 
y
Neural Networks Ch9. , ver. 5f2
15
Inside each neuron x=input, y=output
x (i  1)
x (i  2 )
w(i  1)
w( i  2 )
x (i  I )
w(I )

u
f u 
y
iI
y  f ( u ) with u   w(i)x(i)   b,
i 1
b  bias, x  input, w  weight, u  internal signal
Typically f () is a logistic (sigmod) function, i.e.
1
f (u ) 
, assume   1 for simplicity ,
 u
1 e
1
therefore y  f ( u ) 
 iI

1 e
Neural Networks Ch9. , ver. 5f2
   ( i ) x ( i )  b 
 i 1

•
16
BPNN Forward pass
• Forward pass is to find the output when an input is given. For
example:
• Assume we have used N=60,000 images (MNIST database) to
train a network to recognize c=10 numerals.
• When an unknown image is given to the input, the output
neuron corresponds to the correct answer will give the
highest output level.
Input
image
0
0
0
1
0
0
10 output neurons for 0,1,2,..,9
Neural Networks Ch9. , ver. 5f2
17
Our simple demo program
• Training pattern
class1
– 3 classes (in 3 rows)
– Each class has 3
training samples (items
in each row)
• After training , an
input (assume it is test
image #2) is presented
to the network, the
network should tell
you it is class 2.
Neural Networks Ch9. , ver. 5f2
class2
class3
Result
:image
(class 2)
Unknown
input
18
Numerical Example : Architecture
of our example
x
xl
Input hidden
Layer
9x1 pixelslayer
output
Layer 3x1
W l  5 neurons x 9 inputs for each neuron
bl  5 neurons x 1 (1 bias for each neuron)
Wl  weights, bl  biases , f () for each neuron
Neural Networks Ch9. , ver. 5f2
•
19
The input x
• P2=[50 30 25 215 225 231 31 22 34; ...
%class1: 1st training sample. Gray level 0->255
P1=50
P2=30
P3=25
P4=215
P5=225
P6=235
P7=31
P8=22
P9=34
9 neurons
In input layer
Neural Networks Ch9. , ver. 5f2
3 neurons
5 neurons
In hidden layer In output layer
20
Exercise 2: Feed forward
Input =P1,..P9, output =Y1,Y2,Y3
teacher(target) =T1,T2,T3 Class1 :
•
 (i=1,j=1)
P(i=1)
 (i=2,i=1)
P(i=2)
T1,T2,T3=1,0,0
Y1=0.5101
T1=1
A1(j=1)
 (j=1,k=1)
A1(j=2)  (j=2,k=1)
P(i=3)
Y2=0.4322
 l=2(j=2,k=2) T2=0
Y3=0.3241
T3=0
:
:
P(i=9)
Layer l=1
A1(j=5)
A1: Hidden layer1 =5
neurons, indexed by j
Wl=1=9x5
bl=1=5x1
Output layer
Layer l=2
Input layer
Exercise 2: What is the target code for T1,T2,T3 if it is for class3?
Ans: 0,0,1
Neural Networks Ch9. , ver. 5f2
•
21
Exercise 3: find Y1
•
X=1
y  f ( u) 
1 e
l=1
i=1
0.1
0.35
X=3.1
l=2
i=1
b=0.5
0.4
l=1
i=2
0.27
0.73
0.15
X=0.5
l=1
i=3
1
Wl=1,j=3,i=2
A1
 iI

   ( i ) x ( i )  b 
 i 1

0.6
0.35
A2
l=2
i=2
b=0.3
Hidden layer
l=3
i=1
b=0.7
Y1
=?
0.25
0.8
l=3 y2
i=2
b=0.6
ouput layer
Input layer
Neural Networks Ch9. , ver. 5f2
22
Answer 3
•
•
•
•
•
•
•
•
•
%demo_bpnn_note1 khw ver15
u1=1*0.1+3.1*0.35+0.5*0.4+0.5
A1=1/(1+exp(-1*u1))
•
•
•
•
•
•
•
•
%%%%%% result %%%%%%
%>>demo_bpnn_note1
u1 = 1.8850
A1 = 0.8682
U2 = 2.9080
A2 = 0.9482
Y1 = 0.8528
>> %>>
u2=1*0.27+3.1*0.73+0.5*0.15+0.3
A2=1/(1+exp(-1*u2))
u_Y1=A1*0.6+A2*0.35+0.7
Y1=1/(1+exp(-1*u_Y1))
Neural Networks Ch9. , ver. 5f2
23
Part 2: Back propagation
processing
(Training the network)
Back Propagation Neural Net (BPNN)
(Training)
Ref:http://en.wikipedia.org/wiki/Backpropagation
Neural Networks Ch9. , ver. 5f2
24
Back propagation stage
•
Part1:Feed
Forward
(studied
before)
layer l
x
x l 1
l
x l 1  f (  x  b)
Part2: Back propagation

 l 1
l
For training we need to find
E
, why?

We will explain why and prove the necessary equations in the following slides
Neural Networks Ch9. , ver. 5f2
25
The criteria to train a network
• Based on the overall error function, there are ‘N’ samples and
‘c’ classes to be learned (Assume N=60,000 in MNIST dataset)
2
1 N c n
N
Overall error E   tk  ykn 
2 n 1 k 1
E N  error_for_ all_sample s_all_outp uts
Error of the n th training sample for all outputs ( k  1,..c) :
2
1
E   tkn  ykn 
2 k 1
c
n
2
 2  norm;
Example: The k-th
class training sample
The teacher says
it is class tkn=1
tkn  The given true class of the n th training sample (teacher)
ykn  The output class of the n th training sampleNeural Networks Ch9. , ver. 5f2
at the k th ouput of the feed forward network
26
Before we back propagate data , we have to find the feed
forward error signals e(n) first for training sample x(n).
Recall: Feed forward processing, Input =P1,..P9, output
I.e. e(n)=
=Y1,Y2,Y3, teacher =T1,T2,T3
• Input=
P(i=1)
P(i=2)
 (i=1,j=1)
 (i=2,i=1)
A1(j=1)
 (j=1,k=1)
A1(j=2)  (j=2,k=1)
 (j=2,k=2)
P(i=3)
Y2=0.4322
T2=0
Y3=0.3241
T3=0
:
:
P(i=9)
(1/2)|Y1-T1|2
=0.5*(0.5101-1)^2
=0.12 Y1=0.5101
T1=1
Layer l=1
A1(j=5)
A1: Hidden layer1 =5
neurons, indexed by j
Wl=1=9x5
bl=1=5x1
Output layer
Layer l=2
Input layer
Neural Networks Ch9. , ver. 5f2
•
e
27
Exercise 3 : The training idea
• Assume it is for the nth training
sample, and belong to class C.
• In the previous exercise we
calculated that in this network
Y1=0.8059
• During training for this input
the teacher says t=1
a) What is the error value e?
b) How do we use this e?
•
•
t=1
Assume it is for the nth training sample
Answer a: e=(1/2)|Y1-t|2=0.5*(1-0.8059)^2=0.0188
Answer b: We feed this e back to the network to find w to
minimize the overall E (E =sum_all_n [t-e]). It is because we
know that w_new=w_old+  w will give a new w that decreases
E. hence by applying this formula recursively, we can achieve a
set of W to minimum E.
Neural Networks Ch9. , ver. 5f2
28
How to back propagate?
wi 1, j
Neuron j
1
t  y 2  squared error at output
2
t  target or teacher , y  actual output
E
For a neuron j, Output is y j
iI
By definition , u j   xi wij  b j
i 1
 iI

y j  f (u j )  f   xi wij  b j 
 i 1

E
We want to find
, so
wij
wi  I , j
29
i=1,2,..,I
I inputs to neuron j
Output of neuron j is yj
E
E y j u j

, by chain rule - - - - - - - - - (1)
wij y j u j wij
E
But why do we need to find
?
wij
Neural Networks Ch9. , ver. 5f2
•
29
Because: E/  wi,j tells you how to change w to minimize eE
The method is called Learning by gradient decent
•
In each learning cycle (epoch), a new w is calculated ,
using wnew  wold  w
If we want to decrease E ( e is an element of E) for every learning cycle
E
make w  
(learning by gradient decent)
w
To do it slowly use a small  ve  ( learning factor   0.1)
(The theory of gradient decent wil l be explained in the next slide),
wnew  wold  w  wold  
That' s why we need
E
w
E
w
For the same argument, bnew  bold  
E
b
Neural Networks Ch9. , ver. 5f2
30
We need to find
E ,

why?
Using Taylor series
http://www.fepress.org/files/math_primer_fe_t
aylor.pdf
http://en.wikipedia.org/wiki/Taylor's_theorem
Taylor series by definition
• Ans: E (
new
)  E (old )  E  new  old   ..
Here   new  old , E 
E (new )  E (old ) 
E

E
  - - - - - ( * )

E
   (**)

where is  is a small  ve term to set the learning rate
put ( * * ) into ( *) becomes
set we   - 
E (new )  E (old ) 
E  E 
 E  E 
  -
  E ( wold )   


   







 E  E 
E (new )  E (old ), since  

 is always  ve







E
Conclusion : set   - 
will decrease E

31
Neural Networks Ch9. , ver. 5f2
Back propagation idea
Input =P1,..P9, output =Y(k=1),Y(k=2),Y3(k=3)
teachers =T(k=1),T(k=3),T(k=3)
e=(1/2)|Y1-T1|2
=0.5*(0.5101-1)^2
=0.12
Back propagate to find a
better w to reduce E
•
P(i=1)
P(i=2)
(i=1,j=1)
(i=2,j=1)
(j=1,k=1)
A1(j=2) (j=2,k=1)
P(i=3)
Y(k=2)=0.4322
 l=2(j=2,k=2) T(k=2)=0
Y(k=3)=0.3241
T(k=3)=0
:
:
P(i=9)
Y(k=1)=0.5101
T(k=1)=1
A1(i=1)
Layer l=1
A1(j=5)
A1: Hidden layer1 =5
neurons, indexed by j
Wl=1=9x5
bl=1=5x1
Output layer
Layer l=2
•
32
Input layer
Neural Networks Ch9. , ver. 5f2
32
The training algorithm
• Loop many epochs until E is very small or W is stable
• { For n=1,N_all_training_samples
•
{ feed forward x(n) to network to get y(n)
•
e(n)=0.5*[y(n)-t(n)]^2 //t(n)=teacher of sample x(n)
•
back propagate e(n) to the network,
•
//showed earlier if w=-*E/w , and wnew=wold+ w
•
//output y(n) will be closer to t(n) hence e(n) will decrease
•
find w=-*E/w //E will decrease. 0.1=learning rate
•
update wnew=wold+ w =wold-*E/w //for weight
•
Similarity update bnew=bold+ b =wold-*E/b //for bias
•
}
•
E=sum_all_n (e(n))
Neural Networks Ch9. , ver. 5f2
33
• }
Theory of how to find E/w
jI
uk   x j w j ,k  bk
j 1
 jJ

yk  f (uk )  f   x j w j ,k  bk 
 j 1

Xj=1
Wj,k
Output neuron k
uk yk
Xj=J
An input j is connected to output neuron k through w j ,k
We want to see how w j ,k affects E, so from (1)
E
E

w j ,k y j
y j
u j
u j
w j ,k
term1 , term2,
, (by chain rule)
term3
Neural Networks Ch9. , ver. 5f2
•
34
Case 1: if neuronj is at the output layer. We want to see how E
will change if we change the weight wj,k
E
E

w j ,k yk
yk
uk
E
We want to see w j ,k
uk
,
w j ,k
term1 , term2, term3
y
f (uk )
term2 : k 
 f ' (uk )  f (uk )1  f (uk ) , See appendix
uk
uk
term3 :
 w j ,k x j  b j 
uk

 x j , since b j  constant
w j ,k
w j ,k
E  0.5 yk  tk 
term1 :

  yk  tk , measured at output
yk
yk
xj Wj,k
Teacher
(Target )
Class=tk
uk
yk
2
since
E
 term1 * term2 * term3
w j ,k
k
Output
yk
ek=
0.5(tk-yk)2
, ekE
Neuron k as an output neuron
E
  yk  tk   f (uk )1  f (uk )   x j
w j ,k
note :  k 
E yk E f (uk )

 term1 * term2
yk uk yk uk
 k   yk  tk   f (uk )(1  f (uk ))    ( 2)
 k  sensitivity
Neural Networks Ch9. , ver. 5f2
•
35
Case2 : if neuron j is at the hidden layer. We want to see if how E will change
if we change the weight wi,j. Note: Output yi affects all neurons connected to
Output neurons
it in next layer
indexed by k
neuron j
xi
E
E

wi , j y j
y j
u j
u j
wi , j
wi , j  W1
uj yj
w j ,k 1
in program
w j ,k  2
 k 1
E
uk 2 yk 2 Changes
here
term1 term2 term3
E k  K  E uk  k  K
term1 :


  part1a  part1b 


y j k 1  uk y j  k 1
part1a :
uk 1 yk 1
 k 2
w j ,k  K
E E yk

  k (see eq.(2) of last slide)
uk yk uk
uk  K y k  K
u
w j ,k K
part1b : k  w j ,k , because uk  w j ,k  y j , for each k
y j
 W2
y j affects all uk in the next layer
in
 k K
program
Neural Networks Ch9. , ver. 5f2
•
36
Case2 : continue
E k  K
So, term1   part1a  part1b  
   k  w j ,k 
yi k 1
k 1
k K
k K
E
 term1  term2  term3    k  w j ,k   term2  term3
wi , j
k 1
term2, term3 are similar to that in the previous slide
E k  K
hence
   k  w j ,k   f (ui )1  f (ui )   xi
wi , j k 1
Input xi to the
hidden
neuron i, P(:,)
in program
For this hidden neuron j, this is df1 in the program
Neural Networks Ch9. , ver. 5f2
•
37
After all (E/w) are found after you
solved case1 and case2
We can use this step to update all w
so E is minimized using the
graident decent method, (use learning rate   0.1)
wnew  wold  w
E
w  
w
wnew
E
 wold  
w
•
Neural Networks Ch9. , ver. 5f2
38
Revisit the training algorithm
• Iter=1: all_epochs (or break when E is very small)
• { For n=1:N_all_training_samples
•
{ feed forward x(n) to network to get y(n)
•
e(n)=0.5*[y(n)-t(n)]^2 ;//t(n)=teacher of sample x(n)
•
back propagate e(n) to the network,
•
//showed earlier if w=-*E/w , and wnew=wold+ w
•
//output y(n) will be closer to t(n) hence e(n) will decrease
•
find w=-*E/w //E will decrease. 0.1=learning rate
•
update wnew=wold+ w =wold-*E/w ;//for weight
•
Similarity update bnew=bold+ b =wold-*E/b ;//for bias
•
}
•
E=sum_all_n (e(n))
Neural Networks Ch9. , ver. 5f2
39
• }
Summary
• Learn what is Back Propagation Neural
Networks (BPNN)
• Learn the forward pass
• Learn how to back propagate data during
training of the BPNN network
Neural Networks Ch9. , ver. 5f2
40
• Wiki
References
– http://en.wikipedia.org/wiki/Backpropagation
– http://en.wikipedia.org/wiki/Convolutional_neural_network
• Matlab programs
– Neural Network for pattern recognition- Tutorial
http://www.mathworks.com/matlabcentral/fileexchange/19997neural-network-for-pattern-recognition-tutorial
– CNN Matlab example
http://www.mathworks.com/matlabcentral/fileexchange/38310-deeplearning-toolbox
• Open source library
– Tensor flow: http://www.geekwire.com/2015/google-open-sourcestensorflow-machine-learning-system-offering-its-neural-network-tooutside-developers/
Neural Networks Ch9. , ver. 5f2
41
Appendices
Neural Networks Ch9. , ver. 5f2
42
Appendix 1:Sigmod function f(u) and its derivative f’(u)
f (u ) 
1
1

, for simplicity set   1
 u
u
1 e
1 e
df (u )
 f ' (u )
du
Hence
http://mathworld.wolfram.com/SigmoidFunction.html
 1 
d
u
u 
df (u )
d
(
1

e
)
1

e
'

f (u ) 
 
, ( using chain rule)
u
du
d (1  e )
du
1
1
'
u
u
f (u ) 
(

e
)

e
(1  e u ) 2
(1  e u ) 2


1
e u
1
(1  e u )  (1  e u )  e u


u
u
(1  e ) (1  e ) (1  e u )
(1  e u )



1
1
1


  f (u )1  f (u ) 
u 
u 
(1  e )  (1  e ) 
Thus,
df (u )
 f ' (u )  f (u )1  f (u ) 
du
•
Neural Networks Ch9. , ver. 5f2
http://link.springer.com/chapter/10.1007%2F3-540-59497-3_175#page-1
43
Alternative
Derivation (for
the output layer ,
in each neuron)
•
l
since u l   l x l 1  bl , so
u
 1        (i ),
b
E E u E


   the sensitivit y    (ii )
b u b u
Since the n-th sample
hence
2
2
1 n
1

t  y n   t n  f (u )           (iii )
2
2
Becuase y n  f (u ) is the current output ,
En 
 l 1
t n  the truth or target (teacher)
n
n
E n
n
n  t  y 
From (ii) & (ii i),  
 t  y 
b
b
n
E n
u
l
n
n  t  f ( u ) 
 
 t  y 
 t n  y n  f ' u  ,
b
b
b
u
since in (i ),
 1,
b
E n
l
 
  y n  t n  f ' u     (iv )
b
at the output layer l  L
l
Output
(last layer)
t=target (teacher)
y=output.
Back propagate
error to the
previous layer
 l  L  f ' u l  L    y n  t n 
Neural Networks Ch9. , ver. 5f2
44
derivation
1 n
n 2
Also from (iii) , E  t  y  , and u  wx  b
2
E
y n 
f (u ) 
x  b
n
n 
n
n 
n
n









t

y


t

y


y

t
f
'
(
u
)

l 
l
  l 
 l








n
E
n
n
l
n
n







y

t

f
'
(
u
)

x
,
since
in
(iv)


y

t
f ' (u )
l

E
l




 x      (iv ) (for each input x and weight  )
l

For each learning phase, a new  is calculated , new  old   l - - - - - - - -(v)
If we want to decrease E for everfy learning cycle
E
l
make    l        ( vi)

To do it slowly use a small  ve  ( learning factor)
(This is the gradient decent method, see next slide), hence use eq.(iv , v, vi)
E
new  old   l  old   l  old    x l 1  l 

Neural Networks Ch9. , ver. 5f2
E n
l
l
l
l
45
For the same argument, bnew  bold  b l  bold  bδ , see eq(ii)
b
BNPP example in matlab
Based on Neural Network for pattern
recognition- Tutorial
http://www.mathworks.com/matlabcentral/fil
eexchange/19997-neural-network-forpattern-recognition-tutorial
Neural Networks Ch9. , ver. 5f2
46
Example: a simple BPNN
•
•
•
•
•
Number of classes (no. of output neurons)=3
Input 9 pixels: each input is a 3x3 image
Training samples =3 for each class
Number of hidden layers =1
Number of neurons in the hidden layer =5
Neural Networks Ch9. , ver. 5f2
47
Display of testing patterns
•
Neural Networks Ch9. , ver. 5f2
48
Architecture
A1 
1
1 e 
  l 1 ( i 1, j 1) P1  l 1 ( i 2, j 1) P2 ...  b1 (j1)
P(i=1)
l=1(i=1,j=1)

A2 
 2(j=2,k=1)
Neuron j=1 A1(j=1) A2
Bias=b1(j=1)
P(i=9)
l=1(i=9,j=1)
A5
P(i=2)
 l=1(i=2,j=1)
A2(k=2)
Neuron k=1
Bias=b2(k=1)
 2(j=5,k=1)
A1(j=1)
 l=2(j=1,k=1)
l=2(j=2,k=1)

A1(j=2)
 l=2(j=2,k=2)
P(i=3)
l=1i(j=3,j=4)
:
:
( j 1,k 1) A1 ( k 1)  l  2 ( j 2,k 1) A2 ( k 1) ...  b2 ( k 1))
 2(j=1,k=1)
l=1(i=2,j=1)
 l=1(i=i,j=1)
1  e ( 
A1
P(i=2)
P(i=1)
1
l 2
A1(j=5)
 l=1(i=9,j=5) A1: Hidden layer1 =5
neurons, indexed by j
l=1
P(i=9) Input:
Layer l=1 Wl=1 =9x5
b =5x1
P=9x1
Indexed by j
S1 generated
 l=2(j=5,k=3) A2:layer2, 3
Layer l=2
Output neurons
indexed by k
Wl=2=5x3
bl=2=3x1
S2 generated
Neural Networks Ch9. , ver. 5f2
•
49
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
%source : http://www.mathworks.com/matlabcentral/fileexchange/19997-neural-network-for-pattern-recognition-tutorial
clear memory %comments added by kh wong
clear all
clc
nump=3; % number of classes
n=3; % number of images per class
% training images reshaped into columns in P
% image size (3x3) reshaped to (1x9)
% training images
P=[196 35 234 232 59 244 243 57 226; ...
188 15 236 244 44 228 251 48 230; ... % class 1
246 48 222 225 40 226 208 35 234; ...
255 223 224 255 0 255 249 255 235; ...
234 255 205 251 0 251 238 253 240; ... % class 2
232 255 231 247 38 246 190 236 250; ...
25
24
22
53 224 255 15 25 249 55 235; ...
25 205 251 10 25 238 53 240; ... % class 3
35 231 247 38 24 190 36 250]';
% testing images
N=[208 16 235 255 44 229 236 34 247; ...
245 21 213 254 55 252 215 51 249; ... % class 1
248 22 225 252 30 240 242 27 244; ...
255 241 208 255 28 255 194 234 188; ...
237 243 237 237 19 251 227 225 237; ... % class 2
224 251 215 245 31 222 233 255 254; ...
25 21 208 255 28 25 194 34 188; ...
27 23 237 237 19 21 227 25 237; ... % class 3
24 49 215 245 31 22 233 55 254]';
% Normalization
P=P/256;
N=N/256;
Neural Networks Ch9. , ver. 5f2
50
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
% display the training images
figure(1),
for i=1:n*nump
im=reshape(P(:,i), [3 3]);
%remove theline below to reflect the truth data input
% im=imresize(im,20);
% resize the image to make it clear
subplot(nump,n,i),imshow(im);…
title(strcat('Train image/Class #', int2str(ceil(i/n))))
end
% display the testing images
figure,
for i=1:n*nump
im=reshape(N(:,i), [3 3]);
% remove theline below to reflect the truth data input
% im=imresize(im,20);
% resize the image to make it clear
subplot(nump,n,i),imshow(im);title(strcat('test image #', int2str(i)))
end
Neural Networks Ch9. , ver. 5f2
51
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
% targets
T=[ 1 1 1 0 0 0 0 0 0
0 0 0 1 1 1 0 0 0
0 0 0 0 0 0 1 1 1 ];
S1=5; % numbe of hidden layers
S2=3; % number of output layers (= number of classes)
[R,Q]=size(P);
epochs = 10000; % number of iterations
goal_err = 10e-5; % goal error
a=0.3;
% define the range of random variables
b=-0.3;
W1=a + (b-a) *rand(S1,R); % Weights between Input and Hidden Neurons
W2=a + (b-a) *rand(S2,S1); % Weights between Hidden and Output Neurons
b1=a + (b-a) *rand(S1,1); % Weights between Input and Hidden Neurons
b2=a + (b-a) *rand(S2,1); % Weights between Hidden and Output Neurons
n1=W1*P;
A1=logsig(n1); %feedforward the first time
n2=W2*A1;
A2=logsig(n2);%feedforward the first time
e=A2-T; %actually e=T-A2 in main loop
error =0.5* mean(mean(e.*e)); % better say e=T-A2 , but no harm to error here
nntwarn off
Neural Networks Ch9. , ver. 5f2
52
• for itr =1:epochs
•
if error <= goal_err
•
break
•
else
•
for i=1:Q %i is index to a column in P(9x9), for each column of P ( P:,i)
•
% is a training sample image, 9 training samples, 3 for each class
•
%A1=5x9, A1 =outputs of hidden layer and input to output layer
•
% A2=3x9, A2=Outputs of output layer
•
%T=true class, each column in T is for 1 training sample
•
% hidden_layer =1, output_layer =2,
•
df1=dlogsig(n1,A1(:,i)); %df1 is 5x1 for 5 neurons in hidden layer
•
df2=dlogsig(n2,A2(:,i)); %df2 is 3x1 for output neurons
•
% s2 is sigma2=sensitvity2 from the output layer , equation(2)
•
s2 = -1*diag(df2) * e(:,i); %e=T-A2; df2=f’=f(1-f) of layer2
Neural Networks Ch9. , ver. 5f2
53
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
%s1=5x1
s1 = diag(df1)* W2'* s2; % eq(3),feedback, from s2 to S1
%dW= -n*s2*df(u)*x in ppt, =0.1, S2 is found, x is A1
%W2 is 3x5 , each output neuron receives, update W2
% 5 inputs from 5 hidden neurons in the hidden layer
%sigma2=s2 = -1*diag(df2) * e(:,i); %e=T-A2; df2=f’=f(1-f) of layer2
%delta_W2 = -learning_rate*sigma2*input_to_output_layer
%delta_W2 = -0.1*sigma2*A1
W2 = W2-0.1*s2*A1(:,i)'; %learning rate=0.1, equ(2) output case
%3x5 =3x5- (3x1*1x5),
%A1=5 hidden neuron outputs (5 hidden neurons)
%A1(:,i)’=1x5=outputs of hidden layer,
b2 = b2-0.1*s2; %threshold
% 3x1=3x1- 3x1
%P1(:,i)=1x9 =input t o hidden,
% s1=5x1 because each hidden note has 1 sensitivity (sigma)
W1 = W1-0.1*s1*P(:,i)';% update W1 in layer 1, see equ(3) hidden case
Ch9.and
, ver. 5f2
%5x9 = 5x9-(5x1* 1x9), Neural
sinceNetworks
P is 9x9
for an i, P(:,i)' =1x9
54
•
b1 = b1-0.1*s1;%threshold
•
%5x1=5x1-5x1
•
•
A1(:,i)=logsig(W1*P(:,i)+b1);%forward
•
%5x1 = 5x1
•
A2(:,i)=logsig(W2*A1(:,i)+b2);%forward
•
%3x1=3x1
•
end
•
e = T - A2; % for this e, put -ve sign for finding s2
•
error =0.5*mean(mean(e.*e));
•
disp(sprintf('Iteration :%5d mse :%12.6f%',itr,error));
•
mse(itr)=error;
•
end
• end
•
Neural Networks Ch9. , ver. 5f2
55
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
threshold=0.9; % threshold of the system (higher threshold = more accuracy)
% training images result
%TrnOutput=real(A2)
TrnOutput=real(A2>threshold)
% applying test images to NN , TESTING BEGINS HERE
n1=W1*N;
A1=logsig(n1);
n2=W2*A1;
A2test=logsig(n2);
% testing images result
%TstOutput=real(A2test)
TstOutput=real(A2test>threshold)
% recognition rate
wrong=size(find(TstOutput-T),1);
recognition_rate=100*(size(N,2)-wrong)/size(N,2)
% end of code
Neural Networks Ch9. , ver. 5f2
56
Result of the program
mse error vs. itr (epoch iteration)
•
Neural Networks Ch9. , ver. 5f2
57
Appendix: Architecture of our demo program: exercise3
(write formulas for A1(i=4)
, and A2(k=3)
How many inputs, hidden neurons, outputs, weights in each layer?
A1 ( j  1) 
P(i=1)
1 e 
l=1(i=1,j=1)
  l 1 ( i 1, j 1) P1  l 1 ( i 2, j 1) P2 ...  b1 (j1)
P(i=2)
l=1(i=2,j=1)
P(i=9)
l=1(i=9,j=1)
P(i=1)
P(i=2)
1

A2 (k  1) 
1  e[
( j 1,k 1) A1 ( k 1)  l  2 ( j 2,k 1) A2 ( k 1) ...  b2 ( k 1)]
A1
 l=2(i=1,k=1)
Neuron i=1 A1(i=1) A2
Bias=b1(i=1)
 l=2(i=2,k=1)
A5
 l=1(i=1,j=1)
 l=1(i=2,j=1)
 l=2(i=5,k=1)
Neuron k=1
Bias=b2(k=1)
A2(k=2)
A1(j=1)
 l=2(j=1,k=1)
l=2(j=2,k=1)

A1(j=2)
 l=2(i=2,k=2)
P(i=3)
l=1(i=3,j=4)
:
:
1
l 2
A1(j=5)
 l=1(i=9,j=5) A1: Hidden layer1 =5
neurons, indexed by j
l=1
P(i=9) Input:
Layer l=1 Wl=1 =9x5
b =5x1
P=9x1
Indexed by j
S1 generated
 l=2(j=5,k=3) A2:layer2, 3
Layer l=2
Output neurons
indexed by k
Wl=2=5x3
bl=2=3x1
S2 generated
Neural Networks Ch9. , ver. 5f2
•
58
Answer (exercise3: write values for A1(i=4) and A2(k=3)
• P=[ 0.7656 0.7344 0.9609 0.9961 0.9141 0.9063 0.0977
0.0938 0.0859]%each is p(j=1,2,3..)
• Wl=1=[ 0.2112 0.1540 -0.0687 -0.0289 0.0720 -0.1666
0.2938 -0.0169 -0.1127]%each is w(l=1,j=1,2,3,..)
• bl=1= 0.1441 %for neuron i
• %Find A1(i=4)
• A1_i_is_4=1/(1+exp[-(l=1*P+bl=1))]
• =0.49
• How many inputs, hidden neurons, outputs, weights and biases in
each layer?
• Answer: Inputs=9, hidden neurons=5, outputs=3, weights in hidden
layer (layer1) =9x5, neurons in output layer (layer2)= 5x3, 5 biases
in hidden layer (layer1), 3 biases in output layer (layer2)
• The 4th hidden neuron is A1(i=4)
A1 ( j  4) 
1 e 
1
  l 1 ( j 1, j 4 ) P1  l 1 ( j 2, j 4 ) P2 ...  b1 (j4 )

Neural Networks Ch9. , ver. 5f2
59