Transcript lecture
Chapter 3
Neural Network
Xiu-jun GONG (Ph. D)
School of Computer Science and Technology, Tianjin
University
[email protected]
http://cs.tju.edu.cn/faculties/gongxj/course/ai/
Outline
Introduction
Training a single TLU
Network of TLUs—Artificial Neural Network
Pros & Cons of ANN
Summary
Biological /Artificial Neural Network
x1
x2
Structure of a typical
neuron
SMI32-stained
pyramidal neurons in
cerebral cortex.
w1
w2
… wn
xn
Artificial Intelligence
Recognition modeling
Neuroscience
f(s)
F(s)
Definition of ANN
Stimulate Neural Network: SNN, NN
It is an interconnected group of artificial
neurons that uses a mathematical or
computational model for information
processing based on a connectionist
approach to computation. In most cases
an ANN is an adaptive system that
changes its structure based on external or
internal information that flows through the
network.
Applications of ANN
Function approximation, or regression
analysis, including time series prediction
and modeling.
Classification, including pattern and
sequence recognition, novelty detection
and sequential decision making.
Data processing, including filtering,
clustering, blind signal separation and
compression.
Extension of a TLU
Threshold Logic Unit -> Perceptron (Neuron)
Inputs are not limited be boolean
values
Outputs are not limited be binary
functions
Output functions of a perceptron
θ
1
f
0
s
s
1
f
1 e s
Characters of sigmoid function
Smooth, continuous, and monotonically
increasing (derivative is always positive)
Bounded range - but never reaches max
or min
The logistic function is often used
1
f
s
1 e
f ' f (1 f )
Linear Separable function by TLU
___
f ( x1 , x\ 2 , x3 ) x1 x2 x3
___
___
f ( x1 , x\ 2 ) x1 x2 x1 x2
A network of TLUs
1
x1
y1
-1
x2
XOR
1
0.5
0.5
-1
0.5
y2
1
f
___
___
f ( x1 , x\ 2 ) x1 x2 x1 x2
Even-Parity Function
f x1x 2 x1x 2
Training single neuron
What is the learning/training
The methods
The Delta Procedure
The Generalized Delta Procedure
The Error-Correction Procedure
Reform the representation of a perceptron
x1
Summing
Junction
W1
x2
W2
S=WX
…
xn
Activation
Function
f = f (s)
output
Wn
xn+1 ≡ 1
Wn+`
s i 1 w i x i i 1 w i x i
n
n 1
w1
w2
s x1 x 2 ... xn xn 1 ...
wn
wn 1
Gradient Decent Methods
Minimizing the squared error of desired
response and neuron output
Squared error function: ε = (d - f)2
def
,
,...,
,
W
w
w
w
w
1
2
n
n 1
S W X
f S
f
2(d f )
X
W f S W
S
The Delta Procedure
Using linear function
Weight update:
f=s
W ← W + c (d – f ) X
Delta rule (Widrow-Hoff rule)
The Generalized Delta Procedure
Using
sigmoid function f (s) = 1 /
(1+e-s)
Weight update
W ← W + c (d – f ) f (1-f ) X
Generalized delta procedure:
f (1– f ) → 0 , where f → 0 or f → 1
Weight change can occur only within ‘fuzzy’
region surrounding the hyperplane near the
point f = 0.5
The Error-Correction Procedure
Using threshold function (output : 0,1)
The weight change rule
W ← W + c (d – f ) X
W←W±cX
In the linearly separable case, after finite
iterations, W will be converged to the
solution.
In the nonlinearly separable case, W will
never be converged.
An example
x1=S2+S3
x2=S4+S5
x3=S6+S7
x4=S8+S9
x1
x2
x3
x4
1
W11
W21
W31
W41
W51
east
ANN: Its topologies
Recurrent
ANN
Inputs
Inputs
Outputs
Context Layer
Outputs
Feedback
Feedforward
Training Neural Network
Supervised method
Unsupervised method (Self-organization)
Trained by matching input and output patterns
Input-output pairs can be provided by an external teacher, or
by the system
An (output) unit is trained to respond to clusters of pattern
within the input.
There is no a priori set of categories
Enforcement learning
An intermediate form of the above two types of learning.
The learning machine does some action on the environment
and gets a feedback response from the environment.
The learning system grades its action good (rewarding) or bad
(punishable) based on the environmental response and
accordingly adjusts its parameters.
Supervised training
Back-propagation—Notations
j Layer #
j0
jm
j 1
x p1
N 0 # input
O p1
T p1
---Op2
xp2
|
|
|
N j # neurons in layer j
x pN 0
|
|
|
----
|
|
|
|
|
|
O pN M
Tp 2
p : the pth pattern of n patterns
NM # output
|
|
|
T pN M
Y ji : output of ith neuron in Layer j
ji : the error valu e associated with
the ith neuron in Layer j
W jik : the connection weight from kth neuron
in layer (j - 1) to the ith neuron in Layer j
Back-propagation: The method
1. Initialize connection weights into small random values.
2. Present the pth sample input vector of
pattern and the corresponding output target to
the network
X p ( x p1 , x p 2 ,....x pN0 )
Yp (Yp1 , Yp 2 ,....YpNM )
3. Pass the input values to the first layer, layer 1.
For every input node i in layer 0, perform:
4 For every neuron i in every layer j = 1, 2, ...,
M, from input to output layer, find the output
from the neuron:
Y0 i x pi
N j 1
Y ji f ( Y( j 1) kW jik )
5. Obtain output values. For every output node i in
layer M, perform:
6.Calculate error value for every neuron i in every
layer in backward order j = M, M-1, ... , 2, 1
k 1
O pi YMi
The method cont.
6.1 For the output layer, the error value is:
Mi YMi (1 YMi )(Tpi YMi )
6.2 For the hidden layer, the error value is:
N j 1
ji Y ji (1 Y ji ) ( j 1) kW( j 1) ki
k 1
6.3 The weight adjustment can be done for every connection
from neuron k in layer (i-1) to every neuron i in every layer i:
Wijk Wijk jiY ji
The actions in steps 2 through 6 will be repeated for every
training sample pattern p, and repeated for these sets until the
root mean square (RMS) of output errors is minimized.
NM
E p (Tpj O pj ) 2
j 1
Generalization vs. specialization
Optimal number of hidden neurons
Overtraining:
Too many hidden neurons : you get an over
fit, training set is memorized, thus making the
network useless on new data sets
Not enough hidden neurons:
network is unable to learn problem concept
Too much examples, the ANN memorizes the
examples instead of the general idea
Generalization vs. specialization trade-off
K-fold cross validation is often used
Unsupervised method
No help from the outside
No training data, no information available
on the desired output
Learning by doing
Used to pick out structure in the input:
Clustering
Reduction of dimensionality compression
Kohonen’s Learning Law (SelfOrganization Map)
Winner takes all (only update weights of
winning neuron)
SOM algorithm
An example: Kohonen Network.
Reinforcement learning
Teacher: training data
The teacher scores the performance of the
training examples
Use performance score to shuffle weights
‘randomly’
Relatively slow learning due to
‘randomness’
Anatomy of ANN learning algorithm
ANN
Learning
Unsupervis
ed
Supervised
Reinforcem
ent learning
Logic inputs
Continuous
inputs
Logic inputs
Continuous
inputs
Hopfield
Back
propagation
ART
SOM, Hebb,
Pros & Cons of ANN
Pros:
A neural network can perform
tasks that a linear program can
not.
When an element of the neural
network fails, it can continue
without any problem by their
parallel nature.
A neural network learns and does
not need to be reprogrammed.
It can be implemented in any
application.
Cons :
The neural network
needs training to
operate.
The architecture of a
neural network is
different from the
architecture of
microprocessors
therefore needs to be
emulated.
Requires high
processing time for
large neural networks.
Summary
The capability of ANN representations
Training a single perceptron
Training neural networks
The ability of Generalization vs.
specialization should be memorized