Neural network

Download Report

Transcript Neural network

Machine Learning
Neural Network
Bai Xiao
Neural Network –Roadmap
• What is Neural Network
• Artificial Neuron Model – Perceptron
• Multi-layer and Backpropogation algorithm
• Conclusion
Neural Network
• Analogy to biological neural systems, the most robust
learning systems we know.
• Attempt to understand natural biological systems
through computational modeling.
• Massive parallelism allows for computational
efficiency.
• Help understand “distributed” nature of neural
representations (rather than “localist” representation)
that allow robustness and graceful degradation.
• Intelligent behavior as an “emergent” property of large
number of simple units rather than from explicitly
encoded symbolic rules and algorithms.
Biological inspiration
Animals are able to react adaptively to changes in their
external and internal environment, and they use their nervous
system to perform these behaviours.
An appropriate model/simulation of the nervous system
should be able to produce similar responses and behaviours in
artificial systems.
The nervous system is build by relatively simple units, the
neurons, so copying their behavior and functionality should be
the solution.
Biological inspiration
Dendrites
Soma (cell body)
Axon
Neural Network
• Cell structures
–
–
–
–
Cell body (细胞核)
Dendrites (树突)
Axon (轴突)
Synaptic terminals
(轴突末梢)
Neural Network
• Electrical potential across cell membrane(细胞膜) exhibits
spikes called action potentials.
• Spike originates in cell body, travels down
axon(轴突), and causes synaptic
terminals to release neurotransmitters.
• Chemical diffuses across synapse to
dendrites of other neurons.
• Neurotransmitters can be excititory or
inhibitory.
• If net input of neurotransmitters to a neuron from other
neurons is excititory and exceeds some threshold, it fires an
action potential.
How the Human Brain learns
•
•
•
In the human brain, a typical neuron collects signals from others through a
host of fine structures called dendrites.
The neuron sends out spikes of electrical activity through a long, thin stand
known as an axon, which splits into thousands of branches.
At the end of each branch, a structure called a synapse converts the activity
from the axon into electrical effects that inhibit or excite activity in the
connected neurons.
A Neuron Model
•
When a neuron receives excitatory input that is sufficiently large compared
with its inhibitory input, it sends a spike of electrical activity down its axon.
Learning occurs by changing the effectiveness of the synapses so that the
influence of one neuron on another changes.
•
We conduct these neural networks by first trying to deduce the essential
features of neurons and their interconnections.
We then typically program a computer to simulate these features.
•
Biological inspiration
The spikes travelling along the axon of the pre-synaptic
neuron trigger the release of neurotransmitter
substances at the synapse.
The neurotransmitters cause excitation or inhibition in
the dendrite of the post-synaptic neuron.
The integration of the excitatory and inhibitory signals
may produce spikes in the post-synaptic neuron.
The contribution of the signals depends on the strength
of the synaptic connection.
Artificial neurons
Neurons work by processing information. They receive and provide
information in form of spikes.
w1
x1
Inputs
x2
x3
…
xn-1
xn
n
z   wi xi ; y  H ( z )
w2
.
i 1
w3
.
.
wn-1
wn
The McCullogh-Pitts model
Output
y
A Simple Neuron
• An artificial neuron is a device with many inputs and one
output.
• The neuron has two modes of operation;
• the training mode and
• the using mode.
A Simple Neuron (Cont.)
• In the training mode, the neuron can be trained to fire (or not), for
particular input patterns.
• In the using mode, when a taught input pattern is detected at the input,
its associated output becomes the current output. If the input pattern
does not belong in the taught list of input patterns, the firing rule is
used to determine whether to fire or not.
• The firing rule is an important concept in neural networks and accounts
for their high flexibility. A firing rule determines how one calculates
whether a neuron should fire for any input pattern. It relates to all the
input patterns, not only the ones on which the node was trained on
previously.
Pattern Recognition
• An important application of neural networks is pattern recognition.
Pattern recognition can be implemented by using a feed-forward neural
network that has been trained accordingly.
• During training, the network is trained to associate outputs with input
patterns.
• When the network is used, it identifies the input pattern and tries to
output the associated output pattern.
• The power of neural networks comes to life when a pattern that has no
output associated with it, is given as an input.
• In this case, the network gives the output that corresponds to a taught
input pattern that is least different from the given pattern.
Pattern Recognition (cont.)
• Suppose a network is trained to recognize
the patterns T and H. The associated
patterns are all black and all white
respectively as shown above.
Pattern Recognition (cont.)
Since the input pattern looks more like a ‘T’, when the
network classifies it, it sees the input closely resembling
‘T’ and outputs the pattern that represents a ‘T’.
Pattern Recognition (cont.)
The input pattern here closely resembles ‘H’ with a slight
difference. The network in this case classifies it as an ‘H’ and
outputs the pattern representing an ‘H’.
Pattern Recognition (cont.)
• Here the top row is 2 errors away from a ‘T’ and 3 errors away
from an H. So the top output is a black.
• The middle row is 1 error away from both T and H, so the output is
random.
• The bottom row is 1 error away from T and 2 away from H. Therefore
the output is black.
• Since the input resembles a ‘T’ more than an ‘H’ the output of
the network is in favor of a ‘T’.
Neural Network
• Learning approach based on modeling adaptation in
biological neural systems.
• Perceptron (感知器): Initial algorithm for learning
simple neural networks (single layer) developed in the
1950’s.
• Backpropagation(反向传播): More complex
algorithm for learning multi-layer neural networks
developed in the 1980’s.
Artificial Neuron Model
• Model network as a graph with cells as nodes and synaptic
connections as weighted edges from node i to node j, wji
1
• Model net input to cell as
w12
net j   w jioi
w15
w16
w13 w14
2
i
3
4
5
6
• Cell output is:
0 if net j  T j
oj 
1 if neti  T j
(Tj is threshold for unit j)
oj
1
0
Tj
netj
Artificial Neuron Model
• McCollough and Pitts (1943) showed how such model
neurons could compute logical functions and be used to
construct finite-state machines.
• Can be used to simulate logic gates:
– AND: Let all wji be Tj/n, where n is the number of inputs.
– OR: Let all wji be Tj
– NOT: Let threshold be 0, single input with a negative weight.
• Can build arbitrary logic circuits, sequential machines, and
computers with such gates.
• Given negated inputs, two layer network can compute any
boolean function using a two level AND-OR network.
Artificial Neuron Model
Perceptron Training
• Assume supervised training examples
giving the desired output for a unit given a
set of known input activations.
• Learn synaptic weights (w) so that unit
produces the correct output for each
example.
• Perceptron uses iterative update algorithm
to learn a correct set of weights.
Artificial Neuron Model
Perceptron Training
• Update weights by:
w ji  w ji   (t j  o j )oi
where η is the “learning rate”
tj is the teacher specified output for unit j.
• Equivalent to rules:
– If output is correct do nothing.
– If output is high, lower weights on active inputs
– If output is low, increase weights on active inputs
• Also adjust threshold to compensate:
T j  T j   (t j  o j )
Artificial Neuron Model
Perceptron Training
• Iteratively update weights until convergence.
Initialize weights to random values
Until outputs of all training examples are correct
For each training pair, E, do:
Compute current output oj for E given its inputs
Compare current output to target value, tj , for E
Update synaptic weights and threshold using learning rule
• Each execution of the outer loop is typically
called an epoch.
Perceptron as a Linear Separator
• Since perceptron uses linear threshold function, it is
searching for a linear separator that discriminates the
classes.
w12o2  w13o3  T1
o3
??
w12
T1
o3  
o2 
w13
w13
o2
Or hyperplane in
n-dimensional space
Perceptoron Limits
• Cannot learn exclusive-or, or parity
function in general.
o3
1
+
??
–
0
–
+
o2
1
26
Perceptron Limits
• System obviously cannot learn concepts it
cannot represent.
• Minksy and Papert (1969) wrote a book
analyzing the perceptron and demonstrating
many functions it could not learn.
• These results discouraged further research
on neural nets.
Perceptron Convergence
and Cycling Theorems
• Perceptron convergence theorem: If the data is
linearly separable and therefore a set of weights
exist that are consistent with the data, then the
Perceptron algorithm will eventually converge to a
consistent set of weights.
• Perceptron cycling theorem: If the data is not
linearly separable, the Perceptron algorithm will
eventually repeat a set of weights and threshold at
the end of some epoch and therefore enter an
infinite loop.
Perceptron as Hill Climbing
• The hypothesis space being search is a set of weights and a
threshold.
• Objective is to minimize classification error on the training set.
• Perceptron effectively does hill-climbing (gradient descent) in
this space, changing the weights a small amount at each point
to decrease training set error.
• For a single model neuron, the space is well behaved with a
single minima.
training
error
0
weights
Backpropogation
• Multi-layer networks can represent arbitrary functions, but
an effective learning algorithm for such networks was
thought to be difficult.
• A typical multi-layer network consists of an input, hidden
and output layer, each fully connected to the next, with
activation feeding forward.
output
hidden
activation
input
• The weights determine the function computed. Given an
arbitrary number of hidden units, any boolean function can
be computed with a single hidden layer.
30
Backpropagation Learning Rule
• Each weight changed by:
w ji   j oi
 j  o j (1  o j )(t j  o j )
 j  o j (1  o j ) k wkj
if j is an output unit
if j is a hidden unit
k
where η is a constant called the learning rate
tj is the correct teacher output for unit j
δj is the error measure for unit j
31
Error Backpropagation
• First calculate error of output units and use this to
change the top layer of weights.
Current output: oj=0.2
Correct output: tj=1.0
Error δj = oj(1–oj)(tj–oj)
0.2(1–0.2)(1–0.2)=0.128
Update weights into j
output
hidden
w ji   j oi
input
Error Backpropagation
• Next calculate error for hidden units based on
errors on the output units it feeds into.
output
 j  o j (1  o j )  k wkj
hidden
k
input
33
Error Backpropagation
• Finally update bottom layer of weights based on
errors calculated for hidden units.
output
 j  o j (1  o j )  k wkj
k
hidden
Update weights into j
w ji   j oi
input
34
Backpropagation Training Algorithm
Create the 3-layer network with H hidden units with full connectivity
between layers. Set weights to small random real values.
Until all training examples produce the correct value (within ε), or
mean squared error ceases to decrease, or other termination criteria:
Begin epoch
For each training example, d, do:
Calculate network output for d’s input values
Compute error between current output and correct output for d
Update weights by backpropagating error and using learning rule
End epoch
35
Conclusion on Backpropogation
• Not guaranteed to converge to zero training error,
may converge to local optima or oscillate
indefinitely.
• However, in practice, does converge to low error
for many large networks on real data.
• Many epochs (thousands) may be required, hours
or days of training for large networks.
• To avoid local-minima problems, run several trials
starting with different random weights (random
restarts).
– Take results of trial with lowest training set error.
– Build a committee of results from multiple trials
(possibly weighting votes by training set accuracy).
36
Artificial neural networks
Inputs
Output
An artificial neural network is composed of many artificial
neurons that are linked together according to a specific network
architecture. The objective of the neural network is to transform
the inputs into meaningful outputs.
Artificial neural networks
Tasks to be solved by artificial neural networks:
• controlling the movements of a robot based on self-perception
and other information (e.g., visual information);
• deciding the category of potential food items (e.g., edible or
non-edible) in an artificial world;
• recognizing a visual object (e.g., a familiar face);
• predicting where a moving object goes, when a robot wants to
catch it.
Learning in biological systems
Learning = learning by adaptation
The young animal learns that the green fruits are sour, while the
yellowish/reddish ones are sweet. The learning happens by
adapting the fruit picking behavior.
At the neural level the learning happens by changing of the
synaptic strengths, eliminating some synapses, and building new
ones.
Learning as optimisation
The objective of adapting the responses on the basis of the
information received from the environment is to achieve a better
state. E.g., the animal likes to eat many energy rich, juicy fruits
that make its stomach full, and makes it feel happy.
In other words, the objective of learning in biological organisms is
to optimise the amount of available resources, happiness, or in
general to achieve a closer to optimal state.
Learning in biological neural
networks
The learning rules of Hebb:
• synchronous activation increases the synaptic
strength;
• asynchronous activation decreases the synaptic
strength.
These rules fit with energy minimization principles.
Maintaining synaptic strength needs energy, it should be
maintained at those places where it is needed, and it
shouldn’t be maintained at places where it’s not
needed.
Learning principle for
artificial neural networks
ENERGY MINIMIZATION
We need an appropriate definition of energy for
artificial neural networks, and having that we can use
mathematical optimisation techniques to find how to
change the weights of the synaptic connections
between neurons.
ENERGY = measure of task performance error
Neural network mathematics
Inputs
Output
 y11  2
 1  y1  f ( y1 , w12 )
 y32 
 2
2
3
y 12  f ( x 2 , w12 ) y1   y 2  2
2
1
2
y

f
(
y
,
w
y

y


y 2  f ( y , w2 )
Out
1)
3

1
 2 
y 31  f ( x3 , w31 )
 y3  2
1
2
y3 
1  y3  f ( y , w3 )


 y4 
y 14  f ( x 4 , w14 )
y11  f ( x1 , w11 )
Neural network mathematics
Neural network: input / output transformation
yout  F ( x,W )
W is the matrix of all weight vectors.
MLP neural networks
MLP = multi-layer perceptron
Perceptron:
yout  wT x
x
yout
MLP neural network:
1
y1k 
 w1 kT x  a1k
1 e
y1  ( y11 , y12 , y31 )T
, k  1,2,3
1
y k2 
 w 2 kT y 1  a k2
1 e
y 2  ( y12 , y 22 )T
2
, k  1,2
yout   wk3 y k2  w3T y 2
k 1
x
yout
RBF neural networks
RBF = radial basis function
r ( x)  r (|| x  c ||)
f ( x)  e
Example:
4
y out   wk2  e
k 1
|| x  w1,k ||2

2( ak ) 2
|| x  w||2

2a 2
Gaussian RBF
x
yout
Neural network tasks
• control
• classification
• prediction
• approximation
These can be reformulated in
general as
FUNCTION
APPROXIMATION
tasks.
Approximation: given a set of values of a function g(x) build a
neural network that approximates the g(x) values for any input x.
Neural network approximation
Task specification:
Data: set of value pairs: (xt, yt), yt=g(xt) + zt; zt is
random measurement noise.
Objective: find a neural network that represents the
input / output transformation (a function) F(x,W) such
that
F(x,W) approximates g(x) for every x
Learning to approximate
Error measure:
1
E
N
N
2
(
F
(
x
;
W
)

y
)
 t
t
t 1
Rule for changing the synaptic weights:
E
wi  c 
(W )
j
wi
j
wi
j , new
 wi  wi
j
j
c is the learning parameter (usually a constant)
Learning with a perceptron
T
y

w
x
Perceptron: out
1
2
N
Data: ( x , y1 ), ( x , y2 ),...,( x , y N )
2
T t
2
E
(
t
)

(
y
(
t
)

y
)

(
w
(
t
)
x

y
)
Error:
out
t
t
Learning:
 ( w(t )T x t  yt ) 2
E (t )
wi (t  1)  wi (t )  c 
 wi (t )  c 
wi
wi
wi (t  1)  wi (t )  c  ( w(t )T x t  yt )  xit
m
w(t ) x   w j (t )  x tj
T
j 1
A perceptron is able to learn a linear function.
Learning with RBF neural
networks
M
2
RBF neural network:yout  F ( x, W )   wk  e
|| x  w1,k ||2

2( ak ) 2
k 1
1
2
N
(
x
,
y
),
(
x
,
y
),...,
(
x
, yN )
Data:
1
2
M
Error: E (t )  ( y (t ) out  yt )  ( wk2 (t )  e
2
|| x t  w1,k ||2

2 ( ak ) 2
k 1
Learning:
 yt ) 2
E (t )
w (t  1)  w (t )  c 
wi2
2
i
2
i
E (t )
t

2

(
F
(
x
,W (t ))  yt )  e
2
wi

|| x t  w1,i ||2
2 ( ai ) 2
Only the synaptic weights of the output neuron are
modified.
An RBF neural network learns a nonlinear function.
Learning with MLP neural
networks
MLP neural network:
1
 w1 kT x  a1k
1 e
y1  ( y11 ,..., y1M )T
, k  1,..., M 1
1
with p layers
x
y1k 
you
t
y k2 
1
 w 2 kT y 1  a k2
1 e
y 2  ( y12 ,..., y M2 )T
, k  1,..., M 2
2
...
yout  F ( x;W )  w pT y p 1
1 2 … p-1 p
1
2
N
(
x
,
y
),
(
x
,
y
),...,
(
x
, yN )
Data:
1
2
Error: E(t )  ( y(t ) out  yt ) 2  ( F ( x t ;W )  yt ) 2
It is very complicated to calculate the weight changes.
Learning with backpropagation
Solution of the complicated learning:
• calculate first the changes for the synaptic
weights of the output neuron;
• calculate the changes backward starting from
layer p-1, and propagate backward the local error
terms.
The method is still relatively complicated but
it is much simpler than the original
optimisation problem.
Learning with general optimisation
In general it is enough to have a single layer of
nonlinear neurons in a neural network in order to learn
to approximate a nonlinear function.
In such case general optimisation may be applied
without too much difficulty.
Example: an MLP neural network with a
single hidden layer:M
1
2
yout  F ( x;W )   wk 
k 1
1  e w xa
1,kT
k
Learning with general optimisation
Synaptic weight change rules for the output neuron:
E (t )
w (t  1)  w (t )  c 
wi2
2
i
2
i
E (t )
1
t

2

(
F
(
x
,
W
(
t
))

y
)

1,iT t
t
wi2
1  e  w x  ai
Synaptic weight change rules for the neurons of the
hidden layer: w (t  1)  w (t )  c  Ew(t )
1, i
j
1, i
j
1, i
j
E (t )

t

2

(
F
(
x
,
W
(
t
))

y
)

t
w1j,i
w1j,i

w1j,i
1


w
1 e
1,iT

ew

 
 1  ew
1,iT
x  ai
t

1


w
1 e
1,iT
x t  ai
1,iT
x t  ai

2

x  ai
t





 w1,iT x t  ai
1, i
w j



 w1,iT x t  ai   x tj
1,i
w j
ew
1,iT
w (t  1)  w (t )  c  2  ( F ( x , W (t ))  yt ) 
1, i
j
1, i
j
t
1  e
w
x t  ai
1,iT
x  ai
t

2
 ( x tj )
Conclusion on Neural Network
• Many applications: speech recognition, automated
vehicles, game playing, handwriting recognition.
• More efficient training methods
– Quickprop
– Conjugate gradient
• More biologically plausible learning – Hebbian
theory.
• Learn something more, unsupervised learning,
cognitive science, biological science and etc.
56