Neural Networks

Download Report

Transcript Neural Networks

Advanced Computing Seminar
Data Mining and Its Industrial
Applications
— Chapter 6 —
Neural Networks
Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr
Knowledge and Software Engineering Lab
Advanced Computing Research Centre
School of Computer and Information Science
University of South Australia
2015/7/16
Chap7 NN/Zhongzhi Shi
1
Outline
•
•
•
•
•
•
•
Introduction
Perceptron
Back Propagation
Recurrent network
Hopfield Networks
Self-Organization Maps
Summary
2015/7/16
Chap7 NN/Zhongzhi Shi
2
Introduction
• (Artificial) Neural networks are
– computational models which mimic the brain's learning processes.
– They have the essential features of neurons and their interconnections
as found in the brain.
– Typically, a computer is programmed to simulate these features.
• Other definitions …
A neural network is a massively parallel distributed processor
made up of simple processing units, which has a natural
propensity for storing experimental knowledge and making it
available for use. It resemble the brain in two respects:
• Knowledge is acquired by the network from its environment through a
learning process
• Interneuron connection strengths, known as synaptic weights, are used to
store the acquired knowledge.
2015/7/16
Chap7 NN/Zhongzhi Shi
3
Neural Networks
• A NN is a machine learning approach inspired by the
way in which the brain performs a particular learning
task:
– Knowledge about the learning task is given in the form of
examples.
– Inter neuron connection strengths (weights) are used to
store the acquired information (the training examples).
– During the learning process the weights are modified in
order to model the particular learning task correctly on the
training examples.
2015/7/16
Chap7 NN/Zhongzhi Shi
4
Biological Neural Systems
• The brain is composed of approximately 100 billion
(1011) neurons
A typical neuron collects signals from other neurons
through a host of fine structures called dendrites.
The neuron sends out spikes of electrical activity
through a long, thin strand known as an axon, which
splits into thousands of branches.
Axon
Synapse
Dendrites
Schematic drawing of two biological
neurons connected by synapses
At the end of the branch, a structure called a synapse
converts the activity from the axon into electrical effects
that inhibit or excite activity in the connected neurons.
When a neuron receives excitatory input that is
sufficiently large compared with its inhibitory input, it
sends a spike of electrical activity down its axon.
Learning occurs by changing the effectiveness of the synapses so that the influence of one neuron
on the other changes
2015/7/16
Chap7 NN/Zhongzhi Shi
5
Dimensions of a Neural Network
•
•
•
•
Various types of neurons
Various network architectures
Various learning algorithms
Various applications
2015/7/16
Chap7 NN/Zhongzhi Shi
6
History
“…Nervous Activity” (1943) was a decade
before Hodgkin, Huxley.
 Created logical gates with simple threshold
activations.
 Hand designed connections with locked
weight assignments.

2015/7/16
Chap7 NN/Zhongzhi Shi
7
McCulloch-Pitts Neuron
2015/7/16
Chap7 NN/Zhongzhi Shi
8
History


Rosenblatt (1958) wires McCulloch-Pitts
neurons with a training procedure.
Rosenblatt’s Perceptron (Rosenblatt, F., Principles of
Neurodynamics, New York: Spartan Books, 1962).

1969 Minsky: Failure with linearly separable
problems
– XOR [(X1=T & X2=F) or (X1=F & X2=T)]
– Weakness repaired with hidden layers
(Minsky, M. and Papert, S., Perceptrons, MIT Press,
Cambridge, 1969).
2015/7/16
Chap7 NN/Zhongzhi Shi
9
History


1982 Hopfield network model
Late 1980’s - NN re-emerge with Rumelhart and
McClelland (Rumelhart, D., McClelland, J., Parallel
and Distributed Processing, MIT Press,
Cambridge, 1988).
2015/7/16
Chap7 NN/Zhongzhi Shi
10
The Neuron
• The neuron is the basic information processing unit of
a NN. It consists of:
1 A set of synapses or connecting links, each link
characterized by a weight:
W1, W2, …, Wm
2 An adder function (linear combiner) which
m
computes the weighted sum of
the inputs:
j 1
u   wjxj
3 Activation function (squashing function)  for
limiting the amplitude of the
output of the neuron.
y   (u  b)
2015/7/16
Chap7 NN/Zhongzhi Shi
11
The Neuron
Bias
b
x1
Input
signal
w1
x2

w2

xm

Local
Field
v
Activation
function
 ()
Output
y
Summing
function
wm
Synaptic
weights
2015/7/16
Chap7 NN/Zhongzhi Shi
12
Bias of a Neuron
• Bias b has the effect of applying an affine
transformation to u
v=u+b
• v is the induced field of the neuron
v
u
m
u   wjxj
j 1
2015/7/16
Chap7 NN/Zhongzhi Shi
13
Bias as extra input
• Bias is an external parameter of the neuron. Can be
m
modeled by adding an extra input.
x0 = +1
x1
Input
signal
w0
j 0
w0  b
w1
x2

w2

xm
v   wj x j
Local
Field
v
Summing
function

wm Synaptic
weights
Activation
function
 ()
Output
y
Hebbian Learning
• Simultaneous activation causes
increased synaptic strength
• Asynchronous activation causes
weakened synaptic connection
• Pruning
• Hebbian, anti-Hebbian, and nonHebbian connections
2015/7/16
Chap7 NN/Zhongzhi Shi
15
Common Rules
• Simplify the
complexity
• Task specific
2015/7/16
Chap7 NN/Zhongzhi Shi
16
Biological Firing Rate
• Average Firing Rate depends on biological
effects such as leakage, saturation, and
noise
2015/7/16
Chap7 NN/Zhongzhi Shi
17
Training Methods
• Supervised training
• Unsupervised training
– Self-organization
• Back-Propagation
• Simulated annealing
• Credit assignment
2015/7/16
Chap7 NN/Zhongzhi Shi
18
Outline
•
•
•
•
•
•
•
Introduction
Perceptron
Back Propagation
Recurrent network
Hopfield Netowrks
Self-Organization Maps
Summary
2015/7/16
Chap7 NN/Zhongzhi Shi
19
Perceptron : Single Layer
Feed-forward
Rosenblatt’s Perceptron: a network of processing
elements (PE):
Input layer
of
source nodes
2015/7/16
Output layer
of
neurons
Chap7 NN/Zhongzhi Shi
20
Perceptron : Multi layer feed-forward
3-4-2 Network
Output
layer
Input
layer
Hidden Layer
2015/7/16
Chap7 NN/Zhongzhi Shi
21
Perceptron: Learning Rule
• Err = T – O
– O is the predicted output
– T is the correct output
• Wj  Wj + α * Ij * Err
– Ij is the activation of a unit j in the input
layer
– α is a constant called the learning rate
2015/7/16
Chap7 NN/Zhongzhi Shi
22
Perceptron
2
.5
1
.3

=-1
2(0.5) + 1(0.3) + -1 = 0.3 , O=1
Learning Procedure:
Randomly assign weights (between 0-1)
Present inputs from training data
Get output O, nudge weights to gives results toward our desired output T
Repeat; stop when no errors, or enough epochs completed
2015/7/16
Chap7 NN/Zhongzhi Shi
23
Least Mean Square learning
LMS = Least Mean Square learning Systems, more general than the
previous perceptron learning rule. The concept is to minimize the total
error, as measured over all training examples, P. O is the raw output, as
calculated by
 wi Ii  
i
1
2
Dis tance( LMS )   TP  OP 
2 P
E.g. if we have two patterns and
T1=1, O1=0.8, T2=0, O2=0.5 then D=(0.5)[(1-0.8)2+(0-0.5)2]=.145
We want to minimize the LMS:
C-learning rate
E
W(old)
W(new)
2015/7/16
Chap7 NN/Zhongzhi Shi
W
24
Activation Function
• To apply the LMS learning rule, also known as the
delta rule, we need a differentiable activation
function.
wk  cIk Tj  O j  f '  ActivationFunction
Old:

1 :  wi I i    0

O i


 0 : otherwise 

2015/7/16

New:
Chap7 NN/Zhongzhi Shi
O
1 e
1
  wi I i  
i
25
The activities within a processing unit
2015/7/16
Chap7 NN/Zhongzhi Shi
26
Representation of a processing unit
2015/7/16
Chap7 NN/Zhongzhi Shi
27
A neural network with two
different programs
2015/7/16
Chap7 NN/Zhongzhi Shi
28
Uppercase C and uppercase T
2015/7/16
Chap7 NN/Zhongzhi Shi
29
Various orientations of the
letters C and T
2015/7/16
Chap7 NN/Zhongzhi Shi
30
The structure of the character
recognition system
2015/7/16
Chap7 NN/Zhongzhi Shi
31
The letter C in the field of view
2015/7/16
Chap7 NN/Zhongzhi Shi
32
The letter T in the field of view
2015/7/16
Chap7 NN/Zhongzhi Shi
33
Outline
•
•
•
•
•
•
•
Introduction
Perceptron
Back Propagation
Recurrent network
Hopfield Netowrks
Self-Organization Maps
Summary
2015/7/16
Chap7 NN/Zhongzhi Shi
34
Back-Propagated Delta Rule
Networks (BP)
• Inputs are put
through a
‘Hidden Layer’
before the
output layer
• All nodes
connected
between
layers
2015/7/16
...
Input 0
Input 1
H0
H1
...
Hm
O0
O1
...
Oo
Output 0
Output 1
Chap7 NN/Zhongzhi Shi
...
Input n
Hidden Layer
Output o
35
Learning Rule
Measure error
Reduce that error
By appropriately adjusting each
of the weights in the network
2015/7/16
Chap7 NN/Zhongzhi Shi
36
BP Network Details
• Forward Pass:
– Error is calculated from outputs
– Used to update output weights
• Backward Pass:
– Error at hidden nodes is calculated by back
propagating the error at the outputs
through the new weights
– Hidden weights updated
2015/7/16
Chap7 NN/Zhongzhi Shi
37
Learning Rule –
Back-propagation Network
• Erri = Ti – Oi
• Wj,i  Wj,i + α * aj * Δi
– Δi = Erri * g’(ini)
– g’ is the derivative of the activation function
g
– aj is the activation of the hidden unit
• Wk,j  Wk,j + α * Ik * Δj
– Δj = g’(inj) * ΣiWj,i * Δi
2015/7/16
Chap7 NN/Zhongzhi Shi
38
Backpropagation Networks
• To bypass the linear classification problem, we can
construct multilayer networks. Typically we have fully
connected, feedforward networks.
Input Layer
I1
Output Layer
Hidden Layer
O1
H1
I2
O( x) 
H2
I3
1
Wi,j
1’s - bias
2015/7/16
1
O2
Wj,k
1 e
H ( x) 
1 e
Chap7 NN/Zhongzhi Shi
1
 w j , x H j
j
1
  wi , x I i
i
39
Back Propagation
We had computed:
wk  cIk T j  O j  f '  ActivationFunction;
wk  cIk Tj  O j  f (sum)(1  f (sum)
 1 
f 
sum 
1 e

For the Output unit k, f(sum)=O(k). For the output units, this is:
wj ,k  cH j Tk  Ok Ok (1  Ok )
For the Hidden units (skipping some math), this is:
wi , j  cH j (1  H j ) Ii  (Tk  Ok )Ok (1  Ok )w j ,k
k
I
2015/7/16
H
Wi,j
O
Wj,k
Chap7 NN/Zhongzhi Shi
40
Learning Rule –
Back-propagation Network
• E = 1/2Σi(Ti – Oi)2
•
E = - Ik * Δj
Wk , j
2015/7/16
Chap7 NN/Zhongzhi Shi
41
BP Algorithm
Learning Procedure:
1. Randomly assign weights (between 0-1)
2. Present inputs from training data, propagate to outputs
3. Compute outputs O, adjust weights according to the delta
rule, backpropagating the errors. The weights will be
nudged closer so that the network learns to give the desired
output.
4. Repeat; stop when no errors, or enough epochs
completed
2015/7/16
Chap7 NN/Zhongzhi Shi
42
Backpropagation
• Very powerful - can learn any function, given enough
hidden units! With enough hidden units, we can
generate any function.
• Have the same problems of Generalization vs.
Memorization. With too many units, we will tend to
memorize the input and not generalize well. Some
schemes exist to “prune” the neural network.
• Networks require extensive training, many parameters
to fiddle with. Can be extremely slow to train. May
also fall into local minima.
• Inherently parallel algorithm, ideal for multiprocessor
hardware.
• Despite the cons, a very powerful algorithm that has
seen widespread successful deployment.
2015/7/16
Chap7 NN/Zhongzhi Shi
43
Prediction by BP
2015/7/16
Chap7 NN/Zhongzhi Shi
44
Outline
•
•
•
•
•
•
•
Introduction
Perceptron
Back Propagation
Recurrent network
Hopfield Netowrks
Self-Organization Maps
Summary
2015/7/16
Chap7 NN/Zhongzhi Shi
45
Recurrent Connections
• A sequence is a succession of patterns
that relate to the same object.
• For example, letters that make up a
word or words that make up a sentence.
• Sequences can vary in length. This is a
challenge.
• How many inputs should there be for
varying length inputs ?
2015/7/16
Chap7 NN/Zhongzhi Shi
46
The simple recurrent network
• Jordan network has connections that feed back
from the output to the input layer and also some
input layer units feed back to themselves.
• Useful for tasks that are dependent on a
sequence of a successive states.
• The network can be trained by backpropogation.
• The network has a form of short-term memory.
• Simple recurrent network (SRN) has a similar
form of short-term memory.
2015/7/16
Chap7 NN/Zhongzhi Shi
47
Jordan Network
2015/7/16
Chap7 NN/Zhongzhi Shi
48
Elman Network (SRN)
The number of context units is the same as the number
of hidden units
2015/7/16
Chap7 NN/Zhongzhi Shi
49
Short-term memory in SRN
• The context units remember the
previous internal state.
• Thus, the hidden units have the
task of mapping both an external
input and also the previous internal
state to some desired output.
2015/7/16
Chap7 NN/Zhongzhi Shi
50
Recurrent network
Recurrent Network with hidden neuron(s): unit
delay operator z-1 implies dynamic system
z-1
input
hidden
output
z-1
z-1
2015/7/16
Chap7 NN/Zhongzhi Shi
51
Outline
•
•
•
•
•
•
•
Introduction
Perceptron
Back Propagation
Recurrent network
Hopfield Netowrks
Self-Organization Maps
Summary
2015/7/16
Chap7 NN/Zhongzhi Shi
52
Hopfield Network
• John Hopfield (1982)
– Associative Memory via artificial neural
networks
– Solution for optimization problems
– Statistical mechanics
2015/7/16
Chap7 NN/Zhongzhi Shi
53
Associative memory
• Nature of associative
memory
– part of information given
– the rest of the pattern is
recalled
2015/7/16
Chap7 NN/Zhongzhi Shi
54
Physical Analogy with Memory
• The location of bottom of
the bowl (X0) represents
the stored pattern
• Ball’s initial position
represents the partial
knowledge
• In corrugated surface, can
store {X1, X2,…, Xn}
memories, and recall one
which is closest to the
initial state
2015/7/16
Chap7 NN/Zhongzhi Shi
55
Key Elements of Associative Net
• It is completely described by a state vector
v = (v1, v2, …, vm)
• There are set of stable states v1 , v2 , …, vn .
These correspond to the stored patterns
• System evolves from arbitrary starting state v
to one of the stable states by decreasing its
energy E
2015/7/16
Chap7 NN/Zhongzhi Shi
56
Hopfield 网络
• Every node is connected
to every other nodes
• Weights are symmetric
• Recurrent network
• State of the net is given by
the vector of the node
outputs (x1, x2, x3)
2015/7/16
Chap7 NN/Zhongzhi Shi
57
Neurons in Hopfield Network
• The neurons are binary units
– They are either active (1) or passive
– Alternatively + or –
• The network contains N neurons
• The state of the network is described as a
vector of 0s and 1s:
U  (u1 , u2 ,..., u N )  (0,1,0,1,...,0,0,1)
2015/7/16
Chap7 NN/Zhongzhi Shi
58
State Transition
• Choose one randomly, fire it.
• Calculate activation and transit the state
– transit to itself, or to another state at Hamming
distance
2015/7/16
Chap7 NN/Zhongzhi Shi
59
State Transition Diagram
• Transition tend to take
place down
• Show to reflect the
way the system
decreases its energy
• No matter where we
start in the diagram,
final states will be
either 3 or 6
2015/7/16
Chap7 NN/Zhongzhi Shi
60
Defining Energy for the Net
• If two nodes i and j is connected by positive
weight
– if i=1, j=0, and j fires, input from i through positive
weight let j becomes 1
– if both nodes are 1, they reinforce each other’s
current output
• Define energy function eij = -wijxixj
• The energy of the net is the summing of them
E
 eij w xi x j
Pairs
2015/7/16
(2)
kj
Chap7 NN/Zhongzhi Shi
61
The architecture of Hopfield
Network
• The network is fully interconnected
– All the neurons are connected to each other
– The connections are bidirectional and
symmetric
Wi , j  W j ,i
– The setting of weights depends on the
application
2015/7/16
Chap7 NN/Zhongzhi Shi
62
Defining Energy for the Net
• Since the weight is symmetric
E
1
x xj

w
kj i
2 i, j
(3)
• If node k is selected,
E
1
 xx
2 i  k , wkj i j
jk
 
- 1  w xk xi - 1  w xi xk
2
2
ki
ik
i
1
 x x  i wki xk xi
2 i  k , wkj i
(4)
i
(5)
jk
 S  x  wki x
k i
i
(6)
 S  x ak
k
(7)
2015/7/16
Chap7 NN/Zhongzhi Shi
63
Defining Energy for the Net
• Energy after node k is updated : E’
E '  S  x' a k
k
E  - x a k
k
(8)
(9)
– ak  0, then output goes from 0 to 1 or stays 1. In
either case xk  0 , and E  0
– ak < 0, then output goes from 1 to 0 or stays 0. In
either case xk < 0 , and E  0
– For any node is selected, the energy of the net
decreases or stays the same
– At most N changes of the states, a stable state
has been reached
2015/7/16
Chap7 NN/Zhongzhi Shi
64
Asynchronous vs.
Synchronous update
• Asynchronous update : only one node
is update at any time step
• Synchronous update : every nodes are
update at the same time
– need to store both the current state vector
and the next state vector
2015/7/16
Chap7 NN/Zhongzhi Shi
65
Finding the Weight
• Stable state
– nodes do not change their values
– weights reinforce the node values
– same valued pairs are reinforced by positive
weight
– different valued pairs are reinforced by negative
weight
• Storage prescription of Hopfield net is
– vi = {-1, 1} : spin representation
– w   v p v jp
(1)
ij
2015/7/16
p
i
Chap7 NN/Zhongzhi Shi
66
Hebb Rule
• Training algorithm
For each training patterns
present the components of the pattern at the outputs nodes
if two nodes have the same value
then make small positive increment to the internode weight
else make small negative decrement to the internode weight
endif
endfor
2015/7/16
Chap7 NN/Zhongzhi Shi
67
Hebb Rule
• The learning rule (Hebb Rule) may be written
w  vip v jp
ij
(2)
• Original rule proposed by Hebb (1949)
The organization behavior
When an axon of cell A is near enough to excite a cell B
and repeatedly or persistently takes parts in firing it,
some growth process or metabolic change takes place
in one or both cells such that A’s efficiency,
as one of the cells firing B, is increased.
That is, the correlation of activity between two cells is
reinforced by increasing the synaptic strength between them.
2015/7/16
Chap7 NN/Zhongzhi Shi
68
Updating the Hopfield Network
• The state of the network changes at each time step.
There are four updating modes:
– Serial – Random:
• The state of a randomly chosen single neuron will be updated
at each time step
– Serial-Sequential :
• The state of a single neuron will be updated at each time step,
in a fixed sequence
– Parallel-Synchronous:
• All the neurons will be updated at each time step synchronously
– Parallel Asynchronous:
• The neurons that are not in refractoriness will be updated at the
same time
2015/7/16
Chap7 NN/Zhongzhi Shi
69
The updating Rule
• Here we assume that updating is serialRandom
• Updating will be continued until a stable state
is reached.
– Each neuron receives a weighted sum of the
inputs from other neurons:
N
h j   ui .w j ,i
i 1
i j
– If the input h j is positive the state of the neuron
will be 1, otherwise 0:
2015/7/16
1
uj  
0
if h j  0
if h  0
Chap7 NN/Zhongzhi
Shi
j
70
Convergence of the Hopfield Network
• Does the network eventually reach a stable
state (convergence)?
• To evaluate this a ‘energy’ value will be
associated to the network:
N
1
E    w j ,i ui u j
2 j i 1
i j
• The system will be converged if the energy
is minimized
2015/7/16
Chap7 NN/Zhongzhi Shi
71
Convergence of the Hopfield Network
• Why energy?
– An analogy with spin-glass models of Ferromagnetism (Ising model):
wi , j 
1
1
0
0
1
1
0
1
0
1
1
1
i 1
i j
1
1
0
0
h j   w j ,i ui : the local field exerted upon the unit j
0
0
1
1
, d i , j  distance
N
1
1
1
1
1
1
2
d i, j
u j : the spin of unit j
1
1
0
k
1
0
1
e j   h j u j : The potential energy of unit j
2
E   e j : The overal potential energy of the system
j
N
1
E    w j ,i ui u j
2 j i 1
Chap7ifNN/Zhongzhi
i  j Shi
– The system is stable
the energy
is minimized
2015/7/16
72
Convergence of the Hopfield Network
• Why convergence?
N
h j   ui .w j ,i
i 1
i j
1
uj  
0
if h j  0
if h j  0
N
N
1
1
1
E    w j ,i ui u j    u j  w j ,i ui    u j h j
2 j i 1
2 j
2 j
i 1
i j
i j
if h j  0 and u j  1 then u j will not change  u j h j  h j  0
if h j  0 and u j  0 then u j will change  u j h j  0
if h j  0 and u j  1 then u j will not change  u j h j  0
if h j  0 and u j  1 then u j will change  u j h j  h j  0
in each case u j h j is maximum when u j does not change 
1
Chap7 NN/Zhongzhi Shi
E  -  u j h j is minimum
if u j values do not change
2
2015/7/16
73
Convergence of the Hopfield
Network (4)
• The changes of E with updating:
N
h j   ui .w j ,i
i 1
i j
1
uj  
0
E  Enew  Eold  (
if h j  0
if h j  0
N
1
E    w j ,i ui u j   1  u j h j
2 j i 1
2 j
i j
1
1
1
1
1
1
u j h j  uk newhk )  (  u j h j  uk old hk )   (uk new  uk old )hk   uk .hk

2 j k
2
2 j k
2
2
2
1
if uk old  1 and hk  0  uk new  1  uk  0   uk .hk  0
2
1
if uk old  1 and hk  0  uk new  0  uk  1   uk .hk  0
2
1
if uk old  0 and hk  0  uk new  0  uk  1   uk .hk  0
2
1
if uk old  0 and hk  0  uk new  0  uk  1   uk .hk  0
2
In each case the energy will decrease or remains constant thus the system tends to
Stabilize.
2015/7/16
Chap7 NN/Zhongzhi Shi
74
The Energy Function:
• The energy function is similar to a
multidimensional (N) terrain
Local Minimum
Local Minimum
Global Minimum
2015/7/16
Chap7 NN/Zhongzhi Shi
75
Hopfield network as a model for
associative memory
• Associative memory
– Associates different features with eacother
• Karen  green
• George  red
• Paul  blue
– Recall with partial cues
2015/7/16
Chap7 NN/Zhongzhi Shi
76
Neural Network Model of
associative memory
• Neurons are arranged like a grid:
2015/7/16
Chap7 NN/Zhongzhi Shi
77
Setting the weights
• Each pattern can be denoted by a vector
of -1s or 1s:S  (1,1,1,1,...,1,1,1)  (s p , s p , s p ,...s p )
p
1
2
3
N
• If the number of patterns is m then:
m
wi , j   si s p j
p
p 1
• Hebbian Learning:
– The neurons that fire together , wire together
2015/7/16
Chap7 NN/Zhongzhi Shi
78
Learning in Hopfield net
2015/7/16
Chap7 NN/Zhongzhi Shi
79
Storage Capacity
• As the number of patterns (m) increases, the
chances of accurate storage must decrease
• Hopfield’s empirical work in 1982
– About half of the memories were stored
accurately in a net of N nodes if m = 0.15N
• McCliece’s analysis in 1987
– If we require almost all the required memories to
be stored accurately, then the maximum number
of patterns m is N/2logN
– For N = 100, m = 11
2015/7/16
Chap7 NN/Zhongzhi Shi
80
Limitations of Hopfield Net
• The number of patterns that can be
stored and accurately recalled is
severely limited
– If too many patterns are stored, net may
converge to a novel spurious pattern : no
match output
• Exemplar pattern will be unstable if it
shares many bits in common with
another exemplar pattern
2015/7/16
Chap7 NN/Zhongzhi Shi
81
Outline
•
•
•
•
•
•
•
Introduction
Perceptron
Back Propagation
Recurrent network
Hopfield Netowrks
Self-Organization Maps
Summary
2015/7/16
Chap7 NN/Zhongzhi Shi
82
Self-Organization Maps
•
Kohonen (1982, 1984)
• In biological systems
– cells tuned to similar orientations tend to be
physically located in proximity with one another
– microelectrode studies with cats
• Orientation tuning over the surface forms a kind of
map with similar tunings being found close to each
other
– topographic feature map
– Train a network using competitive learning to
create feature maps automatically
2015/7/16
Chap7 NN/Zhongzhi Shi
83
SOM Clustering
• Self-organizing map (SOM)
– A unsupervised artificial neural network
– Mapping high-dimensional data into a twodimensional representation space
– Similar documents may be found in neighboring
regions
• Disadvantages
– Fixed size in terms of the number of units and their
particular arrangement
– Hierarchical relations between the input data are
not mirrored in a straight-forward manner
2015/7/16
Chap7 NN/Zhongzhi Shi
84
Features
•
Kohonen’s algorithm creates a vector quantizer by
adjusting weight from common input nodes to M
output nodes
• Continuous valued input vectors are presented
without specifying the desired output
• After the learning, weight will be organized such that
topologically close nodes are sensitive to inputs that
are physically similar
• Output nodes will be ordered in a natural manner
2015/7/16
Chap7 NN/Zhongzhi Shi
85
Initial setup of SOM
• Consists a set of
units i in a twodimension grid
• Each unit i is
assigned a weight
vector mi as the
same dimension
as the input data
• The initial weight
vector is assigned
random values
2015/7/16
Chap7 NN/Zhongzhi Shi
86
Winner Selection
• Initially, pick up a random input vector
x(t)
• Compute the unit c with the highest
activity level (the winner c(t)) by
Euclidean distance formula
2015/7/16
Chap7 NN/Zhongzhi Shi
87
Learning Process (Adaptation)
• Guide the adaptation by a learning-rate 
(tune weight vectors from the random
initialization value towards the actual input
space)
• Decrease neighborhood around the winner
towards the currently presented input pattern
(map input onto regions close to each other in
the grid of output pattern, viewed as a neural
network version of k-means clustering)
2015/7/16
Chap7 NN/Zhongzhi Shi
88
Learning Process (Adaptation)
2015/7/16
Chap7 NN/Zhongzhi Shi
89
Neighborhood Strategy
• Neighborhood-kernel hci
• A guassian is used to define neighborhoodkernel
• ||rc-ri||2 denotes the distance between the
winner node c and input vector i
• A time-varying parameter  enable formation
of large clusters in the beginning and finegrained input discrimination towards the end
of the learning process
2015/7/16
Chap7 NN/Zhongzhi Shi
90
Within-Cluster Distance v.s.
Between-Clusters Distance
2015/7/16
Chap7 NN/Zhongzhi Shi
91
SOM Algorithm
Initialise Network
Get Input
Find Focus
Update Focus
Update Neighbourhood
Adjust neighbourhood size
2015/7/16
Chap7 NN/Zhongzhi Shi
92
Learning
• Decreasing the neighbor ensures
progressively finer features are encoded
• gradual lowering of the learn rate
ensures stability
2015/7/16
Chap7 NN/Zhongzhi Shi
93
Basic Algorithm
– Initialize Map (randomly assign weights)
– Loop over training examples
• Assign input unit values according to the values in the
current example
• Find the “winner”, i.e. the output unit that most closely
matches the input units, using some distance metric, e.g.
For all output units j=1 to m
and input units i=1 to n
Find the one that minimizes:
 W
n
i 1
ij
2
 Ii 
• Modify weights on the winner to more closely match the
input
W t 1  c( X it W t )
where c is a small positive learning constant
that usually decreases as the learning proceeds
2015/7/16
Chap7 NN/Zhongzhi Shi
94
Result of Algorithm
• Initially, some output nodes will randomly be a little
closer to some particular type of input
• These nodes become “winners” and the weights move
them even closer to the inputs
• Over time nodes in the output become representative
prototypes for examples in the input
• Note there is no supervised training here
• Classification:
– Given new input, the class is the output node that is
the winner
2015/7/16
Chap7 NN/Zhongzhi Shi
95
Typical Usage: 2D Feature
Map
• In typical usage the output nodes form a 2D “map”
organized in a grid-like fashion and we update
weights in a neighborhood around the winner
Output Layers
Input Layer
O11
O12
O13
O14
O15
O21
O22
O23
O24
O25
O31
O32
O33
O34
O35
O41
O42
O43
O44
O45
O51
O52
O53
O54
O55
I1
I2
…
I3
2015/7/16
Chap7 NN/Zhongzhi Shi
96
Modified Algorithm
– Initialize Map (randomly assign weights)
– Loop over training examples
• Assign input unit values according to the values in the
current example
• Find the “winner”, i.e. the output unit that most closely
matches the input units, using some distance metric, e.g.
• Modify weights on the winner to more closely match the
input
• Modify weights in a neighborhood around the winner
so the neighbors on the 2D map also become closer
to the input
– Over time this will tend to cluster similar items closer on
the map
2015/7/16
Chap7 NN/Zhongzhi Shi
97
Updating the Neighborhood
• Node O44 is the winner
– Color indicates scaling to update neighbors
Output Layers
W t 1  c( X it W t )
O11
O12
O13
O14
O15
O21
O22
O23
O24
O25
c=1
O31
O32
O33
O34
O35
c=0.75
O41
O42
O43
O44
O45
c=0.5
O51
2015/7/16
O52
O53
O54
O55
Chap7 NN/Zhongzhi Shi
Consider if O42
is winner for
some other
input; “fight”
over claiming
O43, O33, O98
53
Selecting the Neighborhood
• Typically, a “Sombrero Function” or Gaussian
function is used
• Neighborhood size usually decreases over
time to allow initial “jockeying for position”
and then “fine-tuning” as algorithm proceeds
2015/7/16
Chap7 NN/Zhongzhi Shi
99
Hierarchical and Partitive
Approaches
• Partitive algorithm
–
–
–
–
–
Determine the number of clusters.
Initialize the cluster centers.
Compute partitioning for data.
Compute (update) cluster centers.
If the partitioning is unchanged (or the algorithm
has converged), stop; otherwise, return to step 3
• k-means error function
– To minimize error function
2015/7/16
Chap7 NN/Zhongzhi Shi
100
Hierarchical and Partitive
Approaches
• Hierarchical clustering algorithm (Dendrogram)
–
–
–
–
Initialize: Assign each vector to its own cluster
Compute distances between all clusters.
Merge the two clusters that are closest to each other.
Return to step 2 until there is only one cluster left.
• Partition strategy
– Cut at different level
2015/7/16
Chap7 NN/Zhongzhi Shi
101
Hierarchical SOM
• GHSOM – Growing
Hierarchical SelfOrganizing Map
– grow in size in order
to represent a
collection of data at a
particular level of
detail
2015/7/16
Chap7 NN/Zhongzhi Shi
102
Plastic Self Organising Maps
• Family of similar
networks
• Adds neurons using
error threshold
• Uses link lengths for
pruning
• Unconnected
neurons removed
• Converges quickly
2015/7/16
Chap7 NN/Zhongzhi Shi
103
PSOM Architecture
Neuron Vector
0.1
0.6
0.2
0.4
2015/7/16
Neuron
Weight
Chap7 NN/Zhongzhi Shi
104
PSOM Algorithm 10
Initialise
Accept Input
Remove
Unconnected
neurons
Find focus
Yes
Create
Neurons
Is euclidean
Distance between
Input and focus
larger than an?
No
Remove
Long links
2015/7/16
Age
links
Update focus and
neighbourhood
Chap7 NN/Zhongzhi Shi
105
Important Parameters
• An is the Node Building Parameter and
controls the ease of adding neurons
• Ba is the link aging parameter and
controls the speed of aging
2015/7/16
Chap7 NN/Zhongzhi Shi
106
Letter and Word recognition
Rumelhart & Zipser (1986)
• Training set {AA, AB, BA, BB}
– 2 units learn to detect either A or B in an particular
serial position
– 4 units learn the pairs : word detector
• Training set {AA, AB, AC, AD, BA, BB, BC,
BD}
– 2 units learn to respond pairs start with A or B
– 4 units learn to recognize pairs end with A, B, C,
or D
– 8 units learn to learn the pairs
2015/7/16
Chap7 NN/Zhongzhi Shi
107
SOM-Graphics
Well trained net should have same
topology as that in the physical space,
and will reflect the properties of the training set
2015/7/16
Chap7 NN/Zhongzhi Shi
108
Face Recognition
90% accurate learning head pose, and recognizing 1-of-20 faces
2015/7/16
Chap7 NN/Zhongzhi Shi
109
Handwritten digit recognition
2015/7/16
Chap7 NN/Zhongzhi Shi
110
Summary
Basics of neural network theory and
practice for supervised and
unsupervised learning.
Most popular Neural Network models:
•
•
•
•
•
Perceptron
Back Propagation
Recurrent network
Hopfield Networks
Self-Organization Maps
2015/7/16
Chap7 NN/Zhongzhi Shi
111
References
[1] Simon Haykin. Neural Networks: A Comprehensive
Foundation. 2nd Edition, 1998, Prentice Hall
[2] Neural Networks Software, URL:
http://cortex.snowseed.com/cortex.htm
[3] Sun Hwa Hahn. Neural Network (IV): Hopfield Net & Kohonen’s SelfOrganization Net. Information & Communication University.
http://ai.kaist.ac.kr/~jkim/cs570-2000/
Lectures/Dr.SHHahn's/NeuralNetwork4.ppt [4]
[4] Harry R. Erwin. Neural Network Toolbox.
http://www.cet.sunderland.ac.uk/
2015/7/16
Chap7 NN/Zhongzhi Shi
112
www.intsci.ac.cn/shizz
Thank you !!!
2015/7/16
Chap7 NN/Zhongzhi Shi
113