SOM

Transcript SOM

Self Organized Map (SOM)
Neural Network
1
Self Organized Map (SOM)
• The self-organizing map (SOM) is a method for
unsupervised learning, based on a grid of artificial
neurons whose weights are adapted to match input
vectors in a training set.
• It was first described by the Finnish professor Teuvo
Kohonen and is thus sometimes referred to as a Kohonen
map.
• SOM is one of the most popular neural computation
methods in use, and several thousand scientific articles
have been written about it. SOM is especially good at
producing visualizations of high-dimensional data.
2
3
Self Organizing Maps (SOM)
• SOM is an unsupervised neural network
technique that approximates an unlimited number
of input data by a finite set of models arranged in
a grid, where neighbor nodes correspond to more
similar models.
• The models are produced by a learning algorithm
that automatically orders them on the twodimensional grid along with their mutual
similarity.
4
Brain’s self-organization
The brain maps the external
multidimensional representation of
the world into a similar 1 or 2 dimensional internal representation.
That is, the brain processes the
external signals in a topologypreserving way
Mimicking the way the brain
learns, our system should be able to
do the same thing.
5
Why SOM ?
•
Unsupervised Learning
•
Clustering
•
Classification
•
Monitoring
•
Data Visualization
•
Potential for combination between SOM and other neural
network (MLP-RBF)
6
Self Organizing Networks
 Discover significant patterns or features
in the input data
 Discovery is done without a teacher
 Synaptic weights are changed according to
local rules
 The changes affect a neuron’s immediate
environment
until a final configuration develops
7
Concept of the SOM.
Input space
Input layer
Reduced feature space
Map layer
Ba
s1
Mn
s2
Sr
Cluster centers (code vectors)
Place of these code vectors
in the reduced space
Clustering and ordering of the cluster centers
in a two dimensional grid
8
Network Architecture
• Two layers of units
– Input: n units (length of training vectors)
– Output: m units (number of categories)
• Input units fully connected with weights to output
units
• Intralayer (lateral) connections
– Within output layer
– Defined according to some topology
– Not weights, but used in algorithm for updating weights
9
SOM - Architecture
• Lattice of neurons (‘nodes’) accepts and responds to set of input
signals
• Responses compared; ‘winning’ neuron selected from lattice
• Selected neuron activated together with ‘neighbourhood’ neurons
• Adaptive process changes weights to more closely inputs
j
2d array of neurons
wj1
x1
wj2 wj3
x2
x3
wjn
...
Weights
xn
Set of input signals
10
Measuring distances between nodes
•
•
a)
b)
•
•
•
•
Distances between output
neurons will be used in the
learning process.
It may be based upon:
Rectangular lattice
Hexagonal lattice
Let d(i,j) be the distance
between the output nodes i,j
d(i,j) = 1 if node j is in the first
outer rectangle/hexagon of node
i
d(i,j) = 2 if node j is in the
second outer rectangle/hexagon
of node i
And so on..
11
•Each neuron is a node containing a template against
which input patterns are matched.
•All Nodes are presented with the same input pattern in
parallel and compute the distance between their template
and the input in parallel.
•Only the node with the closest match between the input
and its template produces an active output.
•Each Node therefore acts like a separate decoder (or
pattern detector, feature detector) for the same input and
the interpretation of the input derives from the presence
or absence of an active response at each location
(rather than the magnitude of response or an input-output
transformation as in feedforward or feedback networks).
12
SOM: interpretation
• Each SOM neuron can be seen as representing
a cluster containing all the input examples
which are mapped to that neuron.
• For a given input, the output of SOM is the
neuron with weight vector most similar (with
respect to Euclidean distance) to that input.
13
Self-Organizing Networks
•
•
•
•
Kohonen maps (SOM)
Learning Vector Quantization (VQ)
Principal Components Networks (PCA)
Adaptive Resonance Theory (ART)
14
Types of Mapping
• Familiarity – the net learns how similar is a given
new input to the typical (average) pattern it has
seen before
• The net finds Principal Components in the data
• Clustering – the net finds the appropriate
categories based on correlations in the data
• Encoding – the output represents the input, using
a smaller amount of bits
• Feature Mapping – the net forms a topographic
map of the input
15
Possible Applications
• Familiarity and PCA can be used to analyze
unknown data
• PCA is used for dimension reduction
• Encoding is used for vector quantization
• Clustering is applied on any types of data
• Feature mapping is important for dimension
reduction and for functionality (as in the
brain)
16
Simple Models
• Network has inputs and outputs
• There is no feedback from the environment
 no supervision
• The network updates the weights following
some learning rule, and finds patterns,
features or categories within the inputs
presented to the network
17
Unsupervised Learning
In unsupervised competitive learning the neurons
take part in some competition for each input. The
winner of the competition and sometimes some
other neurons are allowed to change their weights
• In simple competitive learning only the winner is
allowed to learn (change its weight).
• In self-organizing maps other neurons in the
neighborhood of the winner may also learn.
18
Simple Competitive Learning
N inputs units
P output neurons
P x N weights
x1
W11
W12
x2
W22
WP1
N
hi   Wij X j
Y1
Y2
j 1
i  1, 2... P
Yi  1or0
xN
WPN
YP
19
Network Activation
• The unit with the highest field hi fires
• i* is the winner unit

• Geometrically W i* is closest to the current
input vector
• The winning unit’s weight vector is updated
to be even closer to the current input vector
20
Learning
Starting with small random weights, at each
step:
1. a new input vector is presented to the network
2. allfields are calculated to find a winner
3. W i* is updated to be closer to the input
Using standard competitive learning equ.
Wi* j   ( X j  Wi* j )
21
Result
• Each output unit moves to the center of
mass of a cluster of input vectors 
clustering
22
Competitive Learning, Cntd
• It is important to break the symmetry in the
initial random weights
• Final configuration depends on initialization
– A winning unit has more chances of winning
the next time a similar input is seen
– Some outputs may never fire
– This can be compensated by updating the non
winning units with a smaller update
23
More about SOM learning
• Upon repeated presentations of the training
examples, the weight vectors of the neurons
tend to follow the distribution of the
examples.
• This results in a topological ordering of the
neurons, where neurons adjacent to each other
tend to have similar weight vectors.
• The input space of patterns is mapped onto a
discrete output space of neurons.
24
SOM – Learning Algorithm
1.
2.
3.
Randomly initialise all weights
Select input vector x = [x1, x2, x3, … , xn] from training set
Compare x with weights wj for each neuron j to
d j   ( wij  xi ) 2
i
4.
determine winner
find unit j with the minimum distance
5. Update winner so that it becomes more like x, together with
the winner’s neighbours for units within the radius
according to
6.
7.
wij (n  1)  wij (n)   (n)[ xi  wij (n)]
Adjust parameters: learning rate & ‘neighbourhood function’
Repeat from (2) until … ?
Note that: Learning rate generally decreases
with time:
0   (n)   (n  1)  1 25
Example
An SOFM network with three inputs and two cluster units is to be
trained using the four training vectors:
[0.8 0.7 0.4], [0.6 0.9 0.9], [0.3 0.4 0.1], [0.1 0.1 02] and
initial weights
0.5
0.5 0.4
0.6 0.2


0.8 0.5
0.6
weights to the first
cluster unit
0.8
The initial radius is 0 and the learning rate  is 0.5 . Calculate the
weight changes during the first cycle through the data, taking the
training vectors in the given order.
26
Solution
The Euclidian distance of the input vector 1 to cluster unit 1 is:
d1  0.5  0.8  0.6  0.7   0.8  0.4  0.26
2
2
2
The Euclidian distance of the input vector 1 to cluster unit 2 is:
d 2  0.4  0.8  0.2  0.7   0.5  0.4  0.42
2
2
2
Input vector 1 is closest to cluster unit 1 so update weights to cluster unit 1:
wij (n  1)  wij (n)  0.5[ xi  wij (n)]
0.65  0.5  0.5(0.8  0.5)
0.65  0.6  0.5(0.7  0.6)
0.6  0.8  0.5(0.4  0.8)
0.65
0.65

0.60
0.4
0.2
0.5
27
Solution
The Euclidian distance of the input vector 2 to cluster unit 1 is:
d1  0.65  0.6  0.65  0.9  0.6  0.9  0.155
2
2
2
The Euclidian distance of the input vector 2 to cluster unit 2 is:
d 2  0.4  0.6  0.2  0.9  0.5  0.9  0.69
2
2
2
Input vector 2 is closest to cluster unit 1 so update weights to cluster unit 1 again:
wij (n  1)  wij (n)  0.5[ xi  wij (n)]
0.625  0.65  0.5(0.6  0.65)
0.775  0.65  0.5(0.9  0.65)
0.750 0.60  0.5(0.9  0.60)
0.625 0.4
0.775 0.2


0.750 0.5
Repeat the same update procedure for input vector 3
and 4 also.
28
Neighborhood Function
– Gaussian neighborhood function:
 d ij2

hi (d ij )  exp  
2

2



– dji: lateral distance of neurons i and j
• in a 1-dimensional lattice
|j-i|
• in a 2-dimensional lattice
|| rj - ri ||
where rj is the position of neuron j in the lattice.
29
N13(1)
N13(2)
30
Neighborhood Function
–  measures the degree to which excited
neurons in the vicinity of the winning
neuron cooperate in the learning process.
– In the learning algorithm  is updated at
each iteration during the ordering phase
using the following exponential decay
update rule, with parameters
 (n)   0 exp   n T 

1

31
Neighbourhood function
Degree of
neighbourhood
Degree of
neighbourhood
1
1
Time
0.5
0.5
8
10
6
4
2
-2
-4
-6
-8
-1
0
10
8
6
4
2
0
-2
-4
-8
-1
0
-6
Distance from winner
0
0
0
Distance from winner
Time
32
UPDATE RULE
w j (n  1)  w j (n)   (n) hij( x ) (n) x - w j (n)
exponential decay update of the learning rate:


n
 (n)  0 exp   T 
2

33
Illustration of learning for Kohonen maps
Inputs: coordinates (x,y) of points
drawn from a square
Display neuron j at position xj,yj
where its sj is maximum
100 inputs
200 inputs
y
x
Random initial positions
1000 inputs
34
Two-phases learning approach
– Self-organizing or ordering phase. The learning rate
and spread of the Gaussian neighborhood function
are adapted during the execution of SOM, using for
instance the exponential decay update rule.
– Convergence phase. The learning rate and Gaussian
spread have small fixed values during the execution
of SOM.
35
Ordering Phase
• Self organizing or ordering phase:
– Topological ordering of weight vectors.
– May take 1000 or more iterations of SOM algorithm.
• Important choice of the parameter values. For instance
– (n):  0 = 0.1
T2 = 1000
 decrease gradually (n)  0.01
– hji(x)(n):
 0 big enough
T1 =
1000
log (0)
• With this parameter setting initially the neighborhood of
the winning neuron includes almost all neurons in the
network, then it shrinks slowly with time.
36
Convergence Phase
• Convergence phase:
– Fine tune the weight vectors.
– Must be at least 500 times the number of neurons in
the network  thousands or tens of thousands of
iterations.
• Choice of parameter values:
– (n) maintained on the order of 0.01.
– Neighborhood function such that the neighbor of the
winning neuron contains only the nearest neighbors.
It eventually reduces to one or zero neighboring
neurons.
37
38
Another Self-Organizing Map
(SOM) Example
• From Fausett (1994)
• n = 4, m = 2
– More typical of SOM application
– Smaller number of units in output than in input;
dimensionality reduction
• Training samples
i1: (1, 1, 0, 0)
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1)
Network Architecture
Input units:
Output units: 1
What should we expect as outputs?
2
39
What are the Euclidean Distances
Between the Data Samples?
• Training samples
i1: (1, 1, 0, 0)
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1)
i1
i1
i2
i3
i4
i2
i3
i4
0
0
0
0
40
Euclidean Distances Between Data
Samples
• Training samples
i1: (1, 1, 0, 0)
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1)
Input units:
Output units: 1
2
i1
i2
i3
i1
0
i2
3
0
i3
1
2
0
i4
4
1
3
i4
0
What might we expect from the SOM?
41
Example Details
Input units:
• Training samples
Output units: 1
i1: (1, 1, 0, 0)
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1)
2
• With only 2 outputs, neighborhood = 0
– Only update weights associated with winning output unit (cluster) at each
iteration
• Learning rate
(t) = 0.6; 1 <= t <= 4
(t) = 0.5 (1); 5 <= t <= 8
(t) = 0.5 (5); 9 <= t <= 12
etc.
• Initial weight matrix
(random values between 0 and 1)
Unit 1:
Unit 2:

n
d2 = (Euclidean distance)2 =
Weight update:
k 1
.2 .6 .5 .9
.8 .4 .7 .3


(il ,k  w j ,k (t ))
2
w j (t  1)  w j (t )   (t )(il  w j (t ))
42
Problem: Calculate the weight updates for the first four steps
First Weight Update
• Training sample: i1
– Unit 1 weights
Unit 1:
Unit 2:
.2 .6 .5 .9
.8 .4 .7 .3


i1: (1, 1, 0, 0)
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1)
• d2 = (.2-1)2 + (.6-1)2 + (.5-0)2 + (.9-0)2 = 1.86
– Unit 2 weights
• d2 = (.8-1)2 + (.4-1)2 + (.7-0)2 + (.3-0)2 = .98
– Unit 2 wins
– Weights on winning unit are updated
new  unit 2  weights  [.8 .4 .7 .3]  0.6([1 1 0 0] - [.8 .4 .7 .3]) 
[.92 .76 .28 .12]
– Giving an updated weight matrix:
Unit 1:  .2
.6 .5 .9 


Unit 2: .92 .76 .28 .12
43
i1: (1, 1, 0, 0)
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
 .2 .6 .5 .9  i4: (0, 0, 1, 1)
Second Weight Update
• Training sample: i2
– Unit 1 weights
Unit 1:


Unit 2: .92 .76 .28 .12
• d2 = (.2-0)2 + (.6-0)2 + (.5-0)2 + (.9-1)2 = .66
– Unit 2 weights
• d2 = (.92-0)2 + (.76-0)2 + (.28-0)2 + (.12-1)2 = 2.28
– Unit 1 wins
– Weights on winning unit are updated
new  unit1  weights  [.2 .6 .5 .9]  0.6([0 0 0 1] - [.2 .6 .5 .9]) 
[.08 .24 .20 .96]
– Giving an updated weight matrix:
Unit 1:
Unit 2:
.08 .24 .20 .96
.92 .76 .28 .12


44
i1: (1, 1, 0, 0)
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
.08 .24 .20 .96 i4: (0, 0, 1, 1)
Third Weight Update
• Training sample: i3
– Unit 1 weights
Unit 1:
.92 .76 .28 .12

Unit 2: 
• d2 = (.08-1)2 + (.24-0)2 + (.2-0)2 + (.96-0)2 = 1.87
– Unit 2 weights
• d2 = (.92-1)2 + (.76-0)2 + (.28-0)2 + (.12-0)2 = 0.68
– Unit 2 wins
– Weights on winning unit are updated
new  unit 2  weights  [.92 .76 .28 .12]  0.6([1 0 0 0] - [.92 .76 .28 .12]) 
[.97 .30 .11 .05]
– Giving an updated weight matrix:
Unit 1:
Unit 2:
.08 .24 .20 .96
.97 .30 .11 .05


45
i1: (1, 1, 0, 0)
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
.08 .24 .20 .96 i4: (0, 0, 1, 1)
Fourth Weight Update
• Training sample: i4
– Unit 1 weights
Unit 1:


Unit 2: .97 .30 .11 .05
• d2 = (.08-0)2 + (.24-0)2 + (.2-1)2 + (.96-1)2 = .71
– Unit 2 weights
• d2 = (.97-0)2 + (.30-0)2 + (.11-1)2 + (.05-1)2 = 2.74
– Unit 1 wins
– Weights on winning unit are updated
new  unit1  weights  [.08 .24 .20 .96]  0.6([0 0 1 1] - [.08 .24 .20 .96]) 
[.03 .10 .68 .98]
– Giving an updated weight matrix:
Unit 1:
Unit 2:
.03 .10 .68 .98
.97 .30 .11 .05


46
Applying the SOM Algorithm
Data sample utilized
time (t)
1
2
1
Unit 2
2
3
4
Unit 1
3
Unit 2
4
Unit 1
D(t)
(t)
0
0.6
0
0.6
0
0.6
0
0.6
‘winning’ output unit
After many iterations (epochs)
through the data set:
Unit 1:
Unit 2:
 0 0 .5 1.0
1.0 .5 0 0 


Did we get the clustering that we expected?
47
Training samples
i1: (1, 1, 0, 0)
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1)
Weights
Input units:
Unit 1:
Output units: 1
2
Unit 2:
 0 0 .5 1.0
1.0 .5 0 0 


What clusters do the
data samples fall into?
48
Training samples
i1: (1, 1, 0, 0)
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1)
Solution
Weights
 0 0 .5 1.0
1.0 .5 0 0 


Unit 1:
Input units:
Unit 2:
Output units: 1
2
• Sample: i1
– Distance from unit1 weights
• (1-0)2 + (1-0)2 + (0-.5)2 + (0-1.0)2 = 1+1+.25+1=3.25
– Distance from unit2 weights
• (1-1)2 + (1-.5)2 + (0-0)2 + (0-0)2 = 0+.25+0+0=.25 (winner)
• Sample: i2
– Distance from unit1 weights
• (0-0)2 + (0-0)2 + (0-.5)2 + (1-1.0)2 = 0+0+.25+0 (winner)
– Distance from unit2 weights
• (0-1)2 + (0-.5)2 + (0-0)2 + (1-0)2 =1+.25+0+1=2.25
k 1 (il ,k  w j ,k (t ))
n
d2 = (Euclidean distance)2 =
2
49
Training samples
i1: (1, 1, 0, 0)
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1)
Solution
Weights
 0 0 .5 1.0
1.0 .5 0 0 


Unit 1:
Input units:
Unit 2:
Output units: 1
2
• Sample: i3
– Distance from unit1 weights
• (1-0)2 + (0-0)2 + (0-.5)2 + (0-1.0)2 = 1+0+.25+1=2.25
– Distance from unit2 weights
• (1-1)2 + (0-.5)2 + (0-0)2 + (0-0)2 = 0+.25+0+0=.25 (winner)
• Sample: i4
– Distance from unit1 weights
• (0-0)2 + (0-0)2 + (1-.5)2 + (1-1.0)2 = 0+0+.25+0 (winner)
– Distance from unit2 weights
• (0-1)2 + (0-.5)2 + (1-0)2 + (1-0)2 = 1+.25+1+1=3.25
k 1 (il ,k  w j ,k (t ))
n
d2 = (Euclidean distance)2 =
2
50
Word categories
51
Examples of Applications
• Kohonen (1984). Speech recognition - a map
of phonemes in the Finish language
• Optical character recognition - clustering of
letters of different fonts
• Angeliol etal (1988) – travelling salesman
problem (an optimization problem)
• Kohonen (1990) – learning vector quantization
(pattern classification problem)
• Ritter & Kohonen (1989) – semantic maps
52
Summary
• Unsupervised learning is very common
• US learning requires redundancy in the stimuli
• Self organization is a basic property of the brain’s
computational structure
• SOMs are based on
– competition (wta units)
– cooperation
– synaptic adaptation
• SOMs conserve topological relationships between
the stimuli
• Artificial SOMs have many applications in
53
computational neuroscience
End of slides
54

SOM

Transcript SOM

Directory