Lecture 22 clustering (3)

Download Report

Transcript Lecture 22 clustering (3)

Intro. ANN & Fuzzy Systems
Lecture 22
Clustering (3)
Intro. ANN & Fuzzy Systems
Outline
• Self Organization Map:
Structure
Feature Map
Algorithms
Examples
(C) 2001 by Yu Hen Hu
2
Intro. ANN & Fuzzy Systems
Introduction to SOM
• Both SOM and LVQ are proposed by T. Kohonen.
• Biological motivations: Different regions of a brain
(cerebral cortex) seem to tune into different tasks.
Particular location of the neural response of the "map"
often directly corresponds to specific modality and quality
of sensory signal.
• SOM is an unsupervised clustering algorithm which
creates spatially organized "internal representation" of
various features of input signals and their abstractions.
• LVQ is a supervised classification algorithm which is
obtained by fine-tuning the result obtained using SOM
initially.
(C) 2001 by Yu Hen Hu
3
Intro. ANN & Fuzzy Systems
SOM Structure
• Neurons are spatially organized (indexed) in
an 1-D or 2-D area.
• Inputs connect to every neurons.
•
•
•
•
In 1
(C) 2001 by Yu Hen Hu
•••
•
In 2
4
Intro. ANN & Fuzzy Systems
Neighborhood Structure
• Based on the topological arrangement, a
"neighborhood" can be defined for each neuron.
: current neuron
: nearest neightbor
: 2ndary neightbor
• Linear and higher dimensional neighborhood
can be defined similarly.
(C) 2001 by Yu Hen Hu
5
Intro. ANN & Fuzzy Systems
Feature Map
• The neuron output form a low dimension map
of high dimensional feature space!
• Neighboring features in the feature space are
to be mapped to neighboring neurons in the
feature map.
• Due to the gradient search nature, initial
assignment is important.
(C) 2001 by Yu Hen Hu
6
Intro. ANN & Fuzzy Systems
SOM Training Algorithm
Initialization: Choose weight vectors {wm(0); 1  m  M}
randomly. Set iteration count t = 0.
While Not_Converged
Choose the next x and compute d(x, wm(t)); 1  m M.
Select m* = mimm d(x, wm(t))
Update node m* and its neighborhood nodes:
wm (t )   ( x  wm (t )) m  N (m*, t );
wm (t  1)  
wm (t )
m  N (m*, t )

If Not_converged, then t = t+1
End % while loop
(C) 2001 by Yu Hen Hu
7
Intro. ANN & Fuzzy Systems
Example
initial
at end of 500 iterations
2
2
1
1
0
0
11
1
-1
-2
-2
-1
-1
0
1
2
-2
-2
-1
0
1
2
• Initially, the code words are not ordered as shown in tangled
blue lines.
• At the end, the lines are stretched, and the wiring is untangled.
(C) 2001 by Yu Hen Hu
8
Intro. ANN & Fuzzy Systems
SOM Algorithm Analysis
• Competitive learning – neurons competes to represent
the input data. Winner Takes it All!
• Neighborhood updating: the weights of neurons fall
within the winning neighborhood will be updated by
pulling themselves toward the data sample. Let x(t) be
the data vector at time t, if all the data vectors x(t) have
been nearest to Wm(t), then
Wm (t  1)  Wm (t )    ( x(t )  Wm (t ))  (1   )Wm (t )    x(t )
t
 (1   ) w m (0)    (1  ) k x(t  k )
t 1
k 0
for x(t)’s that m is in the winner neighborhood.
• The size of the neighborhood is reduced as t increase.
Eventually, N(m*,t) = m*.
(C) 2001 by Yu Hen Hu
9
Intro. ANN & Fuzzy Systems
More SOM Algorithm Analysis
• A more elaborate update formulation:
wm(t+1) = wm(t) + (m*,t) (x –wm(t))
where (m*,t) = 0 if m  N(m*,t), and for
example,
(m*,t) = hoexp(-|m-m*|2/s2(t)) if m  N(m*,t).
• Distance measure d(x, wm(t)) = || x – wm(t)||.
Other definition of distance may also be used.
(C) 2001 by Yu Hen Hu
10
Intro. ANN & Fuzzy Systems
SAMMON Mapping
• Visualization of high
dimensional data structure!
• The map developed in SOM
can serve such a purpose.
• In general, this is a multidimensional scaling problem :
Distances (ij) between lowD points {yi} in the map
correspond to the
dissimilarities (dij) between
points {xi} in the original
space.
(C) 2001 by Yu Hen Hu
• Optimization Problem: Given
{xi}, find {yi} to minimize


Jef     ij 
 i j 
1

i j
(dij   ij ) 2
 ij
• Optimize with gradient
search for some initial
selection of {yi}.


 yk Jef  2   ij 
 i j 
1
d kj   kj yk  y j


 kj
d kj
j k
• Other criteria may also be
used.
11