Transcript Training

2806 Neural Computation
Self-Organizing Maps
Lecture 9
2005 Ari Visa
Agenda
Some historical notes
 Some theory
 Self-Organizing Map
 Learning Vector Quantization
 Conclusions

Some Historical Notes
Local ordering (von der Malsbyrg, 1973)
(Amari, 1980) a matematical analysis elucidates the
dynamic stability of a cortical map
Self-organizing feature map (SOM), (Kohonen
1982)
(Erwin, 1992) a convex neighbourhood function
should be used (~ Gaussian)
The relationship between the SOM and principal
curves is dicussed (Ritter, 1992 & Cherkassky and
Mulier, 1995)
Some Historical Notes




Vector quantization: Lloyd algorithm 1957 for
scalar quantization (= Max quantizer )
The generilized Lloyd algorithm for vector
quantization (= k-means algorithm McQueen 1969
= LBG algorithm Linde et al 1980)
The idea of learning vector quantization (LVQ)
(Kohonen, 1986)
Convergence properties of the LVQ algorithm
using the ordinary differential equation (ODE)
(Baras and LaVigna, 1990)
Some Theory


The spatial location of an
output neuron in a topographic
map corresponds to a particular
domain or feature of data drawn
from the input space. (Kohonen
1990)
Two approaches: Willshaw-von
der Malsburg (explain
neurobiological details) and
Kohonen ( a more general than
the first model in the sense that
it is capable of performing data
reduction).
Some Theory

The principle goal of the
self-organizing map is to
transform an incoming
signal pattern of arbitary
dimension into a one- or
two dimensional discrete
map, and to perform this
transformation adaptively
in a topologically ordered
fashion.
SOM
The formation of the self-organizing
map
1. Competition
2. Cooperation
3. Synaptic Adaptation







Competitive Process
x = [x1 , x2 , ... ,xm ]T
wj = [wj1,wj2 , ... ,wjm]T
where j = 1,2,...,l (=total number of
neurons in the network)
Select the neuron with largest wjTx
i(x) = arg minj||x – wj ||
A continuous input space of
activation patterns is mapped onto
a discrete output space of neurons
by a process of competition among
the neurons in the network .
SOM
Cooperation
 How to define a topological
neigborhood that is neurobiologically
correct?
 Let hij denote the topological
neighborhood centered on winning
neuron i, and encompassing a set of
excited neurons denoted by j.
 The topological neighborhood is
symmetric about the maximum point
 The amplitude of the topological
neighborhood decreases monotonically
with the increasing lateral distance
 hj,i(x) (n) = exp(d²j,i/2²(n))
 (n) = 0exp(-n/1), n =0,1,2,… .
SOM
Adaptation
 wj(n+1) = wj(n) + (n) hj,i(x)(n)[x-wj(n)], note
Hebbian learning
 The synaptic weight vector wj of winning neuron
i move toward the input vector x. Upon repeated
presentations of the training data, the synaptic
weight vector tend to follow the distribution of
the input vectors due to the neighborhood
updating.  topological ordering
 The learning-rate parameter (n) should be time
varying. (n) =  0exp(-n/2), n =0,1,2,…
SOM
Ordering and Convergence
 Self-organizing or ordering phase: (max
1000 iterations)
 (n) = [0.1, 0.01], 2 = 1000
 hj,i(n) = [”radius” of the lattice, the
winning neuron and a couple
neighboring neurons around], 1 = 1000/
log 0
 Convergence phase: (fine tune the
feature map, 500*the number of neurons
in the network)
(n) = 0.01, hj,i(n) = the winning neuron
and one or zero neighboring neurons.
(summary of SOM)

Some Theory


Property 1.
Approximation of the
Input Space. The feature
map , represented by the
set of synaptic weight
vectors {wj} in the input
space A, provides a good
approximation to the input
space H.
The theoretical basis of
the idea is rooted in vector
quantization theory.
Some Theory

c(x) acts as an encoder of the
input vector x and x’(c) acts as
a decoder of c(x). The vector x
is selected at random from the
training sample, subject to an
underlying probability density
function fx(x). The optimum
encoding-decoding scheme is
detemined by varying the
functions c(x) and x’(c), so as
to minimize the expected
distortion defined by .
Some Theory
d(x,x’) = ||x-x’||²
= (x-x’)T(x-x’)
Generalized Lloyd algorithm
Condition 1. Given the input vector
x, choose the code c = c(x) to
minimize the squared error
distortion ||x-x’(c)||².
Condition 2. Given the code c,
compute the reconstuction
vector x =x’(c) as the centroid
of the input vectors x that
satisfy condition 1.
The Generalized Lloyd algorithm
is closely related to the SOM.

Some Theory
Condition 1. Given the input vector
x, choose the code c = c(x) to
minimize the distortion measure
D2 .
Condition 2. Given the code c,
compute the reconstuction
vector x =x’(c) to satisfy the
conditions.
 x’new(c) ← x’old(c) + (c-c(x))[
x-x’old(c)]
Some Theory

Property 2. Topological Ordering. The
feature map  computed by the SOM
algorithm is topologically ordered in the
sense that the spatial location of a neuron in
the lattice corresponds to a particular
domain or feature of input patterns.
Some Theory



Property 3. Density Matching. The feature map  reflects variations
in the statistics of the input distributions: regions in the input space H
from which the sample vectors x are drawn with a high probability of
occurrence are mapped onto larger domains of the output space A, and
therefore with better resolution than regions in H from which sample
vectors x are drawn with a low probability of occurrence.
Minimum-distortion encoding, according to which the curvature terms
and all higher-order terms in the distortion measure due to noise model
() are retained. m(x)  fx ⅓(x)
Nearest-neighbor encoding, which emerges if the curvature terms are
ignored, as in the standard form of the SOM algorithm. m(x)  fx ⅔(x)
Some Theory

Property 4. Feature
Selection. Given data
from an input space with
an nonlinear distribution,
. is
the self-organizing map
able to select a set of best
features for approximating
the underlying
distribution.
Learning Vector Quantizer



Vector quantization: an input
space is divided into a number
of distinct regions, and for each
region a reconstruction vector is
defined.
A vector quantizer with
minimum encoding distortion is
called a Voronoi or nearestneighbor quantizer.
The collection of possible
reproduction vectors is called
the code book of the quantizer,
and its members are called code
vectors.
Learning Vector Quantizer
The SOM algorithm provides
an approximate method
for computing the Voronoi
vectors in unsupervised
manner.
Learning vector quantization
(LVQ) is a supervised
learning technique that
uses class information to
move the Voronoi vectors
slightly, so as to improve
the quality of the classifier
decision regions.
Learning Vector Quantizer


An input vector x is picked at
random from the input space. If the
class labels of the input vector x
and a Voronoi vector w agree, the
Voronoi vector w is moved in the
direction of the input vector x. If
the class labels of the input vector x
and the Voronoi vector w disagree,
the Voronoi vector w is moved
away from the input vector x.
Let {wi}1i=1 denote the set of
Voronoi vectors, and the {xi}Ni=1
denote the set of input vectors.
LVQ:
I.
Suppose that the Voronoi vector
wc is the closest to the input
vector xi . Let Lwc denote the
class associated with the Voronoi
vector wc and Lxi denote the class
label of the input vector xi. The
Voronoi vector wc is adjusted as
follows:
If Lwc = Lxi ,then wc(n+1) = wc(n) +
αn[xi - wc(n)] where 0< αn<1.
If Lwc ≠ Lxi ,then wc(n+1) = wc(n) αn[xi - wc(n)] where 0< αn<1.
II. The other Voronoi vectors are not
modified.
Learning Vector Quantizer
Vector quantization is a form of lossy data compression.
Rate distortion theory (Gray 1984): Better data compression performance
can always be achieved by coding vectors instead of scalars, even if
the source of data is memoryless, or if the data compression system has
memory.
A multistage hierarchical vector quantizer (Luttrell 1989)
Learning Vector Quantizer

First-order autoregressive
(AR) model: x(n+1) =
x(n) +(n) where  is the
AR coefficient and the
(n) are independent and
identical distributed (iid)
Gaussian random
variables of zero mean and
unit variance.
Some Theory




Attribute code xa
Symbol code xs
x =[xs, 0]T + [0, xa]T
Contextual map or
semantic map
Summary



The SOM algorithm is neurobiologically inspired,
incorporating all the mechanisms that are basic to
self-organization: competition, cooperation, and
self-amplification.
The Kohonen’s SOM algorithm is so simple to
implement, yet matematically so difficult to
analyze its properties in a general setting.
The self-organizing map may be viewed as a
vector quantizer.