Unsupervised Learning

Download Report

Transcript Unsupervised Learning

Unsupervised Learning
G.Anuradha
Contents
•
•
•
•
•
Introduction
Competitive Learning networks
Kohenen self-organizing networks
Learning vector quantization
Hebbian learning
Introduction
• When no external teacher or critic’s information
is available, only input vectors are used for
learning
• Learning without supervision-unsupervised
learning
• Learning evolves to extract features or
regularities in presented patterns, without being
told about the output classes
• Frequently employed in data clustering, feature
extraction, similarity detection
Introduction
• Unsupervised learning NNs learn to
respond to different input patterns with
different parts of the network
• Trained to strengthen firing to respond to
frequently occurring patterns
• Based on this concept we have
competitive learning and kohenen selforganizing feature maps
Competitive learning networks
Competitive learning networks
• Weights in this case are updated based on
the input pattern alone.
• The no. of inputs is the input dimension,
while the no. of outputs is equal to the
number of clusters that the data are to be
divided into.
• Cluster center position=weight vector
connected to the corresponding output
unit.
Learning algorithm
• Learning algorithn used in most of these
nets is known as kohonen learning
• The units update their weights by forming
a new weight vector, which is a linear
combination of the old weight vector and
the new input vector.
• The weight updation formula for output
cluster unit j is given by
wk(t+1)=wk(t) + ɳ(x(t)-wk(t))
Learning algorithm
• There are 2 methods to determine the winner of
the network during competition
– Euclidian distance method
– Dot product method
• Euclidean distance method:– the square of Euclidean distance between the input
vector and the weight vector is computed
– The unit whose weight vector is at the smallest
Euclidean distance from the input vector is chosen as
the winner
Learning algorithm
• The dot product method:– The dot product between the input vector and weight
vector is computed
3
– aj=
xiwij

i 1
– The output unit with the highest activation must be
selected for further processing (competitive)
– The weights of this unit are updated (Winner-take-all)
– In this case the weight updations are given by
wk (t  1) 
wk (t )   ( x(t )  wk (t ))
|| wk (t )   ( x(t )  wk (t )) ||
What happens in a dot product
method?
• In the case of a dot product finding a
minimum of x-wi is nothing but finding the
maximum among the m scalar products
• Competitive learning network performs an
on-line clustering process on the input
patterns
• When process is complete the input data
is divided into disjoint clusters
• With euclidean distance the update
formula is actually an online gradient
descent that minimizes the objective
function
Competitive learning with unitlength vectors
The dots represent the input
vectors and the crosses denote
the weight vectors for the four
output units
As the learning continues, the
four weight vectors rotate
toward the centers of the four
input clusters
Limitations of competitive learning
• Weights are initialized to random values which
might be far from any input vector and it never
gets updated
– Can be prevented by initializing the weights to
samples from the input data itself, thereby ensuring
that all weights get updated when all the input
patterns are presented
– Or else the weights of winning as well as losing
neurons can be updated by tuning the learning
constant by using a significantly smaller learning rate
for the losers. This is called as leaky learning
– Note:- Changing η is generally desired. An initial value of η explores the data
space widely. Later on progressively smaller value refines the weights. Similar to
the cooling schedule in simulated annealing.
Limitations of competitive learning
• Lacks the capability to add new clusters
when deemed necessary
• If ɳ is constant –no stability of clusters
• If ɳ is decreasing with time may become
too small to update cluster centers
• This is called as stability-plasticity
dilemma (Solved using adaptive
resonance theory (ART))
• If the output units are arranged in the form of a
vector or matrix then the weights of winners as
well as neighbouring losers can be updated.
(Kohenen feature maps)
• After learning the input space is divided into a
number of disjoint clusters. These cluster
centers are known as template or code book
• For any input pattern presented we can use an
appropriate code book vector
(Vector
Quantization)
• This vector quantization is used in data
compression in IP and communication systems.
Kohenen Self-Organizing
networks
• Also known as Kohenen Feature maps or
topology-preserving maps
• Learning procedure of Kohenen feature maps is
similar to that of competitive learning networks.
• Similarity (dissimilarity) measure is selected and
the winning unit is considered to be the one with
the largest (smallest) activation
• The weights of the winning neuron as well as the
neighborhood around the winning units are
adjusted.
• Neighborhood size decreases slowly with every
iteration.
Training of kohenon self organizing
network
1. Select the winning output unit as the one with
the largest similarity measure between all wi
and xi . The winning unit c satisfies the
equation
||x-wc||=min||x-wi|| where the index c refers to
the winning unit (Euclidean distance)
2. Let NBc denote a set of index corresponding to
a neighborhood around winner c. The weights
of the winner and it neighboring units are
updated by
Δwi=ɳ(x-wi) iεNBc
• A neighborhood function around the winning unit
can be used instead of defining the
neighborhood of a winning unit.
• A Gaussian function can be used as
neighborhood function
 || pi  pc ||2
c(i )  exp(
)
2
2
Where pi and pc are the positions of the output units i and c respectively
and σ reflects the scope of the neighborhood.
The update formula using neighborhood function is given by
wi  c(i)(x  wi )
Start
Initialize
Weights,
Learning rate
Initialize
Topological
Neighborhood
params
For
Each i/p
X
For i=1 to n
For j=1 to m
D(j)=Ʃ(xi-wij)2
Winning unit index J is computed
D(J)=minimum
Weights of winning
unit
continue
Stop
Test
(t+1)
Is reduced
continue
Reduce radius
Of network
Reduce learning
rate
Learning Vector Quantization
LVQ
• It is an adaptive data classification method
based on training data with desired class
information
• It is actually a supervised training method
but employs unsupervised data-clustering
techniques to preprocess the data set and
obtain cluster centers
• Resembles a competitive learning network
except that each output unit is associated
with a class.
Network representation of LVQ
Possible data distributions and
decision boundaries
LVQ learning algorithm
• Step 1: Initialize the cluster centers by a
clustering method
• Step 2: Label each cluster by the voting method
• Step 3: Randomly select a training input vector x
and find k such that ||x-wk|| is a minimum
• Step 4: If x and wk belong to the same class
update wk by
wk   ( x  wk )
else
wk   ( x  wk )
• The parameters used for the training
process of a LVQ include the following
– x=training vector (x1,x2,……xn)
– T=category or class for the training vector x
– wj= weight vector for j th output unit
(w1j,…wij….wnj)
– cj= cluster or class or category associated
with jth output unit
– The Euclidean distance of jth output unit is
D(j)=Ʃ(xi-wij)2
For each i/p
x
Initialize weight
Learning rate
Start
Y
Calculate winner
Winner = min D(j)
Input T
A
B
If T=Cj
Y
wj(n)=wj(o) +
ɳ[x-wj(o)]
A
Y
Stop
B
If ɳ reduces
negligible
Reduce ɳ
ɳ(t+1)=0.5 ɳ(t)
N
wj(n)=wj(o) ɳ[x-wj(o)]