Unsupervised Learning

Download Report

Transcript Unsupervised Learning

Machine Learning
Neural Networks (3)
Understanding Supervised and
Unsupervised Learning
A
A
B
B
B A
Two possible Solutions…
A
A
B
B
A
B
B
A
B
B
A
A
Supervised Learning
•
•
•
It is based on a
labeled training set.
The class of each
piece of data in
training set is
known.
Class labels are
pre-determined and
provided in the
training phase.
 Class
A
 Class
B
 Class
B
 Class
A
 Class
B
A
 Class
Unsupervised Learning
•Supervised learning, in which an external teacher
improves network performance by comparing desired
and actual outputs and modifying the synaptic weights
accordingly.
•However, most of the learning that takes place in our
brains is completely unsupervised.
•This type of learning is aimed at achieving the most
efficient representation of the input space, regardless
of any output space.
Supervised Vs Unsupervised
•
•
Task performed
Classification
Pattern
Recognition
NN model :
Preceptron
Feed-forward NN
“What is the class of
this data point?”
•
Task performed
Clustering
•
NN Model :
Self Organizing
Maps
“What groupings exist
in this data?”
“How is each data
point related to the
data set as a
whole?”
Unsupervised Learning
•
Input : set of patterns P, from n-dimensional space S, but
little/no information about their classification, evaluation,
interesting features, etc.
It must learn these by itself! : )
•
Tasks:
– Clustering - Group patterns based on similarity
– Vector Quantization - Fully divide up S into a small
set of regions (defined by codebook vectors) that also
helps cluster P.
– Feature Extraction - Reduce dimensionality of S by
removing unimportant features (i.e. those that do not
help in clustering P)
Unsupervised learning
• The network must discover for itself patterns,
•
•
features, regularities,correlations or
categories in the input data and code them for
the output.
The units and connections must self-organize
themselves based on the stimulus they
receive.
Note that unsupervised learning is useful
when there is redundancy in the input data.
Redundancy provides knowledge.
Unsupervised Neural Networks –
Kohonen Learning
•
•
•
•
•
•
Also defined – Self Organizing Map
Learn a categorization of input space
Neurons are connected into a 1-D or 2-D lattice.
Each neuron represents a point in Ndimensional pattern space, defined by N weights
During training, the neurons move around to try
and fit to the data
Changing the position of one neuron in data
space influences the positions of its neighbors
via the lattice connections
Unsupervised Learning
•Applications of unsupervised learning include
•
•
•
•
Clustering
Vector quantization
Data compression
Feature extraction
Self Organizing Map – Network
Structure
•
•
•
•
All inputs are
connected by weights
to each neuron
size of neighbourhood
changes as net learns
Aim is to map similar
inputs (sets of values)
to similar neuron
positions.
Data is clustered
because it is mapped to
the same node or
group of nodes
Self-organising maps (SOMs)
•
Inspiration from Biology: In auditory pathway
nerve cells arranged in relation to frequency
response (tonotopic organisation).
•
Kohonen took inspiration from to produce selforganising maps (SOMs).
•
In SOM units located physically next to one
another will respond to input vectors that are
‘similar’.
SOMs
•
Useful, as difficult for Humans to visualise when
data has > 3 dimensions.
•
Large dimensional input vectors 'projected down'
onto 2-D map in way maintaining natural order
similarity.
•
SOM is 2-D array of neurons, all inputs arriving
at all neurons .
SOMs
•
Initially each neuron has own set of (random)
weights.
•
When input arrives neuron with pattern of
weights most similar to input gives largest
response.
SOMs
•
Positive excitatory feedback between SOM unit
and nearest neighbours.
•
Causes all the units in ‘neighbourhood’ of winner
unit to learn.
•
As distance from winning unit increases degree
of excitation falls until it becomes inhibition.
•
Bubble of activity (neighbourhood) around unit
with largest net input (Mexican-Hat function).
SOMs
•
•
Initially each weight set to random number.
Euclidean distance D used to find difference
between input vectors and weights of SOM units
(D = square root of the sum of the squared
differences) =
n

i 1
( xi  wij ) 2
SOMs
•
For a 2-dimensional problem, the distance
calculated in each neuron is:
2
 (x
i 1
 wij )  ( x1  w1 j )  ( x 2  w2 j )
2
i
2
2
SOM
•
Input vector simultaneously compared to all
elements in network, one with lowest D is
winner.
•
Update weights all in neighbourhood around
winning unit.
•
If winner is ‘c’, neighbourhood defined as being
Mexican Hat function around ‘c’ .
SOMs
•
Weights of units are adjusted using:
wij = k(xi – wij )Yj
Where Yj from Mexican Hat function
•
k is a value which changes over time (high at
start of training, low later on).
SOM-Algorithm
1. Initialization :Weights are set to unique random values
2. Sampling : Draw an input sample x and present in to network
3. Similarity Matching : The winning neuron i is the neuron with the
weight vector that best matches the input vector
i = argmin(j){ x – wj }
SOM - Algorithm
4. Updating : Adjust the weights of the winning
neuron so that they better match the input.
Also adjust the weights of the neighbouring
neurons.
∆wj = η . hij ( x – wj)
•
neighbourhood function : hij
over time neigbourhood function gets smaller
Result: The neurons provide a good approximation
of the input space and correspond
Two distinct phases in training
•
•
Initial ordering phase: units find correct
topological order (might take 1000 iterations
where k decreases from 0.9 to 0.01, Nc
decreases l from ½ diameter of the network to 1
unit.
Final convergence phase: accuracy of weights
improves. (k may decrease from 0.01 to 0 while
Nc stays at 1 unit. Phase could be 10 to 100
times longer depending on desired accuracy.
WEBSOM
All words of
document are
mapped into the word
category map
Histogram of “hits” on
it is formed
Self-organizing semantic map.
15x21 neurons
Interrelated words that have similar contexts
appear close to each other on the map
Self-organizing maps of document collections.
Self-organizing map.
Largest experiments have used:
• word-category map
315 neurons with 270
inputs each
• Document-map
104040 neurons with 315
inputs each
23
WEBSOM
NN 4
24
Clustering Data
K-Means Clustering
K-Means ( k , data )
• Randomly choose k
cluster center locations
(centroids).
• Loop until convergence
• Assign each point to the
cluster of the closest
centroid.
• Reestimate the cluster
centroids based on the
data assigned to each.
K-Means Clustering
K-Means ( k , data )
• Randomly choose k
cluster center locations
(centroids).
• Loop until convergence
• Assign each point to the
cluster of the closest
centroid.
• Reestimate the cluster
centroids based on the
data assigned to each.
K-Means Clustering
K-Means ( k , data )
• Randomly choose k
cluster center locations
(centroids).
• Loop until convergence
• Assign each point to the
cluster of the closest
centroid.
• Reestimate the cluster
centroids based on the
data assigned to each.
K-Means Animation
Example generated by
Andrew Moore using
Dan Pelleg’s superduper fast K-means
system:
Dan Pelleg and Andrew
Moore. Accelerating
Exact k-means
Algorithms with
Geometric Reasoning.
Proc. Conference on
Knowledge Discovery in
Databases 1999.
K-means Clustering
– Initialize the K weight vectors, e.g. to randomly chosen
examples. Each weight vector represents a cluster.
– Assign each input example x to the cluster c(x) with the
nearest corresponding weight vector:
– Update the weights:
w j (n  1) 
x
c(x)  arg min j x  w j ( n )
x such that c(x)  j
/ nj
with n j the number of examples assigned to cluster j
– Increment n by 1 and go until no noticeable changes of
weight vectors occur.
30
Simple competitive learning
Issues
•
•
•
How many clusters?
– User given parameter K
– Use model selection criteria (Bayesian Information Criterion) with
penalization term which considers model complexity. See e.g. Xmeans: http://www2.cs.cmu.edu/~dpelleg/kmeans.html
What similarity measure?
– Euclidean distance
– Correlation coefficient
– Ad-hoc similarity measure
How to assess the quality of a clustering?
– Compact and well separated clusters are better … many different
quality measures have been introduced. See e.g. C. H. Chou, M.
C. Su and E. Lai, “A New Cluster Validity Measure and Its Application
to Image Compression,” Pattern Analysis and Applications, vol. 7, no.
2, pp. 205-220, 2004. (SCI)
32