Unsupervised models and clustering
Download
Report
Transcript Unsupervised models and clustering
Unsupervised models
and clustering
1
Table of contents
Introduction
The unsupervised learning framework
SelfOrganizing Maps (SOMs)
2
Introduction 1
In order to efficiently mimic the nervous system,
it is necessary to get an idea of the nature of the
biological processes that actually take place in the
brain
The only reasonable assumption is that they are
driven by mechanisms aimed at optimizing the
target they have to pursue
Each cognitive process evolves from an initial
situation associated to a stimulus, up to a
terminal state in which a response is produced,
which is the result of the process itself
It is intuitive that, in this evolution, there is a sort
of information transfer
3
Introduction 2
In fact, it is just the initial stimulus that provides
the information necessary to obtain the desired
response
Such information must be transmitted and
preserved until the conclusion of the process
A reasonable way for interpreting the processes
that take place in the nervous system is therefore
to consider them as transfers of information that
follow a “maximum conservation” criterion
4
The unsupervised learning framework 1
Sometimes, in realworld applications, we do not have
sufficient knowledge on the problem to be solved in
order to entrust to the supervised learning paradigm,
i.e. we do not know the solution to the problem, albeit
sampled on a set of examples, just having at disposal
only data to be analyzed and interpreted, without an a
priori “guideline” of specific information on them
A classic problem of this type is to highlight some data
groups, with common characteristics, within a
“disordered and miscellaneous” set
This problem can be addressed by using a self
organizing neural network capable of learning data
without a supervisor which provides solutions at
specific points in the weight space
5
The unsupervised learning framework 2
Learning without supervision means aggregating
“similar” examples in topologically neighboring regions
While in the supervised learning
protocol the variation of synaptic
connections is done in an
attempt to minimize the error
with respect to the information
provided by the supervisor, in
this case, the learning procedure
is guided by “similarity” criteria,
inherently present within the
data
6
The unsupervised learning framework 3
An unsupervised learning example: the Hebb rule
Hebb (1949): if two connected neurons are active
at the same time, the force (or the weight) of
their connection is increased
The connection weight changes by an amount
proportional to the product of the activation values
of the two neurons with the weight itself, thus
tending to “strengthen” the strongest links and to
“weaken” the weakest ones
More realistic, from the biological point of view,
with respect to BackPropagation
7
The unsupervised learning framework 4
Summing up…
Given a set of examples we want to find something
interesting
The experience is given by the collected “examples”
The problem is that of “finding something
interesting…”
Without any information other than the data itself…
The performance depends on how “interesting” is
the obtained result
8
The unsupervised learning framework 5
9
Selforganizing networks 1
In the central nervous system, the ganglion cells,
which constitute the output stage of the retina, are
organized according to receptive fields, sensitive to
particular stimuli
In the auditory system cortex, neurons and fibers are
anatomically arranged in an orderly manner with
respect to the acoustic frequencies
Studies on brain maps during the prenatal stage, i.e.
in the absence of sensory stimuli, have demonstrated
that, in the lower levels, the neural structure is
genetically predetermined, while at higher levels the
organization matures and emerges as a result of the
learning from examples process
10
Selforganizing networks 2
Trivially, it can be observed that when children
learn to read, mark the letters one by one and
“think” while reading; in adults, however, whole
words and sentences are recognized at a glance
This shows that there is a kind of cerebral
selforganization, that develops the brain power to
classify and easily recognize some “common”
patterns, which is also confirmed by the difficulty of
reading a text upside down or containing attached
words
11
Selforganizing networks 3
Studied by Grossberg, Von Malsburg, Rumelhart,
Zipser and Kohonen since the mid ‘80s, self
organizing neural networks are particularly
indicated in those applications where there is no a
priori knowledge on how to process certain
information, so that the network is expected to
automatically “discover” interesting concepts
within the “data population”
Both the stimulus and the network response are
characterized by a given number of components,
describing points in appropriate spaces
12
Selforganizing networks 4
The learning process can be interpreted as a
geometric transformation from the input to the
output space
The output space has a size smaller than that of the
input space, since the stimulus normally contains
the information required to activate many
simultaneous processes
Compared to a single one, it is redundant
This means that during selforganization, a
redundancy reduction operation is carried out
In both the input and the output space, typical
regions are evidenced, to which information is
associated
13
Selforganizing networks 5
The natural mechanism that controls the information
transfer must then identify those regions, that are
important for the cognitive process, and make sure
that they will correspond in the transformation
Therefore, in the cognitive process, a data clustering
operation is carried out, realized with the acquisition of
experience
To both the operations of redundancy reduction and
clustering, biological evidence can be attached, based
on the functioning of the nervous system
14
Selforganizing networks 6
Mapping from the input space to the space
defined by the neurons of the selforganizing
network
15
Selforganizing networks 7
The main use of selforganizing networks is to perform
data analysis, in order to highlight regularities among
data or to classify them (they are particularly used for
shape classification, in image and signal recognition)
Selforganizing neural networks are suitable for the
solution of different problems with respect to supervised learning approaches
They can be used for classification tasks, such as for
handwritten character recognition or for the identification
of radar signals
They are used to implement clustering, for partitioning
data sets into collections of classes, or clusters,
containing similar patterns
They can perform data compression
16
SelfOrganizing Maps 1
Biological motivation
The Kohonen network is modeled on the basis of a
characteristic behaviour of neurons in laminar
nervous tissues: such neurons are activated, under
the action of a stimulus, in groups characterized by
the presence or the absence of activity, defining as
“activity” the emission of a number of pulses (per
unit time) that exceeds a certain threshold
The spatial demarcation between the two groups is
clear, so that we talk about the formation of
“activation bubbles”, obtained starting from a
situation of undifferentiated low activity of all the
neurons in the network
17
SelfOrganizing Maps 2
In the case of biological neurons, those neurons
that are physically closer to active neurons have
strong and excitatory connections, while the
peripheral ones have inhibiting connections
In the cerebral cortex, there are projections of
sensory stimuli on specific networks of cortical
neurons
The sensorimotor neurons constitute a distorted
map (the extent of each region is proportional to
the sensitivity of the corresponding area, not to its
size) of the body surface
However, adjacent parts of the cortex correspond to
adjacent parts on the body surface
18
SelfOrganizing Maps 3
19
SelfOrganizing Maps 4
Kohonen modelled this feature by restricting the
variation of the weights only to those neurons
that are close to a selected neuron
SOM (T. Kohonen, 1981): “sensorial” maps,
constituted by a single layer of neurons, where
the units specialize themselves to respond to
different stimuli, in such a way that:
different inputs activate different (far away) units
topologically neighboring units are activated by
similar inputs
20
SelfOrganizing Maps 5
Architecture
A single layer of neurons ni, i1,…,wh (where w is the
width and h the height of the map, in the bi
dimensional case)
Each input X=[x0,x1,…,xN1] is connected with all the
neurons
To each connection, a weight wij is associated
Activation function fi = 1/d(Wi,X), where d is a distance
Lateral untrainable connections
21
SelfOrganizing Maps 6
The weights of each neuron are modified:
in an excitatory sense, in proportion to the value of its
activation and to those of the neurons belonging to its
neighborhood, and also proportionally to the distance
from them
in an inhibitory sense, in proportion to the value of the
activation of the neurons outside the neighborhood, and
also proportionally to the distance from them
Therefore, proposing the same input to the network:
neurons that originally had a high activation value, and
their neighbors, will show even a greater activation
neurons that responded little, will react even less
22
SelfOrganizing Maps 7
If “welldistributed” data are presented to the
network, in an iterative fashion, each neuron
specializes itself in responding to inputs of a
certain type
Moreover, neighboring neurons respond to
“similar” stimuli projecting, in practice, the input
space on the layer of neurons
Data dimensionality reduction, from N (input size)
to m (map size, usually 23)
Each data is represented by the coordinate of the
unit on which it is projected, that is the one that
has the maximum activation, or, in other words,
whose weight is more similar (closer) to the data
itself
23
SelfOrganizing Maps 8
Lateral interaction
The output layer neurons are connected with a
“neighborhood” of neurons, according to a system
of lateral interaction defined “Mexican hat”
These connection weights are not subjected to
learning, but they are fixed and positive in the
periphery of each neuron
24
SelfOrganizing Maps 9
In the output layer, a unique neuron must be the
“winner” (showing the maximum activation value)
for each pattern presented to the network
The winner neuron identifies the class to which
the input belongs
The “Mexican hat” connections tend to favour the
formation of bubbles of activation, that identify
similar input
25
SelfOrganizing Maps 10
In practice:
the input space is partitioned into n subspaces, if n
is the dimension of the layer (in neurons)
each subspace si of S{Xk} is defined by:
si {Xj t.c. d(Xj,Wi) mint d(Xj,Wt)}
26
Voronoi tessellation
SelfOrganizing Maps 11
Simplifications of the model to implement the
training algorithm:
weights are changed only in the neighborhood of
the neuron that has the maximum activation (this
type of training is also known as competitive
learning)
we consider only the excitatory lateral interactions
within a limited neighborhood of the winner neuron
Note: changing the weights in the excitatory
sense means making them more and more close
to the inputs; changing the inhibitory weights
causes an opposite behaviour
27
SelfOrganizing Maps 12
aC (alearning rate, C small positive constant 1)
Repeat:
For each input xi belonging to the training set:
1. find the winner neuron nj
2. update the weights of the winner neuron and of
the neurons belonging to its neighborhood, as:
wj(t1) wj (t) a|xi(t1) wj (t)|
a(k1) a(k)(1g) (g small positive constant 1)
until the network comes to a stable equilibrium point
28
SelfOrganizing Maps 13
The role of learning is to rotate the synaptic weight vector to align it
to the input vector; in this way, the winner neuron is even more
sensitized to the recognition of a particular input pattern
29
SelfOrganizing Maps 14
The SOM learning algorithm is very simple (it does not
require the derivative calculation):
Neuron j*, with the weight vector closest to the input
pattern, is selected; that neuron “recall” the input vector
and changes its weights in order to align it with the input
Also the weight vectors of the neighboring neurons of
j* are updated
the network tries to create regions consisting of a large
set of values around the input with which it learns
those vectors that are spatially close to the training
values will be correctly classified even if the network has
never seen them before (generalization capability)
30
SelfOrganizing Maps 15
The purpose of a SOM is to have, for similar
inputs, neighboring winner neurons, so that each
activation bubble represents a class, containing all
the input data with common characteristics
The neighboring region can be chosen as a
square, a circle or a hexagon around the winner
neuron
The neighborhood must be chosen large, at the
beginning, whereas it should slowly decrease
during learning
31
SelfOrganizing Maps 16
Summing up…
SOMs are able to realize data clustering, i.e. they
can identify, in the input space, a set of partitions
induced by the similarities/differences among data
Each partition is represented by a prototype
(centroid) defined by the weight vector of the
corresponding neuron
SOMs produce an unsupervised clustering: they do
not exploit any a priori information about the
classes (number and type)
After training, it is possible to label (classify) data,
based on the partition of the input space to which
they belong
32
SOMs and Bioinformatics
Gene clustering from microarray data
Prediction of microRNA target (miRNAs are small
non coding RNAs that regulate transcriptional
processes via binding the target gene mRNA)
Human tissue classification
DNA motif identification for extracting binding
sites in DNA sequences
Prediction of the behavior of synthetic peptides in
inhibiting the adhesion of human tumor cells to
extracellular proteins
…
33
SOMs and structured data
Since additional structural information is often
available in possible applications of self
organizing maps (SOMs), a transfer of standard
unsupervised learning methods to sequences and
more complex graph structures would be valuable
Nevertheless SOM constitutes a metricbased
approach, and therefore it can be applied directly
to structured data if data comparison is defined
and a notion of adaptation within the data space
can be found
34