Transcript Talk6
Ferromagnetic Clustering
Data clustering using a model granular magnet
Introduction
In recent years there has been significant
interest in adapting numerical as well as
analytic techniques from statistical physics to
provide algorithms and estimates for good
approximate solutions to hard optimization
problems.
Cluster analysis is an important technique in
exploratory data analysis.
Introduction
Partitional clustering methods
Partitional clustering
parametric
non-parametric
Potts Model
The Potts model was introduced as a generalization of the Ising
model. The idea came from the representation of the Ising
model as interacting spins which can be either parallel or
antiparallel. An obvious generalization was to extend the
number of directions of the spins. Such a model was proposed
by C. Domb as a PhD thesis for his student R. Potts in 1952.
The original model considered an interaction of spins, which
point to one of s equally spaced directions in a plane defined by
n angles of the spin at the site q.
This model, now known as the vector Potts model or the clock
model, was solved in 2 dimensions by Potts for s=1,2,3,4.
Visualizations of Spin Models of
Magnetism
Spin models provide simple models of magnetism, and in
particular of the transition between magnetic and nonmagnetic phases as a magnetic material is heated past its
critical temperature. In a spin model of magnetism, variables
representing magnetic spins are associated with sites in a
regular crystalline lattice.
Different spin models are specified by different types of spin
variables, interactions between the spins, and lattice
geometry.
The Ising model is the simplest spin model, where the spins
have only two states (+1 and -1), representing magnetic
spins that are up or down.
Ferromagnetic Ising model
A configuration of the standard ferromagnetic Ising
model with two spin states (represented by black and
white) at the critical temperature, where there is a phase
transition between the low temperature ordered or
magnetic phase (where the spins align) and the high
temperature disordered or non-magnetic phase (where
the spins are more randomized).
As the temperature increases from zero (where all the
spins are the same), "domains" of opposite spin begin to
appear and start to disorder the system.
At the phase transition, domains or clusters of spins
appear in all shapes and sizes.
As the temperature increases further, the domains start
to break up into random individual spins.
Introduction
We present a new approach to clustering,
based on the physical properties of an
inhomogeneous ferromagnet.
No assumption is made regarding the
underlying distribution of the data.
Introduction
We assign a Potts spin to each data point and introduce an interaction
between neighboring points, whose strength is a decreasing function of
the distance between the neighbors.
This magnetic system exhibits three phases.
At very low temperatures it is completely ordered; i.e. all spins are
aligned.
At very high temperatures the system does not exhibit any ordering.
In an intermediate regime clusters of relatively strongly coupled spins
become ordered, whereas different clusters remain uncorrelated.
The spin-spin correlation function (measured by Monte Carlo) is used
to partition the spins and the corresponding data points into clusters.
Some Physics Background
Potts Model
The energy of a configuration is given by the Hamiltonian
Interactions are a decreasing function of the distance
The closer two points are to each other, the more they “like” to
be in the same state.
Some Physics Background
Potts Model
In order to calculate the thermodynamic average of a physical
quantity A at a fixed temperature T, one has to calculate the
sum
where the Boltzmann factor,
plays the role of the probability density which gives the
statistical weight of each spin configuration
in thermal equilibrium and Z is a normalization constant,
Potts Model
The order parameter of the system is <m>, where the
magnetization, m(S), associated with a spin configuration S is
defined (Chen, Ferrenberg and Landau 1992) as
with
where
is the number of spins with the value
The thermal average of
is called the spin–spin correlation
function,
which is the probability of the two spins si and sj being aligned.
Potts Model
Potts system is homogeneous when the spins are on a lattice and all
nearest neighbor couplings are equal.
At high temperatures the system is paramagnetic or disordered;
indicating that, for all
statistically significant
configurations.
As the temperature is lowered, the system undergoes a sharp
transition to an ordered, ferromagnetic phase; the magnetization
jumps to
.
At very low temperatures
and
.
The variance of the magnetization is related to a relevant thermal
quantity, the susceptibility,
Potts Model
Strongly inhomogeneous Potts models - spins form magnetic
“grains”, with very strong couplings between neighbors that
belong to the same grain, and very weak interactions between
all other pairs.
At low temperatures such a system is also ferromagnetic.
At high temperatures the system may exhibit an intermediate,
super-paramagnetic phase - when strongly coupled grains are
aligned, while there is no relative ordering of different grains.
Potts Model
There are can be a sequence of several transitions in the superparamagnetic phase: as the temperature is raised the system
may break first into two clusters, each of which breaks into more
sub-clusters and so on.
Such a hierarchical structure of the magnetic clusters reflects a
hierarchical organization of the data into categories and subcategories.
Working in the super-paramagnetic phase of the model we use
the values of the pair correlation function of the Potts spins to
decide whether a pair of spins do or do not belong to the same
grain and we identify these grains as the clusters of our data.
Monte Carlo simulation of Potts models:
the Swendsen-Wang method
The aim is to evaluate sums such as (2) for models with N >> 1
spins.
Direct evaluation of sums like (2) is impractical, since the
number of configurations S increases exponentially with the
system size N.
Monte Carlo simulations methods overcome this problem by
generating a characteristic subset of configurations which are
used as a statistical sample.
A set of spin configurations
is generated
according to the Boltzmann probability distribution (3). Then,
expression (2) is reduced to a simple arithmetic average:
where the number of configurations in the sample, M, is much
smaller than
, the total number of configurations.
Monte Carlo simulation of Potts models:
the Swendsen-Wang method
1. First “visit” all pairs of spins <i, j> that interact, i.e. have
Two spins are “frozen” together with probability
;
If in our current configuration Sn the two spins are in the same
state,
si = sj , then sites i and j are frozen with probability
2. Identifying the SW–clusters of spins - SW–cluster contains all
spins
which have a path of frozen bonds connecting them.
According to (8) only spins of the same value can be frozen in
the
Monte Carlo simulation of Potts models:
the Swendsen-Wang method
Final step of the procedure – generation of new spin
configuration
, by drawing, independently for each SW–
cluster, randomly a value s = 1, . . . q, which is assigned to all
its spins.
The physical quantities that we are interested in are the
magnetization (4) and its square value for the calculation of the
susceptibility
, and the spin–spin correlation function (5).
At temperatures where large regions of correlated spins occur,
local methods (such as Metropolis), which flip one spin at a
time, become very slow. The SW procedure overcomes this
difficulty by flipping large clusters of aligned spins
simultaneously.
Clustering of Data
Description of the Algorithm
Clustering of Data – Description of the
Algorithm
Assume that our data consists of N patterns
or measurements
, specified by N
corresponding vectors
, embedded in a
D-dimensional metric space.
Method consists of three stages.
Clustering of Data – Description of the
Algorithm
1.Construct the physical analog Potts-spin problem:
(a) Associate a Potts spin variable
to each point
(b) Identify the neighbors of each point
criterion.
(c) Calculate the interaction
and
.
.
according to a selected
between neighboring points
Clustering of Data – Description of the
Algorithm
2. Locate the super-paramagnetic phase.
(a) Estimate the (thermal) average magnetization, <m>, for
different temperatures.
(b) Use the susceptibility
phase.
to identify the super-paramagnetic
Clustering of Data – Description of the
Algorithm
3. In the super-paramagnetic regime
(a) Measure the spin–spin correlation,
points
,
.
(b) Construct the data-clusters.
, for all neighboring
Description of the Algorithm
Construction of the physical analog Potts-spin problem
1.a The value of q does not imply any assumption about the
number of clusters present in the data. q, determines mainly
the sharpness of the transitions and the temperatures at
which they occur.
1.b Since the data do not form a regular lattice, one has to supply
some reasonable definition for “neighbors”. We use Delaunay
triangulation over other graphs structures when the patterns
are embedded in a low dimensional (D ≤ 3) space.
When D>3 - vi and vj have a mutual neighborhood value K,
if and only if vi is one of the K-nearest neighbors of vj and vj
is one of the K-nearest neighbors of vi.
Description of the Algorithm
Construction of the physical analog Potts-spin problem
1.c We need strong interaction between spins that correspond to
data from a high density region, and weak interactions
between neighbors that are in low-density regions.
a is the average of all distances dij between neighboring pairs vi and vj.
is the average number of neighbors;
This normalization of the interaction strength enables us to
estimate the temperature corresponding to the highest superparamagnetic transition.
Description of the Algorithm
Locating super-paramagnetic regions
The various temperature intervals in which the
system self-organizes into different partitions to
clusters are identified by measuring the susceptibility
χ as a function of temperature.
Starting from highest transition temperature
estimate, one can take increasingly refined
temperature scans and calculate the function χ(T) by
Monte Carlo simulation.
Description of the Algorithm
Locating super-paramagnetic regions
1. Choose the number of iterations M to be performed.
2. Generate the initial configuration by assigning a random value to
each spin.
3. Assign frozen bond between nearest neighbors points vi and vj
with probability
.
4. Find the connected subgraphs, the SW–clusters.
5. Assign new random values to the spins (spins that belong to the
same SW–cluster are assigned the same value). This is the new
configuration of the system.
Description of the Algorithm
Locating super-paramagnetic regions
6. Calculate the value assumed by the physical
quantities of interest in the new spin configuration.
7. Go to step 3, unless the maximal number of iterations
M, was reached.
8. Calculate the averages (7).
Description of the Algorithm
Locating super-paramagnetic regions
We measure the susceptibility χ
at different temperatures in order to locate
these different regimes.
The aim is to identify the temperatures at
which the system changes its structure.
Description of the Algorithm
Identifying data clusters
Select one temperature in each region of
interest.
Each sub-phase characterizes a particular
type of partition of the data, with new
clusters merging or breaking.
Description of the Algorithm
Identifying data clusters
Swendsen–Wang method provides an improved estimator of the
spin–spin correlation function.
It calculates the two–point connectedness
, the probability
that sites vi and vj belong to the same SW–cluster, which is
estimated by the average (7) of the following indicator function
Cij = <cij> is the probability of finding sites vi and vj in the
same SW–cluster.
Then the spin–spin correlation function
Description of the Algorithm
Identifying data clusters
1. Build the clusters’ “core”; if
> 0.5, a link is set
between the neighbor data points
and
.
2. Capture points lying on the periphery of the clusters
by linking each point
to its
neighbor of
maximal correlation
.
3. Data clusters are identified as the linked components
of the graphs obtained in steps 1,2.
Applications
The Iris Data
The Iris Data
It consists of measurement of four quantities,
performed on each of 150 flowers. The
specimens were chosen from three species of
Iris. The data constitute 150 points in fourdimensional space.
The Iris Data
We determined neighbors in the D=4
dimensional space according to the
mutual K (K=5) nearest neighbors
definition.
We observe that there is a well
separated cluster (corresponding to the
Iris Setosa species) while clusters
corresponding to the Iris Virginia and
Iris Versicolor do overlap.
Applied the SPC method and obtained
the susceptibility curve
Projection of the iris data on the plane spanned by
its two principal components
.
The Iris Data
Susceptibility curve of Fig. (a) clearly
shows two peaks
When heated, the system first breaks
into two clusters at T~0.01.(Fig.b).
At T=0.02 we obtain two clusters, of
sizes 80 and 40;
At T~0.06 another transition occurs,
where the larger cluster splits to two.
At T=0.07 we identified clusters of
sizes 45, 40 and 38, corresponding to
the species Iris Versicolor,Virginica and
Setosa respectively.
The Iris Data
Iris data breaks into clusters in two stages.
This reflects the fact that two of the three
species are “closer” to each other than to the
third one.
The SPC method clearly handles very well
such hierarchical organization of the data.
The Iris Data
125 samples were classified correctly (as
compared with manual classification).
25 were left unclassified.
No further breaking of clusters was observed.
All three disorder at T~0.08.
The Iris Data
Among all the clustering algorithms used in
this example, the minimal spanning tree
procedure obtained the most accurate result,
followed by SPC method, while the remaining
clustering techniques failed to provide a
satisfactory result.
The Iris Data
Advantages of the SPC Algorithm vs.
Other Methods
Other methods and their
disadvantages
These methods employ a local criterion, against
which some attribute of the local structure of the
data is tested, to construct the clusters.
Typical examples are hierarchical techniques such as
the agglomerative and divisive methods.
All these algorithms tend to create clusters even
when no natural clusters exist in the data.
Other methods and their
disadvantages
(a) high sensitivity to initialization;
(b) poor performance when the data contains
overlapping clusters;
(c) inability to handle variability in cluster shapes,
cluster densities and cluster sizes;
(d) none of these methods provides an index that
could be used to determine the most significant
partitions among those obtained in the entire
hierarchy.
Advantages of the Method
provides information about the different self
organizing regimes of the data;
the number of “macroscopic” clusters is an output of
the algorithm;
hierarchical organization of the data is reflected in
the manner the clusters merge or split when a
control parameter (the physical temperature) is
varied;
Advantages of the Method
the results are completely insensitive to the initial
conditions;
the algorithm is robust against the presence of noise;
the algorithm is computationally efficient,
equilibration time of the spin system scales with N,
the number of data points, and is independent of the
embedding dimension D.