Transcript titel

DATA MINING
van data naar informatie
Ronald Westra
Dep. Mathematics
Maastricht University
PROBABILISTIC MODEL-BASED
CLUSTERING USING MIXTURE
MODELS
Data Mining Lecture VI
[4.5, 8.4, 9.2, 9.6, Hand, Manilla, Smyth ]
Probabilistic Model-Based
Clustering using Mixture Models
A probability mixture model
A mixture model is a formalism for modeling a
probability density function as a sum of
parameterized functions. In mathematical
terms:
A probability mixture model
where pX(x) is the modeled probability
distribution function, K is the number of
components in the mixture model, and ak is
mixture proportion of component k. By
definition, 0 < ak < 1 for all k = 1…K and:
A probability mixture model
h(x | λk) is a probability distribution parameterized by λk.
Mixture models are often used when we know h(x) and
we can sample from pX(x), but we would like to
determine the ak and λk values.
Such situations can arise in studies in which we sample
from a population that is composed of several distinct
subpopulations.
A common approach for ‘decomposing’ a
mixture model
It is common to think of mixture modeling as a
missing data problem. One way to understand this is
to assume that the data points under consideration
have "membership" in one of the distributions we
are using to model the data. When we start, this
membership is unknown, or missing. The job of
estimation is to devise appropriate parameters for
the model functions we choose, with the connection
to the data points being represented as their
membership in the individual model distributions.
Probabilistic Model-Based
Clustering using Mixture Models
The EM-algorithm [book section 8.4]
Mixture Decomposition:
The ‘Expectation-Maximization’ Algorithm
The Expectation-maximization algorithm
computes the missing memberships of data points
in our chosen distribution model.
It is an iterative procedure, where we start with initial
parameters for our model distribution (the ak's and
λk's of the model listed above).
The estimation process proceeds iteratively in two
steps, the Expectation Step, and the Maximization
Step.
The ‘Expectation-Maximization’ Algorithm
The expectation step
With initial guesses for the parameters in our
mixture model, we compute "partial membership"
of each data point in each constituent distribution.
This is done by calculating expectation values for
the membership variables of each data point.
The ‘Expectation-Maximization’ Algorithm
The maximization step
With the expectation values in hand for group
membership, we can recompute plug-in estimates
of our distribution parameters.
For the mixing coefficient of this is simply the
fractional membership of all data points in the
second distribution.
EM-algorithm for Clustering
The Suppose we have data D with a model with
parameters  and hidden parameters H
Interpretation: H = the class label
Log-likelihood of observed data:
l ( )  log p( )  log p( D, H |  )
H
EM-algorithm for Clustering
With p the probability over the data D.
Let Q be the unknown distribution over the
hidden parameters H
Then the log-likelihood is:
Then the log-likelihood is:
l ( )  log p( D, H |  )
H
p( D, H |  )
l ( )  log Q( H )
Q( H )
H
p( D, H |  )
l ( )   Q( H ).log
Q( H )
H
[*Jensen’s inequality]
1
l ( )   Q( H ).log p( D, H |  )   Q( H ).log
 F (Q, )
Q( H )
H
H
Jensen’s inequality
for a concave-down function, the expected value of the function is less than the
function of the expected value. The gray rectangle along the horizontal axis
represents the probability distribution of x, which is uniform for simplicity, but the
general idea applies for any distribution
EM-algorithm
So: F(Q,) is a lower-bound on the log-likelihood
function l(Q,) .
EM alternates between:
E-step: maximising F to Q with fixed , and:
M-step: maximising F to  with fixed Q .
EM-algorithm
E-step:
M-step:
Q

k 1
k 1
 arg max F (Q , )
k
k
Q
 arg max F (Q

k 1
, )
k
Probabilistic Model-Based
Clustering using Gaussian Mixtures
Probabilistic Model-Based Clustering using Mixture Models
Probabilistic Model-Based Clustering using Mixture Models
Probabilistic Model-Based
Clustering using Mixture Models
Probabilistic Model-Based
Clustering using Mixture Models
Gaussian Mixture Decomposition
Gaussian mixture Decomposition is a good
classificator. It allows supervised as well as
unsupervised learning (find how many classes
is optimal, how they should be defined,...). But
training is iterative and time consuming.
Idea is to set position and width of gaussian
distribution(s) to optimize the coverage of
learning samples.
Probabilistic Model-Based
Clustering using Mixture Models
Probabilistic Model-Based Clustering using Mixture Models
Probabilistic Model-Based
Clustering using Mixture Models
Probabilistic Model-Based
Clustering using Mixture Models
Probabilistic Model-Based Clustering using Mixture Models
Probabilistic Model-Based Clustering using Mixture Models
Probabilistic Model-Based Clustering using Mixture Models
Probabilistic Model-Based Clustering using Mixture Models
Probabilistic Model-Based Clustering using Mixture Models
Probabilistic Model-Based Clustering using Mixture Models
The End