a view of the em algorithm that justifies incremental, sparse, and
Download
Report
Transcript a view of the em algorithm that justifies incremental, sparse, and
A VIEW OF THE EM ALGORITHM THAT
JUSTIFIES INCREMENTAL, SPARSE,
AND OTHER VARIANTS
RADFORD M. NEAL
GEOFFREY E. HINTON
발표: 황규백
Abstract
First, the concept of “the negative free energy” is
introduced.
E step maximizes this with respect to the distribution over
unobserved variables.
M step also maximizes this with respect to the model parameters.
Then, it is easy to justify an incremental variant of the EM
algorithm.
Also, for sparse algorithm, and other variants
Introduction
EM algorithms find maximum likelihood parameter
estimates in problems where some variables were
unobserved.
It can be shown that each iteration improves the true likelihood, or
leaves it unchanged.
The M step can be partially implemented.
Not maximizing, but improving
Generalized EM algorithm, ECM
The E step can also be partially implemented.
Incremental EM algorithm
• The unobserved variables are commonly independent.
Introduction(cont’d)
A view of the EM algorithm here
Maximizing the joint function of the parameters and of the
distribution over the unobserved variables.
And this is analogous to the “free energy” function used in
statistical physics.
This can also be viewed as Kullback-Leibler divergence.
E step maximizes this function with respect to the distribution over
unobserved variables.
M step also maximizes this function with respect to the model
parameters.
General Theory
Simple notations
Z: observed variable
Y: unobserved variable
P(y, z| )
• The joint probability of Y and Z.
• is the parameter.
Given observed data z, we wish to find the value of that
maximizes the log likelihood, L() = log P(z| ).
EM Algorithm
EM algorithms start with initial guess at the parameter (0)
and then proceeds following steps.
Each iteration improves, or leaves unchanged the true likelihood.
The algorithm converges to a local maximum of L().
GEM algorithm is also guaranteed to converge.
A View of Increasing One Function.
The function
Variational free energy
Kullback-Leibler divergence
P(y) = P(y| z, )
Lemma 1
Lemma 2
EM Algorithms in New Point of View
The Function F and Likelihood L
Incremental Algorithms
In typical applications
Z = (Z1, ..., Zn)
Y = (Y1, ..., Yn)
P(y, z| ) = i P(yi, zi| )
Then,
Incremental Algorithms(cont’d)
So, the algorithm is
Sufficient Statistics
Vector of sufficient statistics
s(y, z) =i si(yi, zi)
Standard EM Algorithm using sufficient statistics
Sufficient Statistics(cont’d)
Incremental EM using sufficient statistics
Fast convergence
Intermediate variant is
• as fast as incremental algorithm.
An Incremental Variant of EM by
Nowlan(1991)
The vicinity of the correct answer
More rapid.
Demonstration for a Mixture Model
A simple two Gaussian mixture
Zi: observed variable, real-valued
Yi: unobserved, binary variable
• indicates from which of two Gaussian distributions the corresponding
observed variable was generated.
parameter is
• (, 0, 0, 1, 1)
Gaussian Mixture Model
Sufficient statistics
Standard EM vs. Incremental EM
But, the computation time in a pass of incremental EM is as twice
as that of standard EM.
Incremental EM vs. Incremental Variant
There may be the combination of the variant and the incremental.
A Sparse Algorithm
A “sparse” variant of the EM algorithm may be
advantageous
when the unobserved variable, Y, can take on many possible values,
but only a small set of “plausible” values have non-negligible
probability.
Update only “plausible” values.
At infrequent intervals, all probabilities are recomputed, and the
“plausible” set is changed.
In detail,
Iterations in Sparse Algorithm
A normal iteration
A full iteration
Other Variants
“Winner-take-all” variant of EM algorithm
Early stages of maximizing F
In estimating Hidden Markov Models for speech recognition