a view of the em algorithm that justifies incremental, sparse, and

Download Report

Transcript a view of the em algorithm that justifies incremental, sparse, and

A VIEW OF THE EM ALGORITHM THAT
JUSTIFIES INCREMENTAL, SPARSE,
AND OTHER VARIANTS
RADFORD M. NEAL
GEOFFREY E. HINTON
발표: 황규백
Abstract

First, the concept of “the negative free energy” is
introduced.
 E step maximizes this with respect to the distribution over
unobserved variables.
 M step also maximizes this with respect to the model parameters.

Then, it is easy to justify an incremental variant of the EM
algorithm.
 Also, for sparse algorithm, and other variants
Introduction

EM algorithms find maximum likelihood parameter
estimates in problems where some variables were
unobserved.
 It can be shown that each iteration improves the true likelihood, or
leaves it unchanged.

The M step can be partially implemented.
 Not maximizing, but improving
 Generalized EM algorithm, ECM

The E step can also be partially implemented.
 Incremental EM algorithm
• The unobserved variables are commonly independent.
Introduction(cont’d)

A view of the EM algorithm here
 Maximizing the joint function of the parameters and of the
distribution over the unobserved variables.
 And this is analogous to the “free energy” function used in
statistical physics.
 This can also be viewed as Kullback-Leibler divergence.
 E step maximizes this function with respect to the distribution over
unobserved variables.
 M step also maximizes this function with respect to the model
parameters.
General Theory

Simple notations
 Z: observed variable
 Y: unobserved variable
 P(y, z| )
• The joint probability of Y and Z.
•  is the parameter.

Given observed data z, we wish to find the value of  that
maximizes the log likelihood, L() = log P(z| ).
EM Algorithm

EM algorithms start with initial guess at the parameter  (0)
and then proceeds following steps.
 Each iteration improves, or leaves unchanged the true likelihood.
 The algorithm converges to a local maximum of L().

GEM algorithm is also guaranteed to converge.
A View of Increasing One Function.

The function
 Variational free energy

Kullback-Leibler divergence
 P(y) = P(y| z, )
Lemma 1
Lemma 2
EM Algorithms in New Point of View
The Function F and Likelihood L
Incremental Algorithms

In typical applications
 Z = (Z1, ..., Zn)
 Y = (Y1, ..., Yn)
 P(y, z| ) = i P(yi, zi| )

Then,
Incremental Algorithms(cont’d)

So, the algorithm is
Sufficient Statistics

Vector of sufficient statistics
 s(y, z) =i si(yi, zi)

Standard EM Algorithm using sufficient statistics
Sufficient Statistics(cont’d)

Incremental EM using sufficient statistics
 Fast convergence
 Intermediate variant is
• as fast as incremental algorithm.
An Incremental Variant of EM by
Nowlan(1991)
 The vicinity of the correct answer
 More rapid.
Demonstration for a Mixture Model

A simple two Gaussian mixture
 Zi: observed variable, real-valued
 Yi: unobserved, binary variable
• indicates from which of two Gaussian distributions the corresponding
observed variable was generated.
 parameter  is
• (, 0, 0, 1, 1)
Gaussian Mixture Model

Sufficient statistics
Standard EM vs. Incremental EM
 But, the computation time in a pass of incremental EM is as twice
as that of standard EM.
Incremental EM vs. Incremental Variant
 There may be the combination of the variant and the incremental.
A Sparse Algorithm

A “sparse” variant of the EM algorithm may be
advantageous
 when the unobserved variable, Y, can take on many possible values,
but only a small set of “plausible” values have non-negligible
probability.

Update only “plausible” values.
 At infrequent intervals, all probabilities are recomputed, and the
“plausible” set is changed.

In detail,
Iterations in Sparse Algorithm

A normal iteration

A full iteration
Other Variants

“Winner-take-all” variant of EM algorithm
 Early stages of maximizing F
 In estimating Hidden Markov Models for speech recognition