MLE – Part II
Download
Report
Transcript MLE – Part II
5. Maximum Likelihood –II
Prof. Yuille.
Stat 231. Fall 2004.
Topics
• Exponential Distributions, Sufficient
Statistics, and MLE.
• Maximum Entropy Principle.
• Model Selection.
Exponential Distributions.
• Gaussians are a member of the class of
exponential distribution.
• Parameters
• Statistics
Sufficient Statistics.
• The
are the sufficient statistics
of the distribution.
• Knowledge of
is all we need to know about the data
The rest is irrelevant.
• Almost all distributions can be expressed
as Exponentials – Gaussian, Poisson, etc.
Sufficient Statistics of Gaussian
• One-Dimensional Gaussian and samples
• Sufficient statistics are
• And
• These are sufficient to learn the parameters
of the distribution from data.
MLE for Gaussian
• To estimate the parameters – maximize
• Or equivalently, maximize:
• The sufficient statistics are chosen so that
Sufficient Statistics for Gaussian
• Distribution is of form:
• This is the same as a Gaussian with mean
• and variance
Exponential Models and MLE.
• MLE corresponds to maximizing
Equivalent to minimizing
Where
Exponential Models and MLE.
• This minimization is a convex optimization
problem and hence has a unique solution.
But finding this solution may be difficult.
• Algorithms such as Generalized Iterative
Scaling are guaranteed to converge.
Maximum Entropy Principle.
• An alternative way to think of Exponential
Distributions and MLE.
• Start with the Statistics, and then estimate
the form and the parameters of the
probability distribution.
• Using the Maximum Entropy principle.
Entropy
• The entropy of a distribution is
• Defined by Shannon as a measure of the
information obtained by observing a
sample from P(x).
Maximum Entropy Principle
• Maximum Entropy Principle. Select the
distribution P(x) which maximizes the
entropy subject to constraints.
• Lagrange multipliers
• The observed value of the statistics are
Maximum Entropy
• Minimize with respect to P(x). Gives the
(exponential) form of the distribution:
• Maximizing with respect to the Lagrange
parameters
ensures that the
constraints are satisfied:
•
Maximum Entropy.
• This gives the same result as MLE for Exponential
Distributions.
• Maximum Entropy + Constraints = Exponential
Distribution + MLE Parameter.
• The Max-Ent distribution which has the observed
sufficient statistics is the exponential distribution
with those statistics.
• Example: can obtain a Gaussian by performing
Max-Ent on statistics
Minimax Principle.
• Construct a distribution incrementally by
increasing the number of statistics
• The entropy of the Max-Ent distribution with M
statistics is given by:
• Minimax Principle: select the statistics to
minimize the entropy of the maximum entropy
distribution. This relates to model selection.
Model Selection.
• Suppose we do not know which model
generates the data.
• Two models
• Priors
• Model selection enables us to estimate
which model is most likely to have
generated the data
Model Selection.
• Calculate
• Compare with
• Observe that we must sum over all
possible values of the model parameters
Model Selection & Minimax.
• The entropy of the Max-Ent distribution
• Is minus the probability of the data
• So the Minimax Principle is a form of model
selection. But it estimates the parameters
instead of summing them out.
Model Selection.
• Important Issue:
Suppose the model
has more parameters than
Then
is more flexible and can fit a larger
number of data models.
• But summing over the parameters
• and
penalizes this flexibility.
• Gives “Occam’s Razor” favoring the simpler
model.
Model Selection.
• More advanced modeling requires
performing model selection – where the
models are complex.
• Beyond scope of this course.