Lecture 1: Introduction - City University of New York

Download Report

Transcript Lecture 1: Introduction - City University of New York

Lecture 23: Statistical
Estimation with Sampling
Machine Learning
Iain Murray’s MLSS lecture on videolectures.net:
http://videolectures.net/mlss09uk_murray_mcmc/
Today
• In service of EM In graphical models
• Sampling
– Technique to approximate the expected value
of a distribution
• Gibbs Sampling
– Sampling of latent variables in a Graphical
Model
1
What is the average height of
professors of CS at Queens
College?
• What’s the size of C?
2
What is the average height of
students at Queens College?
• What’s the size of C?
3
What is the average height of
people in Queens?
4
So we’re comfortable
approximating statistical
parameters…
• Why don’t we use this to do inference in
complicated Graphical Models?
• or where it is difficult to count everything?
5
Statistical sampling
• Make a prediction about variable, x, based
on data D.
6
Expected Values
• Want to know the expected
value of a distribution.
– E[p(t | x)] is a classification
problem
• We can calculate p(x), but
integration is difficult.
• Given a graphical model
describing the relationship
between variables, we’d like
to generate E[p(x)] where x
is only partially observed.
7
Sampling
• We have a representation of p(x) and f(x), but
integration is intractable
• E[f] is difficult as an integral, but easy as a sum.
• Randomly select points from distribution p(x) and
use these as representative of the distribution of
f(x).
• It turns out that if correctly sampled, only 10-20
points can be sufficient to estimate the mean and
variance of a distribution.
– Samples must be independently drawn
– Expectation may be dominated by regions of high
probability, or high function values
8
Monte Carlo Example
• Sampling techniques to solve difficult
integration problems.
• What is the area of a circle with radius 1?
– What if you don’t know trigonometry?
9
Monte Carlo Estimation
• How can we approximate the
area of a circle if we have no
trigonometry?
• Take a random x and a random y
between 1 and -1
– Sample from x and sample from y.
• Determine if
• Repeat many times.
• Count the number of times that
the inequality is true.
• Divide by the area of the square
10
How is sampling used in EM?
• E-Step
– what are the responsibilities in GMM?
– p(xhidden | xobserved)
• M-Step
– Reestimate parameters based on a convex
optimization.
– Get new parameters
11
Sampling in a Graphical Model
• Sample variables from its marginal
• Sample children after parents
A
B
C
D
E
12
How do you sample from a
distribution???
• Known algorithms
• Use this book:
http://luc.devroye.org/rnbookindex.html
13
Basic Algorithm
x2
x3
x1
x4
• Sample uniformly from x.
• The probability mass to the left of x is a
uniform distribution.
14
Basic Algorithm
1
x2
x3
x1
x4
• y(u) = h-1(u)
• h is not always easy to calculate or invert
15
Rejection Sampling
• The distribution p(x) is easy to evaluate
– As in a graphical model representation
• But difficult to integrate.
• Identify a simpler distribution, kq(x), which bounds
p(x), and sample, x0, from it.
– This is called the proposal distribution.
• Generate another sample u from an even
distribution between 0 and kq(x0).
– If u ≤ p(x0) accept the sample
• E.g. use it in the calculation of an expectation of f
– Otherwise reject the sample
• E.g. omit from the calculation of an expectation of f
16
Rejection Sampling Example
17
Importance Sampling
• One problem with rejection
sampling is that you lose
information when throwing out
samples.
• If we are only looking for the
expected value of f(x), we can
incorporate unlikely samples of x
in the calculation.
• Again use a proposal
distribution to approximate the
expected value.
– Weight each sample from q by
the likelihood that it was also
drawn from p.
18
Graphical Example of
Importance Sampling
19
Markov Chain Monte Carlo
• Markov Chain:
– p(x1|x2,x3,x4,x5,…) = p(x1|x2)
• For MCMC sampling start in a state z(0).
• At each step, draw a sample z(m+1) based on the
previous state z(m)
• Accept this step with some probability based on a
proposal distribution.
– If the step is accepted: z(m+1) = z(m)
– Else: z(m+1) = z(m)
• Or only accept if the sample is consistent with an
observed value
20
Markov Chain Monte Carlo
• Goal: p(z(m)) = p*(z) as m →∞
– MCMCs that have this property are called ergodic.
– Implies that the sampled distribution converges to the
true distribution
• Need to define a transition function to move from
one state to the next.
– How do we draw a sample at state m+1 given state
m?
– Often, z(m+1) is drawn from a gaussian with z(m) mean
and a constant variance.
21
Markov Chain Monte Carlo
• Goal: p(z(m)) = p*(z) as m →∞
– MCMCs that have this property are ergodic.
• Transition properties that provide detailed
balance guarantee ergodic MCMC
processess.
– Also considered reversible.
22
Metropolis-Hastings Algorithm
• Assume the current state is z(m).
• Draw a sample z* from q(z|z(m))
• Accept probability function
• Often use a normal distribution for q
– Tradeoff between convergence and
acceptance rate based on variance.
23
Gibbs Sampling
• We’ve been treating z as a vector to be
sampled as a whole
• However, in high dimensions, the accept
probability becomes vanishingly small.
• Gibbs sampling allows us to sample one
variable at a time, based on the other
variables in z.
24
Gibbs sampling
• Assume a distribution over 3 variables.
• Generate a new sample for each variable
conditioned on all of the other variables.
25
Gibbs Sampling in a Graphical
Model
• The appeal of Gibbs sampling in a graphical
model is that the conditional distribution of a
variable is only dependent on its parents.
• Gibbs sampling fixes n-1 variables, and
generates a sample for the the nth.
• If each of the variables are assumed to have
easily sample-able distributions, we can just
sample from the conditionals given by the
graphical model given some initial states.
26
Gibbs Sampling
A
B
C
D
E
• Fix 4 variables, sample 5th
• repeat until convergence
27
Next Time
• Perceptrons
• Neural Networks
28