• Motivation comes from EM, where in the E-step we need
to do inference which is often intractable.
• Approximate techniques include: Laplace approx.
(Gaussian around the mode), mean field (factorized
ansatz), loopy belief propagation – all are biased but are
• Sampling is asymptotically unbiased but is typically slow.
• Ancestral sampling: top-down sampling from Bayes’ net.
• Sampling by transformation: try to find a transformation
f(.) such that x = f(y) and y is simple to sample from.
• Rejection sampling: find a curve f(x) >= p(x) and sample
from f(x). Then accept a fraction p(x)/f(x).
• Many more methods that we don’t have time for. Most
methods however break down in high dimensions: curse
MCMC & Gibbs
• We give up that all samples have to be independent.
• Sample a new sample given the previous one according to
some proposal distribution S(x’|x).
• We only accept the new sample x’ with a certain probability
• T(x,x’) = S(x’|x) X A(x,x’) is chosen such that the samples
eventually come from the desired distribution p(x).
• p(x) should be the invariant distribution of the kernel T, which is
unique of the chain is ergodic (aperiodic & irreducible).
• Detailed balance is a condition that is sufficient and is easier to
MH & Gibbs
• Metropis Hastings Markov Chain Monte Carlo sampler
always accepts of the prob. of x’ is larger and reject with
a fraction p(x’)/p(x) (in case of symmetric S).
• Gibbs sampling is MH method that always accepts and
uses the conditional distributions to sample:
X, Y ~P(X,Y) iterate X~P(X|Y) & Y~P(Y|X).
• MCMC has problems if the random variables are highly
dependent and if there are multiple modes. In both cases
it takes very long to explore all of the space that has
significant probability. This is called slow mixing.