rejected samples - Computer Vision Group
Download
Report
Transcript rejected samples - Computer Vision Group
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
Machine Learning – Lecture 16
Approximate Inference
07.07.2010
Bastian Leibe
RWTH Aachen
http://www.mmp.rwth-aachen.de
[email protected]
Announcements
• Exercise 5 is available
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
Junction tree algorithm
st-mincut
Graph cuts
Sampling
MCMC
B. Leibe
2
Course Outline
• Fundamentals (2 weeks)
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
Bayes Decision Theory
Probability Density Estimation
• Discriminative Approaches (4 weeks)
Lin. Discriminants, SVMs, Boosting
Dec. Trees, Random Forests, Model Sel.
• Generative Models (5 weeks)
Bayesian Networks + Applications
Markov Random Fields + Applications
Exact Inference
Approximate Inference
• Unifying Perspective (1 week)
B. Leibe
3
Recap: MRF Structure for Images
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
• Basic structure
Noisy observations
“True” image content
• Two components
Observation model
– How likely is it that node xi has label Li given observation yi?
– This relationship is usually learned from training data.
Neighborhood relations
– Simplest case: 4-neighborhood
– Serve as smoothing terms.
Discourage neighboring pixels to have different labels.
– This can either be learned or be set to fixed “penalties”.
B. Leibe
4
Recap: How to Set the Potentials?
• Unary potentials
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
E.g. color model, modeled with a Mixture of Gaussians
X
Á(x i ; yi ; µÁ ) = log
µÁ (x i ; k)p(kjx i )N (yi ; y¹ k ; § k )
k
Learn color distributions for each label
Á(x p = 1; yp )
Á(x p = 0; yp )
yp
B. Leibe
y
5
Recap: How to Set the Potentials?
• Pairwise potentials
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
Potts Model
Ã(x i ; x j ; µÃ ) = µÃ ±(x i 6
= xj )
– Simplest discontinuity preserving model.
– Discontinuities between any pair of labels are penalized equally.
– Useful when labels are unordered or number of labels is small.
Extension: “contrast sensitive Potts model”
Ã(x i ; x j ; gi j (y); µÃ ) = µÃ gi j (y)±(x i 6
= xj )
where
gij ( y ) e
yi y j
2
2 / avg yi y j
2
– Discourages label changes except in places where there is also a
large change in the observations.
B. Leibe
6
Recap: Graph Cuts for Binary Problems
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
D p (t )
t
n-links
a cut
w pq
D p (s )
s
“expected” intensities of
object and background
I s and I t
D (t ) exp || I
p
can be re-estimated
Dp (s) exp || I p I s ||2 / 2 2
t 2
2
I
||
/
2
p
EM-style optimization
Slide credit: Yuri Boykov
B. Leibe
7
[Boykov & Jolly, ICCV’01]
Recap: s-t-Mincut Equivalent to Maxflow
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
Flow = 0
Augmenting Path Based
Algorithms
Source
2
v1
5
9
1
2
Sink
v2
4
1. Find path from source to sink
with positive capacity
2. Push maximum possible flow
through this path
3. Repeat until no path can be
found
Algorithms assume non-negative capacity
Slide credit: Pushmeet Kohli
B. Leibe
8
Recap: When Can s-t Graph Cuts Be Applied?
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
Regional term
E ( L)
Boundary term
E p (Lp )
p
t-links
E(L
pqN
p
, Lq )
n-links
L p {s, t}
• s-t graph cuts can only globally minimize binary energies
that are submodular.
E(L) can be minimized
by s-t graph cuts
[Boros & Hummer, 2002, Kolmogorov & Zabih, 2004]
E ( s, s) E (t , t ) E ( s, t ) E (t , s)
Submodularity
(“convexity”)
• Submodularity is the discrete equivalent to convexity.
Implies that every local energy minimum is a global minimum.
Solution will be globally optimal.
B. Leibe
9
Recap: Simple Binary Image Denoising Model
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
• MRF Structure
Noisy observations
Observation process
“True” image content
x i ; yi 2 f ¡ 1; 1g
“Smoothness constraints”
Example: simple energy function (“Potts model”)
X
E (x; y ) = h
X
xi + ¯
i
Prior
X
±(x i 6
= xj ) ¡ ´
f i ;j g
Smoothness
x i yi
i
Observation
– Smoothness term: fixed penalty ¯ if neighboring labels disagree.
– Observation term: fixed penalty ´ if label and observation disagree.
B. Leibe
10
Image source: C. Bishop, 2006
Converting an MRF into an s-t Graph
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
• Conversion:
Source
Li = 1
h ¡ ´ yi
xi
xj
¡ h + ´ yi
• Energy:
Sink
X
E (x; y ) = h
xi + ¯
i
X
Li = ¡ 1
X
±(x i 6
= xj ) ¡ ´
f i ;j g
x i yi
i
Unary potentials are straightforward to set.
– How?
– Just insert xi = 1 and xi = -1 into the unary terms above...
B. Leibe
11
Converting an MRF into an s-t Graph
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
• Conversion:
Source
Li = 1
h ¡ ´ yi
xi
¯
xj
¡ h + ´ yi
• Energy:
Sink
X
E (x; y ) = h
xi + ¯
i
X
X
±(x i 6
= xj ) ¡ ´
f i ;j g
Li = ¡ 1
x i yi
i
Unary potentials are straightforward to set. How?
Pairwise potentials are more tricky, since we don’t know xi!
– Trick: the pairwise energy only has an influence if xi xj.
– (Only!) in this case, the cut will go through the edge {xi,xj}.
B. Leibe
12
Dealing with Non-Binary Cases
• Limitation to binary energies is often a nuisance.
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
E.g. binary segmentation only…
• We would like to solve also multi-label problems.
The bad news: Problem is NP-hard with 3 or more labels!
• There exist some approximation algorithms which
extend graph cuts to the multi-label case:
-Expansion
-Swap
• They are no longer guaranteed to return the globally
optimal result.
But -Expansion has a guaranteed approximation quality
(2-approx) and converges in a few iterations.
B. Leibe
13
-Expansion Move
• Basic idea:
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
Break multi-way cut computation into a sequence of
binary s-t cuts.
other labels
No longer globally optimal result, but guaranteed approximation
quality and typically converges in few iterations.
Slide credit: Yuri Boykov
B. Leibe
14
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
-Expansion Algorithm
1. Start with any initial solution
2. For each label “” in any (e.g. random) order:
1.
2.
Compute optimal -expansion move (s-t graph cuts).
Decline the move if there is no energy decrease.
3. Stop when no expansion move would decrease energy.
Slide credit: Yuri Boykov
B. Leibe
15
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
Example: Stereo Vision
ground truth
Depth map
Original pair of “stereo” images
Slide credit: Yuri Boykov
B. Leibe
16
-Expansion Moves
• In each -expansion a given label “” grabs space from
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
other labels
initial solution
-expansion
-expansion
-expansion
-expansion
-expansion
-expansion
-expansion
For each move, we choose the expansion that gives the largest
decrease in the energy:
binary optimization problem
Slide credit: Yuri Boykov
B. Leibe
17
GraphCut Applications: “GrabCut”
• Interactive Image Segmentation [Boykov & Jolly, ICCV’01]
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
Rough region cues sufficient
Segmentation boundary can be extracted from edges
• Procedure
User marks foreground and background regions with a brush.
This is used to create an initial segmentation
which can then be corrected by additional brush strokes.
Additional
segmentation
cues
User segmentation cues
18
Slide credit: Matthieu Bray
Iterated Graph Cuts
R
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
Foreground &
Foreground
Background
Background
Background
G
Color model
(Mixture of Gaussians)
Result
1
2
3
4
Energy after
each iteration
Slide credit: Carsten Rother
B. Leibe
19
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
GrabCut: Example Results
B. Leibe
20
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
Applications: Interactive 3D Segmentation
Slide credit: Yuri Boykov
B. Leibe
21
[Y. Boykov, V. Kolmogorov, ICCV’03]
Topics of This Lecture
• Approximate Inference
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
Variational methods
Sampling approaches
• Sampling approaches
Sampling from a distribution
Ancestral Sampling
Rejection Sampling
Importance Sampling
• Markov Chain Monte Carlo
Markov Chains
Metropolis Algorithm
Metropolis-Hastings Algorithm
Gibbs Sampling
B. Leibe
22
Approximate Inference
• Exact Bayesian inference is often intractable.
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
Often infeasible to evaluate the posterior distribution or to
compute expectations w.r.t. the distribution.
– E.g. because the dimensionality of the latent space is too high.
– Or because the posterior distribution has a too complex form.
Problems with continuous variables
– Required integrations may not have closed-form solutions.
Problems with discrete variables
– Marginalization involves summing over all possible configurations of
the hidden variables.
– There may be exponentially many such states.
We need to resort to approximation schemes.
B. Leibe
23
Two Classes of Approximation Schemes
• Deterministic approximations (Variational methods)
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
Based on analytical approximations to the posterior distribution
– E.g. by assuming that it factorizes in a certain form
– Or that it has a certain parametric form (e.g. a Gaussian).
Can never generate exact results, but are often scalable to large
applications.
• Stochastic approximations (Sampling methods)
Given infinite computationally resources, they can generate
exact results.
Approximation arises from the use of a finite amount of
processor time.
Enable the use of Bayesian techniques across many domains.
But: computationally demanding, often limited to small-scale
problems.
B. Leibe
24
Topics of This Lecture
• Approximate Inference
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
Variational methods
Sampling approaches
• Sampling approaches
Sampling from a distribution
Ancestral Sampling
Rejection Sampling
Importance Sampling
• Markov Chain Monte Carlo
Markov Chains
Metropolis Algorithm
Metropolis-Hastings Algorithm
Gibbs Sampling
B. Leibe
25
Sampling Idea
• Objective:
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
Evaluate expectation of a function f(z)
w.r.t. a probability distribution p(z).
• Sampling idea
Draw L independent samples z(l) with l = 1,…,L from p(z).
This allows the expectation to be approximated by a finite sum
XL
1
f^ =
f (zl )
L
l= 1
As long as the samples z(l) are drawn independently from p(z),
then
Unbiased estimate, independent of the dimension of z!
Slide adapted from Bernt Schiele
B. Leibe
26
Sampling – Challenges
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
• Problem 1: Samples might not be independent
Effective sample size might be much smaller than apparent
sample size.
• Problem 2:
If f(z) is small in regions where p(z) is large and vice versa, the
expectation may be dominated by regions of small probability.
Large sample sizes necessary to achieve sufficient accuracy.
B. Leibe
27
Parametric Density Model
• Example:
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
A simple multivariate (d-dimensional) Gaussian model
p(xj¹ ; § ) =
1
(2¼) D =2 j§ j 1=2
½
¾
1
exp ¡ (x ¡ ¹ ) T § ¡ 1 (x ¡ ¹ )
2
This is a “generative” model
in the sense that we can generate
samples x according to the
distribution.
Slide adapted from Bernt Schiele
B. Leibe
28
Sampling from a Gaussian
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
• Given: 1-dim. Gaussian pdf (probability density function)
p(x|¹,¾2) and the corresponding cumulative distribution:
Zx
F¹ ;¾2 (x) =
p(xj¹ ; ¾2 )dx
¡ 1
• To draw samples from a Gaussian, we can invert the
cumulative distribution function:
1
2
u » Uniform(0; 1) ) F¹¡ ;¾
2 (u) » p(xj¹ ; ¾ )
F¹ ;¾2 (x)
p(xj¹ ; ¾2 )
Slide credit: Bernt Schiele
B. Leibe
29
Sampling from a pdf (Transformation method)
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
• In general, assume we are given the pdf p(x) and the
corresponding cumulativeZdistribution:
x
F (x) =
p(z)dz
¡ 1
• To draw samples from this pdf, we can invert the
cumulative distribution function:
u » Uniform(0; 1) ) F ¡ 1 (u) » p(x)
Slide credit: Bernt Schiele
B. Leibe
30
Ancestral Sampling
• Generalization of this idea to directed graphical models.
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
Joint probability factorizes into conditional probabilities:
• Ancestral sampling
Assume the variables are ordered such that there are no links
from any node to a lower-numbered node.
Start with lowest-numbered node and draw a sample from its
distribution.
x^1 » p(x 1 )
Cycle through each of the nodes in order and draw samples from
the conditional distribution (where the parent variable is set to
its sampled value).
x^n » p(x n jpan )
B. Leibe
31
Discussion
• Transformation method
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
Limited applicability, as we need to invert the indefinite integral
of the required distribution p(z).
• More general
Rejection Sampling
Importance Sampling
Slide credit: Bernt Schiele
B. Leibe
32
Rejection Sampling
• Assumptions
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
Sampling directly from p(z) is difficult.
But we can easily evaluate p(z) (up to some normalization factor
Zp):
1
p(z) =
• Idea
Zp
p~(z)
We need some simpler distribution q(z) (called proposal
distribution) from which we can draw samples.
Choose a constant k such that: 8z : kq(z) ¸ p
~(z)
Slide credit: Bernt Schiele
B. Leibe
33
Rejection Sampling
• Sampling procedure
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
Generate a number z0 from q(z).
Generate a number u0 from the
uniform distribution over [0,kq(z0)].
If u0 > p
~(z0 ) reject sample, otherwise accept.
– Sample is rejected if it lies in the grey shaded area.
– The remaining pairs (u0,z0) have uniform distribution under the
curve p
~(z) .
• Discussion
Original values of z are generated from the distribution q(z).
Samples are accepted
Z with probability p~(z)=kq(z)
Z
p(accept ) =
1
p~(z)=kq(z)q(z)dz =
k
p~(z)dz
k should be as small as possible!
Slide credit: Bernt Schiele
B. Leibe
34
Rejection Sampling – Discussion
• Limitation: high-dimensional spaces
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
For rejection sampling to be of practical value, we require that
kq(z) be close to the required distribution, so that the rate of
rejection is minimal.
• Artificial example
Assume that p(z) is Gaussian with covariance matrix ¾p2 I
Assume that q(z) is Gaussian with covariance matrix ¾q2 I
Obviously: ¾q2 ¸ ¾p2
D
In D dimensions: k = (¾ /¾ ) .
q
p
– Assume ¾q is just 1% larger than ¾p.
– D = 1000 k = 1.011000 ¸ 20,000
1
– And p(accept ) ·
20000
Often impractical to find good proposal distributions for high
dimensions!
Slide credit: Bernt Schiele
B. Leibe
35
Importance Sampling
• Approach
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
Approximate expectations directly
(but does not enable to draw samples from p(z) directly).
Goal:
• Simplistic strategy: Grid sampling
Discretize z-space into a uniform grid.
Evaluate the integrand as a sum of the form
But: number of terms grows exponentially with number of
dimensions!
Slide credit: Bernt Schiele
B. Leibe
36
Importance Sampling
• Idea
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
Use a proposal distribution q(z) from which it is easy to draw
samples.
Express expectations in the form of a finite sum over samples
{z(l)} drawn from q(z).
with importance weights
p(z( l ) )
rl =
q(z( l ) )
Slide credit: Bernt Schiele
B. Leibe
37
Importance Sampling
• Typical setting:
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
p(z) can only be evaluated up to an unknown normalization
constant
p(z) = p~(z)=Z p
q(z) can also be treated in a similar fashion.
q(z) = q~(z)=Z q
Then
p~(z( l ) )
with: r~l =
q~(z( l ) )
Slide credit: Bernt Schiele
B. Leibe
38
Importance Sampling
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
• Ratio of normalization constants can be evaluated
Z
Zp
1
=
Zq
Zq
Z
p~(z)dz =
L
p~(z( l ) )
1X
q(z)dz '
r~l
(
l
)
L
q~(z )
l= 1
• and therefore
• with
wl = P
r~l
m
Slide credit: Bernt Schiele
= P
r~m
B. Leibe
p~( z ( l ) )
q~( z ( l ) )
p~( z ( m ) )
m q~( z ( m ) )
39
Importance Sampling – Discussion
• Observations
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
Success of importance sampling depends crucially on how well
the sampling distribution q(z) matches the desired distribution
p(z).
Often, p(z)f(z) is strongly varying and has a significant proportion of its mass concentrated over small regions of z-space.
Weights rl may be dominated by a few weights having large
values.
Practical issue: if none of the samples falls in the regions where
p(z)f(z) is large…
– The results may be arbitrary in error.
– And there will be no diagnostic indication (no large variance in rl)!
Key requirement for sampling distribution q(z):
– Should not be small or zero in regions where p(z) is significant!
Slide credit: Bernt Schiele
B. Leibe
40
Topics of This Lecture
• Approximate Inference
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
Variational methods
Sampling approaches
• Sampling approaches
Sampling from a distribution
Ancestral Sampling
Rejection Sampling
Importance Sampling
• Markov Chain Monte Carlo
Markov Chains
Metropolis Algorithm
Metropolis-Hastings Algorithm
Gibbs Sampling
B. Leibe
41
Independent Sampling vs. Markov Chains
• So far
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
We’ve considered two methods, Rejection Sampling and
Importance Sampling, which were both based on independent
samples from q(z).
However, for many problems of practical interest, it is difficult
or impossible to find q(z) with the necessary properties.
• Different approach
We abandon the idea of independent sampling.
Instead, rely on a Markov Chain to generate dependent samples
from the target distribution.
Independence would be a nice thing, but it is not necessary for
the Monte Carlo estimate to be valid.
Slide credit: Zoubin Ghahramani
B. Leibe
42
MCMC – Markov Chain Monte Carlo
• Overview
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
Allows to sample from a large class of distributions.
Scales well with the dimensionality of the sample space.
• Idea
We maintain a record of the current state z(¿)
The proposal distribution depends on the current state: q(z|z(¿))
The sequence of samples forms a Markov chain z(1), z(2),…
• Setting
We can evaluate p(z) (up to some normalizing factor Zp):
p~(z)
p(z) =
Zp
At each time step, we generate a candidate sample from the
proposal distribution and accept the sample according to a
criterion.
Slide credit: Bernt Schiele
B. Leibe
43
MCMC – Metropolis Algorithm
• Metropolis algorithm
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
[Metropolis et al., 1953]
Proposal distribution is symmetric: q(zA jzB ) = q(zB jzA )
The new candidate sample z* is accepted with probability
µ
¶
?
p
~
(z
)
A(z? ; z( ¿) ) = min 1;
p~(z( ¿) )
• Implementation
Choose random number u uniformly from unit interval (0,1).
Accept sample if A(z? ; z(¿) ) > u .
• Note
New candidate samples always accepted if p
~(z? ) ¸ p~(z(¿) ) .
– I.e. when new sample has higher probability than the previous one.
The algorithm sometimes accepts a state with lower probability.
Slide credit: Bernt Schiele
B. Leibe
44
MCMC – Metropolis Algorithm
• Two cases
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
If new sample is accepted:
Otherwise:
z(¿+ 1) = z?
z(¿+ 1) = z(¿)
This is in contrast to rejection sampling, where rejected samples
are simply discarded.
Leads to multiple copies of the same sample!
Slide credit: Bernt Schiele
B. Leibe
45
MCMC – Metropolis Algorithm
• Property
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
When q(zA|zB) > 0 for all z, the distribution of z¿ tends to p(z)
as ¿ ! 1.
• Note
Sequence z(1), z(2),… is not a set of independent samples from
p(z), as successive samples are highly correlated.
We can obtain (largely) independent samples by just retaining
every Mth sample.
• Example: Sampling from a Gaussian
Proposal: Gaussian with ¾ = 0.2.
Green:
Red:
accepted samples
rejected samples
Slide credit: Bernt Schiele
B. Leibe
46
Markov Chains
• Question
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
How can we show that z¿ tends to p(z) as ¿ ! 1?
• Markov chains
First-order Markov chain:
³
´
³
´
p z(m + 1) jz( 1) ; : : : ; z(m ) = p z(m + 1) jz(m )
Marginal probability
³
´
³
´ ³
´
X
p z( m + 1) =
p z(m + 1) jz(m ) p z( m )
z( m )
Slide credit: Bernt Schiele
B. Leibe
47
Markov Chains – Properties
• Invariant distribution
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
A distribution is said to be invariant (or stationary) w.r.t. a
Markov chain if each step in the chain leaves that distribution
invariant.
Transition probabilities:
³
´
³
´
(m ) ( m + 1)
( m + 1) (m )
T z ;z
= p z
jz
Distribution p*(z) is invariant if:
X
?
p (z) =
T (z0; z) p? (z0)
z0
• Detailed balance
Sufficient (but not necessary) condition to ensure that a
distribution is invariant:
p? (z)T (z; z0) = p? (z0)T (z0; z)
A Markov chain which respects detailed balance is reversible.
Slide credit: Bernt Schiele
B. Leibe
48
MCMC – Metropolis-Hastings Algorithm
• Metropolis-Hastings Algorithm
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
Generalization: Proposal distribution not required to be
symmetric.
The new candidate sample z* is accepted with probability
µ
¶
?
( ¿) ?
p~(z )qk (z jz )
A(z? ; z( ¿) ) = min 1;
p~(z( ¿) )qk (z? jz( ¿) )
where k labels the members of the set of possible transitions
considered.
• Note
When the proposal distributions are symmetric, MetropolisHastings reduces to the standard Metropolis algorithm.
Slide credit: Bernt Schiele
B. Leibe
49
MCMC – Metropolis-Hastings Algorithm
• Properties
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
We can show that p(z) is an invariant distribution of the Markov
chain defined by the Metropolis-Hastings algorithm.
We show detailed balance:
p(z)qk (zjz0)A k (z0; z) = min f p(z)qk (zjz0); p(z0)qk (z0jz)g
= min f p(z0)qk (z0jz); p(z)qk (zjz0)g
= p(z0)qk (z0jz)A k (z; z0)
Slide credit: Bernt Schiele
B. Leibe
50
MCMC – Metropolis-Hastings Algorithm
• Schematic illustration
For continuous state spaces, a common
choice of proposal distribution is a
Gaussian centered on the current state.
What should be the variance of the
proposal distribution?
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
– Large variance: rejection rate will be high for complex problems.
– The scale ½ of the proposal distribution should be as large as
possible without incurring high rejection rates.
½ should be of the same order as the smallest length scale ¾min.
This causes the system to explore the distribution by means of a
random walk.
– Undesired behavior: number of steps to arrive at state that is
independent of original state is of order (¾max/¾min)2.
– Strong correlations can slow down the Metropolis algorithm!
B. Leibe
51
Gibbs Sampling
• Approach
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
MCMC-algorithm that is simple and widely applicable.
May be seen as a special case of Metropolis-Hastings.
• Idea
Sample variable-wise: replace zi by a value drawn from the
distribution p(zi|z\i).
– This means we update one coordinate at a time.
Repeat procedure either by cycling through all variables or by
choosing the next variable.
Slide credit: Bernt Schiele
B. Leibe
52
Gibbs Sampling
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
• Example
Assume distribution p(z1, z2, z3).
Replace z1 with new value drawn from z1(¿+ 1) » p(z1 jz2(¿) ; z3(¿) )
Replace z2( ¿) with new value drawn from z2
( ¿)
( ¿)
(¿+ 1)
» p(z2 jz1
(¿+ 1)
» p(z3jz1
Replace z3 with new value drawn from z3
And so on…
Slide credit: Bernt Schiele
B. Leibe
(¿+ 1)
; z3 )
(¿)
(¿+ 1)
; z2
(¿+ 1)
53
)
Gibbs Sampling
• Properties
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
The factor that determines the acceptance probability in the
Metropolis-Hastings is determined by
? ?
?
? ?
?
?
p(z
jz
)p(z
)p(z
jznk )
p(z
)q
(zjz
)
k
k
nk
nk
k
?
A(z ; z) =
=
= 1
?
p(z)qk (z jz)
p(zk jznk )p(znk )p(zk jznk )
I.e. we get an algorithm which always accepts!
If you can compute (and sample from) the conditionals, you can
apply Gibbs sampling.
The algorithm is completely parameter free.
Can also be applied to subsets of variables.
Slide credit: Zoubin Ghahramani
B. Leibe
54
Gibbs Sampling
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
• Example
20 iterations of Gibbs sampling on a bivariate Gaussian.
Note: strong correlations can slow down Gibbs sampling.
Slide credit: Zoubin Ghahramani
B. Leibe
55
Summary: Approximate Inference
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
• Exact Bayesian Inference often intractable.
• Rejection and Importance Sampling
Generate independent samples.
Impractical in high-dimensional state spaces.
• Markov Chain Monte Carlo (MCMC)
Simple & effective (even though typically computationally
expensive).
Scales well with the dimensionality of the state space.
Issues of convergence have to be considered carefully.
• Gibbs Sampling
Used extensively in practice.
Parameter free
Requires sampling conditional distributions.
B. Leibe
56
References and Further Reading
• Sampling methods for approximate inference are
Augmented Computing
and Sensory
Perceptual
Summer’10
Learning,
Machine
described in detail in Chapter 11 of Bishop’s book.
Christopher M. Bishop
Pattern Recognition and Machine Learning
Springer, 2006
David MacKay
Information Theory, Inference, and Learning Algorithms
Cambridge University Press, 2003
• Another good introduction to Monte Carlo methods can
be found in Chapter 29 of MacKay’s book (also available
online: http://www.inference.phy.cam.ac.uk/mackay/itprnn/book.html)
B. Leibe
57