weissBeliefProp

Download Report

Transcript weissBeliefProp

Bayesian Belief Propagation for Image
Understanding
• David Rosenberg
Markov Random Fields
• Let G be an undirected graph
– nodes: {1, …, n}
• Associate a random variable X_t to each node t in
G.
• (X_1, …, X_n) is a Markov random field on G if
– Every r.v. is independent of its nonneighbors conditioned
on its neighbors.
– P(X_t=x_t | X_s = x_s for all s \neq t} = P(X_t=x_t | X_s =
x_s for all s\in N(t)),
where N(s) be the set of neighbors of a node s.
Specifying a Markov Random Field
• Nice if we could just specify P( X | N(X) )
for all r.v.’s X (as with Bayesian networks)
• Unfortunately, this will overspecify the joint PDF.
– E.g. X_1 -- X_2.
• Joint PDF has 3 degrees of freedom
• Conditiona PDFs X_1|X_2 and X_2|X_1 have 2
degrees of freedom each
• The Hammersley-Clifford Theorem helps to specify
MRFs
The Gibbs Distribution
• A Gibbs distribution w.r.t. graph G is a probability
mass function that can be expressed in the form
– P(x_1, … , x_n) = Prod _ Cliques C V_C(x_1, .., x_n)
– where V_C(x_1, …, x_n) depends only on those x_I in C.
• We can combine potential functions into products
from maximal cliques, so
– P(x_1, … , x_n) = Prod _ MaxCliques C V_C(x_1, ..,
x_n)
– This may be better in certain circumstances because we
don’t have to specify as many potential functions
Hammersley Clifford Theorem
• Let the r.v’s {X_j} have a positive joint probability
mass function.
• Then the Hammersley Clifford Theorem says that
{X_j} is a Markov random field on graph G iff it has a
Gibbs distirubtion w.r.t G.
– Side Note: Hammserley and Clifford discovered this theorem in 1971,
but they didn’t publish it because they kept thinking they should be
able to remove or relax the positivity assumption. They couldn’t.
Clifford published the result in 1990.
• Specifying the potential functions is equivalent to
specifying the joint probability distribution of all
variables.
• Now it’s easy to specify a valid MRF
– still not easy to determine the degrees of freedom in the
A Typical MRF Vision Problem
• We have
– hidden “scene” variables: X_j
– observed “image” variables Y_j
• Given X_j, Y_j is independent of everything else
• Show Picture
• The Problems
– Given: Some instantiations of the Y_j’s
– Find:
• The aposteriori distribution over the X_j’s
• Find the MAP estimate for each X_j
• Find the least squares estimate of each X_j
Straightforward Exact Inference
• Given the joint PDF
– typically specified using potential functions
• We can just marginalize out to
– get the aposteriori distribution for each X_j
• We can immediately extract the
– MAP estimate -- just the mode of the aposteriori
distriubtion
– Least squares estimate -- just the expected value of the
aposteriori distribution
Inference by Message Passing
• The resulting aposteriori distributions are exact for
graphs without loops (Pearl?)
• Weiss and Freeman show that for arbitrary graph
topologies, when belief propagation converges, it
gives the correct least squares estimate (I.e.
posterior mean)
• More results?
Derivation of belief propagation
y1
( x1 , y1 )
y2
( x2 , y2 )
x1
( x1 , x2 )
x2
y3
( x3 , y3 )
( x2 , x3 )
x3
x1MMSE  mean sum sum P( x1 , x2 , x3 , y1 , y2 , y3 )
x1
x2
x3
The posterior factorizes
x1MMSE  mean sum sum P ( x1 , x2 , x3 , y1 , y2 , y3 )
x1
x2
x3
x1MMSE  mean sum sum  ( x1 , y1 )
x1
x2
x3
 ( x2 , y2 )  ( x1 , x2 )
 ( x3 , y3 )  ( x2 , x3 )
x1MMSE  mean  ( x1 , y1 )
x1
sum  ( x2 , y2 )  ( x1 , x2 )
x2
sum  ( x3 , y3 )  ( x2 , x3 )
x3
y1
( x1 , y1 )
y2
( x2 , y2 )
x1
( x1 , x2 )
x2
y3
( x3 , y3 )
( x2 , x3 )
x3
Propagation rules
x1MMSE  mean sum sum P ( x1 , x2 , x3 , y1 , y2 , y3 )
x1
x2
x3
x1MMSE  mean sum sum  ( x1 , y1 )
x1
x2
x3
 ( x2 , y2 )  ( x1 , x2 )
 ( x3 , y3 )  ( x2 , x3 )
x1MMSE  mean  ( x1 , y1 )
x1
sum  ( x2 , y2 )  ( x1 , x2 )
x2
sum  ( x3 , y3 )  ( x2 , x3 )
x3
y1
( x1 , y1 )
y2
( x2 , y2 )
x1
( x1 , x2 )
x2
y3
( x3 , y3 )
( x2 , x3 )
x3
Propagation rules
x1MMSE  mean ( x1 , y1 )
x1
sum ( x2 , y2 )  ( x1 , x2 )
x2
sum ( x3 , y3 )  ( x2 , x3 )
x3
M 12 ( x1 )  sum ( x1 , x2 ) ( x2 , y2 ) M 23 ( x2 )
x2
y1
( x1 , y1 )
y2
( x2 , y2 )
x1
( x1 , x2 )
x2
y3
( x3 , y3 )
( x2 , x3 )
x3
Belief, and message updates
bj (x j ) 
j
k
M
 j (x j )
kN ( j )
M i j ( xi )   ij (xi , x j )
xj
i
=
i
k
M
 j (x j )
kN ( j ) \ i
j
Optimal solution in a chain or tree:
Belief Propagation
• “Do the right thing” Bayesian algorithm.
• For Gaussian random variables over time: Kalman
filter.
• For hidden Markov models: forward/backward
algorithm (and MAP variant is Viterbi).p
No factorization with loops!
x1MMSE  mean ( x1 , y1 )
x1
sum ( x2 , y2 )  ( x1 , x2 )
x2
sum ( x3 , y3 )  ( x2 , x3 )  ( x1 , x3 )
x3
y2
y1
x1
x2
y3
x3
First Toy Examples
• Show messages passed and beliefs at each stage
• show convergence in x steps.
Where does Evidence Fit In?
The Cost Functional Approach
• We can state the solution to many problems in
terms of minimizing a cost functional.
• How can we put this our MRF framework?
Slide on Weiss’s Interior/exterior
Example
• show graphs of convergence speed
Slide on Weiss’s Motion Detection
My own computer example taking the
cost functional approach
Discussion of complexity issues with
message passing
• How long are messages
• How many messages do we have to pass per
iteration
• How many iterations until convergence
• Problem quickly becomes intractible
Mention some apprxomiate inference
approaches
Slides on message passing with jointly
gaussian distributions???
EXTRA SLIDES
Incorporating Evidence nodes into
MRFs
• We would like to have nodes that don’t change their
beliefs -- they are just observations.
• Can we do this via the potential functions on the
non-maximal clique containing just that node?
• I tink this is what they do in the Yair Weiss
implementation
• What if we don’t want to specify a potential
function? Make it identically one, since it’s in a
product.
From cost functional to transition
matrix
From cost functional to update rule
From update rule to transition matrix
The factoriation into pair wise
potentials -- good for general Markov
networks
Other Stuff
• For shorthand, we will write x = (x_1, …, x_n).