presentation on variational inference and Variational Message

Download Report

Transcript presentation on variational inference and Variational Message

Variational Inference and
Variational Message Passing
John Winn
Microsoft Research, Cambridge
12th November 2004
Robotics Research Group, University of Oxford
Overview

Probabilistic models & Bayesian
inference

Variational Inference

Variational Message Passing

Vision example
Overview

Probabilistic models & Bayesian
inference

Variational Inference

Variational Message Passing

Vision example
Bayesian networks
Object class
• Directed graph
C
P(C)
• Nodes represent variables
• Links show dependencies
• Conditional distributions
at each node
• Defines a joint distribution:
Surface
colour
Lighting
colour
L
S
P(L)
P(S|C)
I
P(C,L,S,I)=P(L) P(C) P(S|C) P(I|L,S)
Image
colour
P(I|L,S)
Bayesian inference
Object class
Observed variables V and
hidden variables H.
Hidden variables include
parameters and latent
variables.
Learning/inference involves
finding:
P(H1, H2…| V)
C
Hidden
Surface
colour
Lighting
colour
L
S
I
Observed
Image
colour
Bayesian inference vs. ML/MAP

Consider learning one parameter θ
P(V | θ ) P(θ )
P(θ | V ) 
P(V )
 P(V | θ ) P(θ )

How should we represent this posterior
distribution?
Bayesian inference vs. ML/MAP

Consider learning one parameter θ
Maximum of P(V| θ) P(θ)
P(V| θ) P(θ)
θ
θMAP
Bayesian inference vs. ML/MAP

Consider learning one parameter θ
High probability density
High probability mass
P(V| θ) P(θ)
θ
θMAP
Bayesian inference vs. ML/MAP

Consider learning one parameter θ
P(V| θ) P(θ)
θ
θML
Samples
Bayesian inference vs. ML/MAP

Consider learning one parameter θ
P(V| θ) P(θ)
Q (θ )
θ
θML
Variational
approximation
Overview

Probabilistic models & Bayesian
inference

Variational Inference

Variational Message Passing

Vision example
Variational Inference
(in three easy steps…)
1. Choose a family of variational
distributions Q(H).
2. Use Kullback-Leibler divergence
KL(Q||P) as a measure of ‘distance’
between P(H|V) and Q(H).
3. Find Q which minimises divergence.
KL Divergence
Minimising
KL(Q||P)
Exclusive
Q
P
Q( H )
  Q ( H ) ln
P( H | V )
H
Minimising
KL(P||Q)
  P( H | V ) ln
H
P( H | V )
Q( H )
Inclusive
Q
P
Minimising the KL divergence
• For arbitrary Q(H)
ln P(V )  L(Q )  KL(Q || P)
fixed
maximise
minimise
where
 P ( H ,V ) 
L(Q )   Q ( H ) ln 

 Q( H ) 
H
• We choose a family of Q distributions
where L(Q) is tractable to compute.
Minimising the KL divergence
KL(Q || P)
maximise
ln P(V)
fixed
L(Q)
Minimising the KL divergence
KL(Q || P)
maximise
ln P(V)
fixed
L(Q)
Minimising the KL divergence
KL(Q || P)
maximise
ln P(V)
fixed
L(Q)
Minimising the KL divergence
KL(Q || P)
maximise
ln P(V)
fixed
L(Q)
Minimising the KL divergence
KL(Q || P)
maximise
ln P(V)
fixed
L(Q)
Factorised Approximation

Assume Q factorises
Q( H )   Qi ( H i )
i
No further assumptions are required!

Optimal solution for one factor given by
ln Q ( H i )  ln P( H ,V )
*
i

const.
j i
Example: Univariate Gaussian

Likelihood function
 γ 
P( X | μ, γ )   
 2π 
N /2
 γ N
2
exp  2  xn  μ  
 n 1


Conjugate prior P( μ, γ )

Factorized variational distribution
Q ( μ, γ )  Q ( μ)Q (γ )
Initial Configuration
γ
μ
After Updating Q(μ)
γ
μ
After Updating Q(γ)
γ
μ
Converged Solution
γ
μ
Lower Bound for GMM
Variational Equations for GMM
Overview

Probabilistic models & Bayesian
inference

Variational Inference

Variational Message Passing

Vision example
Variational Message Passing



VMP makes it easier and quicker to
apply factorised variational inference.
VMP carries out variational inference
using local computations and message
passing on the graphical model.
Modular algorithm allows modifying,
extending or combining models.
Local Updates
For factorised Q, update for each factor depends
only on the Markov blanket:
Updates can be carried out locally at each node.
VMP I: The Exponential Family

Conditional distributions expressed in
exponential family form.
T

ln P( X | θ ) θ u( X )  g (θ )  f ( X )
‘natural’ sufficient
parameter statistics
vector
vector

For example, the Gaussian distribution
 mg   X  1 g
1 gm 2 


ln P( X | m , g )  
ln
0


2
2
2
2p
  g / 2  X 
T
VMP II: Conjugacy

Parents and children are chosen to be
conjugate i.e. same functional form
ln P ( X | θ )  θ T u( X )  g (θ )  f ( X )
X
Y
same
T

ln P( Z | X , Y ) φ(Y , Z ) u( X )  g ' ( X )  f ' (Y , Z )

Z
Examples:



Gaussian for the mean of a Gaussian
Gamma for the precision of a Gaussian
Dirichlet for the parameters of a discrete distribution
VMP III: Messages

Conditionals
ln P ( X | θ )  θ T u( X )  g (θ )  f ( X )
T

ln P( Z | X , Y ) φ(Y , Z ) u( X )  g ' ( X )  f ' (Y , Z )

Messages

Parent to child (X→Z)
mX Z  u( X )

X
Y
Q( X )
Child to parent (Z→X)
mZ  X  φ(Y , Z )
Z
Q (Y ) Q ( Z )
VMP IV: Update
Q(X) has same form as P(X|θ) but
with updated parameter vector θ*
Optimal
θ  θ 
*
m
jch ( X )
j X
Computed from
messages from parents
These
messages can also be used to
calculate the bound on the evidence L(Q) –
see Winn & Bishop, 2004.
VMP Example

Learning parameters of a Gaussian from N
data points.
mean
data
γ
μ
precision
(inverse
variance)
x
N
VMP Example
Message from γ to all x.
γ
μ
need initial Q(γ)
 γ 


ln
γ


x
N
VMP Example
Messages from each xn to μ.
γ
μ
 γ xn 
 1 
 2 γ 
x
Update Q(μ)
parameter vector
N
θ θ
*
prior
  m xn  μ
n
VMP Example
Message from updated μ to all x.
γ
μ
 μ 
 2 
 μ 
x
N
VMP Example
Messages from each xn to γ.
γ
μ
  12  xn  μ 2

1

2
x
Update Q(γ)
parameter vector



N
φ φ
*
prior
  m xn  γ
n
Features of VMP





Graph does not need to be a tree – it can
contain loops (but not cycles).
Flexible message passing schedule – factors can
be updated in any order.
Distributions can be discrete or continuous,
multivariate, truncated (e.g. rectified Gaussian).
Can have deterministic relationships (A=B+C).
Allows for point estimates e.g. of hyperparameters
VMP Software: VIBES
Free download from vibes.sourceforge.net
Overview

Probabilistic models & Bayesian
inference

Variational Inference

Variational Message Passing

Vision example
Flexible sprite model
Proposed by Jojic & Frey (2001)
Set of images
e.g. frames
from a video
x
N
Flexible sprite model
f
π
Sprite appearance
and shape
x
N
Flexible sprite model
π
f
Sprite transform for
this image
(discretised)
T
m
x
Mask for
this image
N
Flexible sprite model
b
π
f
Background
T
m
Noise
β
x
N
VMP
π
f
b
 b 
 2 
 b 
 f 
 2 
 f 
  f T 1 (m * x)
 1
1
  2  f T m
β



x



 T log p 


 T log( 1  p ) 
  f m* x 
 1

 2  f m 
  b (1  m) * x 
 1

 2  b (1  m) 
 12 (1  m) * ( x  b) 2
 1
2
  2 m * ( x  Tf )



T
 Tf 
 2 
 Tf 
  


 log  
 log p 


1
 log( 1  p )   T m

1
 1  T m
 m 


 1 m 
 12 log  b   b ( x  b) 2
1
2
 2 log  f   f ( x  Tf )



 m 


 1 m 
m
N
Winn & Blake (NIPS 2004)
Results of VMP on hand video
Original video
Learned
appearance
and mask
Learned
transforms
(i.e. motion)
Conclusions



Variational Message Passing allows
approximate Bayesian inference for a wide
range of models.
VMP simplifies the construction, testing,
extension and comparison of models.
You can try VMP for yourself
vibes.sourceforge.net
That’s all folks!