Direct Approach to Cluster Variation Method in Graphs with Discrete

Download Report

Transcript Direct Approach to Cluster Variation Method in Graphs with Discrete

Direct Approach to Cluster Variation
Method in Graphs with Discrete Nodes
Michal Rosen-Zvi
Computer Science Division, UC Berkeley
Michael I. Jordan
Computer Science Division and the Statistics
Department, UC Berkeley
Outline
Introduction
The exponential family distributions
Variational approximation
Gibbs sampling
Undirected graphs
New approach to Gibbs sampling
Belief propagation revisited
The FNA
Directed graphs if time allows 
Estimating marginals of
discrete random variables
P(x|Q)=exp[QT f(x)-A(Q)]
The features
Log partition function
In a quadratic model:
f ij=xixj f i=xi
The exponential family form
Some definitions
Marginalizing over the mth node
mm=Sxi\xm P(x | Q )
x set of random variables
Q set of parameters
mmn=Sxi\{xm xn} P(x | Q )
xi={0,1} binary
Variational presentation of the
log-partition function
A(Q)=ln Sxexp[QT f(x)]
It is a convex function
(A*)*= A
A*(m)=supqR|I|{mTq- A(Q)}
A(Q)=supmM{mTq- A*(m)}
What set is this???
Dual parameters = marginals
m=Eq[f(x)]
For discrete random variables, M, is the
marginal ploytope defined by
M:={m R|I| | p(·) s.t. Sxif(x)p(x)=m}
CV Approximations: pseudo marginals, m
Mean field
Factorizing the joint probability distribution
P(x|m)=Pi P(xi |mi)= Pi [mi xi (1- mi )1-xi ]
Only one Lagrange multiplier for each node
but found by iterative algorithm – the numeric results
might not be in the approx. M
The objective function is not concave
Pseudomarginals set is convex lies within M
Mean field (cont.)
P(x|m)=Pi P(xi |mi)= Pi [mi xi (1- mi )1-xi ]
Appr. the canonical parm. + Padding with zeros
A*(m)=supqR|I|{mTq- SAi(qi)}
For pairwise and single nodes iter. qi = qi + qijmi/2
A(Q)=supmM{mTq- A*(m)}
A(Q)=supmM{Smiqi+Smimjqij - Smilnmi - S(1-mi)ln(1-mi)}
The objective function is not concave
Pseudomarginals set is convex lies within M
Gibbs Sampling
o Local updates according to the conditional
probability,
p(xi=1)= xN(i)p(xN(i))(SjN(i)qijxj)
(y)=exp(y)/[1+exp(y)]
o The measure converges to the Gibbs distribution –
the exponential form
o All moments are calculated using samples from the
equilibrium
Gibbs Sampling – dual space
pt+1(xi=1)=(1-1/N)pt(xi=1) +
t(x
1/ 
p
N xN(i)
N(i))(SjN(i)qijxj)
A set of 2N fixed point equations yields exact
relations between marginals
p(xi=1)=xN(i)p(xN(i))(SjN(i)qijxj)
p(xi=1,xj=1)=…
p(xi=1,xj=1,xk=1)=…
Gibbs Sampling and Bethe app.
p(xi=1) =xN(i)p(xN(i))(SjN(i)qijxj)
p(xN(i))= ~xiPjN(i)p(xixj)/p(xi)|N(i)|-1
mi=xN(i) xi PjN(I)p(xixj)/p(xi)|N(i)|-1 (SjN(i)qijxj)
mij=f(mi, mij’qij’) j’ stands for all neighbors of i and j.
Gibbs Sampling and the
Factorized Neighbors Algorithm
p(xi=1) =xN(i)p(xN(i))(SjN(i)qijxj)
p(xN(i))= ~PjN(i)p(xj)
The FNA:
mi= xN(i)PjN(i)p(xi)(SjN(i)qijxj)
The F. N. A.
o The approximation is less restricted than MF
o The algorithm is not exact on trees
o The approximation is more restricted than Bethe
Some comparisons for graphs with N nodes M edges and up to
n neighbors.
Time complexity:
MF: O(N) Bethe: O(M) FNA: O(Nexp(n))
Space complexity:
MF: O(N) Bethe: O(M) FNA: O(N)
The F. N. A. results on a grid
Pseudomarginals Vs. Exact
Errors i=(mi- mi)2/2
MF
FNA
BP
11
210-6
(210-6)
210-8
(310-8)
510-9
(910-9)
22
0.0054
(0.0031)
410-6
(610-6)
110-7
(110-7)
 periodic
0.2430
(0.0990)
0.2253
(0.0939)
0.0003
(0.0006)
boundary
conditions
Directed Gibbs sampling and
the parents factored app.
o As soon as a node is chosen all its
descendents are updated
o The local updates are according to the
parents’ current state.
o Factorized parents assumption
p(x(i))=Pj(i) p(xj)
p(xi=1)=x(i)Pj(i) p(xj)(Sj(i)qijxj)
Directed Gibbs Sampling –
dual space
pt+1(xi=1)=(1-i)pt(xi=1) + x(i)[1/Npt(x(i))+
(1/N-i) pt+1(x (i))](Sj(i)qijxj)
p(xi=1)=x(i)p(x(i))(Sj(i)qijxj)
p(xi=1,xj=1)=x(i)\j, x(j)p(x(i)\j, x(j))
(Sk(j)qjkxk) (Sk(i)\jqikxk+ qij)
Back to the CVM approach
P(x|m)=Pi P(xi,x(i)|mi, (i))/…
Padding with zeros to a higher space
A*(m)=entropy of some approx. canonical set
For pairwise and single nodes iterations:
qi = qi qij = qij qi, (i) = 0
A(Q)=supmM{mTq- A*(m)}
The objective function is concave
Pseudomarginals set is not necessarily within M
Directed lattice
Numerical results parents fact.
Evidence: x17=x18=x19=1
FPA makes use of the exact results in the evidence-free graph