Factor Graphs
Download
Report
Transcript Factor Graphs
Christopher M. Bishop,
Pattern Recognition and Machine Learning
1
Outline
Introduction
Directed Graphs
Undirected Graphs
Factor Graphs
Summary
2
Outline
Introduction
Directed Graphs
Undirected Graphs
Factor Graphs
Summary
3
Introduction
A graph consists of nodes (vertices) that are
connected by edges (links, arcs)
They provide a simple and clear way to
visualize the probabilistic model
Complex computations can be expressed in
terms of graphical manipulations
4
Probabilistic Graphical Models
There are two models: directed and
undirected graphical models
Each node represents a random variable
and the edges represent probabilistic
relationships between these variables
Directed
Undirected
5
Outline
Introduction
Directed Graphs
Undirected Graphs
Factor Graphs
Summary
6
Directed Graphical Models
An example: p ( a , b , c ) p ( c | a , b ) p ( a , b )
p (c | a , b ) p (b | a ) p ( a )
a
b
c
Definition: for a graph with K nodes, the joint
distribution is given by
K
p ( x1 ,
, xK )
p ( x k | pa k )
k 1
where pa k denotes the set of parents of x k
7
An Example
x1
x2
x4
x6
p ( x1 ,
x3
x5
x7
, x 7 ) p ( x1 ) p ( x 2 ) p ( x 3 ) p ( x 4 | x1 , x 2 , x 3 )
p ( x 5 | x1 , x 3 ) p ( x 6 | x 4 ) p ( x 7 | x 4 , x 5 )
8
Conditional Independence (1)
a
is conditionally independent of b given c :
p (a | b, c) p (a | c)
p ( a , b | c ) p ( a | b , c ) p (b | c ) p ( a | c ) p (b | c )
A shorthand notation:
There are three types of conditional
independencies for the directed graphs
9
Conditional Independence (2)
tail-to-tail
a
b
p ( a , b , c ) p ( a | c ) p (b | c ) p ( c )
p (a, b)
c
blocked
c
c
p ( a | c ) p (b | c ) p ( c )
a
b
p (a, b | c)
p (a, b, c)
p (c )
p ( a | c ) p (b | c )
10
Conditional Independence (2)
a
c
b
head-to-tail
b
a
c
head-to-head
b
a
c
dependence
Definition: d-separation is the notion of
being separated on a directed graph
11
D-separation: an example
f
a
e
b
c
12
Application: an Example
Hidden Markov model:
13
Outline
Introduction
Directed Graphs
Undirected Graphs
Factor Graphs
Summary
14
Undirected Graphical Models
C
B
A
Nodes of set A and B are separated by the
third set C A and B are conditionally
independent,
p ( a1 , b1 | c1 , c 2 ) p ( a1 | c1 , c 2 ) p ( b1 | c1 , c 2 )
15
Conditional Independence
C1
H1
H2
C2
The computers can infect each other via the
hubs and the hubs can infect each other via
the computers
16
Cliques
Definition: a subset of the nodes in a clique
is fully connected
Maximal cliques
x1
x2
x3
x4
We can define the factors in decomposition
of the joint distribution as functions of the
variable in the clique
17
Undirected Factorization
Consider factorizations of the form:
p(x )
1
Z
C
(x C )
C
where C (x C ) is a non-negative potential
function of a maximal clique
An example:
x
x
p ( x1 ,
1
2
x3
x4
, x 4 ) ( x1 , x 2 , x 3 ) ( x 2 , x 3 , x 4 )
18
An Example
Markov random field:
19
Directed versus Undirected (1)
x1
x3
x1
x2
moralization
x3
x2
moral graph
x4
p ( x1 ,
x4
, x 4 ) p ( x1 ) p ( x 2 ) p ( x 3 ) p ( x 4 | x1 , x 2 , x 3 )
( x1 , x 2 , x 3 , x 4 )
We have to discard some conditional
independence properties to complete this
transfer
20
Directed versus Undirected (2)
D
U
P
P: the set of all distributions over a given set
of variables
21
Outline
Introduction
Directed Graphs
Undirected Graphs
Factor Graphs
Summary
22
Factor Graphs (1)
A factor graph is a more general graph
It allows us to be more explicit about the
details of the factorization
An example:
x2
x1
x3
Variable node
Factor node
fa
fb
fc
fd
p ( x1 , x 2 , x 3 ) f a ( x1 , x 2 ) f b ( x1 , x 2 ) f c ( x 2 , x 3 ) f d ( x 3 )
23
Factor Graphs (2)
Definition: given a factor graph, the joint
probability distribution is given by
p(x )
fs (x s )
s
where the x s denotes a subset of the
variables that connect to the factor f s
Each factor f s is a function of a
corresponding set of variables x s
24
Factor Graphs (3)
( x1 , x 2 , x 3 )
f a ( x1f, (xx2 1, ,xx32), fxb3()x1 , x 2 )
25
Factor Graphs (4)
f a ( x1 ) pf ((xx11),, xf2b,(xx32)) pp((xx1 2) ),p (fxc 2( )x1p,(xx23, |xx3 1), x 2 p) ( x 3 | x1 , x 2 )
Directed and undirected graphs are special
cases of factor graphs
26
Sum-Product Algorithm (1)
Goal:
Obtain a efficient, exact inference algorithm for
finding marginals
Allow computations to be shared efficiently
By definition, the marginal is
p(x)
p (x )
x\x
27
Sum-Product Algorithm (2)
p(x )
Fs ( x , X s )
s ne ( x )
where
n e ( x ) : the factor nodes are neighbors of x
X s : all variables in the subtree
Fs ( x , X s ) : the product of all the factors in the
group associated with factor f s
28
Sum-Product Algorithm (3)
p( x)
p(x )
x\x
f
x\x
Fs ( x , X s )
s ne ( x )
Fs ( x , X s )
s ne ( x ) X s
s ne ( x )
f
s
x
( x)
can be view as messages from the
factor node fs to the variable node x
Fs ( x , X s ) which is a factor sub-graph can
itself be factorized
s
x
( x)
Fs ( x , X s ) f s ( x , x1 ,
, x M ) G1 ( x1 , X s1 )
G M ( x M , X sM )
29
Sum-Product Algorithm (4)
G1(x1,Xs1)
x1
x
x2
fs
xM
Fs ( x , X s ) f s ( x , x1 ,
, x M ) G1 ( x1 , X s1 )
G M ( x M , X sM )
30
Sum-Product Algorithm (5)
f
s
x
( x)
x1
xM
x1
xM
f s ( x , x1 ,
, x M ) G m ( x m , X sm )
m ne ( f s ) \ x X sm
f s ( x , x1 ,
, xM )
m ne ( f s ) \ x
x
m
fs
( xm )
31
Sum-Product Algorithm (6)
G m ( x m , X sm )
Fl ( x m , X m l )
l n e ( x m ) \ f s
x
m
fs
( xm )
l ne ( x m ) \ f s
Fl ( x m , X m l )
X ml
l ne ( x m ) \ f s
f
l
xm
( xm )
32
Sum-Product Algorithm (7)
Messages:
Variable node factor node: take the product of
the in coming messages along all of the other link
Factor node variable node: take the product of
the in coming messages along all of the other link
and multiply by the factor
33
Sum-Product Algorithm (8)
The sum-product algorithm can be viewed
purely in terms of messages sent out by
factor nodes to other factor nodes
34
Sum-Product Algorithm – an Example
root
x1
x2
x3
fb
fa
fc
x4
35
Max-Sum Algorithm (1)
Find a setting of the variables that has the
largest probability
x
m ax
arg m ax p ( x )
x
Find the value of that probability
p(x
m ax
) m ax p ( x )
x
m ax p ( x ) m ax
x
x1
m ax p ( x )
xM
36
Max-Sum Algorithm (2)
Compare this with the marginal:
p(x
m ax
) m ax p ( x )
x
m ax m ax p ( x )
x
x\x
p(x)
p (x )
x\x
That is similar to the sum-product algorithm
except that the summations are replaced by
maximization
37
Max-Sum Algorithm (3)
The max-product algorithm:
f
s
x
( x)
x1
xM
f x ( x ) m ax
x1
f s ( x , x1 ,
x
mne ( f s ) \ x
m ax f ( x , x1 ,
xM
x f ( x )
, xM )
, xM )
mne ( f ) \ x
l n e ( x ) \ f
f
l
x
m
fs
x
m
( xm )
f
( xm )
( x)
38
Max-Sum Algorithm (4)
It is convenient to work with the logarithm of
the joint distribution
The max-sum algorithm:
f x ( x ) m ax ln f ( x , x1 ,
x1 , , x M
x f ( x)
, xM )
l ne ( x ) \ f
m ne ( f ) \ x
f
l
x
x
m
f
( xm )
( x)
39
Max-Sum Algorithm (5)
We can find the maximum by propagating
messages from leaves to a root node
p
m ax
m ax f s x ( x )
x
s ne ( x )
Now we want to find the configuration of the
variables for which the joint distribution
attains this maximum value
40
Max-Sum Algorithm (6)
An example:
x2
x1
x3
f2,3
f1,2
x
m ax
N
xN-1
xN
fN-1,N
arg m ax f N 1 , N x N ( x N )
x
N
( x n ) arg m ax ln f n 1, n ( x n 1, x n ) x
x n 1
n 1 f n 1 , n
( xn )
m ax
N
Once we know x , we can propagate a
message back down the chain using
x n 1 ( x n
m ax
m ax
)
41
Max-Sum Algorithm (7)
It is known as back-tracking
This can be extended to a general treestructure factor graph
42
Examples
A Markov chain:
A hidden Markov model:
43
Outline
Introduction
Directed Graphs
Undirected Graphs
Factor Graphs
Summary
44
Summary
The author introduces three types of
probabilistic graphs
Graphical models are composed of
probability theory and graphical theory
The concept is to factorize a complicated
system into some simple components
45