Transcript Slides
Sampling Bayesian Networks
ICS 295
2008
1
Algorithm Tree
4
Sampling Fundamentals
Given a set of variables X = {X1, X2, … Xn}, a joint
probability distribution (X) and some function g(X), we
can compute expected value of g(X) :
E g
g
(
x
)
(
X
)
dx
D( X )
E g
g ( x) p ( x)
xD ( X )
5
Sampling From (X)
A sample St is an instantiation:
S {x , x ,..., x }
t
t
1
t
2
t
n
Given independent, identically distributed
samples (iid) S1, S2, …ST from (X), it follows
from Strong Law of Large Numbers:
1 T
t
g t 1 g ( S )
T
6
Sampling Challenge
It is hard to generate samples from (X)
Trade-Offs:
Generate samples from Q(X)
Forward Sampling, Likelyhood Weighting, IS
Try to find Q(X) close to (X)
Generate dependent samples forming a
Markov Chain from P’(X)(X)
Metropolis, Metropolis-Hastings, Gibbs
Try to reduce dependence between samples
7
Markov Chain
A sequence of random values x0,x1,… , defined
on a finite state space D(X), is called a Markov
Chain if it satisfies the Markov Property:
P( x
t 1
y |x x,..., x z ) P( x
t
0
t 1
y |x x)
t
If P(xt+1 =y |xt) does not change with t (time
homogeneous), then it is often expressed as a
transition function, A(x,y)
A( x, y ) 1
y
Liu, Ch.12, p 245
8
Markov Chain Monte Carlo
First, define a transition probability P(xt+1=y|xt)
Pick initial state x0, usually not important
because it becomes “forgotten”
Generate samples x1, x2,… sampling each next
value from P(X| xt)
x0
x1
xt
xt+1
If we choose proper P(xt+1|xt), we can guarantee
that the distribution represented by samples
x0,x1,… converges to (X)
9
Markov Chain Properties
Irreducibility
Periodicity
Recurrence
Revercibility
Ergodicity
Stationary Distribution
10
Irreducible
A station x is said to be irreducible if under
the transition rule one has nonzero
probability of moving from x to any other
state and then coming back in a finite
number of steps.
If on state is irreducible, then all the sates
must be irreducible.
Liu, Ch. 12, pp. 249, Def. 12.1.1
11
Aperiodic
A state x is aperiodic if the greatest
common divider of {n : An(x,x) > 0} is 1.
If state x is aperiodic and the chain is
irreducible, then every state must be
aperiodic.
Liu, Ch. 12, pp.240-250, Def. 12.1.1
12
Recurrence
A state x is recurrent if the chain returns to
x with probability 1
(n)
p
ii
State x is recurrent if and only if: n 0
Let M(x) be the expected number of steps to
return to state x
State x is positive recurrent if M(x) is finite
The recurrent states in a finite state chain are positive
recurrent.
13
Ergodicity
A state x is ergodic if it is aperiodic
and positive recurrent.
If all states in a Markov chain are
ergodic then the chain is ergodic.
14
Reversibility
Detail balance condition:
P( xt 1 j | xt i) P( xt i) P( xt i | xt 1 j ) P( xt 1 j )
Markov chain is reversible if there is a such
that:
i pij j p ji
For a reversible Markov chain, is always a
stationary distribution.
15
Stationary Distribution
If the Markov chain is time-homogeneous,
then the vector (X) is a stationary
distribution (aka invariant or equilibrium
distribution, aka “fixed point”), if its
entries sum up to 1 and satisfy:
( x)
( y) A( y, x)
yD ( X )
An irreducible chain has a stationary distribution
if and only if all of its states are positive
recurrent. The distribution is unique.
16
Stationary Distribution In Finite State Space
Stationary distribution always exists but may not
be unique
If a finite-state Markov chain is irreducible and
aperiodic, it is guaranteed to be unique and
A(n)=P(xn = y | x0) converges to a rank-one
matrix in which each row is the stationary
distribution .
Thus, initial state x0 is not important for
convergence: it gets forgotten and we start
sampling from target distribution
However, it is important how long it takes to
forget it!
17
Convergence Theorem
Given a finite state Markov Chain whose
transition function is irreducible and
aperiodic, then An(x0,y) converges to its
invariant distribution (y) geometrically in
variation distance, then there exists a 0 < r <
1 and c > 0 s.t.:
A ( x,)
n
var
cr
n
18
Eigen-Value Condition
Convergence to stationary distribution is
driven by eigen-values of matrix A(x,y).
“The chain will converge to its unique
invariant distribution if and only if matrix
A’s second largest eigen-value in modular
is strictly less than 1.”
Liu, Ch. 12, p. 249
Many proofs of convergence are centered
around analyzing second eigen-value.
19
Convergence In Finite State Space
Assume a finite-state Markov chain is
irreducible and aperiodic
Initial state x0 is not important for
convergence: it gets forgotten and we
start sampling from target distribution
However, it is important how long it takes
to forget it! – known as burn-in time
Since the first k states are not drown
exactly from , they are often thrown
away. Open question: how big a k ?
20
Sampling in BN
Same Idea: generate a set of samples T
Estimate P(Xi|E) from samples
Challenge: X is a vector and P(X) is a
huge distribution represented by BN
Need to know:
How to generate a new sample ?
How many samples T do we need ?
How to estimate P(E=e) and P(Xi|e) ?
21
Sampling Algorithms
Forward Sampling
Gibbs Sampling (MCMC)
Blocking
Rao-Blackwellised
Likelihood Weighting
Importance Sampling
Sequential Monte-Carlo (Particle
Filtering) in Dynamic Bayesian Networks
23
Gibbs Sampling
Markov Chain Monte Carlo method
(Gelfand and Smith, 1990, Smith and Roberts, 1993, Tierney, 1994)
Transition probability equals the conditional
distribution
Example: (X,Y), A(xt+1|yt)=P(x|y),
A(yt+1|xt) = P(y|x)
y1
y0
x0
x1
24
Gibbs Sampling for BN
Samples are dependent, form Markov Chain
Sample from P’(X|e) which converges to P(X|e)
Guaranteed to converge when all P > 0
Methods to improve convergence:
Blocking
Rao-Blackwellised
Error Bounds
Lag-t autocovariance
Multiple Chains, Chebyshev’s Inequality
25
Gibbs Sampling (Pearl, 1988)
A sample t[1,2,…],is an instantiation of all
variables in the network:
x { X 1 x , X 2 x ,..., X N x }
t
t
1
Sampling process
t
2
t
N
Fix values of observed variables e
Instantiate node values in sample x0 at random
Generate samples x1,x2,…xT from P(x|e)
Compute posteriors from samples
26
Ordered Gibbs Sampler
Generate sample xt+1 from xt :
Process
All
Variables
In Some
Order
t 1
1
X1 x
P( x1 | x , x ,..., x , e)
t
2
t
3
t
N
X 2 x2t 1 P( x2 | x1t 1 , x3t ,..., x Nt , e)
...
X N xNt 1 P( x N | x1t 1 , x2t 1 ,..., x Nt 11 , e)
In short, for i=1 to N:
X i xit 1 sampled from P( xi | x t \ xi , e)
27
Gibbs Sampling (cont’d)
(Pearl, 1988)
: ) ix \ tvo kram | ix (P ) ix \ t x | ix (P : tnatropmI
) j ap | j x (P ) iap | ix (P ) ix \ t x | ix (P
ihc j
X
Markov blanket:
Xi
) j ap ( ihc iap ) i X ( M
j hc j
X
te knalb vo kraM neviG
,)stnerap rieht dna ,nerdlihc ,stnerap(
sedon rehto lla fo t nednepedni si i X
28
Ordered Gibbs Sampling
Algorithm
Input: X, E
Output: T samples {xt }
Fix evidence E
Generate samples from P(X | E)
1.
2.
3.
For t = 1 to T (compute samples)
For i = 1 to N (loop through variables)
Xi sample xit from P(Xi | markovt \ Xi)
29
Answering Queries
Query: P(xi |e) = ?
Method 1: count #of samples where Xi=xi:
# samples ( X i xi )
P ( X i xi )
T
Method 2: average probability (mixture estimator):
1 T
t
P( X i xi ) t 1 P( X i xi |markov \ X i )
T
Gibbs Sampling Example - BN
X1
X3
X6
X2
X5
X8
X4
X7
X9
X = {X1,X2,…,X9}
E = {X9}
31
Gibbs Sampling Example - BN
X1
X3
X6
X2
X5
X8
X4
X7
X9
X1 = x 10
X2 = x 20
X3 = x 30
X4 = x 40
X5 = x 50
X 6 = x6 0
X 7 = x7 0
X 8 = x8 0
32
Gibbs Sampling Example - BN
X1
X3
X6
X2
X5
X8
X4
X7
X9
X1 P (X1 |X02,…,X08 ,X9}
E = {X9}
33
Gibbs Sampling Example - BN
X1
X3
X6
X2
X5
X8
X4
X7
X9
X2 P(X2 |X11,…,X08 ,X9}
E = {X9}
34
Gibbs Sampling: Illustration
35
Gibbs Sampling Example – Init
Initialize nodes with
random values:
X1 = x10 X6 = x60
X2 = x20 X7 = x70
X3 = x30 X8 = x80
X4 = x40
X5 = x50
Initialize Running Sums:
SUM1 = 0
SUM2 = 0
SUM3 = 0
SUM4 = 0
SUM5 = 0
SUM6 = 0
SUM7 = 0
SUM8 = 0
36
Gibbs Sampling Example – Step 1
Generate Sample 1
compute SUM1 += P(x1| x20, x30, x40, x50, x60, x70, x80, x9 )
select and assign new value X1 = x11
compute SUM2 += P(x2| x11, x30, x40, x50, x60, x70, x80, x9 )
1
select and assign new value X2 = x2
compute SUM3 += P(x2| x11, x21, x40, x50, x60, x70, x80, x9 )
select and assign new value X3 = x31
…..
At the end, have new sample:
1
1
1
1
1
1
1
1
S = {x1 , x2 , x4 , x5 , x6 , x7 , x8 , x9 }
37
Gibbs Sampling Example – Step 2
Generate Sample 2
Compute P(x1| x21, x31, x41, x51, x61, x71, x81, x9 )
1
select and assign new value X1 = x1
update SUM1 += P(x1| x21, x31, x41, x51, x61, x71, x81, x9 )
Compute P(x2| x12, x31, x41, x51, x61, x71, x81, x9 )
select and assign new value X2 = x21
update SUM2 += P(x2| x12, x31, x41, x51, x61, x71, x81, x9 )
Compute P(x3| x12, x22, x41, x51, x61, x71, x81, x9 )
select and assign new value X3 = x31
compute SUM3 += P(x2| x12, x22, x41, x51, x61, x71, x81, x9 )
…..
2
2
2
2
2
2
2
2
New sample: S = {x1 , x2 , x4 , x5 , x6 , x7 , x8 , x9 }
38
Gibbs Sampling Example –
Answering Queries
P(x1|x9) = SUM1 /2
P(x2|x9) = SUM2 /2
P(x3|x9) = SUM3 /2
P(x4|x9) = SUM4 /2
P(x5|x9) = SUM5 /2
P(x6|x9) = SUM6 /2
P(x7|x9) = SUM7 /2
P(x8|x9) = SUM8 /2
39
Gibbs Convergence
Stationary distribution = target sampling
distribution
MCMC converges to the stationary distribution
if network is ergodic
Chain is ergodic if all probabilities are positive
Si
pij
Sj
pij > 0
If i,j such that pij = 0 , then we may
not be able to explore full sampling
space !
40
Gibbs Sampling: Burn-In
We want to sample from P(X | E)
But…starting point is random
Solution: throw away first K samples
Known As “Burn-In”
What is K ? Hard to tell. Use intuition.
Alternatives: sample first sample values from
approximate P(x|e) (for example, run IBP first)
41
Gibbs Sampling: Performance
+Advantage: guaranteed to converge to P(X|E)
-Disadvantage: convergence may be slow
Problems:
Samples are dependent !
Statistical variance is too big in highdimensional problems
42
Gibbs: Speeding Convergence
Objectives:
1. Reduce dependence between samples
(autocorrelation)
2.
Skip samples
Randomize Variable Sampling Order
Reduce variance
Blocking Gibbs Sampling
Rao-Blackwellisation
43
Skipping Samples
Pick only every k-th sample (Gayer, 1992)
Can reduce dependence between samples !
Increases variance ! Waists samples !
44
Randomized Variable Order
Random Scan Gibbs Sampler
Pick each next variable Xi for update at random
with probability pi , i pi = 1.
(In the simplest case, pi are distributed uniformly.)
In some instances, reduces variance
(MacEachern, Peruggia, 1999
“Subsampling the Gibbs Sampler: Variance Reduction”)
45
Blocking
Sample several variables together, as a block
Example: Given three variables X,Y,Z, with
domains of size 2, group Y and Z together to form
a variable W={Y,Z} with domain size 4. Then, given
sample (xt,yt,zt), compute next sample:
Xt+1 P(yt,zt)=P(wt)
(yt+1,zt+1)=Wt+1 P(xt+1)
+ Can improve convergence greatly when two
variables are strongly correlated!
- Domain of the block variable grows exponentially
with the #variables in a block!
46
Blocking Gibbs Sampling
Jensen, Kong, Kjaerulff, 1993
“Blocking Gibbs Sampling Very Large
Probabilistic Expert Systems”
Select a set of subsets:
E1, E2, E3, …, Ek s.t. Ei X
Ui Ei = X
Ai = X \ Ei
Sample P(Ei | Ai)
47
Rao-Blackwellisation
Do not sample all variables!
Sample a subset!
Example: Given three variables X,Y,Z,
sample only X and Y, sum out Z. Given
sample (xt,yt), compute next sample:
Xt+1 P(x|yt)
yt+1 P(y|xt+1)
48
Rao-Blackwell Theorem
Bottom line: reducing number of variables in a sample reduce variance!
49
Blocking vs. Rao-Blackwellisation
X
Y
Z
Standard Gibbs:
P(x|y,z),P(y|x,z),P(z|x,y)
Blocking:
P(x|y,z), P(y,z|x)
Rao-Blackwellised:
P(x|y), P(y|x)
(1)
(2)
(3)
Var3 < Var2 < Var1
[Liu, Wong, Kong, 1994
Covariance structure of the Gibbs sampler…]
50
Rao-Blackwellised Gibbs: Cutset Sampling
Select C X (possibly cycle-cutset), |C| = m
Fix evidence E
Initialize nodes with random values:
0
For i=1 to m: ci to Ci = c i
For t=1 to n , generate samples:
For i=1 to m:
Ci=cit+1 P(ci|c1 t+1,…,ci-1 t+1,ci+1t,…,cmt ,e)
51
Cutset Sampling
Select a subset C={C1,…,CK} X
A sample t[1,2,…],is an instantiation of C:
c {C1 c , C2 c ,..., CK c }
t
t
1
Sampling process
t
2
t
K
Fix values of observed variables e
Generate sample c0 at random
Generate samples c1,c2,…cT from P(c|e)
Compute posteriors from samples
52
Cutset Sampling
Generating Samples
Generate sample ct+1 from ct :
C1 c1t 1 P(c1 | c2t , c3t ,..., cKt , e)
C2 c2t 1 P(c2 | c1t 1 , c3t ,..., cKt , e)
...
CK c
t 1
K
t 1
1
t 1
2
P(cK | c , c ,..., c
t 1
K 1
, e)
In short, for i=1 to K:
t 1
i
Ci c
sampled from P(ci | c \ ci , e)
t
53
Rao-Blackwellised Gibbs: Cutset Sampling
How to compute P(ci|c t\ci, e) ?
t
Compute joint P(ci, c \ci, e) for each ci D(Ci)
Then normalize:
P(ci| c t\ci , e) = P(ci, c t\ci , e)
Computation efficiency depends
on choice of C
54
Rao-Blackwellised Gibbs: Cutset Sampling
How to choose C ?
Special case: C is cycle-cutset, O(N)
General case: apply Bucket Tree Elimination
(BTE), O(exp(w)) where w is the induced
width of the network when nodes in C are
observed.
Pick C wisely so as to minimize w notion of
w-cutset
55
w-cutset Sampling
C=w-cutset of the network, a set of
nodes such that when C and E are
instantiated, the adjusted induced width
of the network is w
Complexity of exact inference:
bounded by w !
cycle-cutset is a special case
56
Cutset Sampling-Answering Queries
Query: ci C, P(ci |e)=? same as Gibbs:
Special case of w-cutset
1 T
t
P(ci|e) t 1 P (ci | c \ ci , e)
T
computed while generating sample t
Query: P(xi |e) = ?
1 T
P(xi|e) t 1 P ( xi | c t ,e)
T
compute after generating sample t
57
Cutset Sampling Example
c 0 {x20 ,x50 }
X1
X2
X3
X4
X5
X6
X7
X8
X9
E=x9
58
Cutset Sampling Example
Sample a new value for X2 :
c 0 {x20 ,x50 }
X1
X2
X3
BTE ( x2' , x50 ,x9 )
BTE ( x2'' , x50 ,x9 )
X4
X5
X6
X7
X8
X9
'
0
BTE
(
x
,
x
1
2
5 ,x9 )
1
0
x2 P ( x2| x5 ,x9 )
''
0
BTE ( x2 , x5 ,x9 )
59
Cutset Sampling Example
Sample a new value for X5 :
c 0 {x20 , x50 }
X1
X2
X3
x12 P ( x2| x50 ,x9 )
BTE ( x12 , x5' ,x9 )
X4
X5
X6
X7
X8
X9
BTE ( x12 , x5'' ,x9 )
1
'
BTE
(
x
,
x
1
2
5 ,x9 )
1
1
x5 P ( x5| x 2 ,x9 )
1
''
BTE ( x2 , x5 ,x9 )
c1 {x12 , x51 }
60
Cutset Sampling Example
Query P(x2|e) for sampling node X2 :
x12 P ( x2| x50 ,x9 ) Sample 1
X1
X2
X3
x22 P ( x2| x51 ,x9 ) Sample 2
X4
X7
X5
X8
X6
X9
x23 P ( x2| x52 ,x9 ) Sample 3
P ( x2| x50 ,x9 )
1
1
P ( x2 | x9 ) P ( x2| x5 ,x9 )
3
2
P
(
x
|
x
,x
)
2
5
9
61
Cutset Sampling Example
Query P(x3 |e) for non-sampled node X3 :
c1 {x12 , x51 } P( x3 | x12 , x51 , x9 )
X1
X2
X3
c 2 {x22 , x52 } P( x3 | x22 , x52 , x9 )
c 3 {x23 , x53 } P( x3 | x23 , x53 , x9 )
X4
X5
X6
X7
X8
X9
P( x3 | x12 , x51 , x9 )
1
2
2
P( x3 | x9 ) P( x3 | x2 , x5 , x9 )
3
3
3
P
(
x
|
x
,
x
,
x
)
3
2
5
9
62
Gibbs: Error Bounds
Objectives:
Estimate needed number of samples T
Estimate error
Methodology:
1 chain use lag-k autocovariance
Estimate T
M chains standard sampling variance
Estimate Error
63
Gibbs: lag-k autocovariance
Pi P ( xi | x \ xi )
t
1
N
t
P P ( xi | e) t 1 P ( xi | x \ xi )
T
1
N k
(k ) t 1 ( Pi P )( Pi k P )
T
Lag-k autocovariance
2 1
1
Var ( P ) (0) 2 (i )
T
i 1
64
Gibbs: lag-k autocovariance
Estimate Monte Carlo variance:
1
Var ( P )
T
2 1
(0) 2 (i )
i 1
Here, is the smallest positive integer satisfying:
( 2 ) ( 2 1) 1
Effective chain size:
(0)
ˆ
T
Var ( P )
ˆ T
In absense of autocovariance: T
65
Gibbs: Multiple Chains
Generate M chains of size K
Each chain produces independent estimate Pm:
1
Pm P ( xi | e)
K
K
t
P
(
x
|
x
\
x
)
i
i
t 1
Estimate P(xi|e) as average of Pm (xi|e) :
1
P
M
M
i 1
Pm
Treat Pm as independent random variables.
66
Gibbs: Multiple Chains
{ Pm } are independent random variables
Therefore:
2
1
M
Var ( P ) S
P
P
m
m 1
M 1
1 M
2
2
Pm MP
M 1 m 1
2
t / 2, M 1
S
M
67
Geman&Geman1984
Geman, S. & Geman D., 1984. Stocahstic
relaxation, Gibbs distributions, and the Bayesian
restoration of images. IEEE
Trans.Pat.Anal.Mach.Intel. 6, 721-41.
Introduce Gibbs sampling;
Place the idea of Gibbs sampling in a general setting
in which the collection of variables is structured in a
graphical model and each variable has a
neighborhood corresponding to a local region of the
graphical structure. Geman and Geman use the Gibbs
distribution to define the joint distribution on this
structured set of variables.
68
Tanner&Wong 1987
Tanner and Wong (1987)
Data-augmentation
Convergence Results
69
Pearl1988
Pearl,1988. Probabilistic Reasoning in
Intelligent Systems, Morgan-Kaufmann.
In the case of Bayesian networks, the
neighborhoods correspond to the Markov
blanket of a variable and the joint
distribution is defined by the factorization
of the network.
70
Gelfand&Smith,1990
Gelfand, A.E. and Smith, A.F.M., 1990.
Sampling-based approaches to
calculating marginal densities. J.
Am.Statist. Assoc. 85, 398-409.
Show variance reduction in using mixture
estimator for posterior marginals.
71
Neal, 1992
R. M. Neal, 1992. Connectionist
learning of belief networks, Artifical
Intelligence, v. 56, pp. 71-118.
Stochastic simulation in Noisy-Or
networks.
72
CPCS54 Test Results
CPCS54, n=54, |C|=15, |E|=3
CPCS54, n=54, |C|=15, |E|=3
Cutset
Cutset
Gibbs
Gibbs
0.004
0.0008
0.003
0.0006
0.002
0.0004
0.001
0.0002
0
0
0
1000
2000
3000
4000
5000
0
5
# samples
10
15
20
25
Tim e(sec)
MSE vs. #samples (left) and time (right)
Ergodic, |X| = 54, D(Xi) = 2, |C| = 15, |E| = 4
Exact Time = 30 sec using Cutset Conditioning
73
CPCS179 Test Results
CPCS179, n=179, |C|=8, |E|=35
Cutset
CPCS179, n=179, |C|=8, |E|=35
Gibbs
Cutset
Gibbs
0.012
0.012
0.01
0.01
0.008
0.008
0.006
0.006
0.004
0.004
0.002
0.002
0
0
100
500
1000
2000
# sam ples
3000
4000
0
20
40
60
80
Tim e(sec)
MSE vs. #samples (left) and time (right)
Non-Ergodic (1 deterministic CPT entry)
|X| = 179, |C| = 8, 2<= D(Xi)<=4, |E| = 35
Exact Time = 122 sec using Loop-Cutset Conditioning
74
CPCS360b Test Results
CPCS360b, n=360, |C|=21, |E|=36
CPCS360b, n=360, |C|=21, |E|=36
Cutset
Cutset
Gibbs
0.00016
0.00016
0.00012
0.00012
0.00008
0.00008
0.00004
0.00004
0
Gibbs
0
0
200
400
600
800
1000
1
2
3
5
# samples
10
20
30
40
50
60
Tim e(sec)
MSE vs. #samples (left) and time (right)
Ergodic, |X| = 360, D(Xi)=2, |C| = 21, |E| = 36
Exact Time > 60 min using Cutset Conditioning
Exact Values obtained via Bucket Elimination
75
Random Networks
RANDOM, n=100, |C|=13, |E|=15-20
RANDOM, n=100, |C|=13, |E|=15-20
Cutset
Gibbs
Cutset
Gibbs
0.001
0.0035
0.0008
0.003
0.0025
0.0006
0.002
0.0004
0.0015
0.001
0.0002
0.0005
0
0
0
200
400
600
800
1000
1200
0
1
2
3
# samples
4
5
6
7
8
9
10
11
Time(sec)
MSE vs. #samples (left) and time (right)
|X| = 100, D(Xi) =2,|C| = 13, |E| = 15-20
Exact Time = 30 sec using Cutset Conditioning
76
Coding Networks
x1
x2
x3
x4
Coding Networks, n=100, |C|=12-14
u1
u2
u3
u4
p1
p2
p3
p4
y1
y2
y3
y4
IBP
Gibbs
Cutset
0.1
0.01
0.001
0
10
20
30
40
50
60
Tim e(sec)
MSE vs. time (right)
Non-Ergodic, |X| = 100, D(Xi)=2, |C| = 13-16, |E| = 50
Sample Ergodic Subspace U={U1, U2,…Uk}
Exact Time = 50 sec using Cutset Conditioning
77
Non-Ergodic Hailfinder
HailFinder, n=56, |C|=5, |E|=1
HailFinder, n=56, |C|=5, |E|=1
Cutset
Cutset
Gibbs
Gibbs
1
0.1
0.1
0.01
0.01
0.001
0.001
0.0001
0.0001
0
500
1000
1500
1
2
# samples
3
4
5
6
7
8
9
10
Tim e(sec)
MSE vs. #samples (left) and time (right)
Non-Ergodic, |X| = 56, |C| = 5, 2 <=D(Xi) <=11, |E| = 0
Exact Time = 2 sec using Loop-Cutset Conditioning
78
Non-Ergodic CPCS360b - MSE
cpcs360b, N=360, |E|=[20-34], w*=20, MSE
Gibbs
0.000025
IBP
|C|=26,fw=3
0.00002
|C|=48,fw=2
0.000015
0.00001
0.000005
0
0
200
400
600
800
1000
1200
1400
1600
Time (sec)
MSE vs. Time
Non-Ergodic, |X| = 360, |C| = 26, D(Xi)=2
Exact Time = 50 min using BTE
79
Non-Ergodic CPCS360b - MaxErr
cpcs360b, N=360, |E|=[20-34], MaxErr
Gibbs
0.007
IBP
0.006
|C|=26,fw=3
0.005
|C|=48,fw=2
0.004
0.003
0.002
0.001
0
0
200
400
600
800
1000
1200
1400
1600
Time (sec)
80
Importance vs. Gibbs
~
Gibbs : x p ( x | e)
T
~
p ( x | e)
p ( x | e)
t
T
1
t
f ( x) f ( x )
T t 1
Importance : x q ( x | e)
t
T
t
t
1
f ( x ) p( x )
f
t
T t 1
q( x )
wt
81