Transcript Slides

Sampling Bayesian Networks
ICS 295
2008
1
Algorithm Tree
4
Sampling Fundamentals
Given a set of variables X = {X1, X2, … Xn}, a joint
probability distribution (X) and some function g(X), we
can compute expected value of g(X) :
E g 
g
(
x
)

(
X
)
dx

D( X )
E g 
 g ( x) p ( x)
xD ( X )
5
Sampling From (X)
A sample St is an instantiation:
S  {x , x ,..., x }
t
t
1
t
2
t
n
Given independent, identically distributed
samples (iid) S1, S2, …ST from (X), it follows
from Strong Law of Large Numbers:
1 T
t
g  t 1 g ( S )
T
6
Sampling Challenge


It is hard to generate samples from (X)
Trade-Offs:

Generate samples from Q(X)



Forward Sampling, Likelyhood Weighting, IS
Try to find Q(X) close to (X)
Generate dependent samples forming a
Markov Chain from P’(X)(X)


Metropolis, Metropolis-Hastings, Gibbs
Try to reduce dependence between samples
7
Markov Chain

A sequence of random values x0,x1,… , defined
on a finite state space D(X), is called a Markov
Chain if it satisfies the Markov Property:
P( x

t 1
 y |x  x,..., x  z )  P( x
t
0
t 1
 y |x  x)
t
If P(xt+1 =y |xt) does not change with t (time
homogeneous), then it is often expressed as a
transition function, A(x,y)
 A( x, y )  1
y
Liu, Ch.12, p 245
8
Markov Chain Monte Carlo



First, define a transition probability P(xt+1=y|xt)
Pick initial state x0, usually not important
because it becomes “forgotten”
Generate samples x1, x2,… sampling each next
value from P(X| xt)
x0

x1
xt
xt+1
If we choose proper P(xt+1|xt), we can guarantee
that the distribution represented by samples
x0,x1,… converges to (X)
9
Markov Chain Properties






Irreducibility
Periodicity
Recurrence
Revercibility
Ergodicity
Stationary Distribution
10
Irreducible


A station x is said to be irreducible if under
the transition rule one has nonzero
probability of moving from x to any other
state and then coming back in a finite
number of steps.
If on state is irreducible, then all the sates
must be irreducible.
Liu, Ch. 12, pp. 249, Def. 12.1.1
11
Aperiodic


A state x is aperiodic if the greatest
common divider of {n : An(x,x) > 0} is 1.
If state x is aperiodic and the chain is
irreducible, then every state must be
aperiodic.
Liu, Ch. 12, pp.240-250, Def. 12.1.1
12
Recurrence




A state x is recurrent if the chain returns to

x with probability 1
(n)
p


ii
State x is recurrent if and only if: n 0
Let M(x) be the expected number of steps to
return to state x
State x is positive recurrent if M(x) is finite

The recurrent states in a finite state chain are positive
recurrent.
13
Ergodicity


A state x is ergodic if it is aperiodic
and positive recurrent.
If all states in a Markov chain are
ergodic then the chain is ergodic.
14
Reversibility

Detail balance condition:
P( xt 1  j | xt  i) P( xt  i)  P( xt  i | xt 1  j ) P( xt 1  j )

Markov chain is reversible if there is a  such
that:
 i pij   j p ji

For a reversible Markov chain,  is always a
stationary distribution.
15
Stationary Distribution

If the Markov chain is time-homogeneous,
then the vector (X) is a stationary
distribution (aka invariant or equilibrium
distribution, aka “fixed point”), if its
entries sum up to 1 and satisfy:
 ( x) 
  ( y) A( y, x)
yD ( X )

An irreducible chain has a stationary distribution
if and only if all of its states are positive
recurrent. The distribution is unique.
16
Stationary Distribution In Finite State Space




Stationary distribution always exists but may not
be unique
If a finite-state Markov chain is irreducible and
aperiodic, it is guaranteed to be unique and
A(n)=P(xn = y | x0) converges to a rank-one
matrix in which each row is the stationary
distribution .
Thus, initial state x0 is not important for
convergence: it gets forgotten and we start
sampling from target distribution
However, it is important how long it takes to
forget it!
17
Convergence Theorem

Given a finite state Markov Chain whose
transition function is irreducible and
aperiodic, then An(x0,y) converges to its
invariant distribution (y) geometrically in
variation distance, then there exists a 0 < r <
1 and c > 0 s.t.:
A ( x,)  
n
var
 cr
n
18
Eigen-Value Condition


Convergence to stationary distribution is
driven by eigen-values of matrix A(x,y).
“The chain will converge to its unique
invariant distribution if and only if matrix
A’s second largest eigen-value in modular
is strictly less than 1.”
Liu, Ch. 12, p. 249

Many proofs of convergence are centered
around analyzing second eigen-value.
19
Convergence In Finite State Space




Assume a finite-state Markov chain is
irreducible and aperiodic
Initial state x0 is not important for
convergence: it gets forgotten and we
start sampling from target distribution
However, it is important how long it takes
to forget it! – known as burn-in time
Since the first k states are not drown
exactly from , they are often thrown
away. Open question: how big a k ?
20
Sampling in BN




Same Idea: generate a set of samples T
Estimate P(Xi|E) from samples
Challenge: X is a vector and P(X) is a
huge distribution represented by BN
Need to know:



How to generate a new sample ?
How many samples T do we need ?
How to estimate P(E=e) and P(Xi|e) ?
21
Sampling Algorithms


Forward Sampling
Gibbs Sampling (MCMC)





Blocking
Rao-Blackwellised
Likelihood Weighting
Importance Sampling
Sequential Monte-Carlo (Particle
Filtering) in Dynamic Bayesian Networks
23
Gibbs Sampling

Markov Chain Monte Carlo method
(Gelfand and Smith, 1990, Smith and Roberts, 1993, Tierney, 1994)


Transition probability equals the conditional
distribution
Example: (X,Y), A(xt+1|yt)=P(x|y),
A(yt+1|xt) = P(y|x)
y1
y0
x0
x1
24
Gibbs Sampling for BN




Samples are dependent, form Markov Chain
Sample from P’(X|e) which converges to P(X|e)
Guaranteed to converge when all P > 0
Methods to improve convergence:



Blocking
Rao-Blackwellised
Error Bounds


Lag-t autocovariance
Multiple Chains, Chebyshev’s Inequality
25
Gibbs Sampling (Pearl, 1988)

A sample t[1,2,…],is an instantiation of all
variables in the network:
x  { X 1  x , X 2  x ,..., X N  x }
t

t
1
Sampling process




t
2
t
N
Fix values of observed variables e
Instantiate node values in sample x0 at random
Generate samples x1,x2,…xT from P(x|e)
Compute posteriors from samples
26
Ordered Gibbs Sampler
Generate sample xt+1 from xt :
Process
All
Variables
In Some
Order
t 1
1
X1  x
 P( x1 | x , x ,..., x , e)
t
2
t
3
t
N
X 2  x2t 1  P( x2 | x1t 1 , x3t ,..., x Nt , e)
...
X N  xNt 1  P( x N | x1t 1 , x2t 1 ,..., x Nt 11 , e)
In short, for i=1 to N:
X i  xit 1  sampled from P( xi | x t \ xi , e)
27
Gibbs Sampling (cont’d)
(Pearl, 1988)
: ) ix \ tvo kram | ix (P  ) ix \ t x | ix (P : tnatropmI
) j ap | j x (P  ) iap | ix (P  ) ix \ t x | ix (P
ihc j
X
Markov blanket:
Xi
) j ap  (  ihc  iap  ) i X ( M
j hc j
X
te knalb vo kraM neviG
,)stnerap rieht dna ,nerdlihc ,stnerap(
sedon rehto lla fo t nednepedni si i X
28
Ordered Gibbs Sampling
Algorithm
Input: X, E
Output: T samples {xt }

Fix evidence E

Generate samples from P(X | E)
1.
2.
3.
For t = 1 to T (compute samples)
For i = 1 to N (loop through variables)
Xi  sample xit from P(Xi | markovt \ Xi)
29
Answering Queries


Query: P(xi |e) = ?
Method 1: count #of samples where Xi=xi:
# samples ( X i  xi )
P ( X i  xi ) 
T
Method 2: average probability (mixture estimator):
1 T
t
P( X i  xi )  t 1 P( X i  xi |markov \ X i )
T
Gibbs Sampling Example - BN
X1
X3
X6
X2
X5
X8
X4
X7
X9
X = {X1,X2,…,X9}
E = {X9}
31
Gibbs Sampling Example - BN
X1
X3
X6
X2
X5
X8
X4
X7
X9
X1 = x 10
X2 = x 20
X3 = x 30
X4 = x 40
X5 = x 50
X 6 = x6 0
X 7 = x7 0
X 8 = x8 0
32
Gibbs Sampling Example - BN
X1
X3
X6
X2
X5
X8
X4
X7
X9
X1  P (X1 |X02,…,X08 ,X9}
E = {X9}
33
Gibbs Sampling Example - BN
X1
X3
X6
X2
X5
X8
X4
X7
X9
X2  P(X2 |X11,…,X08 ,X9}
E = {X9}
34
Gibbs Sampling: Illustration
35
Gibbs Sampling Example – Init
Initialize nodes with
random values:
X1 = x10 X6 = x60
X2 = x20 X7 = x70
X3 = x30 X8 = x80
X4 = x40
X5 = x50
Initialize Running Sums:
SUM1 = 0
SUM2 = 0
SUM3 = 0
SUM4 = 0
SUM5 = 0
SUM6 = 0
SUM7 = 0
SUM8 = 0
36
Gibbs Sampling Example – Step 1

Generate Sample 1







compute SUM1 += P(x1| x20, x30, x40, x50, x60, x70, x80, x9 )
select and assign new value X1 = x11
compute SUM2 += P(x2| x11, x30, x40, x50, x60, x70, x80, x9 )
1
select and assign new value X2 = x2
compute SUM3 += P(x2| x11, x21, x40, x50, x60, x70, x80, x9 )
select and assign new value X3 = x31
…..
At the end, have new sample:
1
1
1
1
1
1
1
1
S = {x1 , x2 , x4 , x5 , x6 , x7 , x8 , x9 }

37
Gibbs Sampling Example – Step 2

Generate Sample 2











Compute P(x1| x21, x31, x41, x51, x61, x71, x81, x9 )
1
select and assign new value X1 = x1
update SUM1 += P(x1| x21, x31, x41, x51, x61, x71, x81, x9 )
Compute P(x2| x12, x31, x41, x51, x61, x71, x81, x9 )
select and assign new value X2 = x21
update SUM2 += P(x2| x12, x31, x41, x51, x61, x71, x81, x9 )
Compute P(x3| x12, x22, x41, x51, x61, x71, x81, x9 )
select and assign new value X3 = x31
compute SUM3 += P(x2| x12, x22, x41, x51, x61, x71, x81, x9 )
…..
2
2
2
2
2
2
2
2
New sample: S = {x1 , x2 , x4 , x5 , x6 , x7 , x8 , x9 }
38
Gibbs Sampling Example –
Answering Queries
P(x1|x9) = SUM1 /2
P(x2|x9) = SUM2 /2
P(x3|x9) = SUM3 /2
P(x4|x9) = SUM4 /2
P(x5|x9) = SUM5 /2
P(x6|x9) = SUM6 /2
P(x7|x9) = SUM7 /2
P(x8|x9) = SUM8 /2
39
Gibbs Convergence



Stationary distribution = target sampling
distribution
MCMC converges to the stationary distribution
if network is ergodic
Chain is ergodic if all probabilities are positive
Si
pij
Sj
pij > 0
If i,j such that pij = 0 , then we may
not be able to explore full sampling
space !

40
Gibbs Sampling: Burn-In






We want to sample from P(X | E)
But…starting point is random
Solution: throw away first K samples
Known As “Burn-In”
What is K ? Hard to tell. Use intuition.
Alternatives: sample first sample values from
approximate P(x|e) (for example, run IBP first)
41
Gibbs Sampling: Performance
+Advantage: guaranteed to converge to P(X|E)
-Disadvantage: convergence may be slow
Problems:


Samples are dependent !
Statistical variance is too big in highdimensional problems
42
Gibbs: Speeding Convergence
Objectives:
1. Reduce dependence between samples
(autocorrelation)


2.
Skip samples
Randomize Variable Sampling Order
Reduce variance


Blocking Gibbs Sampling
Rao-Blackwellisation
43
Skipping Samples
Pick only every k-th sample (Gayer, 1992)
Can reduce dependence between samples !
Increases variance ! Waists samples !

44
Randomized Variable Order
Random Scan Gibbs Sampler
Pick each next variable Xi for update at random
with probability pi , i pi = 1.
(In the simplest case, pi are distributed uniformly.)
In some instances, reduces variance
(MacEachern, Peruggia, 1999
“Subsampling the Gibbs Sampler: Variance Reduction”)
45
Blocking
Sample several variables together, as a block
 Example: Given three variables X,Y,Z, with
domains of size 2, group Y and Z together to form
a variable W={Y,Z} with domain size 4. Then, given
sample (xt,yt,zt), compute next sample:
Xt+1  P(yt,zt)=P(wt)
(yt+1,zt+1)=Wt+1  P(xt+1)
+ Can improve convergence greatly when two
variables are strongly correlated!
- Domain of the block variable grows exponentially
with the #variables in a block!

46
Blocking Gibbs Sampling
Jensen, Kong, Kjaerulff, 1993
“Blocking Gibbs Sampling Very Large
Probabilistic Expert Systems”
 Select a set of subsets:
E1, E2, E3, …, Ek s.t. Ei  X
Ui Ei = X
Ai = X \ Ei
 Sample P(Ei | Ai)
47
Rao-Blackwellisation



Do not sample all variables!
Sample a subset!
Example: Given three variables X,Y,Z,
sample only X and Y, sum out Z. Given
sample (xt,yt), compute next sample:
Xt+1  P(x|yt)
yt+1  P(y|xt+1)
48
Rao-Blackwell Theorem
Bottom line: reducing number of variables in a sample reduce variance!
49
Blocking vs. Rao-Blackwellisation

X
Y


Z
Standard Gibbs:
P(x|y,z),P(y|x,z),P(z|x,y)
Blocking:
P(x|y,z), P(y,z|x)
Rao-Blackwellised:
P(x|y), P(y|x)
(1)
(2)
(3)
Var3 < Var2 < Var1
[Liu, Wong, Kong, 1994
Covariance structure of the Gibbs sampler…]
50
Rao-Blackwellised Gibbs: Cutset Sampling




Select C  X (possibly cycle-cutset), |C| = m
Fix evidence E
Initialize nodes with random values:
0
For i=1 to m: ci to Ci = c i
For t=1 to n , generate samples:
For i=1 to m:
Ci=cit+1  P(ci|c1 t+1,…,ci-1 t+1,ci+1t,…,cmt ,e)
51
Cutset Sampling


Select a subset C={C1,…,CK}  X
A sample t[1,2,…],is an instantiation of C:
c  {C1  c , C2  c ,..., CK  c }
t

t
1
Sampling process




t
2
t
K
Fix values of observed variables e
Generate sample c0 at random
Generate samples c1,c2,…cT from P(c|e)
Compute posteriors from samples
52
Cutset Sampling
Generating Samples
Generate sample ct+1 from ct :
C1  c1t 1  P(c1 | c2t , c3t ,..., cKt , e)
C2  c2t 1  P(c2 | c1t 1 , c3t ,..., cKt , e)
...
CK  c
t 1
K
t 1
1
t 1
2
 P(cK | c , c ,..., c
t 1
K 1
, e)
In short, for i=1 to K:
t 1
i
Ci  c
 sampled from P(ci | c \ ci , e)
t
53
Rao-Blackwellised Gibbs: Cutset Sampling
How to compute P(ci|c t\ci, e) ?
t
 Compute joint P(ci, c \ci, e) for each ci  D(Ci)
 Then normalize:
P(ci| c t\ci , e) =  P(ci, c t\ci , e)
 Computation efficiency depends
on choice of C
54
Rao-Blackwellised Gibbs: Cutset Sampling
How to choose C ?
 Special case: C is cycle-cutset, O(N)
 General case: apply Bucket Tree Elimination
(BTE), O(exp(w)) where w is the induced
width of the network when nodes in C are
observed.
 Pick C wisely so as to minimize w  notion of
w-cutset
55
w-cutset Sampling



C=w-cutset of the network, a set of
nodes such that when C and E are
instantiated, the adjusted induced width
of the network is w
Complexity of exact inference:
bounded by w !
cycle-cutset is a special case
56
Cutset Sampling-Answering Queries


Query: ci C, P(ci |e)=? same as Gibbs:
Special case of w-cutset
1 T
t
P(ci|e)  t 1 P (ci | c \ ci , e)
T
computed while generating sample t

Query: P(xi |e) = ?
1 T
P(xi|e)  t 1 P ( xi | c t ,e)
T
compute after generating sample t
57
Cutset Sampling Example
c 0  {x20 ,x50 }
X1
X2
X3
X4
X5
X6
X7
X8
X9
E=x9
58
Cutset Sampling Example
Sample a new value for X2 :
c 0  {x20 ,x50 }
X1
X2
X3
BTE ( x2' , x50 ,x9 )
BTE ( x2'' , x50 ,x9 )
X4
X5
X6
X7
X8
X9
'
0


BTE
(
x
,
x
1
2
5 ,x9 )
1
0
x2  P ( x2| x5 ,x9 )  

''
0
  BTE ( x2 , x5 ,x9 )
59
Cutset Sampling Example
Sample a new value for X5 :
c 0  {x20 , x50 }
X1
X2
X3
x12  P ( x2| x50 ,x9 )
BTE ( x12 , x5' ,x9 )
X4
X5
X6
X7
X8
X9
BTE ( x12 , x5'' ,x9 )
1
'


BTE
(
x
,
x
1
2
5 ,x9 )
1
1
x5  P ( x5| x 2 ,x9 )  

1
''
  BTE ( x2 , x5 ,x9 )
c1  {x12 , x51 }
60
Cutset Sampling Example
Query P(x2|e) for sampling node X2 :
x12  P ( x2| x50 ,x9 ) Sample 1
X1
X2
X3

x22  P ( x2| x51 ,x9 ) Sample 2

X4
X7
X5
X8
X6
X9
x23  P ( x2| x52 ,x9 ) Sample 3
 P ( x2| x50 ,x9 ) 

1
1
P ( x2 | x9 )   P ( x2| x5 ,x9 ) 
3

2

P
(
x
|
x
,x
)
2
5
9 

61
Cutset Sampling Example
Query P(x3 |e) for non-sampled node X3 :
c1  {x12 , x51 }  P( x3 | x12 , x51 , x9 )
X1
X2
X3
c 2  {x22 , x52 }  P( x3 | x22 , x52 , x9 )
c 3  {x23 , x53 }  P( x3 | x23 , x53 , x9 )
X4
X5
X6
X7
X8
X9
 P( x3 | x12 , x51 , x9 ) 

1
2
2
P( x3 | x9 )   P( x3 | x2 , x5 , x9 )
3

3
3

P
(
x
|
x
,
x
,
x
)
3
2
5
9 

62
Gibbs: Error Bounds
Objectives:
 Estimate needed number of samples T
 Estimate error
Methodology:
 1 chain  use lag-k autocovariance


Estimate T
M chains  standard sampling variance

Estimate Error
63
Gibbs: lag-k autocovariance
Pi  P ( xi | x \ xi )
t
1
N
t
P  P ( xi | e)  t 1 P ( xi | x \ xi )
T
1
N k
 (k )  t 1 ( Pi  P )( Pi  k  P )
T
Lag-k autocovariance
2 1
1

Var ( P )    (0)  2   (i ) 
T
i 1

64
Gibbs: lag-k autocovariance
Estimate Monte Carlo variance:
1
Var ( P ) 
T
2 1


  (0)  2   (i ) 
i 1


Here,  is the smallest positive integer satisfying:
 ( 2 )   ( 2  1) 1
Effective chain size:
 (0)
ˆ
T 
Var ( P )
ˆ T
In absense of autocovariance: T
65
Gibbs: Multiple Chains


Generate M chains of size K
Each chain produces independent estimate Pm:
1
Pm  P ( xi | e) 
K

K
t
P
(
x
|
x
\
x
)
i
i
t 1
Estimate P(xi|e) as average of Pm (xi|e) :
1
P 
M

M
i 1
Pm
Treat Pm as independent random variables.
66
Gibbs: Multiple Chains
{ Pm } are independent random variables
Therefore:
2
1
M


Var ( P )  S 
P

P

m
m 1
M 1
1 M
2
2

Pm  MP 


M  1  m 1

2
  t / 2, M 1
S
M
67
Geman&Geman1984

Geman, S. & Geman D., 1984. Stocahstic
relaxation, Gibbs distributions, and the Bayesian
restoration of images. IEEE
Trans.Pat.Anal.Mach.Intel. 6, 721-41.


Introduce Gibbs sampling;
Place the idea of Gibbs sampling in a general setting
in which the collection of variables is structured in a
graphical model and each variable has a
neighborhood corresponding to a local region of the
graphical structure. Geman and Geman use the Gibbs
distribution to define the joint distribution on this
structured set of variables.
68
Tanner&Wong 1987

Tanner and Wong (1987)


Data-augmentation
Convergence Results
69
Pearl1988

Pearl,1988. Probabilistic Reasoning in
Intelligent Systems, Morgan-Kaufmann.

In the case of Bayesian networks, the
neighborhoods correspond to the Markov
blanket of a variable and the joint
distribution is defined by the factorization
of the network.
70
Gelfand&Smith,1990

Gelfand, A.E. and Smith, A.F.M., 1990.
Sampling-based approaches to
calculating marginal densities. J.
Am.Statist. Assoc. 85, 398-409.

Show variance reduction in using mixture
estimator for posterior marginals.
71
Neal, 1992

R. M. Neal, 1992. Connectionist
learning of belief networks, Artifical
Intelligence, v. 56, pp. 71-118.

Stochastic simulation in Noisy-Or
networks.
72
CPCS54 Test Results
CPCS54, n=54, |C|=15, |E|=3
CPCS54, n=54, |C|=15, |E|=3
Cutset
Cutset
Gibbs
Gibbs
0.004
0.0008
0.003
0.0006
0.002
0.0004
0.001
0.0002
0
0
0
1000
2000
3000
4000
5000
0
5
# samples
10
15
20
25
Tim e(sec)
MSE vs. #samples (left) and time (right)
Ergodic, |X| = 54, D(Xi) = 2, |C| = 15, |E| = 4
Exact Time = 30 sec using Cutset Conditioning
73
CPCS179 Test Results
CPCS179, n=179, |C|=8, |E|=35
Cutset
CPCS179, n=179, |C|=8, |E|=35
Gibbs
Cutset
Gibbs
0.012
0.012
0.01
0.01
0.008
0.008
0.006
0.006
0.004
0.004
0.002
0.002
0
0
100
500
1000
2000
# sam ples
3000
4000
0
20
40
60
80
Tim e(sec)
MSE vs. #samples (left) and time (right)
Non-Ergodic (1 deterministic CPT entry)
|X| = 179, |C| = 8, 2<= D(Xi)<=4, |E| = 35
Exact Time = 122 sec using Loop-Cutset Conditioning
74
CPCS360b Test Results
CPCS360b, n=360, |C|=21, |E|=36
CPCS360b, n=360, |C|=21, |E|=36
Cutset
Cutset
Gibbs
0.00016
0.00016
0.00012
0.00012
0.00008
0.00008
0.00004
0.00004
0
Gibbs
0
0
200
400
600
800
1000
1
2
3
5
# samples
10
20
30
40
50
60
Tim e(sec)
MSE vs. #samples (left) and time (right)
Ergodic, |X| = 360, D(Xi)=2, |C| = 21, |E| = 36
Exact Time > 60 min using Cutset Conditioning
Exact Values obtained via Bucket Elimination
75
Random Networks
RANDOM, n=100, |C|=13, |E|=15-20
RANDOM, n=100, |C|=13, |E|=15-20
Cutset
Gibbs
Cutset
Gibbs
0.001
0.0035
0.0008
0.003
0.0025
0.0006
0.002
0.0004
0.0015
0.001
0.0002
0.0005
0
0
0
200
400
600
800
1000
1200
0
1
2
3
# samples
4
5
6
7
8
9
10
11
Time(sec)
MSE vs. #samples (left) and time (right)
|X| = 100, D(Xi) =2,|C| = 13, |E| = 15-20
Exact Time = 30 sec using Cutset Conditioning
76
Coding Networks
x1
x2
x3
x4
Coding Networks, n=100, |C|=12-14
u1
u2
u3
u4
p1
p2
p3
p4
y1
y2
y3
y4
IBP
Gibbs
Cutset
0.1
0.01
0.001
0
10
20
30
40
50
60
Tim e(sec)
MSE vs. time (right)
Non-Ergodic, |X| = 100, D(Xi)=2, |C| = 13-16, |E| = 50
Sample Ergodic Subspace U={U1, U2,…Uk}
Exact Time = 50 sec using Cutset Conditioning
77
Non-Ergodic Hailfinder
HailFinder, n=56, |C|=5, |E|=1
HailFinder, n=56, |C|=5, |E|=1
Cutset
Cutset
Gibbs
Gibbs
1
0.1
0.1
0.01
0.01
0.001
0.001
0.0001
0.0001
0
500
1000
1500
1
2
# samples
3
4
5
6
7
8
9
10
Tim e(sec)
MSE vs. #samples (left) and time (right)
Non-Ergodic, |X| = 56, |C| = 5, 2 <=D(Xi) <=11, |E| = 0
Exact Time = 2 sec using Loop-Cutset Conditioning
78
Non-Ergodic CPCS360b - MSE
cpcs360b, N=360, |E|=[20-34], w*=20, MSE
Gibbs
0.000025
IBP
|C|=26,fw=3
0.00002
|C|=48,fw=2
0.000015
0.00001
0.000005
0
0
200
400
600
800
1000
1200
1400
1600
Time (sec)
MSE vs. Time
Non-Ergodic, |X| = 360, |C| = 26, D(Xi)=2
Exact Time = 50 min using BTE
79
Non-Ergodic CPCS360b - MaxErr
cpcs360b, N=360, |E|=[20-34], MaxErr
Gibbs
0.007
IBP
0.006
|C|=26,fw=3
0.005
|C|=48,fw=2
0.004
0.003
0.002
0.001
0
0
200
400
600
800
1000
1200
1400
1600
Time (sec)
80
Importance vs. Gibbs
~
Gibbs : x  p ( x | e)
T 
~
p ( x | e) 
 p ( x | e)
t
T
1
t
f ( x)   f ( x )
T t 1
Importance : x  q ( x | e)
t
T
t
t
1
f ( x ) p( x )
f  
t
T t 1
q( x )
wt
81