Transcript class09

Bayesian Networks
Lecture 9
.
Edited from Nir Friedman’s slides by Dan Geiger from Nir Friedman’s slides.
Bayesian Network
V
p(t|v)
p(v)
S
p(s)
p(l|s)
L
T
p(b|s)
p(a|t,l)
B
A
p(x|a)
X
p(d|a,b)
D
P(v, s, t , l , b, a, x, d ) 
P(v) P( s ) P(t | v) P(l | s ) P(b | s) P(a | t , l ) P( x | a) P(d | a, b)
Bayesian network = Directed Acyclic Graph (DAG),
annotated with conditional probability distributions.
2
Bayesian Network (cont.)
Each Directed Acyclic Graph defines a factorization of the form:
n
p( x1 ,, xn )   p( xi | pa i )
i 1
V
p(v)
p(t|v)
p(l|s)
T
L
S
p(s)
p(b|s)
p(a|t,l)
B
A
p(x|a)
X
p(d|a,b)
D
P(v, s, t , l , b, a, x, d ) 
P(v) P( s ) P(t | v) P(l | s ) P(b | s) P(a | t , l ) P( x | a) P(d | a, b)
3
Local distributions
Tuberculosis
Lung Cancer
(Yes/No)
(Yes/No)
p(A|T,L)
Abnormality
in Chest
(Yes/no)
Conditional Probability Table:
p(A=y|L=n, T=n) = 0.02
p(A=y|L=n, T=y) = 0.60
p(A=y|L=y, T=n) = 0.99
p(A=y|L=y, T=y) = 0.99
4
“Example”
Locus 1
Locus 2 (Disease)
Locus 3
Locus 4
This model depicts the qualitative relations between the variables.
We will now specify the joint distribution over these variables.
5
The “Visit-to-Asia” Example
Visit to
Asia
Tuberculosis
Smoking
Lung Cancer
Abnormality
in Chest
X-Ray
Bronchitis
Dyspnea
6
Queries
There are many types of queries.
Most queries involve evidence
An evidence e is an assignment of values
to a set E of variables in the domain
P(Dyspnea = Yes | Visit_to_Asia = Yes, Smoking=Yes)
S
V
P(Smoking=Yes | Dyspnea = Yes )
L
T
B
A
X
D
7
Queries: A posteriori belief
The conditional probability of a variable given the
evidence
P ( x, e)
P ( x | e) 
P ( e)
This is the a posteriori belief in x, given evidence e
Often we compute the term P(x, e) from which we
can recover the a posteriori belief by
P ( x, e )
P( x | e ) 
 P( x, e )
x
Examples given in previous slide.
8
S
V
A posteriori belief
L
T
B
A
This query is useful in many other cases:
X
D
Prediction: what is the probability of an outcome given the
starting condition
 Target is a descendent of the evidence (e.g., Does a
visit to Asia lead to Tuberculosis ?)
 Diagnosis: what is the probability of disease/fault given
symptoms
 Target is an ancestor of the evidence (e.g., Does the Xray results indicate higher probability of Tuberculosis ?)

9
Example: Predictive+Diagnostic
Probabilistic inference can combine evidence form
all parts of the network, Diagnostic and Predictive,
regardless of the directions of edges in the model.
P(T = Yes | Visit_to_Asia = Yes, Dyspnea = Yes )
S
V
L
T
B
A
X
D
10
Queries: MAP
 Find
the maximum a posteriori assignment for
some variable of interest (say H1,…,Hl )
 That is, h1,…,hl maximize the conditional probability
P(h1,…,hl | e)
 Equivalent to maximizing the joint P(h1,…,hl, e)
11
Queries: MAP
D1
D2
D3
D4
We can use MAP for:
S2
S1
S4
S3
 Explanation
 What is the most likely joint event, given the
evidence (e.g., a set of likely diseases given the
symptoms)
 What is the most likely scenario, given the
evidence (e.g., a series of likely malfunctions
that trigger a fault).
Bad
alternator
Bad magneto
Not charging
Bad battery
Dead
battery
12
Complexity of Inference
Thm:
Computing P(X = x) in a Bayesian network is NPhard
Not surprising, since we can simulate Boolean gates.
13
Proof
We reduce 3-SAT to Bayesian network computation
Assume we are given a 3-SAT problem:
 Q1,…,Qn be propositions,
 1 ,... ,k be clauses, such that i = li1 li2  li3
where each lij is a literal over Q1,…,Qn
(e.g., Q1 = true )
  = 1... k
We will construct a Bayesian network s.t.
P(X=t) > 0 iff  is satisfiable
14
Q1
Q2
1
Q3
2
3
A1
A2
...
Q4
... 
...
k-1
Ak-2
Qn
k
X
 P(Qi
= true) = 0.5,
 P(I = true | Qi , Qj , Ql ) = 1 iff Qi , Qj , Ql
satisfy the clause I
 A1, A2, …, are simple binary AND gates
15
 It
is easy to check



Polynomial number of variables
Each Conditional Probability Table can be described by a
small table (8 parameters at most)
P(X = true) > 0 if and only if there exists a satisfying
assignment to Q1,…,Qn
 Conclusion:
polynomial reduction of 3-SAT
16
Inference is even #P-hard
 P(X
= t) is the fraction of satisfying assignments to

 Hence 2nP(X = t) is the number of satisfying
assignments to 
if we know to compute P(X = t), we know to
count the number of satisfying assignments to.
 Consequently, computing P(X = t) is #P-hard.
 Thus,
17
Hardness - Notes
We used deterministic relations in our construction
 The same construction works if we use (1-, ) instead of
(1,0) in each gate for any  < 0.5
Homework: Prove it.
 Hardness does not mean we cannot solve inference
 It implies that we cannot find a general procedure that
works efficiently for all networks
 For particular families of networks, we can have provably
efficient procedures (e.g., trees, HMMs).
 Variable elimination algorithms.

18
Approximation
 Until
now, we examined exact computation
 In many applications, approximation are sufficient
 Example: P(X = x|e) = 0.3183098861838
 Maybe P(X = x|e)  0.3 is a good enough
approximation
 e.g., we take action only if P(X = x|e) > 0.5
 Can
we find good approximation algorithms?
19
Types of Approximations
Absolute error
 An estimate q of P(X = x | e) has absolute error ,
if
P(X = x|e) -   q  P(X = x|e) + 
equivalently
q -   P(X = x|e)  q + 

Absolute error is not always what we want:


If P(X = x | e) = 0.0001, then an absolute error of 0.001 is
unacceptable
If P(X = x | e) = 0.3, then an absolute error of 0.001 is
overly precise
1
q
2
0
20
Types of Approximations
Relative error
 An estimate q of P(X = x | e) has relative
error , if
P(X = x|e)(1 - )  q  P(X = x|e)(1 + )
equivalently
q/(1-)
q/(1 + )  P(X = x|e)  q/(1 - )
1
q
q/(1+)
 Sensitivity
of approximation depends on
actual value of desired result
0
21
Complexity
 Exact
inference is hard.
 Is approximate inference any easier?
22
Complexity: Relative Error
 Suppose
that q is a relative error estimate of
P(X = t),
 If  is not satisfiable, then P(X = t)=0 . Hence,
0 = P(X = t)(1 - )  q  P(X = t)(1 + ) = 0
namely, q=0. Thus, if q > 0, then  is satisfiable
An immediate consequence:
Thm:
Given , finding an -relative error approximation is NPhard
23
Complexity: Absolute error
 We
can find absolute error approximations to
P(X = x) with high probability (via sampling).
 We will see such algorithms next class.
 However, once we have evidence, the problem is
harder
Thm
 If  < 0.5, then finding an estimate of P(X=x|e) with
 absulote error approximation is NP-Hard
24
Proof
 Recall
our construction
Q1
Q2
1
Q3
2
A1
3
Q4
... 
... ...
A2
...
Qn
k
k-1
X
25
Proof (cont.)
we can estimate with  absolute error
 Let p1  P(Q1 = t | X = t)
 Assign q1 = t if p1 > 0.5, else q1 = f
 Suppose
Let p2  P(Q2 = t | X = t, Q1 = q1 )
Assign q2 = t if p2 > 0.5, else q2 = f
…
Let pn  P(Qn = t | X = t, Q1 = q1, …, Qn-1 = qn-1 )
Assign qn = t if pn > 0.5, else qn = f
26
Proof (cont.)
Claim: if  is satisfiable, then q1 ,…, qn is a satisfying
assignment
 Suppose  is satisfiable
 By induction on i there is a satisfying assignment with
Q1 = q1, …, Qi = qi
Base case:
If Q1 = t in all satisfying assignments,
 P(Q1 = t | X = t) = 1
 p1  1 -  > 0.5
 q1 = t
If Q1 = f, in all satisfying assignments, then q1 = f
Otherwise, the statement holds for any choice of q1
27
Proof (cont.)
Claim: if  is satisfiable, then q1 ,…, qn is a satisfying
assignment
 Suppose  is satisfiable
 By induction on i there is a satisfying assignment with
Q1 = q1, …, Qi = qi
Induction argument:
If Qi+1 = t in all satisfying assignments s.t.Q1 = q1, …, Qi = qi
 P(Qi+1 = t | X = t, Q1 = q1, …, Qi = qi ) = 1
 pi+1  1 -  > 0.5
 qi+1 = t
If Qi+1 = f in all satisfying assignments s.t.Q1 = q1, …, Qi = qi
then qi+1 = f
28
Proof (cont.)
can efficiently check whether q1 ,…, qn is a
satisfying assignment (linear time)
 If it is, then  is satisfiable
 If it is not, then  is not satisfiable
 We
 Suppose
we have an approximation procedure with
 relative error
  we can determine 3-SAT with n procedure calls
  approximation is NP-hard
29