Transcript class09
Bayesian Networks
Lecture 9
.
Edited from Nir Friedman’s slides by Dan Geiger from Nir Friedman’s slides.
Bayesian Network
V
p(t|v)
p(v)
S
p(s)
p(l|s)
L
T
p(b|s)
p(a|t,l)
B
A
p(x|a)
X
p(d|a,b)
D
P(v, s, t , l , b, a, x, d )
P(v) P( s ) P(t | v) P(l | s ) P(b | s) P(a | t , l ) P( x | a) P(d | a, b)
Bayesian network = Directed Acyclic Graph (DAG),
annotated with conditional probability distributions.
2
Bayesian Network (cont.)
Each Directed Acyclic Graph defines a factorization of the form:
n
p( x1 ,, xn ) p( xi | pa i )
i 1
V
p(v)
p(t|v)
p(l|s)
T
L
S
p(s)
p(b|s)
p(a|t,l)
B
A
p(x|a)
X
p(d|a,b)
D
P(v, s, t , l , b, a, x, d )
P(v) P( s ) P(t | v) P(l | s ) P(b | s) P(a | t , l ) P( x | a) P(d | a, b)
3
Local distributions
Tuberculosis
Lung Cancer
(Yes/No)
(Yes/No)
p(A|T,L)
Abnormality
in Chest
(Yes/no)
Conditional Probability Table:
p(A=y|L=n, T=n) = 0.02
p(A=y|L=n, T=y) = 0.60
p(A=y|L=y, T=n) = 0.99
p(A=y|L=y, T=y) = 0.99
4
“Example”
Locus 1
Locus 2 (Disease)
Locus 3
Locus 4
This model depicts the qualitative relations between the variables.
We will now specify the joint distribution over these variables.
5
The “Visit-to-Asia” Example
Visit to
Asia
Tuberculosis
Smoking
Lung Cancer
Abnormality
in Chest
X-Ray
Bronchitis
Dyspnea
6
Queries
There are many types of queries.
Most queries involve evidence
An evidence e is an assignment of values
to a set E of variables in the domain
P(Dyspnea = Yes | Visit_to_Asia = Yes, Smoking=Yes)
S
V
P(Smoking=Yes | Dyspnea = Yes )
L
T
B
A
X
D
7
Queries: A posteriori belief
The conditional probability of a variable given the
evidence
P ( x, e)
P ( x | e)
P ( e)
This is the a posteriori belief in x, given evidence e
Often we compute the term P(x, e) from which we
can recover the a posteriori belief by
P ( x, e )
P( x | e )
P( x, e )
x
Examples given in previous slide.
8
S
V
A posteriori belief
L
T
B
A
This query is useful in many other cases:
X
D
Prediction: what is the probability of an outcome given the
starting condition
Target is a descendent of the evidence (e.g., Does a
visit to Asia lead to Tuberculosis ?)
Diagnosis: what is the probability of disease/fault given
symptoms
Target is an ancestor of the evidence (e.g., Does the Xray results indicate higher probability of Tuberculosis ?)
9
Example: Predictive+Diagnostic
Probabilistic inference can combine evidence form
all parts of the network, Diagnostic and Predictive,
regardless of the directions of edges in the model.
P(T = Yes | Visit_to_Asia = Yes, Dyspnea = Yes )
S
V
L
T
B
A
X
D
10
Queries: MAP
Find
the maximum a posteriori assignment for
some variable of interest (say H1,…,Hl )
That is, h1,…,hl maximize the conditional probability
P(h1,…,hl | e)
Equivalent to maximizing the joint P(h1,…,hl, e)
11
Queries: MAP
D1
D2
D3
D4
We can use MAP for:
S2
S1
S4
S3
Explanation
What is the most likely joint event, given the
evidence (e.g., a set of likely diseases given the
symptoms)
What is the most likely scenario, given the
evidence (e.g., a series of likely malfunctions
that trigger a fault).
Bad
alternator
Bad magneto
Not charging
Bad battery
Dead
battery
12
Complexity of Inference
Thm:
Computing P(X = x) in a Bayesian network is NPhard
Not surprising, since we can simulate Boolean gates.
13
Proof
We reduce 3-SAT to Bayesian network computation
Assume we are given a 3-SAT problem:
Q1,…,Qn be propositions,
1 ,... ,k be clauses, such that i = li1 li2 li3
where each lij is a literal over Q1,…,Qn
(e.g., Q1 = true )
= 1... k
We will construct a Bayesian network s.t.
P(X=t) > 0 iff is satisfiable
14
Q1
Q2
1
Q3
2
3
A1
A2
...
Q4
...
...
k-1
Ak-2
Qn
k
X
P(Qi
= true) = 0.5,
P(I = true | Qi , Qj , Ql ) = 1 iff Qi , Qj , Ql
satisfy the clause I
A1, A2, …, are simple binary AND gates
15
It
is easy to check
Polynomial number of variables
Each Conditional Probability Table can be described by a
small table (8 parameters at most)
P(X = true) > 0 if and only if there exists a satisfying
assignment to Q1,…,Qn
Conclusion:
polynomial reduction of 3-SAT
16
Inference is even #P-hard
P(X
= t) is the fraction of satisfying assignments to
Hence 2nP(X = t) is the number of satisfying
assignments to
if we know to compute P(X = t), we know to
count the number of satisfying assignments to.
Consequently, computing P(X = t) is #P-hard.
Thus,
17
Hardness - Notes
We used deterministic relations in our construction
The same construction works if we use (1-, ) instead of
(1,0) in each gate for any < 0.5
Homework: Prove it.
Hardness does not mean we cannot solve inference
It implies that we cannot find a general procedure that
works efficiently for all networks
For particular families of networks, we can have provably
efficient procedures (e.g., trees, HMMs).
Variable elimination algorithms.
18
Approximation
Until
now, we examined exact computation
In many applications, approximation are sufficient
Example: P(X = x|e) = 0.3183098861838
Maybe P(X = x|e) 0.3 is a good enough
approximation
e.g., we take action only if P(X = x|e) > 0.5
Can
we find good approximation algorithms?
19
Types of Approximations
Absolute error
An estimate q of P(X = x | e) has absolute error ,
if
P(X = x|e) - q P(X = x|e) +
equivalently
q - P(X = x|e) q +
Absolute error is not always what we want:
If P(X = x | e) = 0.0001, then an absolute error of 0.001 is
unacceptable
If P(X = x | e) = 0.3, then an absolute error of 0.001 is
overly precise
1
q
2
0
20
Types of Approximations
Relative error
An estimate q of P(X = x | e) has relative
error , if
P(X = x|e)(1 - ) q P(X = x|e)(1 + )
equivalently
q/(1-)
q/(1 + ) P(X = x|e) q/(1 - )
1
q
q/(1+)
Sensitivity
of approximation depends on
actual value of desired result
0
21
Complexity
Exact
inference is hard.
Is approximate inference any easier?
22
Complexity: Relative Error
Suppose
that q is a relative error estimate of
P(X = t),
If is not satisfiable, then P(X = t)=0 . Hence,
0 = P(X = t)(1 - ) q P(X = t)(1 + ) = 0
namely, q=0. Thus, if q > 0, then is satisfiable
An immediate consequence:
Thm:
Given , finding an -relative error approximation is NPhard
23
Complexity: Absolute error
We
can find absolute error approximations to
P(X = x) with high probability (via sampling).
We will see such algorithms next class.
However, once we have evidence, the problem is
harder
Thm
If < 0.5, then finding an estimate of P(X=x|e) with
absulote error approximation is NP-Hard
24
Proof
Recall
our construction
Q1
Q2
1
Q3
2
A1
3
Q4
...
... ...
A2
...
Qn
k
k-1
X
25
Proof (cont.)
we can estimate with absolute error
Let p1 P(Q1 = t | X = t)
Assign q1 = t if p1 > 0.5, else q1 = f
Suppose
Let p2 P(Q2 = t | X = t, Q1 = q1 )
Assign q2 = t if p2 > 0.5, else q2 = f
…
Let pn P(Qn = t | X = t, Q1 = q1, …, Qn-1 = qn-1 )
Assign qn = t if pn > 0.5, else qn = f
26
Proof (cont.)
Claim: if is satisfiable, then q1 ,…, qn is a satisfying
assignment
Suppose is satisfiable
By induction on i there is a satisfying assignment with
Q1 = q1, …, Qi = qi
Base case:
If Q1 = t in all satisfying assignments,
P(Q1 = t | X = t) = 1
p1 1 - > 0.5
q1 = t
If Q1 = f, in all satisfying assignments, then q1 = f
Otherwise, the statement holds for any choice of q1
27
Proof (cont.)
Claim: if is satisfiable, then q1 ,…, qn is a satisfying
assignment
Suppose is satisfiable
By induction on i there is a satisfying assignment with
Q1 = q1, …, Qi = qi
Induction argument:
If Qi+1 = t in all satisfying assignments s.t.Q1 = q1, …, Qi = qi
P(Qi+1 = t | X = t, Q1 = q1, …, Qi = qi ) = 1
pi+1 1 - > 0.5
qi+1 = t
If Qi+1 = f in all satisfying assignments s.t.Q1 = q1, …, Qi = qi
then qi+1 = f
28
Proof (cont.)
can efficiently check whether q1 ,…, qn is a
satisfying assignment (linear time)
If it is, then is satisfiable
If it is not, then is not satisfiable
We
Suppose
we have an approximation procedure with
relative error
we can determine 3-SAT with n procedure calls
approximation is NP-hard
29