P - Oregon State University
Download
Report
Transcript P - Oregon State University
Oregon State University – CS430 Intro to AI
Probabilistic Representation and
Reasoning
Given a set of
facts/beliefs/rules/evidence
Evaluate a given statement
Determine the probability of a statement
Find a statement that optimizes a set of
constraints
Most probable explanation (MPE) (Setting of
hidden variables that best explains observations.)
(c) 2003 Thomas G. Dietterich
1
Oregon State University – CS430 Intro to AI
Probability Theory
Random Variables
Boolean: W1,2 (just like propositional logic). Two
possible values {true, false}
Discrete: Weather 2 {sunny, cloudy, rainy, snow}
Continuous: Temperature 2 <
Propositions
W1,2 = true, Weather = sunny, Temperature = 65
These can be combined as in propositional logic
(c) 2003 Thomas G. Dietterich
2
Oregon State University – CS430 Intro to AI
Example
Consider a car described by 3 random
variables:
Gas 2 {true, false}: There is gas in the tank
Meter 2 {empty,full}: The gas gauge shows
the tank is empty or full
Starts 2 {yes,no}: The car starts when you
turn the key in the ignition
(c) 2003 Thomas G. Dietterich
3
Oregon State University – CS430 Intro to AI
Joint Probability Distribution
Each row is called a
“primitive event”
Rows are mutually
exclusive and
exhaustive
Corresponds to an “8sided coin” with the
indicated probabilities
Gas
Meter
Starts
P(G,M,S)
false
empty
no
0.1386
false
empty
yes
0.0014
false
full
no
0.0594
false
full
yes
0.0006
true
empty
no
0.0240
true
empty
yes
0.0560
true
full
no
0.2160
true
full
yes
0.5040
(c) 2003 Thomas G. Dietterich
4
Oregon State University – CS430 Intro to AI
Any Query Can Be Answered from
the Joint Distribution
Gas
Meter
Starts
P(G,M,S)
false
empty
no
0.1386
false
empty
yes
0.0014
P(Gas = false) = 0.2,
this is the sum of all
cells where Gas =
false
false
full
no
0.0594
false
full
yes
0.0006
true
empty
no
0.0240
In general: To
compute P(Q), for any
proposition Q, add up
the probability in all
cells where Q is true
true
empty
yes
0.0560
true
full
no
0.2160
true
full
yes
0.5040
P(Gas = false Æ
Meter = full Æ Starts
= yes) = 0.0006
(c) 2003 Thomas G. Dietterich
5
Oregon State University – CS430 Intro to AI
Notations
P(G,M,S) denotes the entire joint
distribution (In the book, P is boldface).
It is a table or function that maps from G,
M, and S to a probability.
P(true,empty,no) denotes a single
probability value:
P(Gas=true Æ Meter=empty Æ
Starts=no)
(c) 2003 Thomas G. Dietterich
6
Oregon State University – CS430 Intro to AI
Operations on Probability Tables
(1)
Marginalization
(“summing away”)
M,S P(G,M,S) = P(G)
P(G) is called a
“marginal probability”
distribution. It consists
of two probabilities:
(c) 2003 Thomas G. Dietterich
Gas
false
true
P(G)
P(false,empty,yes) +
P(false,empty,no) +
P(false,full,yes) +
P(false,full,no)
P(true,empty,yes) +
P(true,empty,no) +
P(true,full,yes) +
P(true,full,no)
7
Oregon State University – CS430 Intro to AI
Conditional Probability
Suppose we observe that M=full. What
is the probability that the car will start?
P(S=yes | M=full)
Definition: P(A|B) = P(A Æ B) / P(B)
(c) 2003 Thomas G. Dietterich
8
Oregon State University – CS430 Intro to AI
Conditional Probability
Select cells that
match the condition
(M=full)
Delete remaining cells
and M column
Renormalize the table
to obtain
P(S,G|M=full)
Sum away Gas:
G P(S,G | M=full) =
P(S|M=full)
Gas
Meter
Starts
P(G,M,S)
false
Gas
false
empty
Meter
Gas
empty
no
0.1386
Starts
P(G,M,S)
Starts
yes P(G,S|M=full)
0.0014
false
false
Starts
false
full
no
false
full
true
true
yes
true
empty
full
true
full
empty
P(S|M=full)
nono
0.0762
0.0594
0.0594
0.3531
yes
yes
0.0006
0.0008
0.0006
0.6469
nono
0.2160
0.2769
0.2160
0.0240
true
true
full
full
yes
yes
no
0.6462
0.5040
0.0560
0.5040
0.2160
yes
0.5040
Read answer from
P(S=yes | M=full) cell
(c) 2003 Thomas G. Dietterich
9
Oregon State University – CS430 Intro to AI
Operations on Probability Tables (2):
Conditionalizing
Construct P(G,S | M)
by normalizing the
subtable
corresponding to
M=full and
normalizing the
subtable
corresponding to
M=empty
Gas
Meter
Starts
P(G,M,S)
false
empty
no
0.1386
0.6300
false
empty
yes
0.0064
0.0014
false
full
no
0.0762
0.0594
false
full
yes
0.0008
0.0006
true
empty
no
0.1091
0.0240
true
empty
yes
0.2545
0.0560
true
full
no
0.2769
0.2160
true
full
yes
0.6462
0.5040
(c) 2003 Thomas G. Dietterich
10
Oregon State University – CS430 Intro to AI
Chain Rule of Probability
P(A,B,C) = P(A|B,C) ¢ P(B|C) ¢ P(C)
Proof:
(c) 2003 Thomas G. Dietterich
11
Oregon State University – CS430 Intro to AI
Chain Rule (2)
Holds for distributions too:
P(A,B,C) = P(A | B,C) ¢ P(B | C) ¢ P(C)
This means that for each setting of A,B,
and C, we can substitute into the
equation, and it is true.
(c) 2003 Thomas G. Dietterich
12
Oregon State University – CS430 Intro to AI
Belief Networks (1):
Independence
Defn: Two random variables X and Y are independent
iff
P(X,Y) = P(X) ¢ P(Y)
Example:
X is a coin with P(X=heads) = 0.4
Y is a coin with P(Y=heads) = 0.8
Joint distribution:
P(X,Y)
X=heads
X=tails
Y=heads
0.32
0.48
Y=tails
0.08
0.12
(c) 2003 Thomas G. Dietterich
13
Oregon State University – CS430 Intro to AI
Belief Networks (2)
Conditional Independence
Defn: Two random variables X and Y are conditionally
independent given Z iff
P(X,Y | Z) = P(X|Z) ¢ P(Y|Z)
Example:
P(S,M | G) = P(S | G) ¢ P(M | G)
Intuition: G independently causes S and M
S
M
G=no
G=yes
no
empty
0.693
0.030
S
G=no
G=yes
M
G=no
G=yes
no
full
0.297
0.270
no
0.99
0.30
empty
0.70
0.10
yes
empy
0.007
0.070
yes
0.01
0.70
full
0.30
0.90
yes
full
0.003
0.630
(c) 2003 Thomas G. Dietterich
14
Oregon State University – CS430 Intro to AI
Operations on Probability Tables (3):
Conformal Product
Allocate space for resulting table and
then fill in each cell with the product of
the corresponding cells:
P(S,M | G) = P(S | G) ¢ P(M | G)
S
no
no
yes
yes
M
G=no
G=yes
empty a00¢b00 a01¢b01
full
a00¢b10 a01¢b11
empy a10¢b00 a11¢b01
full
=
S
G=no
G=yes
M
G=no
G=yes
no
a00
a01
empty
b00
b01
yes
a10
a11
full
b10
b11
¢
a10¢b10 a11¢b11
(c) 2003 Thomas G. Dietterich
15
Oregon State University – CS430 Intro to AI
Properties of Conformal Products
Commutative
Associative
Work on normalized or unnormalized
tables
Work on joint or conditional tables
(c) 2003 Thomas G. Dietterich
16
Oregon State University – CS430 Intro to AI
Conditional Independence Allows
Us to Simplify the Joint Distribution
P(G,M,S) = P(M,S | G) ¢ P(G) [chain rule]
= P(M | G) ¢ P(S | G) ¢ P(G) [CI]
(c) 2003 Thomas G. Dietterich
17
Bayesian Networks
Gas
Meter
Starts
One node for each random variable
Each node stores a probability
distribution P(node | parents(node))
Only direct dependencies are shown
Joint distribution is conformal product of
node distributions:
P(G,M,S) = P(G) ¢ P(M | G) ¢ P(S | G)
Oregon State University – CS430 Intro to AI
Inference in Bayesian Networks
Suppose we observe that M=full. What is the
probability that the car will start?
P(S=yes | M=full)
Before, we handled this by the following steps:
Remove all rows corresponding to M=empty
Normalize remaining rows to get P(S,G|M=full)
Sum over G: G P(S,G|M=full) = P(S | M=full)
Read answer from the S=yes entry in the table
We want to get the same result, but without
constructing the joint distribution first.
(c) 2003 Thomas G. Dietterich
19
Oregon State University – CS430 Intro to AI
Inference in Bayesian Networks (2)
Remove all rows corresponding to M=empty
from all nodes
P(G) – unchanged
P(M | G) becomes P[G]
P(S | G) – unchanged
Sum over G: G P(G) ¢ P[G] ¢ P(S | G)
Normalize to get P(S|M=full)
Read answer from the S=yes entry in the table
(c) 2003 Thomas G. Dietterich
20
Oregon State University – CS430 Intro to AI
Inference with Tables
G
G
P(G)
P(S|G) G=false G=true
false
0.20
no
0.99
0.30
empty
0.70
0.10
true
0.80
yes
0.01
0.70
full
0.30
0.90
(c) 2003 Thomas G. Dietterich
P(M|G) G=no
G=yes
21
Oregon State University – CS430 Intro to AI
Inference with Tables
G
G
P(G)
P(S|G) G=false G=true
false
0.20
no
0.99
0.30
true
0.80
yes
0.01
0.70
G=false G=true
0.30
0.90
Step 1: Delete M=empty rows from all tables
(c) 2003 Thomas G. Dietterich
22
Oregon State University – CS430 Intro to AI
Inference with Tables
G
G
P(G)
P(S|G) G=false G=true
false
0.20
no
0.99
0.30
true
0.80
yes
0.01
0.70
G=false G=true
0.30
0.90
Step 1: Delete M=empty rows from all tables
Step 2: Perform algebra to push summation inwards
(no-op in this case)
(c) 2003 Thomas G. Dietterich
23
Oregon State University – CS430 Intro to AI
Inference with Tables
G
S
G=false G=true
no
0.0594 0.2160
yes
0.0006 0.5040
Step 1: Delete M=empty rows from all tables
Step 2: Perform algebra to push summation inwards
(no-op in this case)
Step 3: Form conformal product
(c) 2003 Thomas G. Dietterich
24
Oregon State University – CS430 Intro to AI
Inference with Tables
S
no
0.2754
yes
0.5046
Step 1: Delete M=empty rows from all tables
Step 2: Perform algebra to push summation inwards
(no-op in this case)
Step 3: Form conformal product
Step 4: Sum away G
(c) 2003 Thomas G. Dietterich
25
Oregon State University – CS430 Intro to AI
Inference with Tables
S
no
0.3531
yes
0.6469
Step 1: Delete M=empty rows from all tables
Step 2: Perform algebra to push summation inwards
(no-op in this case)
Step 3: Form conformal product
Step 4: Sum away G
Step 5: Normalize
(c) 2003 Thomas G. Dietterich
26
Oregon State University – CS430 Intro to AI
Inference with Tables
S
no
0.3531
yes
0.6469
Step 1: Delete M=empty rows from all tables
Step 2: Perform algebra to push summation inwards
(no-op in this case)
Step 3: Form conformal product
Step 4: Sum away G
Step 5: Normalize
Step 6: Read answer from table: 0.6469
(c) 2003 Thomas G. Dietterich
27
Oregon State University – CS430 Intro to AI
Notes
We never created the joint distribution
Deleting the M=empty rows from the
individual table followed by conformal
product has the same effect as
performing the conformal product first
and then deleting the M=empty rows
Normalization can be postponed to the
end
(c) 2003 Thomas G. Dietterich
28
Oregon State University – CS430 Intro to AI
Another Example: “Asia”
(all variables Boolean)
Cold
Allergy
Sneeze
Cat
Scratch
Suppose we observe Sneeze
What is P(Cold | Sneeze) = P(Co|S)?
(c) 2003 Thomas G. Dietterich
29
Oregon State University – CS430 Intro to AI
Answering the query
Joint distribution:
A,Ca,Sc P(Co) ¢ P(A) ¢ P(Sn | Co,A) ¢ P(Ca) ¢ P(A | Ca) ¢ P(Sc | Ca)
Apply evidence: sn (Sneeze = true)
A,Ca,Sc P(Co) ¢ P(A) ¢ P[Co,A]¢ P(Ca) ¢ P(A | Ca) ¢ P(Sc | Ca)
Push summations in as far as possible:
P(Co) ¢ A P(A) ¢ P[Co,A] ¢ Ca P(A | Ca) ¢ P(Ca) ¢ Sc P(Sc | Ca)
Evaluate:
P(Co) ¢ A P(A) ¢ P[Co,A] ¢ Ca P(A | Ca) ¢ P(Ca) ¢ P[Ca]
P(Co) ¢ A P(A) ¢ P[Co,A] ¢ P[A]
P(Co) ¢ P[Co]
P[Co]
Normalize and extract answer
(c) 2003 Thomas G. Dietterich
30
Oregon State University – CS430 Intro to AI
Pruning Leaves
Leaf nodes not involved in the evidence
or the query can be pruned.
Example: Scratch
Cold
Allergy
Cat
Query
Sneeze
Scratch
Evidence
(c) 2003 Thomas G. Dietterich
31
Oregon State University – CS430 Intro to AI
Greedy algorithm for choosing the
elimination order
nodes = set of tables (after evidence)
V = variables to sum over
while |nodes| > 1 do
Generate all pairs of tables in nodes that share at least
one variable
Compute size of table that would result from conformal
product of each pair (summing over as many variables
in V as possible)
Let (T1,T2) be the pair with smallest resulting size
Delete T1 and T2 from nodes
Add conformal product V T1¢T2 to nodes
end
(c) 2003 Thomas G. Dietterich
32
Oregon State University – CS430 Intro to AI
Example of Greedy Algorithm
Given tables P(Co), P[Co,A], P(A|Ca), P(Ca)
Variables to sum: A, Ca
candidate pair
size of product
sum over
size of result
P(Co) ¢ P[Co,A]
2
nil
2
P[Co,A] ¢ P(A|Ca)
3
A
2
P(A|Ca) ¢ P(Ca)
2
Ca
1
Choose: P[A] = Ca P(A|Ca) ¢ P(Ca)
(c) 2003 Thomas G. Dietterich
33
Oregon State University – CS430 Intro to AI
Example of Greedy Algorithm (2)
Given tables P(Co), P[Co,A], P[A]
Variables to sum: A
candidate pair
size of product
sum over
size of result
P(Co) ¢ P[Co,A]
2
nil
2
P[Co,A] ¢ P[A]
2
A
2
Choose: P[Co] = A P[Co,A] ¢ P[A]
(c) 2003 Thomas G. Dietterich
34
Oregon State University – CS430 Intro to AI
Example of Greedy Algorithm (3)
Given tables P(Co), P[Co]
Variables to sum: none
candidate pair
size of product
sum over
size of result
P(Co) ¢ P[Co]
1
nil
1
Choose: P2[Co] = P(Co) ¢ P[Co]
Normalize and extract answer
(c) 2003 Thomas G. Dietterich
35
Oregon State University – CS430 Intro to AI
Bayesian Network For WUMPUS
P(P1,1,P1,2, …, P4,4, B1,1, B1,2, …, B4,4)
P1,1
B1,1
B1,2
P1,2
B2,1
P2,1
B1,3
P2,2
B2,2
P1,3
B3,1
P3,1
B1,4
(c) 2003 Thomas G. Dietterich
P1,4
B2,3
1,4
2,4
3,4
4,4
1,3
2,3
3,3
4,3
1,2
2,2
3,2
4,2
1,1
2,1
3,1
4,1
…
B4,1
B2,4
…
36
Oregon State University – CS430 Intro to AI
Probabilistic Inference in
WUMPUS
Suppose we have observed
No breeze in 1,1
Breeze in 1,2 and 2,1
No pit in 1,1, 1,2, and 1,3
1,3
????
1,2
B
OK
2,2
1,1
2,1
B
OK
OK
3,1
What is the probability of a pit in 1,3?
P(P1,3|:B1,1,B1,2,B2,1, :P1,1,:P1,2,:P2,1)
(c) 2003 Thomas G. Dietterich
37
Oregon State University – CS430 Intro to AI
What is
P(P1,3|:B1,1,B1,2,B2,1, :P1,1,:P1,2,:P2,1)?
B1,1
false
false
false
false
P1,1
P1,2
P2,1
B1,2
true
B2,1
B1,3
query
P2,2
B2,2
P1,3
B3,1
P3,1
B1,4
P1,4
B2,3
…
B4,1
B2,4
…
true
(c) 2003 Thomas G. Dietterich
38
Oregon State University – CS430 Intro to AI
Prune Leaves Not Involved in Query or
Evidence
B1,1
false
false
false
false
P1,1
P1,2
P2,1
B1,2
true
B2,1
B1,3
query
P2,2
B2,2
P1,3
B3,1
P3,1
B1,4
P1,4
B2,3
…
B4,1
B2,4
…
true
(c) 2003 Thomas G. Dietterich
39
Oregon State University – CS430 Intro to AI
Prune Independent Nodes
B1,1
false
false
false
false
P1,1
P1,2
P2,1
B1,2
true
query
P2,2
P1,3
P3,1
P1,4
…
B2,1
true
(c) 2003 Thomas G. Dietterich
40
Oregon State University – CS430 Intro to AI
Solve Remaining Network
B1,1
false
false
false
false
P1,1
P1,2
P2,1
B1,2
true
B2,1
true
query
P2,2
P1,3
P3,1
P2,2,P3,1 P(B1,1|P1,1,P1,2,P2,1) ¢
P(B1,2|P1,1,P1,2,P1,3) ¢
P(B2,1|P1,1,P2,1,P2,2,P3,1) ¢ P(P1,1) ¢
P(P1,2) ¢ P(P2,1) ¢ P(P2,2) ¢ P(P1,3) ¢
P(P3,1)
(c) 2003 Thomas G. Dietterich
41
Oregon State University – CS430 Intro to AI
Performing the Inference
NORM{ P2,2,P3,1 P(B1,1|P1,1,P1,2,P2,1) ¢ P(B1,2|P1,1,P1,2,P1,3)
¢ P(B2,1|P1,1,P2,1,P2,2,P3,1) ¢ P(P1,1) ¢ P(P1,2) ¢ P(P2,1) ¢
P(P2,2) ¢ P(P1,3) ¢ P(P3,1) }
NORM{ P2,2,P3,1 P[P1,3] ¢ P[P2,2,P3,1] ¢ P(P2,2) ¢ P(P1,3) ¢
P(P3,1) }
NORM{ P[P1,3] ¢ P(P1,3) ¢ P2,2 P(P2,2) ¢ P3,1 P[P2,2,P3,1] ¢
P(P3,1) }
P(P1,3) = h0.69, 0.31i
31% chance of WUMPUS!
We have reduced the inference to a simple computation over
2x2 tables.
(c) 2003 Thomas G. Dietterich
42
Oregon State University – CS430 Intro to AI
Summary
The Joint Distribution is analogous to the truth
table for propositional logic. It exponentially
large, but any query can be answered using it
Conditional independence allows us to factor
the joint distribution using conformal products
Conditional independence relationships are
conveniently visualized and encoded in a belief
network DAG
Given evidence, we can reason efficiently by
algebraic manipulation of the factored
representation
(c) 2003 Thomas G. Dietterich
43