Transcript Lecture24

CSC384: Lecture 24
Last time
• Variable elimination, decision trees
Today
• decision trees, decision networks
Readings:
• Today: 10.4 (decision trees, decision networks)
• Next week: wrap up
Announcements:
• none
CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart
1
Evaluating a Decision Tree
U(n3) = .9*5 + .1*2
s1
U(n4) = .8*3 + .2*4
a
b
U(s2) = max{U(n3), U(n4)}
n1
n2
.7
• decision a or b (whichever is max) .3
U(n1) = .3U(s2) + .7U(s3)
s2
s3
U(s1) = max{U(n1), U(n2)}
a
b
• decision: max of a, b
n3
n4
.9
.1
.8
.2
5
2
3
4
CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart
2
Decision Tree Policies
a
n1
.3
a
s2
.7
n4
a
b
n2
s3
b
n3
s1
s4
ba
b
Note that we don’t just compute values,
but policies for the tree
A policy assigns a decision to each
choice node in tree
CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart
3
Large State Spaces (Variables)
Tree size: grows exponentially with depth
Specification: suppose each state is an
assignment to variables then the number of
distinct states can be huge
To represent outcomes of actions or decisions,
we need to specify distributions
• Pr(s|d) : probability of outcome s given decision d
• Pr(s|a,s’): prob. of state s given that action a is
performed in state s’
But state space exponential in # of variables
• spelling out distributions explicitly is intractable
CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart
4
Large State Spaces (Variables)
Bayes nets can be used to represent actions
(action distributions)
• Pr(s|a,s’)
• If s, s’ are specified as assignments to a set of state
•
•
variables we have:
Pr(s|a,s’) = Pr(X1,X2,…,Xn|A,X’1,X’2,…,X’n)
this is just a joint distribution over variables,
conditioned on action/decision and previous state.
We have already seen that Bayes nets can represent
such distributions compactly.
CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart
5
Example Action using Dynamic BN
M – mail waiting C – Craig has coffee
T – lab tidy
R – robot has coffee
L – robot located in Craig’s office
Deliver Coffee action
Mt
Mt+1
Tt
Tt+1
Lt
Lt+1
Ct
Ct+1
Rt
Rt+1
T
T
F
T(t+1) T(t+1)
1.0 0.0
0.0 1.0
L
R C
C(t+1) C(t+1)
T
F
T
F
T
F
T
F
T
T
F
F
T
T
F
F
1.0
1.0
1.0
1.0
0.8
0.0
0.0
0.0
T
T
T
T
F
F
F
F
0.0
0.0
0.0
0.0
0.2
1.0
1.0
1.0
CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart
fJ(Tt,Tt+1)
fR(Lt,Rt,Ct,Ct+1)
6
Dynamic BN Action Representation
Dynamic Bayesian networks (DBNs):
• a way to use BNs to represent specific actions
• list all state variables for time t (pre-action)
• list all state variables for time t+1 (post-action)
• indicate parents of all t+1 variables
these can include time t and time t+1 variables
network must be acyclic though
specify CPT for each time t+1 variable
•
Note: generally no prior given for time t variables
• we’re (generally) interested in conditional distribution
over post-action states given pre-action state
• so time t vars are instantiated as “evidence” when
using a DBN (generally)
CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart
7
Example of Dependence within Slice
Throw rock at window action
Alarmt
Alarmt+1
P(alt+1 | alt, Brt) = 1
P(alt+1 | ~alt,~brt+1) = 0
P(alt+1 | ~alt,brt+1) = .95
Brokent
Brokent+1
P(brokent+1 | brokent) = 1
P(brokent+1 | ~brokent) = .6
Throwing rock has certain probability of breaking window and
setting off alarm; but whether alarm is triggered depends on
whether rock actually broke the window.
CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart
8
Use of BN Action Reprsnt’n
DBNs: actions concisely, naturally specified
• These look a bit like STRIPS and the situation
calculus, but allow for probabilistic effects
How to use:
• use to generate “expectimax” search tree to solve
decision problems
use directly in stochastic decision making algorithms
•
First use doesn’t buy us much computationally
when solving decision problems. But second use
allows us to compute expected utilities without
enumerating the outcome space (tree)
• well see something like this with decision networks
CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart
9
Decision Networks
Decision networks (more commonly known as
influence diagrams) provide a way of representing
sequential decision problems
• basic idea: represent the variables in the problem as
•
•
you would in a BN
add decision variables – variables that you “control”
add utility variables – how good different states are
CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart
10
Sample Decision Network
TstResult
Chills
BloodTst
Fever
Drug
optional
Disease
U
CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart
11
Decision Networks: Chance Nodes
Chance nodes
• random variables, denoted by circles
• as in a BN, probabilistic dependence on parents
Pr(f|flu) = .5
Pr(f|mal) = .3
Pr(f|none) = .05
Fever
TstResult
Disease
Pr(flu) = .3
Pr(mal) = .1
Pr(none) = .6
BloodTst
CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart
Pr(pos|flu,bt) = .2
Pr(neg|flu,bt) = .8
Pr(null|flu,bt) = 0
Pr(pos|mal,bt) = .9
Pr(neg|mal,bt) = .1
Pr(null|mal,bt) = 0
Pr(pos|no,bt) = .1
Pr(neg|no,bt) = .9
Pr(null|no,bt) = 0
Pr(pos|D,~bt) = 0
Pr(neg|D,~bt) = 0
Pr(null|D,~bt) = 1
12
Decision Networks: Decision Nodes
Decision nodes
• variables decision maker sets, denoted by squares
• parents reflect information available at time decision
is to be made
In example decision node: the actual values of
Ch and Fev will be observed before the decision
to take test must be made
• agent can make different decisions for each
instantiation of parents (i.e., policies)
Chills
Fever
BloodTst
BT ∊ {bt, ~bt}
CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart
13
Decision Networks: Value Node
Value node
• specifies utility of a state, denoted by a diamond
• utility depends only on state of parents of value node
• generally: only one value node in a decision network
Utility depends only on disease and drug
BloodTst
Disease
Drug
optional
U
U(fludrug, flu) = 20
U(fludrug, mal) = -300
U(fludrug, none) = -5
U(maldrug, flu) = -30
U(maldrug, mal) = 10
U(maldrug, none) = -20
U(no drug, flu) = -10
U(no drug, mal) = -285
U(no drug, none) = 30
CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart
14
Decision Networks: Assumptions
Decision nodes are totally ordered
• decision variables D1, D2, …, Dn
• decisions are made in sequence
• e.g., BloodTst (yes,no) decided before Drug (fd,md,no)
No-forgetting property
• any information available when decision Di is made is available
when decision Dj is made (for i < j)
• thus all parents of Di as well as Di would be parents of Dj
Chills
BloodTst
Drug
Fever
CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart
Dashed arcs
ensure the
no-forgetting
property
15
Policies
Let Par(Di) be the parents of decision node Di
• Dom(Par(Di)) is the set of assignments to parents
A policy δ is a set of mappings δi, one for each
decision node Di
• δi :Dom(Par(Di)) →Dom(Di)
• δi associates a decision with each parent asst for Di
For example, a policy for BT might be:
• δBT (c,f) = bt
• δBT (c,~f) = ~bt
Chills
BloodTst
• δBT (~c,f) = bt
Fever
• δBT (~c,~f) = ~bt
CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart
16
Value of a Policy
Value of a policy δ is the expected utility given
that decision nodes are executed according to δ
Given asst x to the set X of all chance variables,
let δ(x) denote the asst to decision variables
dictated by δ
• e.g., asst to D1 determined by it’s parents’ asst in x
• e.g., asst to D2 determined by it’s parents’ asst in x
•
along with whatever was assigned to D1
etc.
Value of δ :
EU(δ) = ΣX P(X, δ(X)) U(X, δ(X))
CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart
17
Value of a Policy
Note: Once δ has been fixed, the decision nodes
become just like ordinary probability nodes in a
Bayes net.
Thus we can compute a distribution over its values
using variable elimination.
Chills
Fever
BloodTst
Pr(bt|c,f) = 1.0
Pr(bt|c,~f) = 0.0
Pr(bt|~c,f) = 1.0
Pr(bt|~c,~f) = 0.0
δBT (c,f) = bt
δBT (c,~f) = ~bt
δBT (~c,f) = bt
δBT (~c,~f) = ~bt
CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart
18
Value of a Policy
With all nodes in the influence diagram
“converted” to ordinary Bayes net nodes by the
fixed δ we can compute the posterior distribution
of the various values of the utility node again by
variable elimination. U(fludrug, flu) = 20
Pr(fludrug, flu) = .1
Drug
Disease
U
U(fludrug, mal) = -300
U(fludrug, none) = -5
U(maldrug, flu) = -30
U(maldrug, mal) = 10
U(maldrug, none) = -20
U(no drug, flu) = -10
U(no drug, mal) = -285
U(no drug, none) = 30
Pr(fludrug, mal) = .05
Pr(fludrug, none) = .05
Pr(maldrug, flu) = .1
Pr(maldrug, mal) = .1
Pr(maldrug, none) = .2
Pr(no drug, flu) = .1
Pr(no drug, mal) = .1
Pr(no drug, none) = .2
From this posterior
we can compute the
expected value
CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart
19
Optimal Policies
An optimal policy is a policy δ* such that
EU(δ*) ≥ EU(δ) for all policies δ
We can use the dynamic programming principle
yet again to avoid enumerating all policies, and
using variable elimination to do the computation.
CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart
20
Computing the Best Policy
We can work backwards as follows
First compute optimal policy for Drug (last dec’n)
• for each asst to parents (C,F,BT,TR) and for each
•
•
decision value (D = md,fd,none), compute the
expected value of choosing that value of D using VE
set policy choice for each
value of parents to be
the value of D that
TstResult
Chills
has max value
BloodTst
Drug
eg: δD(c,f,bt,pos) = md
Fever
Disease
U
CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart
21
Computing the Best Policy
Next compute policy for BT given policy
δD(C,F,BT,TR) just determined for Drug
• Once δD(C,F,BT,TR) is fixed, we can treat Drug as a
•
•
•
normal random variable with deterministic
probabilities
i.e., for any instantiation of parents, value of Drug is
fixed by policy δD
this means we can solve for optimal policy for BT just
as before
only uninstantiated vars are random vars (once we fix
its parents)
CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart
22
Computing the Best Policy
Computing the expected values with VE.
• suppose we have asst <c,f,bt,pos> to parents of Drug
• we want to compute EU of deciding to set Drug = md
• we run variable elimination.
Treat C,F,BT,TR,Dr as evidence
• this reduces factors (e.g., U restricted to bt,md: depends on Dis)
• eliminate remaining variables (e.g., only Disease left)
• left with factor: U() = ΣDis P(Dis| c,f,bt,pos,md)U(Dis)
We now know EU of doing
TstResult
Dr=md when c,f,bt,pos true
Chills
BloodTst
Drug
Can do same for fd,no to
Fever
decide which is best
Disease
CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart
U
23
Computing Expected Utilities
The preceding illustrates a general phenomenon
• computing expected utilities with BNs is quite easy
• utility nodes are just factors that can be dealt with
using variable elimination
EU = ΣA,B,C P(A,B,C) U(B,C)
= ΣA,B,C P(C|B) P(B|A) P(A) U(B,C)
Just eliminate variables
C
in the usual way
A
U
B
CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart
24
Optimizing Policies: Key Points
If a decision node D has no decisions that follow
it, we can find its policy by instantiating each of
its parents and computing the expected utility of
each decision for each parent instantiation
• no-forgetting means that all other decisions are
•
•
•
instantiated (they must be parents)
its easy to compute the expected utility using VE
the number of computations is quite large: we run
expected utility calculations (VE) for each parent
instantiation together with each possible decision D
might allow
policy: choose max decision for each parent instant’n
CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart
25
Optimizing Policies: Key Points
When a decision D node is optimized, it can be
treated as a random variable
• for each instantiation of its parents we now know
•
what value the decision should take
just treat policy as a new CPT: for a given parent
instantiation x, D gets δ(x) with probability 1(all other
decisions get probability zero)
If we optimize from last decision to first, at each
point we can optimize a specific decision by (a
bunch of) simple VE calculations
• it’s successor decisions (optimized) are just normal
nodes in the BNs (with CPTs)
CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart
26
Decision Network Notes
Decision networks commonly used by decision
analysts to help structure decision problems
Much work put into computationally effective
techniques to solve these
• common trick: replace the decision nodes with
random variables at outset and solve a plain Bayes
net (a subtle but useful transformation)
Complexity much greater than BN inference
• we need to solve a number of BN inference problems
• one BN problem for each setting of decision node
parents and decision node value
CSC 384 Lecture Slides (c) 2002-2003, C. Boutilier and P. Poupart
27