Transcript Week 2
8/27
Current make-up: 13 UG; 21 G; +~3
Assignment 0 due next Wednesday 9/5
(Model-based reflex agents)
This one already assumes that the “sensorsfeatures” mapping has been done!
(aka Model-based Reflex Agents)
EXPLICIT MODELS OF THE ENVIRONMENT
--Blackbox models (child function)
--Logical models
--Probabilistic models
Representation & Reasoning
It is not always obvious what action to do now given a set of goals
You woke up in the morning. You want to attend a class. What should your action be?
Search (Find a path from the current state to goal state; execute the first op)
Planning (does the same for logical—non-blackbox state models)
..certain inalienable rights—life, liberty and pursuit of
?Money
?Daytime TV
?Happiness (utility)
--Decision Theoretic Planning
--Sequential Decision Problems
Discounting
• The decision-theoretic agent often needs to assess the
utility of sequences of states (also called behaviors).
– One technical problem is “How do keep the utility of an infinite
sequence finite?
– A closely related real problem is how do we combine the utility of
a future state with that of a current state (how does 15$ tomorrow
compare with 5000$ when you retire?)
– The way both are handled is to have a discount factor r (0<r<1)
and multiply the utility of nth state by rn
• r0 U(so)+ r1 U(s1)+…….+ rn U(sn)+
• Guaranteed to converge since power series converge for 0<r<n
– r is set by the individual agents based on how they think future
rewards stack up to the current ones
• An agent that expects to live longer may consider a larger r than one
that expects to live shorter…
Representation Mechanisms:
Logic (propositional; first order)
Probabilistic logic
Learning
the models
How the course topics stack up…
Search
Blind, Informed
Planning
Inference
Logical resolution
Bayesian inference
Learning
Dimensions:
What can be learned?
--Any of the boxes representing
the agent’s knowledge
--action description, effect probabilities,
causal relations in the world (and the
probabilities of causation), utility models
(sort of through credit assignment), sensor
data interpretation models
What feedback is available?
--Supervised, unsupervised,
“reinforcement” learning
--Credit assignment problem
What prior knowledge is available?
-- “Tabularasa” (agent’s head is a blank
slate) or pre-existing knowledge
Problem Solving Agents (Search-based Agents)
The important difference from the graph-search scenario you
learned in CSE 310 is that you want to keep the graph
implicit rather than explicit (i.e., generate only that part of
the graph that is absolutely needed to get the optimal path)
VERY important since for most problems, the graphs are
humongous..
“Factored Representation…”
8/29
Blog seems alive and well
Blog Czars appointed
Will Cushing
J. Benton
Menkes van den Briel
Kartik Talamadupula
+
Aravind (TA)
Rao (instructor)
You search in this
Space even if your
Init state is known
But actions are
Non-deterministic
Sensing reduces
State Uncertainty
Search in Multi-state
(inaccessible)
version
Set of states is
Called a “Belief State”
So we are searching in
the space of belief states
Utility of eyes (sensors) is
reflected in the size of the
effective search space!
In general, a subgraph
rather than a tree
(loops may be needed
consider closing a faulty door )
Given a state space of size n (or 2v where v is the # state variables)
the single-state problem searches for a path in the graph of size n (2v)
the multiple-state problem searches for a path in a graph of size 2n (22^v)
the contingency problem searches for a sub-graph in a graph of size 2n (22^v)
2n is the EVILthat every CS student’s nightmares are made of
??
Search algorithms differ based on the specific
queuing function they use
All search algorithms must do goal-test only when the
node is picked up for expansion
We typically analyze properties of search algorithms
on uniform trees
--with uniform branching factor b and goal depth
d (tree itself may go to depth dt )
Breadth first search on a uniform tree of b=10
Assume 1000nodes expanded/sec 100bytes/node
Qn: Is there a way of getting linear memory search
that is complete and optimal?
The search is “complete” now (since there is finite
space to be explored). But still inoptimal.
IDDFS: Review
Num iterations: (d+1)
Asymptotic ratio of # nodes expanded by IDDFS vs DFS
(b+1)/ (b-1) (approximates to 1 when b is large)
A
BFS:
A,B,G
DFS: A,B,C,D,G
IDDFS: (A), (A, B, G)
B
C
Note that IDDFS can do fewer
Expansions than DFS on a graph
Shaped search space.
D
G
A
BFS:
A,B,G
DFS: A,B,A,B,A,B,A,B,A,B
IDDFS: (A), (A, B, G)
B
C
Note that IDDFS can do fewer
Expansions than DFS on a graph
Shaped search space.
D
G
Search on undirected graphs or directed graphs with cycles…
Cycles galore…
Graph (instead of tree) Search:
Handling repeated nodes
Main points:
--repeated expansions is a bigger issue for DFS than for
BFS or IDDFS
--Trying to remember all previously expanded nodes and
comparing the new nodes with them is infeasible
--Space becomes exponential
--duplicate checking can also be exponential
--Partial reduction in repeated expansion can be done by
--Checking to see if any children of a node n have the same
state as the parent of n
-- Checking to see if any children of a node n have the same
state as any ancestor of n (at most d ancestors for n—where
d is the depth of n)
Uniform Cost Search
A
A
No:A (0)
1
N1:B(1)
B
1
1
9
C
N2:G(9)
Bait &
Switch
Graph
B
0.1
N3:C(2)
0.1
D
2
0.1
N4:D(3)
G
N5:G(5)
Notation:
C(n,n’) cost of the edge
between n and n’
g(n)
distance of n from root
dist(n,n’’) shortest distance
between n and n’’
Completeness?
Optimality?
9
C
D
25
if d < d’, then paths with
d distance explored before those
with d’
Branch & Bound argument
(as long as all op costs are +ve)
Efficiency?
(as bad as blind search..)
G
Visualizing Breadth-First & Uniform Cost Search
This is also a proof of
optimality…
Breadth-First goes level by level
Proof of Optimality of Uniform search
Proof of optimality:
Let N be the goal node we output.
Suppose there is another goal node N’
We want to prove that g(N’) >= g(N)
Suppose this is not true.
i.e. g(N’) < g(N) --Assumption A1
No
N’’
When N was picked up for expansion,
Either N’ itself, or some ancestor of N’,
Say N’’ must have been on the search queue
If we picked N instead of N’’ for expansion,
It was because
g(N) <= g(N’’) ---Fact f1
But g(N’) = g(N’’) + dist(N’’,N’)
So g(N’) >= g(N’’)
So from f1, we have
g(N) <= g(N’)
But this contradicts our assumption A1
N
N’
Holds only because
dist(N’’,N’) >= 0
This will hold if every operator has +ve cost
“Informing” Uniform search…
A
0.1
Bait &
Switch
Graph
B
0.1
0.1
9
C
N1:B(.1)
N2:G(9)
N3:C(.2)
N4:D(.3)
D
25
No:A (0)
G
N5:G(25.3)
Admissibility
Informedness
Would be nice if we could tell that
N2 is better than N1
--Need to take not just the distance
until now, but also distance to goal
--Computing true distance to goal is as
hard as the full search
--So, try “bounds” h(n)
prioritize nodes in terms of f(n) = g(n) +h(n)
two bounds: h1(n) <= h*(n) <= h2(n)
Which guarantees optimality?
--h1(n) <= h2(n) <= h*(n)
Which is better function?
*
A