CS 294-5: Statistical Natural Language Processing
Download
Report
Transcript CS 294-5: Statistical Natural Language Processing
CS 188: Artificial Intelligence
Fall 2008
Lecture 27: Conclusion
12/9/2008
Dan Klein – UC Berkeley
1
Autonomous Robotics
3
Policy Search
[demo]
Problem: often the feature-based policies that work well
aren’t the ones that approximate V / Q best
E.g. your value functions from project 2 were probably horrible
estimates of future rewards, but they still produced good
decisions
Same distinction between modeling and prediction showed up in
classification (where?)
Solution: learn the policy that maximizes rewards rather
than the value that predicts rewards
This is the idea behind policy search, such as what
controlled the upside-down helicopter
4
POMDPs
Up until now:
Search / MDPs: decision making when the world is
fully observable (even if the actions are nondeterministic
Probabilistic reasoning: computing beliefs in a static
world
Learning: discovering how the world works
What about acting under uncertainty?
In general, the problem formalization is the partially
observable Markov decision process (POMDP)
A simple case: value of information
5
POMDPs
MDPs have:
States S
Actions A
Transition fn P(s’|s,a) (or T(s,a,s’))
Rewards R(s,a,s’)
s
a
s, a
s,a,s’
s’
POMDPs add:
b
Observations O
Observation function P(o|s) (or O(s,o))
POMDPs are MDPs over belief
states b (distributions over S)
a
b, a
o
b’
6
Example: Ghostbusters
In (static) Ghostbusters:
Belief state determined by
evidence to date {e}
Tree really over evidence
sets
Probabilistic reasoning
needed to predict new
evidence given past evidence
Solving POMDPs
b
{e}
a
a
b, a
e, a
e’
o
b’
{e, e’}
{e}
abust
One way: use truncated
expectimax to compute
approximate value of actions U(abust, {e})
e’
What if you only considered
busting or one sense
abust
followed by a bust?
You get the VPI agent from
project 4!
U(abust, {e, e’})
asense
{e}, asense
{e, e’}
7
More Generally
General solutions map belief
functions to actions
Can divide regions of belief space
(set of belief functions) into policy
regions (gets complex quickly)
Can build approximate policies using
discretization methods
Can factor belief functions in various
ways
Overall, POMDPs are very
(actually PSACE-) hard
Most real problems are POMDPs,
but we can rarely solve then in
general!
8
Pacman Contest
8 teams, 26 people qualified
3rd Place: Niels Joubert, Michael Ngo, Rohit Nambiar,
Tim Chen
What they did: split offense / defense
Strong offense: feature-based balance between eating dots
against helping defend
10
Pacman Contest
Blue Team: Yiding Jia
Red Team: William Li, York Wu
What they did (Yiding):
Reflex plus tracking!
Probabilistic inference, particle filtering, consider direct ghost observations and dot
vanishings
Defense: move toward distributions, hope to get better info and hunt, stay near remaining
food
Offense: move toward guard-free dots, flee guard clouds
What they did (William, York):
… ??
[DEMO]
11
Example: Stratagus
[DEMO]
12
Hierarchical RL
Stratagus: Example of a large RL task
Stratagus is hard for reinforcement learning algorithms
> 10100 states
> 1030 actions at each point
Time horizon ≈ 104 steps
Stratagus is hard for human programmers
Typically takes several person-months for game companies to write
computer opponent
Still, no match for experienced human players
Programming involves much trial and error
Hierarchical RL
Humans supply high-level prior knowledge using partial program
Learning algorithm fills in the details
[From Bhaskara Marthi’s thesis]
13
Partial “Alisp” Program
(defun top ()
(loop
(choose
(gather-wood)
(gather-gold))))
(defun gather-wood ()
(with-choice
(dest *forest-list*)
(nav dest)
(action ‘get-wood)
(nav *base-loc*)
(action ‘dropoff)))
(defun gather-gold ()
(with-choice
(dest *goldmine-list*)
(nav dest))
(action ‘get-gold)
(nav *base-loc*))
(action ‘dropoff)))
(defun nav (dest)
(until (= (pos (get-state))
dest)
(with-choice
(move ‘(N S E W NOOP))
(action move))))
14
Hierarchical RL
Define a hierarchical Q-function which learns a
linear feature-based mini-Q-function at each
choice point
Very good at balancing resources and directing
rewards to the right region
Still not very good at the strategic elements of
these kinds of games (i.e. the Markov game
aspect)
[DEMO]
15
Bugman
AI = Animal
Intelligence?
Wim van Eck at
Leiden University
Pacman controlled
by a human
Ghosts controlled
by crickets
Vibrations drive
crickets toward or
away from
Pacman’s location
[DEMO]
http://pong.hku.nl/~wim/bugman.htm
16
Where to go next?
Congratulations, you’ve seen the basics of
modern AI!
More directions:
Robotics / vision / IR / language: cs194
There will be a web form to get more info, from the 188 page
Machine learning: cs281a
Cognitive modeling: cog sci 131
NLP: 288
Vision: 280
… and more; ask if you’re interested
17
That’s It!
Help us out with some course evaluations
Have a good break, and always maximize
your expected utilities!
18