Prediction and Cognition or What is Knowledge, that a Machine may

Download Report

Transcript Prediction and Cognition or What is Knowledge, that a Machine may

Toward Grounding Knowledge
in Prediction
or
Toward a Computational Theory
of Artificial Intelligence
Rich Sutton
AT&T Labs
with thanks to
Satinder Singh and Doina Precup
It’s Hard to Build Large AI Systems
• Brittleness
• Unforeseen interactions
• Scaling
• Requires too much manual complexity management
– people must understand, intervene, patch and tune
– like programming
• Need more autonomy
– learning, verification
– internal coherence of knowledge and experience
Marr’s Three Levels of Understanding
• Marr proposed three levels at which any
information-processing machine must be understood
– Computational Theory Level
• What is computed and why
– Representation and Algorithm Level
– Hardware Implementation Level
• We have little computational theory for Intelligence
– Many methods for knowledge representation, but no theory
of knowledge
– No clear problem definition
– Logic
Reinforcement Learning provides a
little Computational Theory
• Policies (controllers)
 : States  Pr(Actions)
• Value Functions
V  : States  
V  (s)  E

t 1

 reward t start in s0 , follow 
t 1
• 1-Step Models
P s t 1 st ,a t
E rt 1 st ,a t
Outline of Talk
• Experience
• Knowledge  Prediction
• Macro-Predictions
• Mental Simulation
offering a coherent candidate
computational theory of intelligence
Experience
• AI agent should be embedded in an ongoing
interaction with a world
Agent
actions
observations
World
Experience = these 2 time series
• Enables clear definition of the AI problem
– Let {reward } be function of {observation }
t
t
– Choose actions to maximize total reward
cf. textbook
definitions
• Experience provides something for knowledge
to be about
What is Knowledge?
What is Knowledge?
Deny the physical world
Deny existence of objects, people, space…
Deny all non-answers, correspondence theories
All we really know about is our experience
Knowledge must be in terms of experience
Grounded Knowledge
A is always followed by B
if
ot = A then ot 1 = B
if A( ) then B(
ot
if A( ) then B(
)
ot 1
)
ht conditioning:
ht 1
Action
if A( ) and C(
ht
A,B observations
A,B predicates
h t  ot ,a t 1 ,o t 1 ,a t 2 ,o t 2 ,
) then B(
at
)
ht 1
All of these are predictions
World Knowledge  Predictions
• The world is a black box, known only by its I/O
behavior (observations in response to actions)
• Therefore, all meaningful statements about the
world are statements about the observations it
generates
• The only observations worth talking about are
future ones
Therefore:
The only meaningful things to say about
the world are predictions
Non-predictive “Knowledge”
• Mathematical knowledge, theorems and proofs
– always true, but tell us nothing about the world
– not world knowledge
• Uninterpretted signals, e.g., useful representations
– real and useful, but not by themselves world knowledge,
only an aid to acquiring it
• Knowledge of the past
• Policies
– could be viewed as predictions of value
– but by themselves are more like uninterpretted signals
Predictions capture “regular”, descriptive world knowledge
Grounded Knowledge
A is always followed by B
if
ot = A then ot 1 = B
if A( ) then B(
1-step
preds.
ot
if A( ) then B(
)
ot 1
)
ht conditioning:
ht 1
Action
if A( ) and C(
ht
) then B(
at
A,B observations
A,B predicates
h t  ot ,a t 1 ,o t 1 ,a t 2 ,o t 2 ,
)
ht 1
Still a pretty limited kind of knowledge.
Can’t say anything beyond one step!
Grounded Knowledge
A is always followed by B
if
ot = A then ot 1 = B
if A( ) then B(
1-step
preds.
A,B observations
ot
if A( ) then B(
)
A,B predicates
ot 1
)
h t  ot ,a t 1 ,o t 1 ,a t 2 ,o t 2 ,
ht conditioning:
ht 1
Action
if A( ) and C(
ht
) then B(
at
)
ht 1
steps later
many steps
long
if A( ) and <arbitrary
experiment>
then many
B(<outcome>)
macropred.
ht
prior grounding
posterior grounding
Both Prior and Posterior Grounding
are Needed
• “Classical” AI systems omit prior grounding
– e.g., “Tweety is a bird”, “John loves Mary”
– sometimes called the “symbol grounding problem”
• Modern AI sytems tend to skimp the posterior
– supervised learning, Bayes nets, robotics…
• It is not OK to leave posterior grounding to
external, human observers
– the information is just not in the machine
– we don’t understand it; we haven’t done our job!
• Yet this is such an appealing shortcut that we have
almost always done it
Outline of Talk
• Experience
• Knowledge  Prediction
• Macro-Predictions
• Mental Simulation
offering a coherent candidate
computational theory of intelligence
Macro-Predictions (Options)
a la Sutton, Precup & Singh, 1999 et al.
Let  : States  Pr(Actions) be an arbitrary policy
Let b : States  Pr({0,1}) be a termination condition
Then <,b> is a kind of experiment
– do  until b=1
– measure something about the resulting experience
Suppose we measure the outcome:
– the state at the end of the experiment
– the total reward during the experiment
Then the macro-prediction for <,b> would predict
Pr(end-state), E{total reward} given start-state
This is a very general, expressive form of prediction
Sutton, Precup,
& Singh, 1999
Rooms Example
4 st ochast ic
primit ive act ions
HALLWAYS
up
lef t
o1
G1
G2
o2
right
Fail 3 3 %
of t he t ime
down
8 mult i-st ep opt ions
( t o each room' s 2 hallways)
Policy of
one option:
Target
Hallway
Planning with Macro-Predictions
wit h cell-t o-cell
primit ive act ions
V (goal)=1
Iteration #0
Iteration #1
Iteration #2
Iteration #1
Iteration #2
wit h room-t o-room
opt ions
V (goal)=1
Iteration #0
Learning Path-to-Goal with and
without Hallway Macros (Options)
1000
Actions
Steps
per 100
episode
Macros
& actions
Macros
10
1
10
100
Episodes
1000
10,000
Mental Simulation
• Knowledge can be gained from experience
– by actually performing experiments
• But knowledge can also be gained without overt
experience
– we call this thinking, reasoning, planning, cognition…
• This can be done through “thought experiments”
– internal simulation of experience
– generated from predictive knowledge
– subject to learning methods as before
• Much thought can be achieved this way...
Illustration: Dynamic Mission
Planning for UAVs
Reward=25
•
15
8
•
?
•
•
5
10
•
Base
Expected
Reward/
Mission
60
•
•
40
High Fuel
RL planning
w/strategies
and real-time
control
–
Tactics: which way to fly now
–
Strategies: which site to head for
Strategies compress space and time
–
–
50
30
Mission: Fly over (observe) most
valuable sites and return to base
Stochastic weather affects
observability (cloudy or clear) of sites
Limited fuel
Intractable with classical optimal
control methods
Temporal scales:
RL planning
w/strategies
Static
Replanner
Low Fuel
Reduce no. states from ~1011 to ~106
Reduce tour length from ~600 to ~6
Reinforcement Learning with strategies
and real-time control outperforms
optimal tour planner
that assumes static weather
Barto, Sutton, and Moll, Adaptive Networks Laboratory, University of Massachusetts
What to compute
and Why
Reward
Policy
Value
Functions
The ultimate goal is reward,
but our AI spends most of
its time with knowledge
Knowledge/
Predictions
A Candidate Computational Theory
of Artificial Intelligence
• AI Agent should be focused on finding general
macro-predictions of experience
• Especially seeking predictions that enable rapid
computation of values and optimal actions
• Predictions and their associated experiments are
the coin of the realm
– they have a clear semantics, can be tested & learned
– can be combined to produce other predictions, e.g. values
• Mental Simulation (plus learning)
– makes new predictions from old
– start of a computational theory of knowledge use
Conclusions
• World knowledge must be expressed in terms of the data
• Such posterior grounding is challenging,
– lose expressiveness in the short term
– lose external (human) coherence, explainability
• But can be done step by step,
• And brings palpable benefits
– autonomous learning/verification/extension of knowledge
– autonomous complexity management due to internal coherence
– knowledge suited to general reasoning process – mental simulation
• We must provide this grounding!