Transcript Document
Learning in Worlds with Objects
Leslie Pack Kaelbling
MIT Artificial Intelligence Laboratory
With Tim Oates, Natalia Hernandez, Sarah Finney
Leslie Pack Kaelbling
1
NTT-MIT Collaboration Meeting, 2001
What is an Agent?
A system that has an
ongoing interaction with an
external environment
• household robot
• factory controller
• web agent
• Mars explorer
• pizza delivery robot
Leslie Pack Kaelbling
Environment
Observation
2
Action
NTT-MIT Collaboration Meeting, 2001
Agents Must Learn
Learning is a crucial aspect of intelligent behavior
• human programmers lack required knowledge
• agents should work in a variety of environments
• agents should work in changing environments
What to learn?
• World dynamics: What happens when I take a
particular action?
• Reward: What world states are good?
Leslie Pack Kaelbling
3
NTT-MIT Collaboration Meeting, 2001
Crisis
Current state-of-the-art learning methods will not work in
domains with multiple objects:
?
These are crucial domains for robots of the future.
Leslie Pack Kaelbling
4
NTT-MIT Collaboration Meeting, 2001
Representation
Learning requires some sort of representation of states
of the world.
The choice of representation affects
• what information can be represented
• what kinds of generalizations the agent can make
Leslie Pack Kaelbling
5
NTT-MIT Collaboration Meeting, 2001
Attribute Vector
State-of-the-art representation for learning
temperature = 48.2
pressure = 57.9 mB
valve1 = open
valve2 = closed
time = 10:48AM
backlog = 78
volume = 32.2
production = 45.5
…
Leslie Pack Kaelbling
6
NTT-MIT Collaboration Meeting, 2001
Generalization over Attribute
Vectors
x
1
0.5
0
-0.5
-1
0
3
2
time
temp > 22
1
1
2
temp
3
0
pressure < 3
close
valve
Leslie Pack Kaelbling
7
open
valve
time < 10AM
add
reagent
increase
temp
NTT-MIT Collaboration Meeting, 2001
Complex Everyday Domains
Attribute vector is impossibly big
book1-on-book2:
book2-on-book1:
true
false
pen-is-yellow:
pen-is-blue:
lamp-on:
lamp-off:
ink-bottle-level:
true
false
true
false
50%
lamp-in-bottle:
bottle-on-lamp:
paper1-color:
paper2-color:
false
false
gray
white
fabric-behind-lamp:
book2-is-clear:
book4-is-clear:
book1-is-clear:
true
false
false
true
block1-on-block2:
block3-unstable:
block2-on-table:
block1-in-front-of-lamp:
false
true
false
true
…
Leslie Pack Kaelbling
8
NTT-MIT Collaboration Meeting, 2001
Generalization over Objects
• If book1 is on book2 and I move book2, then book1
will move
• If the cup is on the table and I move the table, then
the cup will move
• If the pen is on the paper and I move the paper,
then the pen will move
• If the coat is on the chair and I move the chair,
then the coat will move
For all objects A and B:
If A is on B and I move B, then A will move
Leslie Pack Kaelbling
9
NTT-MIT Collaboration Meeting, 2001
Referring to Objects
Traditional symbolic AI has the problem of “symbol
grounding”:
How do I know what object is named by book1?
on(book1,book2)
Leslie Pack Kaelbling
10
NTT-MIT Collaboration Meeting, 2001
Deictic Expressions
“Deixis” is Greek for “pointing”
ima
koko
watashi-ga motteiru hako
Leslie Pack Kaelbling
watashi-ga miteiru hako
11
NTT-MIT Collaboration Meeting, 2001
Automatic Generalization
If I have an object in my hand and I open my hand, then
the object that was in my hand is now on the table
This is true, no matter what object is in your hand.
Leslie Pack Kaelbling
12
NTT-MIT Collaboration Meeting, 2001
Communicating with Humans
Natural language communication
• speaks of the world in terms of objects and their
relationships
• uses deictic expressions
Our robots of the future will have to be able to
understand and generate human descriptions of the
world
Leslie Pack Kaelbling
13
NTT-MIT Collaboration Meeting, 2001
Long-Term Research Goal
A robotic system with hand and cameras that can
• learn to achieve tasks efficiently through trial and
error
• acquire natural language descriptions of the
objects and their properties through “conversation”
with humans
Leslie Pack Kaelbling
14
NTT-MIT Collaboration Meeting, 2001
Short-Term Research Plan
Explore deictic, object-based representation for learning
algorithms
• build simulated hand-eye robot system that
manipulates blocks (with real physics)
• have simulated robot learn to carry out tasks from
trial and error
Demonstrate empirically and theoretically that deictic
representation is crucial for efficient learning
Leslie Pack Kaelbling
15
NTT-MIT Collaboration Meeting, 2001
First Example Domain
Unreliable block stacking:
• robot is rewarded for making tall piles of blocks
• the taller a pile is, the more likely it is to fall over
when another block is added
• a pile can be made more stable by building piles to
its sides
Once the robot learns to do this task, keep the physics
of the domain the same, but reward a more complex
behavior.
Leslie Pack Kaelbling
16
NTT-MIT Collaboration Meeting, 2001
Learning by Doing
Having an initial task to perform focuses the robot’s
attention on aspects of the environment
• Use extension of Utree learning algorithm to select
important aspects of the environment
• Generate new deictic expressions dynamically:
the-block-on-top-of(the-block-I-am-looking-at)
• Extend reinforcement learning methods to apply to
object-based representations
Leslie Pack Kaelbling
17
NTT-MIT Collaboration Meeting, 2001
Extracting General Rules
There are too many facts that are true in any interesting
environment.
Solving tasks focuses attention on
• particular objects (named with deictic
expressions)
• particular properties of those objects
These objects and properties are likely of general
importance: use them as input to association-rule
learning algorithm to learn facts like:
The thing that is on the thing that I am holding will
probably fall off if I move
Leslie Pack Kaelbling
18
NTT-MIT Collaboration Meeting, 2001
Enabling Planning
Given general rules, the
agent can “think” about
the consequences of its
actions and decide what
to do, rather than learn
through trial and error.
Leslie Pack Kaelbling
19
NTT-MIT Collaboration Meeting, 2001
In Future
An ambitious research project
• vision algorithms for learning segmentation and
object recognition
• learning good properties and relations for
characterizing the domain (“concept learning”)
• connect with natural language learning for word
meanings
Leslie Pack Kaelbling
20
NTT-MIT Collaboration Meeting, 2001
Don’t miss
any dirt!
Leslie Pack Kaelbling
21
NTT-MIT Collaboration Meeting, 2001