The MIT Artificial Intelligence Lab

Download Report

Transcript The MIT Artificial Intelligence Lab

Intelligent Agents that Learn
Leslie Pack Kaelbling
MIT Artificial Intelligence Laboratory — Research Directions
Making Reinforcement Learning Really
Work
• Typical RL methods require far too much data to
be practical in an online setting. Address the
problem by
– strong generalization techniques
– using human input to bootstrap
• Let humans do what they’re good at
• Let learning algorithms do what they’re good at
MIT Artificial Intelligence Laboratory — Research Directions
Incorporating Human Input
• Humans can help, even if they are bad at the task
–
–
–
–
Human provides initial trajectories
No attempt is made to learn to reproduce the trajectories
Reinforcement learning takes place in parallel
Once learned policy is good, use it
MIT Artificial Intelligence Laboratory — Research Directions
Learning Phase One
Environment
R O
Supplied
Control
Policy
A
Learning
System
MIT Artificial Intelligence Laboratory — Research Directions
Learning Phase Two
Environment
R O
Supplied
Control
Policy
A
Learning
System
MIT Artificial Intelligence Laboratory — Research Directions
Early Results: Corridor Following
MIT Artificial Intelligence Laboratory — Research Directions
Corridor-Following
• 3 continuous state dimensions
– corridor angle
– offset from middle
– distance to end of corridor
• 1 continuous action dimension
– rotation velocity
• Supplied example policy
–
Average 110 steps to goal
MIT Artificial Intelligence Laboratory — Research Directions
Experimental Set-Up
– Initial training runs start from roughly the
middle of the corridor
– Translation speed has a fixed policy
– Evaluation on a number of set starting points
– Reward
» 10 at end of corridor
» 0 everywhere else
MIT Artificial Intelligence Laboratory — Research Directions
Corridor-Following
Phase 1 Phase 2
Average training
“Best” possible
MIT Artificial Intelligence Laboratory — Research Directions
Corridor Following: Initial Policy
QuickTime™ and a
Cinepak decompressor
are needed to see this picture.
MIT Artificial Intelligence Laboratory — Research Directions
Corridor Following: After Phase 1
QuickTime™ and a
Cinepak decompressor
are needed to see this picture.
MIT Artificial Intelligence Laboratory — Research Directions