Using Hierarchical Reinforcement Learning to Solve a

Download Report

Transcript Using Hierarchical Reinforcement Learning to Solve a

Using Hierarchical
Reinforcement Learning to
Solve a Problem with Multiple
Conflicting Sub-problems
By: Stephen Robertson
Supervisor: Phil Sterne
Presentation Outline
•
•
•
•
•
•
•
Project Motivation
Project Aim
Progress so far
The Gridworld Problem
Flat Reinforcement Learning
Implementation
Results
Still to do
Project Motivation
•
•
Reinforcement Learning is an
attractive form of machine learning,
but because of the curse of
dimensionality, with complex
problems it becomes inefficient
Hierarchical Reinforcement Learning
is a method for dealing with this curse
of dimensionality
Project Aim
•
•
Implementing various algorithms of
Hierarchical Reinforcement Learning
to a complex gridworld problem
Comparing the various algorithms to
each other and to flat Reinforcement
Learning
Progress
•
•
•
Gridworld Implemented in Java
Flat Reinforcement Learning
Implemented on a 6x6 gridworld in
Java
Feudal Reinforcement Learning in the
process of being implemented
Rules of the gridworld
•
•
•
Possible Actions: Left,
Right, Up, Down and
Rest
Collecting food and
drink increases
nourishment and
hydration
Landing on the tree,
the explorer is now
carrying wood with
which it can repair its
shelter
Rules of the gridworld
•
•
•
Resting in a
repaired shelter
increases health
Landing on the lion
decreases health
With time,
nourishment,
hydration, health
and shelter
condition all
gradually decrease
Flat Reinforcement learning
•
•
•
•
•
•
SARSA with eligibility traces was used
To get Flat Reinforcement Learning
working at all I needed to simplify the task
a bit
6x6 gridworld
Nourishment, Hydration, Health and
Shelter Condition minimised to 4 discrete
levels each
Total states: 6 x 6 x 4 x 4 x 4 x 4 x 2 =
18432
Managable
Results
Still to do
•
•
•
•
•
Finish implementing Feudal
Reinforcement Learning
Implement Phil’s interpretation of
Feudal Reinforcement Learning
Implement MaxQ hierarchical
reinforcement learning
And perhaps others…
Compare them
Questions ?