Reinforcement Learning in Real
Download
Report
Transcript Reinforcement Learning in Real
Reinforcement Learning in
Real-Time Strategy Games
Nick Imrei
Supervisors: Matthew Mitchell & Martin Dick
Outline
Reasons
Background
What this research is about
Motivation and Aim
RTS games
Reinforcement Learning explained
Applying RL to RTS
This project
Methodology
Evaluation
Summary
Motivation and Aims
Problem:
AI has been a neglected area – game developers
have adopted the “not broken so why fix it”
philosophy
Internet Thrashing – my own experience
Aim:
Use learning to develop a human-like player
Simulate beginner → intermediate level play
Use RL and A-life-like techniques
E.g. Black and White, Pengi [Scott]
RTS Games – The Domain
Two or more teams of individuals/cohorts in a warlike situation on a series of battlefields
Teams can have a variety of:
E.g. Command & Conquer, Starcraft, Age of Empires, Red
Alert, Empire Earth
Weapons
Units
Resources
Buildings
Players required to manage all of the above to
achieve the end goal.
(Destroy all units, capture flag, etc.)
Challenges offered in RTS
games
Real time constraints on actions
High level strategies combined with lowlevel tactics
Multiple goals and choices
The Aim and Approach
Create a human-like opponent
Realistic
Diverse behavior (not boring)
This is difficult to do!
Tactics and Strategy
Agents will be reactive to environment
Learn rather than code – Reinforcement
learning
The Approach Part 1 –
Reinforcement Learning
Reward and Penalty
Action Rewards / Penalties
Strategic Rewards / Penalties
Penalize being shot
Reward killing a player on the other team
Securing / occupying a certain area
Staying in certain group formations
Destroying all enemy units
Aim to receive maximum reward over time
Problem: Credit assignment
What rewards should be given to which behaviors?
The Approach Part 2 – Credit
Assignment
States and actions
Decide on a state space and action
space
Assign values to
States, or
States and Actions
Train the agent in this space
Reinforcement Learning
example
Reinforcement Learning
example
Why use Reinforcement
Learning?
Well suited to problems where there is a
delayed reward (tactics and strategy)
The trained agent moves in (worst case)
linear time (reactive)
Problems:
Large state spaces (state aggregation)
Long training times (ER and shaping)
The Approach Part 3 – Getting
Diversity
A-life-like behavior using aggregated state spaces
Agent
Agent state
space
Research Summary:
Investigate this approach using a simple RTS game
Issues:
Empirical Research
Applying RL in a novel way
Not using entire state space
Need to investigate
Appropriate reward functions
Appropriate state spaces
Problems with Training
Will need lots of trials - the propagation problem
No. trials can be reduced using Shaping [Mahadevan] and
Experience Replay [Lin]
Self play – other possibilities include A* and human opponents
Tesauro, Samuel
Methodology
Hypothesis:
“The combination of RL and reduced state spaces
in a rich (RTS) environment will lead to human-like
gameplay”
Empirical investigation to test hypothesis
Evaluate system behavior
Analyze the observed results
Describe interesting phenomenon
Evaluation
Measure the diversity of strategies
How big a change (and what type) is required to
change the behaviour – a qualitative analysis of
this
Success of strategies
I.e. what level of gameplay does it achieve
Time to win, points scored, resembles humans
Compare to human strategies
“10 requirements of a challenging and
realistic opponent” [Scott]
Summary
Interested in a human-level game program
Want to avoid brittle, predictable programmed
solutions
Search program space for most diverse
solutions using RL to direct search
Allows specifications of results, without needing to
specify how this can be achieved
Evaluate the results
References
Bob Scott. The illusion of intelligence. AI Game
Programming Wisdom, pages 16–20, 2002.
Sridhar Mahadevan and Jonathan Connell. Automatic
programming of behavior-based robots using
reinforcement learning. Artificial Intelligence 55,
pages 311–364, 1992.
L Lin. Reinforcement learning for robots using neural
networks. PhD thesis, School of Computer Science,
Carnegie Mellon University, Pittsburgh USA, 1993.
Mark Bishop Ring. Continual Learning in
Reinforcement Environments. MIT Press, 1994.
Stay Tuned!
For more information, see
http://www.csse.monash.edu.au/~ngi/
Thanks for listening!