Reinforcement Learning in Real

Download Report

Transcript Reinforcement Learning in Real

Reinforcement Learning in
Real-Time Strategy Games
Nick Imrei
Supervisors: Matthew Mitchell & Martin Dick
Outline

Reasons



Background




What this research is about
Motivation and Aim
RTS games
Reinforcement Learning explained
Applying RL to RTS
This project



Methodology
Evaluation
Summary
Motivation and Aims

Problem:



AI has been a neglected area – game developers
have adopted the “not broken so why fix it”
philosophy
Internet Thrashing – my own experience
Aim:



Use learning to develop a human-like player
Simulate beginner → intermediate level play
Use RL and A-life-like techniques

E.g. Black and White, Pengi [Scott]
RTS Games – The Domain

Two or more teams of individuals/cohorts in a warlike situation on a series of battlefields


Teams can have a variety of:





E.g. Command & Conquer, Starcraft, Age of Empires, Red
Alert, Empire Earth
Weapons
Units
Resources
Buildings
Players required to manage all of the above to
achieve the end goal.
(Destroy all units, capture flag, etc.)
Challenges offered in RTS
games



Real time constraints on actions
High level strategies combined with lowlevel tactics
Multiple goals and choices
The Aim and Approach

Create a human-like opponent




Realistic
Diverse behavior (not boring)
This is difficult to do!
Tactics and Strategy


Agents will be reactive to environment
Learn rather than code – Reinforcement
learning
The Approach Part 1 –
Reinforcement Learning

Reward and Penalty

Action Rewards / Penalties



Strategic Rewards / Penalties





Penalize being shot
Reward killing a player on the other team
Securing / occupying a certain area
Staying in certain group formations
Destroying all enemy units
Aim to receive maximum reward over time
Problem: Credit assignment

What rewards should be given to which behaviors?
The Approach Part 2 – Credit
Assignment



States and actions
Decide on a state space and action
space
Assign values to



States, or
States and Actions
Train the agent in this space
Reinforcement Learning
example
Reinforcement Learning
example
Why use Reinforcement
Learning?



Well suited to problems where there is a
delayed reward (tactics and strategy)
The trained agent moves in (worst case)
linear time (reactive)
Problems:


Large state spaces (state aggregation)
Long training times (ER and shaping)
The Approach Part 3 – Getting
Diversity
A-life-like behavior using aggregated state spaces
Agent
Agent state
space
Research Summary:


Investigate this approach using a simple RTS game
Issues:

Empirical Research



Applying RL in a novel way
Not using entire state space
Need to investigate



Appropriate reward functions
Appropriate state spaces
Problems with Training



Will need lots of trials - the propagation problem
No. trials can be reduced using Shaping [Mahadevan] and
Experience Replay [Lin]
Self play – other possibilities include A* and human opponents
Tesauro, Samuel
Methodology

Hypothesis:



“The combination of RL and reduced state spaces
in a rich (RTS) environment will lead to human-like
gameplay”
Empirical investigation to test hypothesis
Evaluate system behavior


Analyze the observed results
Describe interesting phenomenon
Evaluation

Measure the diversity of strategies


How big a change (and what type) is required to
change the behaviour – a qualitative analysis of
this
Success of strategies

I.e. what level of gameplay does it achieve



Time to win, points scored, resembles humans
Compare to human strategies
“10 requirements of a challenging and
realistic opponent” [Scott]
Summary



Interested in a human-level game program
Want to avoid brittle, predictable programmed
solutions
Search program space for most diverse
solutions using RL to direct search


Allows specifications of results, without needing to
specify how this can be achieved
Evaluate the results
References




Bob Scott. The illusion of intelligence. AI Game
Programming Wisdom, pages 16–20, 2002.
Sridhar Mahadevan and Jonathan Connell. Automatic
programming of behavior-based robots using
reinforcement learning. Artificial Intelligence 55,
pages 311–364, 1992.
L Lin. Reinforcement learning for robots using neural
networks. PhD thesis, School of Computer Science,
Carnegie Mellon University, Pittsburgh USA, 1993.
Mark Bishop Ring. Continual Learning in
Reinforcement Environments. MIT Press, 1994.
Stay Tuned!


For more information, see
http://www.csse.monash.edu.au/~ngi/
Thanks for listening!