Constructing Intelligent Agents via Neuroevolution

Download Report

Transcript Constructing Intelligent Agents via Neuroevolution

Constructing Intelligent Agents
via Neuroevolution
By Jacob Schrum
[email protected]
Motivation
• Intelligent agents are needed
– Search-and-rescue robots
– Mars exploration
– Training simulations
– Video games
• Insight into nature of intelligence
– Sufficient conditions for emergence of:
• Cooperation
• Communication
• Multimodal behavior
Talk Outline
• Bio-inspired learning methods
– Neural networks
– Evolutionary computation
• My research
– Learning multimodal behavior
– Modular networks in Ms. Pac-Man
– Human-like behavior in Unreal Tournament
• Future work
• Conclusion
Artificial Neural Networks
• Brain = network of neurons
• ANN = abstraction of brain
– Neurons organized into layers
Inputs
Outputs
What Can Neural Networks Do?
• In theory, anything!
– Universal Approximation
Theorem
N
M
– [0,1]  [0,1]
• Can’t program: too complicated
• In practice, learning/training is hard
– Supervised: Backpropagation
– Unsupervised: Self-Organizing Maps
– Reinforcement Learning: Temporal-Difference
and Evolutionary Computation
Evolutionary Computation
• Computational abstraction of evolution
– Descent with modification (mutation)
– Sexual reproduction (crossover)
– Survival of the fittest (natural selection)
• Evolution + Neural Nets = Neuroevolution
– Population of neural networks
– Mutation and crossover modify networks
– Net used as control policy to evaluate fitness
Neuroevolution Example
Start With
Parent Population
Neuroevolution Example
Start With
Parent Population
Evaluate and
Assign Fitness
100
90
75
61
56
50
31
Neuroevolution Example
Start With
Parent Population
Evaluate and
Assign Fitness
Clone, Crossover
and Mutate
To Get Child
Population
100
90
75
61
56
50
31
Neuroevolution Example
Start With
Parent Population
Evaluate and
Assign Fitness
100
90
75
61
56
50
31
100
120
69
99
60
83
50
Clone, Crossover
and Mutate
Children Are Now
the New Parents
Repeat Process:
Fitness Evaluations
As the process continues, each successive population improves performance
Neuroevolution Applications
Double Pole Balancing
F. Gomez and R. Miikkulainen, “2-D Pole Balancing With
Recurrent Evolutionary Networks” ICANN 1998
Neuroevolution Applications
Finless Rocket Control
F. Gomez and R. Miikkulainen, “Active Guidance for a
Finless Rocket Using Neuroevolution” GECCO 2003
Neuroevolution Applications
Vehicle Crash Warning System
N. Kohl, K. Stanley, R. Miikkulainen, M. Samples, and R. Sherony,
"Evolving a Real-World Vehicle Warning System" GECCO 2006
Neuroevolution Applications
http://nerogame.org/
Training Video Game Agents
K. O. Stanley, B. D. Bryant, I. Karpov, R. Miikkulainen, "Real-Time
Evolution of Neural Networks in the NERO Video Game" AAAI 2006
What is Missing?
• NERO agents are specialists
– Sniping from a distance
– Aggressively rushing in
• Humans can do all of this, and more
• Multimodal behavior
– Different behaviors for
different situations
• Human-like behavior
– Preferred by humans
What I do With Neuroevolution
• Discover complex agent behavior
• Discover multimodal behavior
Contributions:
• Use multi-objective evolution
– Different objectives for different modes
• Evolve modular networks
– Networks with modules for
each mode
• Human-like behavior
– Constrain evolution
Pareto-based Multiobjective Optimization
Imagine game with two objectives :
- Damage Dealt
- Health Remaining


 
v dominates u , i.e. v  u 
1. i  {1,  , n}(vi  ui ) and
High health but did not deal much damage
Tradeoff between objectives
2. i  {1,  , n}(vi  ui )
Non - dominated points best :
A  F is Pareto optimal 
A contains all points in F s.t.


 
x  Ay  F ( y  x )
More than one optimal solution
Dealt lot of damage,
but lost lots of health
Non-dominated Sorting Genetic Algorithm II
•
•
•
•
Population P with size N; Evaluate P
Use mutation (& crossover) to get P´ size N; Evaluate P´
Calculate non-dominated fronts of P P´ size 2N
New population size N from highest fronts of P  P´
K. Deb, S. Agrawal, A. Pratap, T. Meyarivan, "A Fast Elitist Non-dominated sorting
genetic algorithm for multi-objective optimization: NSGA-II" PPSN VI, 2000
Ms. Pac-Man
• Popular classic game
• Predator-prey scenario
– Ghosts are predators
– Until power pill is eaten
• Multimodal behavior
needed
– Running from threats
– Chasing edible ghosts
– More?
Modular Networks
• Different areas of brain specialize
– Structural modularity → functional modularity
• Apply to evolved neural networks
– Separate module → behavioral mode
• Preference neurons (grey)
arbitrate between modules
• Use module with highest
preference output
(
)(
)
Module Mutation
• Let evolution decide how many modules
Networks start with
one module
New modules added
by one of several
module mutations
Previous
Duplicate
Random
Intelligent Module Usage
• Evolution discovers a
novel task division
– Not programmed
• Dedicates one module
to luring (cyan)
• Improves ghost eating
when using other
module
Comparison With Other Work
Authors
Method
Game
AVG
MAX
Alhejali and Lucas [1]
GP FourMaze 16,014 44,560
Alhejali and Lucas [2]
GP+Camps FourMaze 11,413 31,850
My Module Mutation Duplicate Results FourMaze 32,647 44,520
Brandstetter and Ahmadi [3]
Recio et al. [4]
Alhejali and Lucas [5]
GP CIG 2011
19,198 33,420
ACO CIG 2011
36,031 43,467
GP+MCTS CIG 2011
32,641 69,010
My Module Mutation Duplicate Results
CIG 2011
63,299 84,980
[1] A.M. Alhejali, S.M. Lucas: Evolving diverse Ms. Pac-Man playing agents using genetic programming. UKCI 2010.
[2] A.M. Alhejali, S.M. Lucas: Using a training camp with Genetic Programming to evolve Ms Pac-Man agents. CIG 2011.
[3] M.F. Brandstetter, S. Ahmadi: Reactive control of Ms. Pac Man using information retrieval based on Genetic
Programming. CIG 2012.
[4] G. Recio, E. Martín, C. Estébanez, Y. Sáez: AntBot: Ant Colonies for Video Games. TCIAIG 2012.
[5] A.M. Alhejali, S.M. Lucas: Using genetic programming to evolve heuristics for a Monte Carlo Tree Search Ms Pac-Man
agent. CIG 2013.
Types of Intelligence
• Evolved intelligent Ms. Pac-Man behavior
– Surprising module usage
– Evolution discovers the unexpected
– Diverse collection of solutions
• Still not human-like
– Human-like vs. optimal
– Human intelligence
Modern Game: Unreal Tournament
•
•
•
•
3D world with simulated physics
Multiple human and software agents interacting
Agents attack, retreat, explore, etc.
Multimodal behavior required to succeed
Human-like Behavior: BotPrize
• International competition at CIG conference
• A Turing Test for video game bots
– Judge as human over 50% of time to win
– After 5 years, we won in 2012
• Evolved combat behavior
– Constrained to
be human-like
Guessing Game
•
•
•
•
•
•
•
•
Coleman: ????
Milford: ????
Moises: ????
Lawerence: ????
Clifford: ????
Kathe: ????
Tristan: ????
Jackie: ????
Judging Game
Player Identities
•
•
•
•
•
•
•
•
Coleman: UT^2 (Our winning bot)
Milford: ICE-2010 (bot)
Moises: Discordia (bot)
Lawerence: Native UT2004 bot
Clifford: w00t (bot)
Kathe: Human
Tristan: Human
Jackie: Native UT2004 bot
Human Subject Study
• Six participants played the judging game
• Recorded extensive post-game interviews
• What criteria to humans claim to judge by?
Lessons Learned
Item
• Don’t be too skilled
– Evolved with accuracy restrictions
– Disable elaborate dodging
Bot
• Humans are “tenacious”
– Opponent-relative actions
– Encourage “focusing” on opponent
• Don’t repeat mistakes
– Database of human traces to get unstuck
Enemy
Bot Architecture
Future Work
• Evolving teamwork
– Ghosts must cooperate to eat Ms. Pac-Man
– Unreal Tournament supports team play
• Domination, Capture the Flag, etc.
• Interactive evolution
– Evolve in response to human interaction
• Adaptive opponents/assistants
• Evolutionary art
• Content generation
http://picbreeder.org/
Conclusion
• Evolution discovers unexpected behavior
• Modular networks learn multimodal behavior
• Human behavior not optimal
– Evolution can be constrained to be
more human-like
• Many directions for future research
Questions?
contact
Jacob Schrum
[email protected]