schrum.proposal - Neural Network Research Group
Download
Report
Transcript schrum.proposal - Neural Network Research Group
Evolving Multimodal
Behavior
PhD Proposal
Jacob Schrum
11/4/09
Introduction
Challenge: Discover behavior automatically
Simulations,
video games, robotics
Why challenging?
Noisy
sensors
Complex domains
Continuous states/actions
Multiple agents, teamwork
Multiple objectives
Multimodal behavior required (focus)
What is Multimodal Behavior?
Working definition:
Agent
exhibits distinct kinds of actions
under different circumstances
Examples:
Offensive
& defensive modes in soccer
Search for weapons or opponents in
video game
Animal with foraging & fleeing modes
Very important for teams
Roles
correspond to modes
Example domains will involve teamwork
Previous Approaches
Design Approaches
Hand
Value-Function Based Approaches
Learn
code in a structured manner
the utility of actions (RL)
Evolutionary Approaches
Selectively
search based on performance
Design
Subsumption Architecture (Brooks 1986)
Hierarchical
design
Lower levels independent of higher levels
Built incrementally
Common in robotics
Hand coded
Value-Function Based (1/2)
MAXQ (Dietterich 1998)
Hand-designed
hierarchy
TD learning at multiple levels
Reduce state space
Taxi domain
Still just a grid world
Discrete state & action
Value-Function Based (2/2)
Basis Behaviors (Matarić 1997)
Low-level
behaviors pre-defined
Learn high-level control
Discrete state space
High-level features (conditions)
Reward
shaping necessary
Applied to real robots
Too
much expert knowledge
Evolutionary (1/2)
Layered Evolution (Togelius 2004)
Evolve
components of
subsumption architecture
Applied to:
EvoTanks (Thompson and Levine 2008)
Unreal Tournament 2004 (van Hoorn et al. 2009)
Must
specify:
Hierarchy
Training tasks
Similar
to Layered
Learning (Stone 2000)
Evolutionary (2/2)
Neuro-Evolving Robotic Operatives
(Stanley et al. 2005)
ML
game
Train robot army
Many objectives
Weighted sum: z-scores method
User changes weights during training
Dynamic
objective management
Leads to multimodal behavior
Multiple Objectives
Multimodal problems are typically multiobjective
Modes
associated with objectives
Traditional: weighted sum (Cohon 1978)
Must
tune the weights
Only one solution
Bad for non-convex surfaces
Need better formalism
Each point corresponds to
one set of specific weights
Cannot be captured
by weighted sum
Greatest Mass Sarsa
(Sprague and Ballard 2003)
Multiple MDPs with shared action space
Learn via Sarsa(0) update rule:
Best value is sum of component values:
Used in sidewalk
navigation task
Like weighted sum
Convex Hull Iteration
(Barrett and Narayanan 2008)
Changes MDP formalism:
Vector reward
Find solutions for all possible
weightings where
Maximize:
Results in compact set of solutions
Different trade-offs
Cannot capture non-convex surfaces
Discrete states/actions only
Need a way to capture non-convex surfaces!
Pareto-based Multiobjective Optimization
(Pareto 1890)
Imagine game with two objectives:
Damage Dealt
Health Remaining
dominates
iff
1.
High health but did not deal much damage
Tradeoff between objectives
and
2.
Population of points
not dominated are best:
Pareto Front
Dealt lot of damage,
but lost lots of health
Non-dominated Sorting Genetic Algorithm II
(Deb et al. 2000)
Population P with size N; Evaluate P
Use mutation to get P´ size N; Evaluate P´
Calculate non-dominated fronts of {P P´} size 2N
New population size N from highest fronts of {P P´}
Constructive Neuroevolution
Genetic Algorithms + Neural Networks
Build structure incrementally
Good at generating control policies
Three basic mutations (no crossover used)
Other structural
mutations
possible
More
later
Perturb Weight
Add Connection
Add Node
Evolution of Teamwork
Heterogeneous
Different
roles
Cooperation harder
to evolve
Team-level
multimodal behavior
Homogeneous
Shared
policy
Individuals know how
teammates act
Individuals fill roles as
needed: multimodal
Completed Work
Benefits of Multiobjective Neuroevolution
Pareto-based
Targeting Unachieved Goals (TUG)
Speed
up evolution with objective management
Evolving Multiple Output Modes
Allow
leads to multimodal behavior
networks to have multiple policies/modes
Need a domain to experiment in …
Battle Domain
Evolved monsters (yellow)
Scripted fighter (green)
Approach nearest monster
Swing bat repeatedly
Monsters can hurt fighter
Bat can hurt monsters
Multiple objectives
Deal
damage
Avoid damage
Stay alive
Can multimodal teamwork evolve?
Benefits of Multiobjective Neuroevolution
Research Questions:
NSGA-II
better than z-scores (weighted sum)?
Homogeneous or heterogeneous teams better?
30 trials for each combination
Three evaluations per individual
Average
scores to overcome noisy evals
Incremental Evolution
Hard to evolve against scripted strategy
Could
Incremental evolution against increasing speeds
0%,
easily fail to evolve interesting behavior
40%, 80%, 90%, 95%, 100%
Increase speed when all
goals are met
End when goals met at 100%
Goals
Average population performance high enough?
Then increase speed
Each objective has a goal:
At
least 50 damage to bot (1 kill)
Less than 20 damage per monster on average (2 hits)
Survive at least 540 time
steps (90% of trial)
AVG population objective
score met goal value?
Goal achieved
Evolved Behaviors
Baiting + Side-Swiping
Lure
fighter
Turns allow team to catch up
Attacks on left side of fighter
Taking Turns
Hit
and run
Next counter-clockwise
monster rushes in
Fighter hit on left side
Multimodal behaviors!
Multiobjective Conclusions
NSGA-II faster than z-scores
NSGA-II more likely to generate
multimodal behavior
Many runs did not finish/were slow
Several “successful” runs did not have
multimodal behavior
Targeting Unachieved Goals
Research Question:
How
to speed up evolution, make more reliable
When objective’s goal is met, stop using it
Restore objective if scores drop below goal
Focuses on the most challenging objectives
Combine NSGA-II with TUG
Tough Objectives
Evolved Behaviors
Alternating Baiting
Bait
until another monster hits
Then baiting monster attacks
Fighter knocked back and forth
Synchronized Formation
Move
as a group
Fighter chases one bait
Other monster rushes in with
side swipe attacks
More multimodal behaviors!
TUG Conclusions
TUG results in huge speed-up
No
wasted effort on achieved goals
TUG runs finish more reliably
Heterogeneous runs have more multimodal
behavior than homogeneous
Some runs still did not finish
Some “successful” runs still did not have
multimodal behavior
Fight or Flight
Separate Fight and Flight trials
Fight = Battle Domain
Flight:
Scripted
prey (red) instead of fighter
Has no bat; has to escape
Monsters confine and damage
New objective: Deal damage in Flight
Flight task requires teamwork
Requires multimodal behavior
New-Mode Mutation
Encourage multimodal behavior
New mode with inputs from preexisting mode
Initially
very similar
Maximum preference node determines mode
Evolving Multiple Output Modes
Research Question:
How
to evolve teams that do well in both tasks
Compare 1Mode to ModeMutation
Three evals in Fight and three in Flight
Same
networks for two different tasks
1Mode Behaviors
Aggressive + Corralling
Aggressive in Fight task
Take lots of damage
Deal lots of damage
Corralling in Flight task
Run/Rush + Crowding
Run/Rush in Fight task
Good timing on attack
Kill fighter w/o taking too much damage
Crowding in Flight task
Get too close to prey
Knock prey out and it escapes
Networks can’t handle both tasks!
ModeMutation
Behaviors
Alternating Baiting + Corralling
Alternating
baiting in Fight task
Corralling in Flight task
Spread out to prevent escape
Individuals rush in to attack
Hit into Crowd + Crowding
Hitting into Crowd in Fight task
One attacker knocks fighter into others
Crowding in Flight task
Rush prey, ricochet back and forth
Some times knocks prey free
Networks succeed at both tasks!
Mode Mutation Conclusions
ModeMutation slower than 1Mode
ModeMutation better at producing
multimodal behaviors
Harder task resulted in more failed runs
Many unused output modes created
Slows
down execution
Bloats output layer
Proposed Work
Extensions
Avoiding Stagnation by Promoting Diversity
2. Extending Evolution of Multiple Output Modes
3. Heterogeneous Teams Using Subpopulations
4. Open-Ended Evolution + TUG
1.
Evaluate in new tasks
Killer App: Unreal Tournament 2004
1. Avoiding Stagnation by Promoting Diversity
Behavioral diversity avoids stagnation
Add a diversity objective (Mouret et al. 2009)
Behavior vector:
Given
input vectors,
concatenate outputs
1.2 -2
Diversity objective:
…
-1 2.2
…
AVG
distance from
other behavior
vectors in pop.
0.5 1 0.2 1.7 -2
…
1.5 -1 0.6 0.3 2
2. Extending Evolution of Multiple Output Modes
Encourage mode differences
Random
input sources
Probabilistic arbitration
Bad
modes less likely to persist
Like softmax action selection
Restrict New-Mode Mutation
New
objective: punish unused modes, reward used modes
Delete similar modes
Based on behavior metric
Limit
modes: make best use of limited resources
Dynamically increase the limit?
3. Heterogeneous Teams Using Subpopulations
Each team member from different subpopulation
(Yong 2007)
Encourages division of labor across teammates
Different roles leads to multimodal team behavior
4. Open-Ended Evolution + TUG
Keep increasing goals
Evolution
has something to strive towards
Preserves benefits of TUG
Does not settle early
When to increase goals?
When
As
all goals are achieved
individual goals are achieved
New Tasks
More tasks require more modes
Investigate single-agent tasks
Investigate complementary objectives
Only teams so far
TUG only helps contradictory?
Are hard when combined with others
Tasks:
1.
Predator
•
•
2.
Opposite of Flight
Partial observability
Sink the Ball
•
•
•
Very different from previous
Needs more distinct modes?
Less mode sharing?
Unreal Tournament 2004
Commercial First-Person Shooter (FPS)
Challenging domain
Continuous
state and action
Multiobjective
Partial information
Multimodal behaviors required
Programming API: Pogamut
Competitions:
Botprize
Deathmatch
Unreal Deathmatch
Packaged bots are hand-coded
Previous winners of botprize hand-coded
Learning attempts
Simplified
version of game (van Hoorn et al. 2009)
Limited to certain behaviors (Kadlec 2008)
Multimodal behavior in full game: not done yet
Unreal Teams
Team Deathmatch
Largely
ignored?
Capture the Flag
Teams
protect own flag
Bring enemy flag to base
GP approach could not beat UT bots (Kadlec 2008)
Domination
King
of hill
Teams defend key locations
RL approach learned group strategy
hand-coded bots (Smith et al. 2007)
of
Review
System for developing multimodal behavior
Multiobjective
Evolution
Targeting Unachieved Goals
New-Mode Mutation
Behavioral Diversity
Extending Mode Mutation
Subpopulations
Open-Ended Evolution
Final evaluation in Unreal Tournament 2004
Conclusion
Create system:
Automatically
discovers multimodal behavior
No high-level hierarchy needed
No low-level behaviors needed
Works in continuous, noisy environments
Discovers team behavior as well
Agents w/array of different useful behaviors
Lead to better agents/behaviors in
simulations, games and robotics
Questions?
Auxiliary Slides
Design (2)
Behavior Trees (Isla 2005)
Top-down
approach
Used first in Halo 2
Other commercial games since
“Brute Force Approach to Common Sense”
Add a behavior for every situation
Hand coded
Evolutionary (0)
Dangerous Foraging (Stanley et al. 2003)
Don’t
know if food is safe or poison
Partial information
Multimodal:
Eat food
Avoid food
Adaptive Teams of Agents (Bryant
Roman
legions defend
Homogeneous teams
Barbarians plunder
Multimodal:
Defend town
Chase barbarian
and Miikkulainen 2003)
Other MOEAs
PESA-II (Corne et al. 2001)
strength = 0
strength = 1
In archive
External
archive and
internal population
Region-based selection
using squeeze factor
SPEA2 (Zitzler and Thiele 1999)
External
archive and
internal population
Fitness based on strength
and density
squeeze factor = 2
squeeze factor = 1
1. Predator
Scripted agent is predator
Chases
nearest monster
Monsters run to avoid damage
When combined with Flight,
is partially observable
Opposite
behavior from Flight
Complementary objectives
Avoid
damage
Stay alive
2. Sink the Ball
Monsters push ball around
No
scripted agent
Monster agent sensors tuned to ball
Move ball to goal to sink it
Very different from previous tasks
More
distinct behavioral modes
Complementary objectives
Min.
distance between ball and goal
Min. time to sink goal
Heterogeneous z vs. NSGA-II
Homogeneous z vs. NSGA-II
The z-scores method was faster
Heterogeneous NSGA-II vs. TUG
Homogeneous NSGA-II vs. TUG
Heterogeneous 1Mode vs.
ModeMutation
1Mode was faster
Homogeneous 1Mode vs.
ModeMutation
1Mode was faster