schrum.gecco2010
Download
Report
Transcript schrum.gecco2010
Evolving Agent Behavior in
Multiobjective Domains Using
Fitness-Based Shaping
Jacob Schrum and Risto Miikkulainen
University of Texas at Austin
Department of Computer Science
Typical Uses of MOEAs
Where have MOEAs proven themselves?
Wireless Sensor Networks (Woehrle et al, 2010)
Groundwater Management (Siegfried et al 2009)
Hydrologic model calibration (Tang et al, 2006)
Epoxy polymerization (Deb et al, 2004)
Voltage-controlled oscillator design (Chu et al, 2004)
Multi-spindle gear-box design (Deb & Jain, 2003)
Foundry casting scheduling (Deb & Reddy, 2001)
Multipoint airfoil design (Poloni & Pediroda, 1997)
Design of aerodynamic compressor blades (Obayashi, 1997)
Electromagnetic system design (Michielssen & Weile, 1995)
Microprocessor design (Stanley & Mudge, 1995)
Design of laminated ceramic composites (Belegundu et al, 1994)
Many engineering/design problems!
New Domains for MOEAs
Simulated agents often face multiple objectives
Automatic discovery of intelligent behavior
Video game opponents in Unreal
Tournament (van Hoorn, 2009)
Predator/prey scenarios
(Schrum & Miikkulainen 2009)
Race car driving in TORCS
(Agapitos et al, 2008)
Comparatively little so far
Direct application of MOEA seldom successful
Success often depends on “shaping”
What is Shaping?
Term from Behavioral Psychology
Identified by B. F. Skinner (1938)
Task-Based Example:
Train rat to press lever
First reward proximity
Then any interaction with lever
Then actual pressing of lever
Evolutionary Shaping
Environment changes, making task harder
Evolution shapes behavior across generations
Example: Migration given continental drift [1]
Animals become accustomed to short migration
Continental drift increases distance of migration
Ability to travel increasing distances required
Arctic Tern
Atlantic Salmon
EC models with incremental evolution (ex. [2])
[1] B. F. Skinner. The shaping of phylogenic behavior. Experimental Analysis of Behavior. 1975.
[2] Schrum and Miikkulainen. Constructing Complex NPC Behavior via Multiobjective Neuroevolution. 2008.
Fitness-Based Shaping
Not extensively used
Little/no domain knowledge needed
Multiobjective approach a good fit
Selection criteria change
Exploiting ignored objectives (TUG)
Exploiting unfilled niches (BD)
Objective Space
Dominated, but
exploiting mostly
ignored objective
Uncrowded Niches
Behavior Space
Crowded Niches
Mutiobjective Optimization
v u iff
Pareto dominance:
i 1, , n: v u
i
i
i 1, , n: v u
i
i
Assumes maximization
Want nondominated points
NSGA-II used in this work
What to evolve?
NNs as control policies
Nondominated
Constructive Neuroevolution
Genetic Algorithms + Neural Networks
Build structure incrementally (complexification)
Good at generating control policies
Three basic mutations (no crossover used)
Perturb Weight
Add Connection
Add Node
Targeting Unachieved Goals
Main ideas:
“Hard” and “easy” defined in terms of goal values
Temporarily deactivate “easy” objectives
Focus on “hard” objectives
Easy: average fitness “persists” above goal (achieved)
Hard: goal not yet achieved
Objectives reactivated when no longer achieved
Increase goal values when all achieved
Hard Objectives
TUG Example
Other goals also achieved → Goals increase
Noisy evaluations
Goal achieved
Reset recency-weighted average
Behavioral Diversity
Originally developed for single-objective tasks [3]
Add behavioral diversity objective
Encourage exploration of new behaviors
Domain-specific behavior measure required
Extensions in this work:
Senses
Multiobjective task
Domain independent method
Only requires policy mapping
ℝN to ℝ M, e.g. NNs
Actions
[3] J.-B. Mouret and S. Doncieux. Using behavioral exploration objectives to solve deceptive
problems in neuro-evolution. 2009.
Behavioral Diversity Details
Behavior vector:
Given input vectors, concatenate outputs
0.1 2.3 4.3 5.2 3.2
Behavior vector
0.5 5.3 7.5 3.4 2.1
2.4 4.3 0.7 4.2
…
1.3 4.2 5.6 4.5 7.7
Behavioral diversity objective:
AVG distance from other
behavior vectors
High average distance from other points
…
2.1 3.5
Battle Domain
Evolved monsters (blue)
Scripted fighter (green)
Bat can hurt monsters
Three objectives
Monsters can hurt fighter
Deal damage
Avoid damage
Stay alive
Previous work required
incremental evolution to solve
Experimental Comparison
NN copied to 4 monsters
In paper
Control: Plain NSGA-II
TUG: NSGA-II with TUG using expert initial goals
BD: NSGA-II with BD using random input vectors
Additional methods since publication
Homogeneous teams
TUG-Low: NSGA-II with TUG using minimal initial goals
BD-Obs: NSGA-II with BD using inputs from evaluations
Each repeated 30 times
Attainment Surfaces [4]
Result attainment surface
Shows space dominated by single Pareto front
Summary attainment surface s
Union of space dominated in at least s out of n runs
Surface s weakly dominates s+1, etc.
Individual
surfaces intersect
Surface 1
Surface 2
Surface 3
Pareto Fronts
(Approximation Sets)
Result Attainment
Surfaces
Summary
Attainment Surfaces
[4] J. Knowles. A summary-attainment surface plotting method for visualizing the performance
of stochastic multiobjective optimizers. 2005.
Final Summary Attainment Surfaces
Animation: worst to best summary attainment surface
Control
TUG
TUG-Low
BD
BD-Obs
Hypervolume Metric [5]
Hypervolume of result attainment surface
WRT reference point
Simply “volume” for 3 domain objectives
Slightly less than minimum scores
Pareto-compliant metric
F1 F2 HV1 HV2
Hypervolume = A + B + C + D
[5] E. Zitzler and L. Thiele. Multiobjective optimization
using evolutionary algorithms – a comparative case
study. 1998.
Hypervolume
Successful Behaviors
BD
TUG
BD-Obs
TUG-Low
Discussion
Control: more extreme trade-offs
BD: more precise timing
BD-Obs and BD similar
“Real” inputs give no
advantage
TUG: more teamwork
Particular initial objectives
TUG-Low more like BD than TUG
ALL are better than Control
Future Work
How to combine TUG and BD
Naïve combination doesn’t work
Scaling up
Many objectives
More complex domains
Current work in Unreal Tournament promising
Conclusion
BD and TUG improve MO evolution
Domain independence!
Contrast to task-based shaping
Expand MOEAs to a new range of domains
Questions?
Email: [email protected]
See movies at:
http://nn.cs.utexas.edu/?fitness-shaping
TUG Details
Persistence:
rt 1 rt ( xt 1 rt )
Goals:
Recency-weighted average rt
surpasses goal
Initial values based on domain knowledge
Or simply the minimal values for objectives
Increase each goal g when all are achieved
g o g o (omax g o )
Goal achieved
o
Objectives reactivated when no longer achieved
TUG Cycles