Kevin Forbes Optimizing Flocking Controllers using Gradient

Download Report

Transcript Kevin Forbes Optimizing Flocking Controllers using Gradient

Optimizing Flocking
Controllers using
Gradient
Descent
Kevin Forbes
Motivation
• Flocking models can animate complex scenes in a cost-effective way
• But, they are hard to control – there are many parameters that interact
in non-intuitive ways – animators find good values by trial and error
• Can we use machine learning techniques to optimize the
parameters instead of setting them by hand?
Background – Flocking model
Reynolds (1987): Flocks, Herds, and Schools: A Distributed
Behavioral Model
Reynolds (1999): Steering Behaviors For Autonomous Characters
• Each agent can “see” other agents in its
neighbourhood
• Motion derived from weighted combination
of force vectors
Alignment
Cohesion
Separation
Background – Learning Model
Lawrence (2003): Efficient Gradient Estimation for Motor Control
Learning
Policy Search: Finds optimal settings of a system’s control
parameter vector, as evaluated by some objective function
Stochastic elements in the system result in noisy gradient
estimates, but there are techniques to limit their effects.
Simple 2-parameter example
Axes: values of control parameters
Color: value of objective function
Blue arrows: negative gradient of
objective function
Red line: result of gradient descent
Project Steps
1. Define physical agent model
2. Define flocking forces
3. Define objective function
4. Take derivatives of all system element
w.r.t all control parameters
5. Do policy search
1. Agent Model
Position, Velocity and Acceleration defined as in Reynolds (1999):
Recursive definition: the base case is the system’s initial condition.
If there are no stochastic forces, the system is deterministic (w.r.t.
the initial conditions).
The flock’s policy is defined by the alpha vector.
2. Forces
The simulator includes the following forces:
Flocking Forces:
Cohesion*, Separation*, Alignment
Single-Agent Forces:
Noise, Drag*
Environmental Forces:
Obstacle Avoidance, Goal Seeking*
* Implemented with learnable coefficients (so far)
3. Objective Function
The exact function used depends upon the goals of the particular
animation
I used the following objective function for the flock at time t:
The neighbourhood
function implied here (and
in the force calculations)
will come back to haunt us
on the next slide. . .
4. Derivatives
In order to estimate the gradient of the objective function, it
must be differentiable.
We can build an appropriate N-function
by multiplying transformed sigmoids
together:
Other derivative-related wrinkles:
• Can not use max/min truncations
• Numerical stability issues
• Increased memory requirements
5. Policy Search
Use Monte Carlo to estimate the expected value of the gradient:
This assumes that the only random
variables are the initial conditions. A
less-noisy estimate can be made if the
distribution of the stochastic forces in the
model are taken into account using
importance sampling.
The Simulator
Features:
• Forward flocking simulation
• Policy learning and mapping
• Optional OpenGL visualization
• Spatial sorting gives good
performance
Limitations:
• Wraparound
• Not all forces are learnable yet
• Buggy neighbourhood
function derivative
Experimental Method
Simple Gradient descent:
•
•
•
•
Initialize flock, assign a random alpha
Run simulation ( N times)
Step (with annealing) in negative gradient direction
Reset flock
Steps 2-4 are repeated for a certain number of steps
Results - ia
Simple system test:
• 1 agent
• 2 forces: seek and drag
• Seek target in front of agent;
agent initially moving towards
target
• Simulate 2 minutes
• No noise
• Take best of 10 descents
Results - ib
Simple system test w/noise:
• Same as before, but set the
wander force to strength 1
• Used N=10 for the Monte
Carlo estimate
Effects of noise:
• Optimal seek force is larger
• Both surface and descent path
are noisy
Results - ii
More complicated system:
• 2 Agents
• Seek and drag set at 10, .1
• Learn cohesion, separation
• Seek target orbiting agents’
start position
• Simulate 2 minutes
• Target distance of 5,10
• Noise in initial conditions
Results:
• Target distance does influence the optimized values
• Search often gets caught in foothills
Results - iii
Higher dimensions:
• 10 agents
• Learn cohesion,
separation, seek, drag
• Otherwise the same as
last test
Results:
• Objective function is being optimized (albeit slowly)!
• Target distance is matched!
Conclusion
• Technique shows promise
• Implementation has poor search performance
Future Work
• Implement more learnable parameters
• Fix neighbourhood derivative
• Improve gradient search method
• Use importance sampling!
Demonstrations