Evolutionary Robotics - uni

Download Report

Transcript Evolutionary Robotics - uni

A TUTORIAL
Stefano Nolfi
Dario Floreano
Neural Systems & Artificial Life
National Research Council
Roma, Italy
[email protected]
Microengineering Dept.
Swiss Federal Institute of Technology
Lausanne, Switzerland
[email protected]
The method


  V 1  v 1  i 
Gen. 0
test
fitness function
reproduce
and mutate
select
test
Gen. 1
select
………..
reproduce
and mutate
test
genotype-to-phenotype
mapping
Gen. n
TUTORIAL
Stefano Nolfi & Dario Floreano, 2000
Behavior-Based Robotics & ER
manipulate the world
build maps
sensors
explore
actuators
behavior-based
robotics
[Brooks, 1986]
avoid hitting things
locomote
?
?
sensors
?
actuators
?
?
TUTORIAL
Stefano Nolfi & Dario Floreano, 2000
evolutionary robotics
Learning Robotics & ER
[Kodjabachian & Meyer, 1999]
desired output or
teaching signal
motors
sensors
TUTORIAL
Stefano Nolfi & Dario Floreano, 2000
Artificial Life & ER
[Menczer and Belew, 1997]
TUTORIAL
[Floreano and Mondada 1994]
Stefano Nolfi & Dario Floreano, 2000
How to Evolve Robots
evolution on the real world
evolution on simulation
[Floreano and Nolfi, 1998]
+ test on the real robot
[Nolfi, Floreano, Miglino, Mondada 1994]
TUTORIAL
Stefano Nolfi & Dario Floreano, 2000
Evolution in the Real World
mechanical
robustness
[© K-Team SA]
energy
supply
analysis
[© K-Team SA]
[© K-Team SA]
[Floreano and Mondada, 1994]
TUTORIAL
Stefano Nolfi & Dario Floreano, 2000
Evolution in Simulation
Different physical sensors and actuators may perform differently
because of slight differences in their electronics or mechanics.
1024
1024
768
768
y 512
y 512
0
256
20
0
320
20
x
80
4th IF sensor
z
0
256
20
0
140
200
x
z
260
8th IF sensor
Physical sensors deliver uncertain values and commands to
actuators have uncertain effects.
The body of the robot and the environment should be accurately
reproduced in the simulation.
[Nolfi, Floreano, Miglino and Mondada 1994; Miglino, Lund, Nolfi, 1995]
TUTORIAL
Stefano Nolfi & Dario Floreano, 2000
Designing the Fitness Function
[Floreano et al, 2000]
FEE functions that describe how the controller should work (functional), rate the system on
the basis of several variables and constraints (explicit), and employ precise external
measuring devices (external) are appropriate to optimize a set of parameters for complex
but well defined control problem in a well-controlled environment.
BII functions that rate only the behavioral outcome of an evolutionary controller (behavioral),
rely on few variables and constraints (implicit) that be computed on-board (internal) are
suitable for developing adaptive robots capable of autonomous operation in partially
unknown and unpredictable environments without human intervention.
TUTORIAL
Stefano Nolfi & Dario Floreano, 2000
Genetic Encoding
Evolvability
Expressive power
Compactess
Simplicity
[Gruau, 1994, Nolfi and Floreano 2000]
TUTORIAL
Stefano Nolfi & Dario Floreano, 2000
Adaptation is more Powerful than
Decomposition and Integration
The main strategy followed to develop mobile robots has
been that of Divide and Conquer:
1) divide the problem into a list of hopefully simpler sub-problems
2) build a set of modules or layers able to solve each sub-problem
3) integrate the modules so to solve the whole problem
Unfortunately, it is not clear how a desired behavior should
be broken down
TUTORIAL
Stefano Nolfi & Dario Floreano, 2000
Proximal and Distal Descriptions of
Behaviors
discriminate
approach
distal
description
avoid
explore
proximal
description
motor
space
environment
sensory
space
[Nolfi, 1997]
TUTORIAL
Stefano Nolfi & Dario Floreano, 2000
Discrimination Task (1)
decomposition and integration
discriminate
sensors
approach
actuators
avoid
explore
walls
and
cylinders
small
and
large
cylinders
TUTORIAL
Stefano Nolfi & Dario Floreano, 2000
[Nolfi, 1996,1999]
Discrimination Task (2)
discriminate
sensors
approach
avoid
genotype
TUTORIAL
phenotype
Stefano Nolfi & Dario Floreano, 2000
explore
[Nolfi, 1996]
actuators
Discrimination Task (3)
Evolved robots act so to select sensory patterns that are
easy to discriminate
[Scheier, Pfeifer, and Kuniyoshi, 1998]
TUTORIAL
Stefano Nolfi & Dario Floreano, 2000
The Importance of Self-organization
Operating a decomposition at the level of the distal description
of behavior does not necessarily simplify the challenge
By allowing individuals to self-organise, artificial evolution
tends to find simple solutions that exploit the interaction
between the robot and the environment and between the
different internal mechanism of the control system.
[Nolfi, 1996,1997]
TUTORIAL
Stefano Nolfi & Dario Floreano, 2000
Modularity and Behaviors
Garbage Collecting
Behavior
Garbage Collecting
Behavior
Human Design
Adaptation
Explore
Discriminate
Displace in front
Pick-up
Release
C
O
O
R
D
I
N
A
T
E
?
?
?
?
…….
?
Is modularity useful in ER ?
What is the relation between self-organized
neural modules and behaviors ?
[Nolfi, 1997]
TUTORIAL
Stefano Nolfi & Dario Floreano, 2000
The Garbage Collecting Task (1)
motors
Selector
Neurons
left m.
right m.
pick-up
release
IR-sensors & BL-sensor
Output
Neurons
B
A
motors
IR-Sensors
TUTORIAL
IR-sensors & BL-sensor
motors (a)
motors (b)
LB-Sensor
IR-sensors & BL-sensor
IR-sensors & BL-sensor
C
D
Stefano Nolfi & Dario Floreano, 2000
[Nolfi, 1997]
The Garbage Collecting Task (2)
successful epochs
12
9
Modular neural controller able to
self-organize outperform other
architectures
6
3
0
0
250
500
750
1000
generations
There is not a correspondence
between self-organized neural
modules and sub-behaviors
[Nolfi, 1997]
TUTORIAL
Stefano Nolfi & Dario Floreano, 2000
Evolving “complex” behaviors
Bootstrap problem: selecting individuals directly for their
ability to solve a task only works for simple tasks
Incremental Evolution: starting with a simplified version
of the task and then progressively increasing complexity
Including in the selection criterion also a reward for
sub-components of the desired behavior
Start with a simplified version of the task and then
progressively increase its complexity by modifying
the selection criterion
TUTORIAL
Stefano Nolfi & Dario Floreano, 2000
Visually-Guided Robots
[Cliff et al. 1993; Harvey et al. 1994]
TUTORIAL
Stefano Nolfi & Dario Floreano, 2000
Learning & Evolution: Interactions
• Different time scales, different mechanisms, similar effects
• Learning Advantages in Evolution [Nolfi &
–
–
–
–
Floreano, 1999]
Adapt to changes that occur faster than a generation
Extract information that might channel the course of evolution
Help and guide evolution
Reduce genetic complexity and increase population diversity
• Learning Costs in Evolution [Mayley, 1997]:
– Delay in the ability to achieve fit behaviors
– Increased unreliability (learning wrong things)
– Physical damages, energy waste, tutoring
• Baldwin effect [Baldwin, 1896; Morgan, 1896; Waddington, 1942]
TUTORIAL
:
Stefano Nolfi & Dario Floreano, 2000
Hinton & Nowlan model [1987]
0
1 ?
1
?
0
00?11???0111?0?1?0?1
Fitness=correct combination of weights
•
•
•
Learning samples space in the surrounding of the individual
Fitness landscape is smoothed and evolution becomes faster
Baldwin effect (assimilation of features normally « learnt »)
•
Model constraints:
–
–
–
–
TUTORIAL
Learning task and evolutionary task are the same
Learning is a random process
Environment is static
Genotype and Phenotype space are correlated
Stefano Nolfi & Dario Floreano, 2000
Different Tasks [Nolfi, Elman, Parisi, 1994]
-
-
TUTORIAL
Increased speed & fitness
Genetic assimilation
Stefano Nolfi & Dario Floreano, 2000
Evolving for food
Learning predictions
Learning mechanism=BP
Perspectives on Landscape
P
B2
A
B1
C
Q
A=weights evolved for food finding
C=weights trained for prediction
B1, B2= new position after mutation
Fitness=higher when closer to A
Correlated landscapes
Relearning effects
[Parisi & Nolfi, 1996]
to compensate mutations
[Harvey, 1997]
(it may hold only in few cases)
TUTORIAL
Stefano Nolfi & Dario Floreano, 2000
Evolutionary Reinforcement Learning
•
•
•
•
•
TUTORIAL
Stefano Nolfi & Dario Floreano, 2000
Evolving both action and
evaluation connection
strengths [Ackley & Littman, 1991]
Action module modifies
weights during lifetime
using CRBP
ERL better better
performance than E alone
or RL alone
Baldwin effect
Method validated on
mobile robots [Medeen, 1996]
Evolutionary Auto-teaching
•
All weights genetically
encoded, but one half
teaches the other half
using Delta rule [Nolfi & Parisi,
1991]
Learning
•
No learning
•
TUTORIAL
Stefano Nolfi & Dario Floreano, 2000
Individuals can live in one
of two environments,
randomly determined at
birth
Learning individuals adapt
strategy to environment
and display higher fitness
Evolution of Learning Mechanisms (1)
Genetically-determined
1 synapse
synapse sign
synapse strength
•
•
•
•
TUTORIAL
Adaptive
1 synapse
synapse sign
learning rule
- hebb
- postsynaptic
- presynaptic
- covariance
learning rate
Encoding learning rules, NOT learning weights [Floreano & Mondada, 1994]
Weights always initialized to random values
Different weights can use different rules within same network
Adaptive method can be applied to node encoding (short genotypes)
Stefano Nolfi & Dario Floreano, 2000
Sequential task & unpredictable change
•
Faster and better results
[Floreano & Urzelai, 2000]
•
•
•
Genetically-determined
Adaptive
Automatic decomposition of
sequential task
Synapses continuously
change
Evolved robots adapt online
to upredictable change [Urzelai &
Floreano, 2000]:
–
–
–
–
–
Illumination
From simulations to robots
Environmental layout
Different robotic platform
Lesions to motor gears
[Eggenberge et al., 1999]
TUTORIAL
Stefano Nolfi & Dario Floreano, 2000
Summary
• Learning is very useful for robotic evolution:
– accelerates and boosts evolutionary performance
– can cope with fast changing environments
– can adapt to unpredictable sources of change
• Lamarck evolution (inherit learned properties) may provide shortterm gains [Lund, 1999], but it does not display all the advantages
listed above [Sasaki & Tokoro, 1997, 1999]
• Distinction between learning and adaptation [Floreano & Urzelai, 2000]:
– Adaptation does not necessary develops and capitalize upon
new skills and knowledge
– Learning is an incremental process whereby new skills and
knowledge are gradually acquired and integrated
TUTORIAL
Stefano Nolfi & Dario Floreano, 2000
Competitive Co-evolution
• Fitness of each population depends on fitness of opponent
population. Examples:
– Predator-prey
– Host-parasite
• It may increase adaptive power by producing an
evolutionary arms race [Dawkins & Krebs, 1979]
• More complex solutions may incrementally emerge as
each population tries to win over the opponent
• It may be a solution to the boostrap problem
• Fitness function plays a less important role
• Continuously changing fitness landscape may help to
prevent stagnation in local minima [Hillis, 1990]
TUTORIAL
Stefano Nolfi & Dario Floreano, 2000
Co-evolutionary Pitfalls
The same set of solutions
may be discovered over
and over again. This
cycling behavior may end
up in very simple solutions.
Solution: Retain best
individuals of last few gens
(Hall-of-Fame->all gens).
TUTORIAL
Whereas in conventional evolution the fitness
landscape is static and fitness is a monotonic
function of progress, in competitive co-evolution
the fitness landscape can be modified by the
competitor and fitness function is no longer an
indicator of progress.
Solution: Master Fitness (after evolution test
each best against all best), CIAO graphs (test
each best against all previous best).
Stefano Nolfi & Dario Floreano, 2000
Examples of Co-evolutionary Agents
Ball-catching agents
[Sims, 1994]
Distance-based fitness
Rare good results
Simulated predator-prey
[Cliff & Miller, 1997]
Distance-based fitness
100s generations
CIAO method et al.
Evolution of sensors
TUTORIAL
Stefano Nolfi & Dario Floreano, 2000
Co-evolutionary Robots
• Energetically autonomous
• Predator-prey scenarion
• Time-based fitness
• Controllers downloaded to
increase reaction speed
• Retain last best 5 controllers
for testing individuals
Floreano, Nolfi, & Mondada, 1998
• Predators=vision+proximity
• Prey=proximity+faster
• Predator genotype longer
• Prey has initial position
advantage
TUTORIAL
Stefano Nolfi & Dario Floreano, 2000
Co-evolutionary Results
progress
1
best
fun
best predator
fitness
1
best prey
fitness
d
t
0.8
0.8
0.6
0.6
0.4
t
Predators do not attempt
to minimize distance
0.2
0.4
d
20
40
60
80
100
0.2
Prey maximize distance
20
generations
TUTORIAL
Stefano Nolfi & Dario Floreano, 2000
40
60
80
100
generations
Increasing Environmental Complexity
240
36
…prevents premature cycling [Nolfi & Floreano, 1999]
TUTORIAL
Stefano Nolfi & Dario Floreano, 2000
Summary
• Competitive co-evolution is challenging because:
– Fitness landscape is continuously changing
– Hard to monitor progress online
– Cycling local minima
• When environment is sufficiently complex, or Hall-of-Fame
method is used, the system develops increasing more
complex solutions
• It can work and capitalize on very implicit, internal, and
behavioral fitness functions by exploring a large range of
behaviors triggered by opponents
• When co-evolving adaptive mechanisms, prey resort to
random actions whereas predators adapt online to the prey
strategy and report better performance [Floreano & Nolfi, 1997]
TUTORIAL
Stefano Nolfi & Dario Floreano, 2000
Evolvable Hardware
• Evolution of electronic circuits
http://www.cogs.susx.ac.uk/users/adrianth/EHW_groups.html
• Evolution of body morphologies (including sensors)
• Why evolve hardware?
– Hardware choice constrains environmental interactions and
the course of evolution
– Evolved solutions can be more efficient than those designed
by humans
– Develope new adaptive materials with self-configuration and
self-repair abilities
TUTORIAL
Stefano Nolfi & Dario Floreano, 2000
Evolutionary Control Circuits
• Thompson’s unconstrained evolution
• Xilinx, family 6000, overwrite global
synchronization
• Tone reproduction
• Robot control
• Fitness landscape studies (very rugged,
neutral networks)
Evolvable Hardware
Module for Khepera
http://www.aai.ca
TUTORIAL
Stefano Nolfi & Dario Floreano, 2000
Evolutionary Control Circuits
• Keymeulen: evolution of vision
based controllers
• Find ball while avoiding obstacles
• Constrained evolution, entirely
on physical robot
TUTORIAL
• De Garis: CAM Brain, composed
of tens of Xilinx FPGAs, 6000 family
• Growth of neural circuits using CA
with evolved rules
• Willing to evolve brain for kitten robot.
Pitfall: speed limited by sensorymotor loop.
Stefano Nolfi & Dario Floreano, 2000
Evolutionary Morphologies
• Evolution of Lego Structures [Funes et al,, 1997]
• Bridges
• Cranes
• Extended to objects and robot bodies
• see www.demo.cs.brandeis.edu
• Example of evolved crane [Funes et al,, 1997]
TUTORIAL
Stefano Nolfi & Dario Floreano, 2000
Co-evolutionary Morphologies
Karl Sims, 1994
Komosinski & Ulatowski, 1999
http://www.frams.poznan.pl
Effect of doubling sensor range on body/wheel size [Lund et al., 1997]
TUTORIAL
Stefano Nolfi & Dario Floreano, 2000
Suggestions for Further Research
•
•
•
•
•
•
•
•
•
Encoding and mapping of control systems
Exploration of alternative building blocks
Integration of growth, learning, and maturation
Incremental and open-ended evolution
Morphology and sensory co-evolution
Application to large-scale circuits
User-directed evolution
Comparison with other adaptive techniques
Further readings:
–
–
–
TUTORIAL
Nolfi, S. & Floreano, D. Evolutionary Robotics. The Biology, Technoloy, and
Intelligence of Self-Organizing Machines. MIT Press, October 2000
Husbands, P. & Meyer, J-A. (Eds.) Evolutionary Robotics. Proceedings of the 1st
European Workshop, Springer Verlag, 1998
Gomi, T. (Ed.) Evolutionary Robotics. Volume series: I (1997), II (1998), III
(2000), AAI Books.
Stefano Nolfi & Dario Floreano, 2000
Evorobot Simulator
Sources, binaries, and documentation files freely
available at:
http://gral.ip.rm.cnr.it/evorobot/simulator.html
TUTORIAL
Stefano Nolfi & Dario Floreano, 2000
[Nolfi, 2000]