Combinations of Evolution & Learning in Artificial Adaptive Systems

Download Report

Transcript Combinations of Evolution & Learning in Artificial Adaptive Systems

Relationships between
Evolutionary Computation &
Biology
Focal Points:
Combining Evolution & Learning in Hybrid
Adaptive Systems
Exploiting Coevolution in Evolutionary
Computation
Biology in Evolutionary Computation
• Basic Neo-Darwinian Evolution
– The essence of most EAs
• Genotype-Phenotype Distinction
– Developmental processes that derive ptype from gtype
• Fundamental Theoretical Results
– Hardy-Weinberg Law
– Fisher’s Theorem
• Combinations of Evolution & Learning
– Lamarckianism
– The Baldwin Effect
• Coevolution
2
Diploid Genetics
Chromosome Pair
A
b
C
d
a
B
C
d
Heterozygous for genes 1 & 2
Homozygous for genes 3 & 4
• For each gene, the allele pair will interact in some way to produce
a protein (phenotypic trait).
• In some cases, one allele “dominates” while the other is “recessive”
=> one allele is completely expressed; the other is ignored.
• In other cases, the two alleles have a combined effect.
3
Diploid Recombination
Generation K
Sex*
Crossover
Gamete Pool
a
b
c
d
w
x
y
z
A
B
C
D
W
X
Y
Z
Generation K+1
A
b
c
D
w
x
y
Z
A
B
c
d
a
B
C
d
W
X
Y
z
w
x
Y
Z
a
b
C
D
W
X
y
z
a
b
C
D
a
B
C
d
w
x
Y
Z
w
x
y
Z
4
Population Genetics Notation
Gene Pool
AA
AA
aa
aa AA
AA
Aa
Gene frequencies
p = fraction of A’s = 9/16
q = fraction of a’s = 1 - p = 7/16
Genotype frequencies
D = fraction of AA’s = NAA/N = 4/8
H = fraction of Aa’s = NAa/N = 1/8
R = fraction of aa’s = Naa/N = 3/8
aa
Population Size
N=8
Genotype Numbers
NAA(4), NAa(1),Naa(3)
p = (2NAA + NAa) / 2N = D + H/2
q = (2Naa + NAa) / 2N = R + H/2
5
The Hardy-Weinberg Law
Given:
• Initial genotype distribution: D0,H0,R0 => p0, q0
Assume:
• Random mating
• No mutation
• No selection - all genotypes have same fitness
• No genetic drift - population size large enough that
the mating probabilities and child genotype
distributions equal the statistical estimates.
Results:
• D1 = p02
H1 = 2p0q0
R1 = q02
• p1 = p0
q1 = q0
• D, H, R values remain constant after gen 1.
• Gene and Genotype freqs at stable equilibrium.
6
Lessons for Evolutionary Computation
• Diploid EAs can be useful
– Allow recessive traits to linger in the population without always being
expressed. If the environment (problem) changes such that expression
of the recessive gene is desirable, then aa’s will have a selective
advantage and rise in frequency.
– Without diploidy, we would have to wait for the proper mutation to
create an a.
– We get recessiveness for free in GP, since some subtrees may not be
expressed in one individual, but when combined with other code, they
get executed. E.g. (if nil SubtreeA SubtreeB)
• Don’t forget Hardy, Weinberg & Fisher!
– Without selective pressure and mutation, evolution can quickly
stagnate, even when random recombination (I.e., crossover) is used.
– Evolutionary speed is a function of fitness variance.
– Fitness functions need to have a wide range of outputs.
– Selection mechanisms need to maintain selection pressure, or create it
in cases where all individuals have similar fitness.
7
Learning + Evolution
• Individuals improve during their lifetimes (plasticity/learning) &
populations improve (avg fitness increases) over many generations.
• Learning speeds up evolution by:
– Lamarckianism: Direct inheritance of acquired traits
• Results of learning (the improved phenotype) is reverseencoded into the genome prior to reproduction => children
inherit what their parents learned/acquired.
• Disprove biologically, but useful for EC.
– The Baldwin Effect: Indirect inheritance of acquired traits
• Some genotypes are not inherently optimal, but their
corresponding phenotypes can become so via learning. Hence,
these genotypes have a selective advantage and will remain in
the population until a mutation produces optimal genotypes.
• No ptype-to-gtype reverse encoding necessary.
• Accepted by biologists, but difficult to test.
• Useful in EC.
8
Lamarck -vs- Baldwin
• Direct -vs- Indirect inheritance of acquired characteristics
• Below, the blue 3-fingered component is an acquired trait.
Lamarck
Baldwin
Phenotype
Many generations
1 generation
Genotype
Selective
Advantage
9
Mutation
Lamarckianism
• Jean-Baptiste Lamarck (1744-1829)
• Earliest advocate of an evolutionary theory
• Inheritance of Acquired Characteristics (1809)
Basic Idea:
If organisms adapt during their lifetimes, then those changes can be
DIRECTLY passed on to their offspring. (inheritance of acquired
traits).
Modern Implications
The results of individual plasticity, such as learning or physical changes,
can be reverse encoded into the genome prior to reproduction. So
children inherit the fruits of their parents plasticity as innate traits.
Biologically disproven, except in a few rare cases.
10
Contributions of Lamarckianism
3 Things that Lamarck may have gotten right (Schull, 1996):
1. Individual plasticity can influence evolution.
2. Similar individual & populational adaptivity.
3. Collective activity is relevant to evolutionary theory.
Agents behave (and learn) purposely, so these
collective activities could have an emergent effect
upon the course of evolution.
Lamarck’s main mistake is still very useful:
Lamarckian inheritance can improve evolutionary
computation!!
11
Evolutionary Computation + Local Search
Local Search = Learning/Plasticity
Lamarckianism
Phenotype
Genotype
12
Lamarckianism in EC
Basic Process
– Convert genotypes to phenotypes
– Phenotypes change via learning
– At generation end, CONVERT THE
UPDATED PHENOTYPES BACK INTO A
CORRESPONDING GENOTYPE
– Crossover and mutate new genotypes to create
the next generation
13
Partial Lamarckianism
(Houck, et. al., 1997)
– Only N% of genotypes are changed via the reverse
encoding from phenotypes.
– N = 20-40% gives best results on benchmarks.
– This beats pure Lamarckianism (N = 100) and pure
Baldwin Effect (N = 0 with plastic ptypes)
– N = 100 is dangerous
• Leads to premature convergence in static environments
• Kids should NOT inherit too much of what parents learned in
dynamic environments, since kids’ world is different from
parents’
14
The Baldwin Effect
A new factor in evolution (James Baldwin, 1896)
Basic Idea:
If organisms adapt during their lifetimes, then those changes can be
INDIRECTLY passed on to their descendants, although it may take many
generations.
Two Phases
1. Plasticity enhances evolution by INCREASING FITNESS DIVERSITY
and smoothing the fitness landscape. **************************
Individuals who adapt their way to higher fitness have a
selective advantage over those who, despite adaptivity, cannot.
2. The results of plasticity are assimilated into the gene pool via chance
mutations.
15
• Biologically plausible, but hard to prove in
natural systems.
• Hinton & Nowlan (1987) - Trivial
evolutionary simulations clearly show it!
16
Baldwin Effect Phase I: Learning Enhancement
•Learning/Plasticity = Local Hill Climbing in Fitness Space
•Phenotypes near the base of the peak can achieve fitness increases.
Smoother
Fitness
Landscape
D2
D1
A
Phenotype
Genotype
17
• D1, D2: All phenotypes have the ability to
learn, but only those near the base of the
peak can achieve fitness increases. This
selective advantage moves the
genotype/phenotype distribution from D1 to
D2, hence closer to the optimal phenotype
P*.
18
Baldwin Effect Phase II: Genetic Assimilation
• Mutation produces genotypes hard-wired for phenotype A.
• Learning has a cost => Selection favors Natural Born A’s
Economy of Flexibility
1. Learning accelerates
evolution,
2. Evolution removes the
need for learning.
3. Evolution removes the
ability to learn!
D2
D4
A
D3
Phenotype
Genotype
19
• Genetic operations spread the population
distribution from D2 to D3, producing
individuals with hard-wired optimal
penotypes, P*. Due to the cost of learning.
These natural-born P*s have a selective
advantage over the learned P*s, and the
distribution moves from D3 TO D4.
20
Preconditions for the Baldwin Effect (Mayley, 1996)
1.
2.
Learning has both benefits (phase I) and costs (phase II)
The genotype and phenotype spaces are correlated
•
There needs to be an achievable genotypic change for each
learned phenotypic change (Observe nature !).
•
Without this, the mutations that should produce phase II will fail
to find a genotype that encodes the previously-learned trait
(There should be a possible genetypic change for each
acquired trait).
It’s the exploitation of the benefits and reduction of the costs that provide
the selection pressure for :
1.
The adoption of a learned behaviour.
2.
Its genetic assimilation.
21
Benefits of Learning (Mayley, 1996)
1.
2.
3.
Adaptive members of the population are able to ”find”
new advantageous behaviours that less plastic
individuals are unable to perform. This means that these
adaptive individuals gain the upper hand and are
selected for.
Learning provides individuals with the ability to adapt to
an environment that is changing at a faster time scale
than that on which evolution operates.
A learning mechanism may be able to provide an
individual with behaviours that are simply very hard to
evolve. (i.e. acquiring a language)
22
Costs of Learning (Mayley, 1996)
1.
Cost as function of time in the juvenile (learning) stage
– Time wasted in “ignorant” state.
– Energy expended during knowledge acquisition
– Delayed reproduction (until beyond juvenile stage)
2. Catastrophic costs
– Unreliability - environment may never provide the necessary
stimuli for learning
– Death due to state of ignorance
3. Fixed costs
– Extra expense to support plasticity (particularly evident in EAs
with a learning component)
4. Population-level costs (don’t affect juveniles directly)
– Parental investment in teaching the young.
All 4 types of costs can affect The Baldwin Effect
– Phase I: Too high costs => no learning => no Baldwin Effect
Phase II: Higher costs => more selective pressure for assimilation
23
=> faster convergence (Leaning phase is cut short)
Genotype-Phenotype Correlation
Correlated: near(G1,G2) <=> near(P1,P2)
P1
P2
P1
P2
Phenotype
Genotype
G2 G1
Easy mutation
G2
G1
Harder mutation
24
Hinton & Nowlan (1987)
• First evidence of the Baldwin Effect in a natural or artificial system
Needle-in-a-haystack Task:
Goal: Find a particular 20-bit string
Evolution: GA with 20-gene chromosomes
3 Alleles: 0, 1, * (wildcard)
Learning: Max of 1000 random guesses per individual.
Guess = replace a * with a 0 or 1.
Fitness Function:
F = 1 + 19(1000 – NG)/1000
NG = # guesses (low better!)
Learning has a cost!
Without learning:
needle-in-haystack search
With learning:
smoother, navigable space
25
Gtype/Ptype
Hinton & Nowlan (2)
• Without learning: no fitness variance, since all individuals are equally
bad (unless, by pure chance, the needle is found).
– Thus, no evolutionary progress (avg fitness does not change)
– Search is random.
• With learning:partial solutions (that can learn their way to the correct
solution) get partial credit and move search in the proper direction (up
the smooth hill).
– Learning smoothes the fitness landscape by making more
informative the points in the neighborhood of the optimum.
1
Baldwin Effect
Phase I Phase II
Correct
Wildcard
Degree of genetic
assimilation = size of
Correct-Wildcard gap
and is directly
proportional to the
learning cost.
Wrong
26
Generations
50
Hinton & Nowlan (3)
• PHASE 1 : Larger diversity with learning, hence
the probability that some and more of those
individuals will get closer to the correct solution is
higher. That’s the reason why number of corrects
increase while wrongs are low in proportion.
• PHASE 2: The ones closer to the correct
phenotype are favored and mutated. Wrong ones
are almost completely eliminated while the correct
ones are even more. Natural born ones are also
favored over others.
27
How does Baldwin effect work in
Evolution?
• Acquired traits are not coded back into the
genotype. However, the genes which could
acquire highly fit traits are expected to acquire
similar highly fit traits in future if they are allowed
to stay in the populations of coming generations.
If so, with mutation these genes are expected to
have traits acquired by learning until now to be
CODED into their genoypes. This the way how
learning effects evolution. Ordinary genes are not
expected to be genetically mutated in a way to
acquire traits that they are not able to do so with
learning.
28
Evolutionary Reinforcement Learning (ERL)
Ackley & Litmann (1991)
Complementary Neural Networks:
#1 learns via backprop; original weights from GA.
#2 provides state evaluations for #1; static weights from GA.
#1
#2
Current State
Baldwin Effect:
Next State
Action Network
Evaluation
Network
Action
State
Evaluation
World
Phase I:
Genes for ANN #2 are
relatively fixed early on
=> selective pressure for
learning in ANN #1, which
depends on ANN#2.
Phase II:
ANN #2 no need for
learning. Good
behaviors are assimilated
in ANN #1.
29
Using Learning to Faciliate the Evolution of
Features for Recognizing Visual Concepts (Bala
et. al. 1996)
Objectives:
1. Assess the ability of a hybrid evolution and
learning architecture to identify feature sets
which result in good classificiation performance.
2. Understand clearly the role that learning played
in this process.
3. Understand the extend to which various aspects
of the Baldwin effect were present.
30
Enhancing Machine Learning with Evolution
Bala et. al, 1996
Training
Data
Feature Set
Genetic
Algorithm
Test
Data
Decision Tree
C4.5
Classify
Test
Fitness (Feature Set) = -1*Size + Entropy + Accuracy
• Each GA genome is a subset of the universal feature set.
• Size(Feature Set) = Learning Effort/Cost
31
Optimal Feature Subset
Full Feature Set
GA MODULE
Feature Subsets
Fitness Measure
EVALUATION MODULE
Infomax measure
Entropy
Trees
Acuracy
Tree Evaluation
Tree Induction
Testing data
Feature Set Size
Training data
Image Data
Feature
Extraction
Examples
Feature
Mask
32
GA + C4.5 Tests
Tasks:
Satellite Imaging & Facial Recognition
Evolution + Learning beats:
Learning alone = C4.5 using universal feature set.
Evolution alone =Fitness function without the accuracy term.
Baldwin Effect:
Stage 1: Learning selected for. Large feature sets dominate initially.
Stage 2: GA population converges to small feature sets that match the
optimal sets learned by solo C4.5 (Genetic assimilation).
First example of the Baldwin Effect on a significant problem.
33
Reinforced Genetic Programming (RGP)
Downing (2001)
if
>
X
if
5
=
Y
Pick
Move
Pick
Move
move
north
3
• Strongly-typed classify-and-act genetic programs.
• Actions = simple moves or monitored choices.
• Choice nodes enable Q-Learning by keeping track of previous actions
and resulting rewards.
• Evolution provides problem-space abstractions for RL to work with.
34
• RL enhances GP evolution via Baldwin Effect or Lamarckianism.
RGP Maze Walkers
10 x 10
maze
* Goal
Start
4
2.5
Baldwin Effect
3.5
3
1.5
Max
Avg
Min
1
Avg-#-Decisions
Fitness
2
Phase I
2.5
Phase II
2
1.5
1
0.5
0.5
0
0
100
200
Generation
300
400
0
0
100
200
Generation
300
35
400
Coevolution
• In nature, the playing field is always changing. Populations are never
evolving against a static environment, but against an ever-changing
world, consisting of many other evolving populations.
• So changes to both the environment and to the other populations need
to be taken into account w.r.t. design changes for improved fitness.
Competitive Coevolution (Arms Race)
• Competitors (e.g. predators & prey) must constantly improve to keep
pace with one another.
• Over time, the individuals have higher absolute fitness (I.e. they are
much better than their ancestors), but their relative fitness (vis-à-vis
their current competitors) remains unchanged.
Cooperative Coevolution
• 2 or more species evolve such that each enhances its own selective
advantage plus that of the other(s).
36
Coevolution in EC
Dilemma #1: Problem Impossibility
The problem can be so difficult, and the fitness function so hard to design, that
all individuals in early generations get very low fitness => low fitness variance
=> slow or no evolution.
Dilemma #2: Problem Mastery
After many generations, all individuals may have the same high fitness but
may not yet have found a solution. Once again, fitness variance is low =>
evolution may stop, and the optimal solutions may never be found.
Competitive Coevolution to the Rescue!
• Use 2 competing populations: problems & solutions.
• Fitness(problem) = # errors/omissions made by the solution individuals on that
particular problem.
Early: Problems are easier, so at least some of the early solutions can solve
some of the problems => fitness variance => evolution.
Late: Problems become more difficult, and all solutions are pretty good. But
there are usually tough problems that only some solutions can handle =>
fitness variance => evolution.
• So coevolution keeps selection pressure from getting too low, thus avoiding
dilemmas #1 and #2 and maintaining evolutionary progress.
37
EC in Evolutionary Biology (Mitchell, 1996)
Standard Evol Bio techniques (drawbacks/problems in italics)
• Fossil records: incomplete
• View adaptations in real ecosystems: can’t do controlled experiments.
• Lab experiements (e.g. fruit flies): too few generations for major
evolutionary changes.
• DNA analysis to reconstruct phylogenic trees: ambiguous & junk DNA.
• Mathematical models (usually differential equations): extremely
simplified - abstract away many individual behaviors.
Computer Simulations of Evolution (e.g. Artificial Life)
• Controllable (via modifiable parameters)
• Repeatable
• Many generations can be run.
• Individual-based models require fewer abstracting assumptions: but it’s
still a model.
• Generate lots of data: data interpretation problems
38
Summary: EC & Biology
• Biological metaphors (even failed ones) have useful applications in
artificial adaptive systems.
– NeoDarwinism
– Lamarckianism
– Baldwinism
– Diploidy
– Coevolution
• Evolutionary Computation can contribute back to biology
– Existence proof of the Baldwin Effect.
– Assessing roles of mutation & crossover in evolution
– GP for discovering metabolic pathways and solving protein-folding
problems.
– GA in some algorithms that help to map the human genome.
– Evolutionary Simulations for exploring possible complex
39
interactions among evolving species.
Lamarckian Modesty
Can there be any more important
conclusion...or any to which more attention
should be paid that that which I have just set
forth?
... Lamarck’s final sentence of an excerpt
published in 1914.
40