X 1 - Homepages | The University of Aberdeen

Download Report

Transcript X 1 - Homepages | The University of Aberdeen

Genetic Algorithms and
Genetic Programming
Evolutionary Computation
1. Computational procedures patterned
after biological evolution
2. Search procedure that probabilistically
applies search operators to a set of
points in the search space.
Biological Evolution
• Lamarck and others
– Species “transmute” over time
• Darwin and Wallace
– Consistent heritable variation among individuals in the
population
– Natural selection of the fittest
• Mendel and genetics
– A mechanism for inheriting traits
– genotype -> phenotype mapping
Biological Terminology
• gene
• functional entity that codes for a specific feature e.g. eye
color
• set of possible alleles
• allele
• value of a gene e.g. blue, green, brown
• codes for a specific variation of the gene/feature
• locus
• position of a gene on the chromosome
• genome
• set of all genes that define a species
• the genome of a specific individual is called genotype
• the genome of a living organism is composed of several
chromosomes
• population
• set of competing genomes/individuals
Genotype versus Phenotype
• genotype
• blue print that contains the information to construct an
organism e.g. human DNA
• genetic operators such as mutation and recombination
modify the genotype during reproduction
• genotype of an individual is immutable
(no Lamarckian evolution)
• phenotype
• physical make-up of an organism
• selection operates on phenotypes
(Darwin’s principle : “survival of the fittest”
The Genetic Algorithm
• Directed search algorithms based on
the mechanics of biological evolution
• Developed by John Holland, University
of Michigan (1970’s)
– To understand the adaptive processes of
natural systems
– To design artificial systems software that
retains the robustness of natural systems
The Genetic Algorithm (cont.)
• Provide efficient, effective techniques
for optimization and machine learning
applications
• Widely-used today in business,
scientific and engineering circles
Components of a GA
A problem to solve, and ...
• Encoding technique
(gene, chromosome)
• Initialization procedure
(creation)
• Evaluation function
(environment)
• Selection of parents
(reproduction)
• Genetic operators (mutation, recombination)
• Parameter settings
(practice and art)
Genotype Operators
• recombination (crossover)
• combines two parent genotypes into a new offspring
• generates new variants by mixing existing genetic
material
• stochastic selection among parent genes
• mutation
• random alteration of genes
• maintain genetic diversity
• in genetic algorithms crossover is the major operator
whereas mutation only plays a minor role
Representation
Phenotype space
Genotype space =
{0,1}L
Encoding
(representation)
10010001
10010010
010001001
011101001
Decoding
(inverse representation)
Simple Genetic Algorithm
{
initialize population;
evaluate population;
while TerminationCriteriaNotSatisfied
{
select parents for reproduction;
perform recombination and mutation;
evaluate population;
}
}
The GA Cycle of Reproduction
reproduction
children
modified
children
parents
population
deleted
members
discard
modification
evaluation
evaluated children
Population
population
Chromosomes could be:
–
–
–
–
–
–
Bit strings
Real numbers
Permutations of element
Lists of rules
Program elements
... any data structure ...
(0101 ... 1100)
(43.2 -33.1 ... 0.0 89.2)
(E11 E3 E7 ... E1 E15)
(R1 R2 R3 ... R22 R23)
(genetic programming)
Reproduction
reproduction
children
parents
population
Parents are selected at random with
selection chances biased in relation to
chromosome evaluations.
Chromosome Modification
children
modification
modified children
• Modifications are stochastically triggered
• Operator types are:
– Mutation
– Crossover (recombination)
The simple GA
• Has been subject of many (early) studies
– still often used as benchmark for novel GAs
• Shows many shortcomings, e.g.
– Representation is too restrictive
– Mutation & crossovers only applicable for bit-string &
integer representations
– Selection mechanism sensitive for converging
populations with close fitness values
– Generational population model (step 5 in SGA repr.
cycle) can be improved with explicit survivor selection
Representing Hypotheses
Represent:
(Outlook = Overcast v Rain) ^ (Wind = Strong)
by
Outlook Wind
011
10
Represent:
IF Wind = Strong THEN PlayTennis = yes
by
Outlook Wind
PlayTennis
011
10
10
SGA operators: 1-point
crossover
•
•
•
•
Choose a random point on the two parents
Split parents at this crossover point
Create children by exchanging tails
Pc typically in range (0.6, 0.9)
SGA operators: mutation
• Alter each gene independently with a probability pm
• pm is called the mutation rate
– Typically between 1/pop_size and 1/ chromosome_length
Alternative Crossover
Operators
• Performance with 1 Point Crossover depends on the
order that variables occur in the representation
– more likely to keep together genes that are near
each other
– Can never keep together genes from opposite ends
of string
– This is known as Positional Bias
– Can be exploited if we know about the structure of
our problem, but this is not usually the case
n-point crossover
•
•
•
•
Choose n random crossover points
Split along those points
Glue parts, alternating between parents
Generalisation of 1 point (still some positional bias)
Uniform crossover
•
•
•
•
Assign 'heads' to one parent, 'tails' to the other
Flip a coin for each gene of the first child
Make an inverse copy of the gene for the second child
Inheritance is independent of position
Order 1 crossover
• Idea is to preserve relative order that elements occur
• Informal procedure:
1. Choose an arbitrary part from the first parent
2. Copy this part to the first child
3. Copy the numbers that are not in the first part, to
the first child:
• starting right from cut point of the copied part,
• using the order of the second parent
• and wrapping around at the end
4. Analogous for the second child, with parent roles
reversed
Order 1 crossover example
• Copy randomly selected set from first parent
• Copy rest from second parent in order
1,9,3,8,2
Crossover OR mutation?
• Decade long debate: which one is better / necessary /
main-background
• Answer (at least, rather wide agreement):
–
–
–
–
it depends on the problem, but
in general, it is good to have both
both have another role
mutation-only-EA is possible, xover-only-EA would not work
Evaluation
evaluated
children
modified
children
evaluation
• The evaluator decodes a chromosome and assigns it
a fitness measure
• The evaluator is the only link between a classical GA
and the problem it is solving
Selection Schemes
• stochastic sampling
• roulette wheel selection
• spin wheel N times
• stochastic universal sampling
• roulette wheel selection
• single spin, wheel has N equally
spaced markers
• tournament selection
• choose k candidates at random with uniform probability
• pick best one for reproduction
• expected number of offspring
best :  k , average  k ½ k-1 , worst  k 1/N k-1
Replacement Schemes
• generational replacement
• entire population is replaced each generation
• non-overlapping population
10010
01001
11001
Selection
Crossover
Mutation
10111
01000
01011
• steady state replacement
• a single individual (worst, random) is replaced by
one offspring
• overlapping population
10010
Selection
10010
01001
01110
Crossover
01001
11001
Mutation
11001
Deletion
population
discarded members
discard
• Generational GA:
entire populations replaced with each iteration
• Steady-state GA:
a few members replaced each generation
An Abstract Example
Distribution of Individuals in Generation 0
Distribution of Individuals in Generation N
A Simple Example
The Traveling Salesman Problem:
Find a tour of a given set of cities so that
– each city is visited only once
– the total distance traveled is minimized
Representation
Representation is an ordered list of city
numbers known as an order-based GA.
1) London
2) Venice
3) Dunedin
4) Singapore
5) Beijing 7) Tokyo
6) Phoenix 8) Victoria
CityList1
(3 5 7 2 1 6 4 8)
CityList2
(2 5 7 6 8 1 3 4)
Crossover
Crossover combines inversion and
recombination:
*
*
Parent1
(3 5 7 2 1 6 4 8)
Parent2
(2 5 7 6 8 1 3 4)
Child
(5 8 7 2 1 6 3 4)
This operator is called the Order1 crossover.
Mutation
Mutation involves reordering of the list:
Before:
*
*
(5 8 7 2 1 6 3 4)
After:
(5 8 6 2 1 7 3 4)
TSP Example: 30 Cities
120
100
y
80
60
40
20
0
0
10
20
30
40
50
x
60
70
80
90
100
Solution i (Distance = 941)
TSP30 (Performance = 941)
120
100
y
80
60
40
20
0
0
10
20
30
40
50
x
60
70
80
90
100
Solution j(Distance = 800)
TSP30 (Performance = 800)
120
100
80
y
44
62
69
67
78
64
62
54
42
50
40
40
38
21
35
67
60
60
40
42
50
99
60
40
20
0
0
10
20
30
40
50
x
60
70
80
90
100
Solution k(Distance = 652)
TSP30 (Performance = 652)
120
100
y
80
60
40
20
0
0
10
20
30
40
50
x
60
70
80
90
100
Best Solution (Distance = 420)
TSP30 Solution (Performance = 420)
120
100
80
y
42
38
35
26
21
35
32
7
38
46
44
58
60
69
76
78
71
69
67
62
84
94
60
40
20
0
0
10
20
30
40
50
x
60
70
80
90
100
Overview of Performance
TSP30 - Overview of Performance
1800
1600
1400
Distan ce
1200
1000
800
600
400
200
0
1
3
5
7
9
11
13
15
17
19
21
Gen er atio ns (1000)
23
25
27
29
31
Best
Wors t
A v erage
Population Models
• SGA uses a Generational model:
– each individual survives for exactly one generation
– the entire set of parents is replaced by the offspring
• At the other end of the scale are Steady-State
models:
– one offspring is generated per generation,
– one member of population replaced,
• Generation Gap
– the proportion of the population replaced
– 1.0 for GGA, 1/pop_size for SSGA
GABIL [de Jong et al, 1993]
Learn disjunctive set of propositional rules; competitive with C4.5
Fitness
Fitness(h) = (correct(h))2
Representation
IF a1=T ^ a2=F THEN c=T; IF a2=T THEN c=F
represented by:
a1 a2 c
a1 a2 c
10 01 1
11 10 0
Genetic operators???
– want variable length rule sets
– want only well-formed bit-string hypotheses
Crossover with variable-length
bit-strings
Start with:
a1 a2 c
h1 10 01 1
h1 01 11 0
a1 a2 c
11 10 0
10 01 0
1.
2.
choose crossover points for h1, e.g. after bits 1, 8
now restrict points in h2 to those that produce bitstrings with welldefined semantics, e.g. 〈1,3〉, 〈1,8〉, 〈6,8〉.
if we choose 〈1,3〉, the results is:
a1 a2 c
h3:
h4:
a1 a2 c
00 01 1
11 10 0
a1 a2 c
11 11 0
a1 a2 c
10 01 0
GABIL extensions
Add new genetic operators, also applied
probabilistically:
1.
2.
AddAlternative: generalise constraint on a1 by changing
a 0 to 1
DropCondition: generalise constraint on ai by changing
every 0 to 1.
And, add new field to bitstring to determine whether to
allow these:
a1 a2 c
a1 a2 cAA
01 11 0
10 01 0 1
so now the learning strategy also evolves.
DC
0
GABIL Results
• Performance of GABIL comparable to
symbolic rule/tree learning methods C4.5,
ID5R, AQ14.
• Average performance on a set of 12 synthetic
problems:
– GABIL without AA and DC operators: 92.1%
accuracy
– GABIL with AA and DC operators 95.2% accuracy
– symbolic learning methods ranged from 91.2% to
96.6%
Extensions to the Simple GA
Encoding schemes
• gray encoding
• messy genetic algorithms
Replacement schemes
• generational replacement
• steady state replacement
Fitness scaling
• linear scaling
• - truncation
• ranking
Selection schemes
• stochastic sampling
• tournament selection
Genetic Programming
• automatic generation of computer programs
by means of natural evolution see Koza 1999
• programs are represented by a parse tree (LISP expression)
• tree nodes correspond to functions :
- arithmetic functions {+,-,*,/}
- logarithmic functions {sin,exp}
• leaf nodes correspond to terminals :
+
- input variables {X1, X2, X3}
- constants {0.1, 0.2, 0.5 }
X1
tree is parsed from left to right:
(+ X1 (* X2 X3))
X1+(X2*X3)
*
X2
X3
Genetic Programming : Crossover
-
+
parent A
X1
parent B
X2
*
X2
-
X3
-
X1
X2
*
+
offspring A
X2
/
X2
/
X3
X2
-
X1
X3
offspring B
X1
X3
Block-Stacking Problem
N
U
A
L
stack
S
table
I
R
V
E
U
N
I
V
E
R
S
A
L
• objective: place the blocks in the correct order such
that the stack forms the word universal
• functions: set of actions, logical operators, do-until loop
• terminals: set of sensors that indicate top block on stack,
next suitable block on table etc.
• each program tree is tested on 166 different initial
configurations
• fitness: #configurations for which the stack was correct
after program execution
Block-Stacking Problem
CS
TB
N
U
A
L
NN
S
I
R
V
E
Sensors:
• CS: current stack, name of the top block of the stack
• TB: top correct block, name of the topmost block on the stack
such that it and all blocks underneath are in correct order
• NN: next block needed, name of the block needed above TB
Block-Stacking Problem
MS TB
N
S
N
U
A
L
MS NN
S
I
R
V
E
Functions:
• MS(X): move block X to the top of the stack, return value X
• MT(X): moves the block on top of the stack to the table
if X is anywhere in the stack returns X
• DU(exp1, exp2): execute exp1 until the predicate exp2
becomes true
• NOT(exp1) : negation of expression exp1
• EQ(exp1, exp2) : returns true if exp1 and exp2 are equal
Block-Stacking Problem
CS
TB
U
A
L
NN
S
I
R
V
E
N
• (EQ (MS NN) (EQ (MS NN) (MS NN)))
move next needed block to the stack three times in a row
(succeeds with a stack VERSAL and U N I on the table
• (DU (MS NN) (NOT NN))
move next needed block to the stack until no more blocks
are needed
• (EQ (DU (MT CS) (NOT CS)) (DU (MS NN) (NOT NN)))
empty the stack on the table and then build the stack in
the correct order
• (EQ (DU (MT CS) (EQ (CS TB))) (DU (MS NN) (NOT NN)))
Learned Program
• Trained to fit 166 test problems
• Using population of 300 programs,
found this after 10 generations:
(EQ (DU (MT CS)(NOT CS))(DU (MS
NN)(NOT NN)))
Genetic Programming
• More interesting example: design electronic filter
circuits
– individual are programs that transform beginning cct to final
cct, by adding/subtracting components and connections.
– Use population of 640,000 run on 64 node parallel processor
– Discover circuits competitive with best human desgn
GP for classifying images
• Teller & Veloso, 1997
• Fitness: based on coverage & accuracy
• Representation:
– Primitives include Add, Sub, Mult, Div, Not, Max, Min, Read,
Write, If-Then-Else, Either, Pixel, Least, Most, Ave,
Variance, Difference, Mini, Library
– Mini refers to a local subroutine that is separately co-evolved
– Library refers to a global library subroutine (evolved by
selecting the most useful minis)
• Genetic operators:
– Crossover mutation
– Create “mating pools” and use rank proportionate
reproduction.
Biological Evolution
• Lamarck (19th century)
– believed individual genetic makeup was
altered by lifetime experience
– but current evidence contradicts this view
• What is the impact of individual learning
on population evolution?
Baldwin Effect
• Assume
– individual learning has no direct influence on individual DNA
– But ability to learn reduces need to “hard wire” traitsin DNA
• Then
– Ability of individuals to learn will support more diverse gene
pool
• because learning allow individuals with various “hard
wired” traits to be successful
– More diverse gene pool will support faster evolution of gene
pool.
– > individual learning (indirectly) increases rate of evolution.
Baldwin Effect
Plausible example:
1. New predator appears in the environment
2. Individual who can learn (to avoid it) will
be selected
3. Increase in learning individuals will
support more diverse gene pool
4. resulting in faster evolution
5. possibly resulting in new non-learned
traits such as instinctive fear of predator
Computer experiements on
Baldwin Effect
• Evolve simple neural nets
– [Hinton & Nowlan, 1987]
– some network weights fixed during lifetime, others trainable
– genetic makeup determines which are fixed and their weight
values
• Results:
– With no individual learning, population failed to improve over
time
– when individual learning allowed
• early generations: population contained many individuals with
many trainable weights
• later generations: higher fitness, while number of trainable
weights decreased.