Genetic algorithms / genetic programming

Download Report

Transcript Genetic algorithms / genetic programming

Brief introduction to
genetic algorithms
and
genetic programming
A.E. Eiben
Free University Amsterdam
Genetic algorithm(s)



Developed: USA in the 1970’s
Early names: J. Holland, K. DeJong, D. Goldberg
Typically applied to:


Attributed features:



discrete optimization
not too fast
good solver for combinatorial problems
Special:


many variants, e.g., reproduction models, operators
formerly: the GA, nowdays: a GA, GAs
A.E. Eiben, GAs and GP
2
EvoNet Summer School 2002
Representation
Genotype space = {0,1}L
Phenotype space
Encoding
(representation)
10010001
10010010
010001001
011101001
Decoding
(inverse representation)
A.E. Eiben, GAs and GP
3
EvoNet Summer School 2002
GA: crossover (1)
Crossover is used with probability pc

1-point crossover:




n-point crossover:




Choose a random point on the two parents (same for both)
Split parents at this crossover point
Create children by exchanging tails
Choose n random crossover points
Split along those points
Glue parts, alternating between parents
uniform crossover:



Assign 'heads' to one parent, 'tails' to the other
Flip a coin for each gene of the first child
Make an inverse copy of the gene for the second child
A.E. Eiben, GAs and GP
4
EvoNet Summer School 2002
GA: crossover (2)
A.E. Eiben, GAs and GP
5
EvoNet Summer School 2002
GA: mutation
Mutation:

Alter each gene independently with a probability pm

Relatively large chance for not being mutated
(exercise: L=100, pm =1/L)
A.E. Eiben, GAs and GP
6
EvoNet Summer School 2002
Crossover OR mutation?


Decade long debate: which one is better / necessary /
main-background
Answer (at least, rather wide agreement):




it depends on the problem, but
in general, it is good to have both
both have another role
mutation-only-EA is possible, xover-only-EA would not work
A.E. Eiben, GAs and GP
7
EvoNet Summer School 2002
Crossover OR mutation?
(cont’d)
Exploration: Discovering promising areas in the search
space, i.e. gaining information on the problem
Exploitation: Optimising within a promising area, i.e. using
information
There is co-operation AND competition between them
Crossover is explorative, it makes a big jump to an area
somewhere “in between” two (parent) areas
Mutation is exploitative, it creates random small diversions,
thereby staying near (i.e., in the area of ) the parent
A.E. Eiben, GAs and GP
8
EvoNet Summer School 2002
Crossover OR mutation?
(cont’d)

Only crossover can combine information from two parents

Only mutation can introduce new information (alleles)


Crossover does not change the allele frequencies of the
population (thought experiment: 50% 0’s on first bit in the
population, ?% after performing n crossovers)
To hit the optimum you often need a ‘lucky’ mutation.
A.E. Eiben, GAs and GP
9
EvoNet Summer School 2002
Selection



Main idea: better individuals get higher chance
2ndary idea: chances proportional to fitness
Implementation: roulette wheel technique
 Assign to each individual a part of the roulette wheel
(“unfair”: size proportional to its fitness)
 Spin the wheel n times to select n individuals (fair)
1/6 = 17%
fitness(A) = 3
A
fitness(B) = 1
3/6 = 50%
fitness(C) = 2
A.E. Eiben, GAs and GP
10
B
C
2/6 = 33%
EvoNet Summer School 2002
Selection (cont’d)
Fitness proportional selection (FPS):
Expected number of times fi is selected for mating is:
.
fi
f
Disadvantages:
 Outstanding individuals take over the entire population
very quickly  danger for premature convergence.
 Low selection pressure when fitness values are near each
other.
 Behaves differently on transposed versions of the same
function.
A.E. Eiben, GAs and GP
11
EvoNet Summer School 2002
Selection (cont’d)
Tournament selection:

Pick k individuals randomly, without replacement
 Select the best of these k comparing their fitness values
 k is called the size of the tournament
 selection is repeated as many times as necessary
A.E. Eiben, GAs and GP
12
EvoNet Summer School 2002
Generational GA
reproduction cycle
1. Select parents for the mating pool
(size of mating pool = population size)
2. Shuffle the mating pool
3. For each consecutive pair apply crossover with
probability pc
4. For each new-born apply mutation (bit-flip with
probability pm)
5. Replace the whole population by the resulting mating
pool
A.E. Eiben, GAs and GP
13
EvoNet Summer School 2002
Generational GA
reproduction cycle
Generation t
Mating pool
Generation t+1
string1
string2
child1(2,4)
string2
string4
mut(child2(2,4))
string3
string2
string2
string4
string1
mut(string1)
…
…
…
Notes:
• Offspring can be: clone, pure mutant, pure crossing, mutated crossing
• Generational replacement: whole population deleted/replaced
To be discussed: no survival of the fittest here?
A.E. Eiben, GAs and GP
14
EvoNet Summer School 2002
An example after Goldberg
‘89 (1)


Simple problem: max x2 over {0,1,…,31}
GA approach:






Representation: binary code, e.g. 01101  13
Population size: 4
1-point xover, no mutation (just an example!)
Roulette wheel selection
Random initialisation
One generational cycle with the hand shown
A.E. Eiben, GAs and GP
15
EvoNet Summer School 2002
An example after Goldberg
‘89 (2)
A.E. Eiben, GAs and GP
16
EvoNet Summer School 2002
An example after Goldberg
‘89 (3)
A.E. Eiben, GAs and GP
17
EvoNet Summer School 2002
The simple GA



Has been subject of many (early) studies
Is often used as benchmark for novel GAs
Shows many shortcomings, e.g.




Representation is too restrictive
Mutation & crossovers only applicable for bit-string & integer
representations
Selection mechanism sensitive for converging populations with close
fitness values
Generational population model can be improved with explicit survivor
selection
A.E. Eiben, GAs and GP
18
EvoNet Summer School 2002
Genetic programming



Developed: USA in the 1990’s
Early names: J. Koza
Typically applied to:


Attributed features:




machine learning tasks
competes with neural nets and alike
slow
needs huge populations (thousands)
Special:


non-linear chromosomes: trees, graphs
mutation possible but not necessary (disputed!)
A.E. Eiben, GAs and GP
19
EvoNet Summer School 2002
Motivation
Why introduce yet another EA?
Because fixed length linear
representations are too rigid
Reasons:

Elements of a search space may vary in length

Linear representation may be too ‘unnatural’

Complex variable hierarchy can not be (easily) mapped on linear
structures
Example search space:

Graphs without restriction on size and structure
A.E. Eiben, GAs and GP
20
EvoNet Summer School 2002
Credit score example (1)

Given: lot of historical data on:



customer profile and
creditability index (good/bad).
Needed: a model that classifies good customers.
for evaluating loan applicants)
(to be used
Data description for customer profiles.
A.E. Eiben, GAs and GP
21
EvoNet Summer School 2002
Credit score example (2)




A possible model for classification:
IF (NOC = 2) AND (S > 80000) THEN good
In general: IF formula THEN good.
Need to find the right formula.
Natural representation of formulas is: parse trees
AND
=
NOC

>
2
S
80000
Natural fitness of models: percentage of well classified
cases.
A.E. Eiben, GAs and GP
22
EvoNet Summer School 2002
GP: representation



Problem domain: modelling (forecasting, regression,
classification, data mining, robot control).
Fitness: the performance on a given (training) data set,
e.g. the nr. of hits/matches/good predictions
Representation: implied by problem domain, i.e.
individual = model = parse tree


parse trees sometimes viewed as LISP expressions 
GP = evolving computer programs
parse trees sometimes viewed as just-another-genotype 
GP = a GA sub-dialect
A.E. Eiben, GAs and GP
23
EvoNet Summer School 2002
GP: mutation
Replace randomly chosen subtree by a randomly
generated (sub)tree
A.E. Eiben, GAs and GP
24
EvoNet Summer School 2002
GP: crossover
Exchange randomly selected subtrees in the parents
A.E. Eiben, GAs and GP
25
EvoNet Summer School 2002
GP: selection

Standard GA selection is usual

Sometimes overselection to increase efficiency:

rank population by fitness and divide it into two groups:



when executing selection




group 1: best c % of population
group 2: other 100-c %
80% of selection operations chooses from group 1
20% from group 2
for pop. size = 1000, 2000, 4000, 8000 the portion c is
c = 32%, 16%, 8%, 4%
%’s come from rule of thumb
A.E. Eiben, GAs and GP
26
EvoNet Summer School 2002
Generating random trees
Given a:

Function set F and a

terminal set T ,

both satisfying the closure property.
Trees are randomly generated by:

Full method:
Each branch is of length Dmax (pre-specified),
nodes with depth < Dmax are from F
nodes with depth = Dmax are from T

Grow method:
maximum branch length is Dmax (pre-specified)

Ramped half-and-half:
for each D  Dmax an equal nr. of trees


half of them with full method
half of them with grow method
A.E. Eiben, GAs and GP
27
EvoNet Summer School 2002
Mutation of trees
Replace randomly chosen subtree by a randomly generated
(sub)tree.
A.E. Eiben, GAs and GP
28
EvoNet Summer School 2002
Crossover of trees
Exchange randomly selected subtrees in the parents
A.E. Eiben, GAs and GP
29
EvoNet Summer School 2002
Standard parameters in GP (1)
Qualitative variables
Initialisation: ramped half-and-half.
Fitness: adjusted fitness is used.
Selection:



fitness proportionate,
elitist strategy is not used,
over-selection is used for populations of M  1000.
Over selection for population size = 1000:





rank population by fitness and divide it into two groups:
group 1: best c = 32% of pop, group 2 other 68%
80% of selection operations chooses from group 1, 20% from group 2
for pop. size = 2000, 4000, 8000 the portion c is c = 16%, 8%, 4%
motivation: to increase efficiency, %’s come from rule of thumb
A.E. Eiben, GAs and GP
30
EvoNet Summer School 2002
Standard parameters in GP (2)








Major numerical parameters
Population size M = 500.
Maximum number of generations G = 51.
Minor numerical parameters
Probability pm of mutation = 0% !!!
Probability pr of reproduction = 10%
Probability pc of crossover = 90%
Probability pip of choosing internal points for xover = 90%
Maximum size Di for initial random S-expressions = 6
Maximum size Dc for S-expressions during the run = 17
… and some “exotic” options usually set at 0 (e.g. permutation,
editing, encapsulation, decimation)
A.E. Eiben, GAs and GP
31
EvoNet Summer School 2002
Simple symbolic regression (1)
Given a number of sample points in 2:
(x1, y1), (x2, y2), … , (xn, yn)
Find a one-dimensional numerical function f(x):
i  {1, …, n} : f(xi) = yi
In the present test 20 sample points are generated by:
x4 + x3 + x2 + x
A.E. Eiben, GAs and GP
32
EvoNet Summer School 2002
Simple symbolic regression (2)
Specification of the GP for the symbolic regression problem
A.E. Eiben, GAs and GP
33
EvoNet Summer School 2002
Simple symbolic regression (3)
Graphical representation of an individual
(and the benchmark formula x4 + x3 + x2 + x)
A.E. Eiben, GAs and GP
34
EvoNet Summer School 2002
Simple symbolic regression (5)
+
•
X
+
X
•
X
•
X
+
X
X +(X • (X +(X • (X • (X +(COS(X - X ) - (X - X))))))) =
X + (X • (X + (X • (X • (X + 1))))) =
X + X4 + X3 + X2
-
X
-
cos
X
Best individual representing a perfect solution:
X
X
X
A.E. Eiben, GAs and GP
35
EvoNet Summer School 2002