Transcript Selection

Evolutionary Computation
Stefano Cagnoni
University of Parma, Italy
Evolution
In every population in nature mutations occur from
time to time.
Mutations may generate individuals who are fitter,
with respect to the environment.
These will survive longer (natural selection)
producing more numerous offspring (reproduction).
Their offspring partly share their parents’ genetic
characters (chromosomes, made up of genes),
partly define new types, obtained by mixing such
characters (crossover).
Evolution
The following generations are more likely to
have the same characters as the individuals
who have the highest fitness with respect to the
environment.
An individual’s genetic code is called genotype.
The manifestation of the characters encoded in
such code (the individual) is called phenotype.
Evolutionary Computation
In science:
• Verification of hypotheses in biology, sociology,
religion, etc. through simulations
In engineering:
•
•
•
•
Function Optimization
Combinatorial Optimization
Machine Learning
More generally, search for good solutions to hard
problems
Nature/Computing Correspondences
Individual
Solution to a problem
Population
A set of solutions
Fitness
Quality of a solution
Chromosome*
Representation of a solution
Gene*
Component of a representation
Crossover, Mutation
Operators used to search
solutions
Natural Selection
Re-use of good solutions
Evolution
Search of good solutions
*only for Genetic Algorithms
Components of an evolutionary
algorithm
1. Representation (encoding)
2. Evaluation Function (fitness function)
3. Population
4. (Parents’) Selection Strategy
5. Operators (Modification/recombination)
6. Survivors’ Selection Strategy (Substitution)
7. Initialization Strategy
8. Termination Condition
General evolutionary algorithm
1. Initialize a population
2. Evaluate population’s fitness
3. Repeat:
a. select a population subset, based on fitness
b. from the selected individuals, generate a new
population using the modification/recombination
operators
c. Evaluate the new population’s fitness until a
termination criterion is met
Genetic Algorithms
In a genetic algorithm, new solutions are obtained
operating on their encoding: in genetic terms, only the
genotype is affected (as in nature). A genotype-tophenotype decoding needs therefore to be defined.
Chromosomes are represented as strings of symbols,
e.g. 0’s and 1’s.
Individuals may be anything that can be represented by
a string of symbols.
Phenotypes may be, for example, vectors of parameters,
list of possible choices, etc.
Basic Genetic Algorithm
1.
Generate a random population of chromosomes.
2.
Decode each chromosome to obtain an individual.
3.
Evaluate each individual’s fitness.
4.
Generate a new population, partly by cloning
(copying), partly by recombining, partly by
mutating the chromosomes of the fittest individuals.
5.
Repeat 2,3,4 until a termination condition is met.
Representation (n-bit chromosome)
Numbers
Integer
(from 0 to 2n-1, from K to K+2n-1,
from 0 to M con M2n-1)
Real
Elements belonging to finite sets
Vectors of numbers or parameters
Representation
Similar representations must represent similar entities
Gray Code
Representations of consecutive integers differ by 1 bit
0
1
2
3
Gray
000
001
011
010
Bin
000
001
010
011
4
5
6
7
Gray
110
111
101
100
Bin
100
101
110
111
Inverting one bit produces small changes.
When the change is large, it is larger than with the
traditional binary encoding.
Fitness Function
Fundamental hypotheses
1. A measure Q exists for the quality of a solution.
2. Q is positive
3. It has to be maximized
4. An individual’s Q is its fitness
Population
A population is a multiset (a set which admits the
presence of more copies of the same element) of
solutions.
It is characterized either by its size (number of
individuals) or, possibly, by a spatial structure according
to which individuals are arranged.
The population size is most often kept constant through
the generations.
The population diversity is the number of different
individuals which are contained in it.
Selection
The strategy according to which individuals (actually,
their genotype, represented by the chromosomes) are
selected for reproduction.
To simulate natural selection, higher-fitness
individuals have higher probability to be selected.
Different selection strategies exist, some of which are
not biologically plausible.
Usually in a genetic algorithm:
1. A set of solutions is selected for mating (mating pool)
2. Pairs of individuals are randomly extracted from the
mating pool and are coupled (sexual reproduction)
Selection
Fitness-proportionate selection
The most commonly-used selection strategy.
Each individual is assigned a probability to be
selected, which is proportional to its fitness
pi = fi/Skfk
NB It is properly a probability, since Sipi = 1
Selection
Implementation (fitness-proportionate selection)
Suppose 4 individuals have fitness
f1=f2=10
f3=15 f4=25
Then (probability of selection):
p1=p2=1/6
p3=1/4
p4=5/12
Selection
Implementation (fitness-proportionate selection)
1. Roulette-wheel strategy
Each individual is assigned a wheel sector, whose
size is proportional to its fitness. Every position of the
arrow corresponds to a number. A random number is
extracted and the individual which ‘owns’ the region
where the arrow is pointing, is selected.
Selection
Other implementations (fitness-proportionate selection)
2. Vector of size N
112233344444
0
N-1
Each individual is represented a number of times proportional
to its fitness. A random number from 0 to N-1 is generated
and the individual corresponding to the vector value in that
position is selected.
3. Real number between 0 e Sjfj
The fitness values are ‘enqueued’, and a random number r in
that interval is extracted. The individual is selected such that
Sj=1,i-1 fj  r < Sj=1,i fj
Selection
Problems
Premature convergence
If an individual’s fitness is much higher than the average
fitness of the population, but much lower than the
optimum fitness, it will tend to be repeatedly selected so
that a mediocre uniform population is generated.
Stagnation
If all individuals have similar fitness, they tend to have
the same the same selection probability, causing the
search to become random.
Selection
Rank selection
Individuals are ranked by fitness (in decreasing order).
A probability distribution function, decreasing with rank, is
defined, independent of fitness values.
Advantages
• No premature convergence: no individual’s selection
probability is much higher than any other individual’s
• No stagnation: the probability distribution does not change.
Disadvantages
Computationally heavier.
Note: it is not biologically plausible.
Selection
Tournament selection
To select each individual, a random set of individuals
is picked, the best of which is selected.
Advantages
Same as rank-based selection, with no need for
ordering.
Selection
Elitist Selection
At least one copy of the best individual is kept in the
new generation.
Advantages
Good solutions do not get lost due to ‘random’
selection strategies
Disadvantages.
If the best individual’s characters become dominant,
this may lead to premature convergence.
Survivors’ selection (substitution)
If m is the population size and l the number of offspring which are
generated, from m parents plus l offspring, m individuals must be
selected which will compose the next generation.
If l = m we have a generational algorithm, if l < m we have a steady
state algorithm.
Selection strategies may be based on fitness or be independent of it.
Age-based strategies
Independently of fitness, each individual survives for a pre-set
number of generations. Implementation is trivial if l = m (the whole
generation t+1 is made up of all offspring of generation t).
If l < m fitness should be taken into account (with a random
substitution the probability to lose the best individual is very high).
Survivors’ selection (substitution)
Fitness-based strategies
The same strategies used to define the mating pool can be
used (fitness proportionate, rank-based, tournament).
It is also possible to consider age, requiring, for instance (if l
< m ) that all offspring make up the next generation, along with
the best m - l parents.
The replace-worst strategy replaces the worst l individuals.
Elitist strategies require that the best individual of
generation t be present in generation t+1.
Genetic Operators: Crossover
Offspring is generated by recombining genetic
material of individuals comprised in the mating pool.
This is called crossover or recombination.
Crossover generates, as with sexual reproduction in
nature, new individuals whose genetic code derives
partly from one parent and partly from the other one.
Genetic operators: Crossover
SINGLE-POINT CROSSOVER
1110001101001010
1000100111001010
PARENTS
1110000111001010
1000101101001010
OFFSPRING
A point within the genome is randomly chosen and the right or
left sections are swapped
TWO-POINT CROSSOVER
1110001101001010
1000100111001010
PARENTS
1110100111001010
1000001101001010
OFFSPRING
The string is circular. Two ‘cuts’ are made and the internal or
external sections are swapped
Genetic operators: Crossover
UNIFORM CROSSOVER
1110001101001010
1000100111001010
PARENTS
1010000111001010
1100101101001010
OFFSPRING
Each bit is randomly selected from one of the two
parents for the first child and from the other parent for
the second one.
The same operators can be used if a representation
based on vectors of integers is used.
Genetic operators: Crossover
If the representation is based on floating point vectors the
operators are conceptually similar. However, instead of selecting
the representation elements by copying them from one of the two
parents x and y, they perform a weighted sum of their values.
Simple Recombination: a point k is selected and a < 1
Child 1: < x1, x2, … , xk , a yk+1 + (1-a) xk+1, …. , a yN + (1-a) xN >
Child 2, same as Child 1 but x and y are swapped
Single Recombination: a point k is chosen and a < 1
Child 1: < x1, x2, … , a yk + (1-a) xk, xk+1 …. , xN >
Child 2, same as Child 1 but x and y are swapped
Full recombination
Child 1: a y + (1-a) x
Child 2: a x + (1-a) y
Crossover between permutations
Partially-mapped crossover (PMX):
1. Two points are chosen and the values within them are copied into C1
P1 : 123456789
P2: 937826514
C 1:
...4567..
2. Starting from the first point, the elements of P2 , comprised between the
two selected points, which have not been copied yet are considered
3. For each of them (i) the element (j) of C1 which occupies the
corresponding position is considered
4. i is moved into the position occupied by j in P2
C1: ...4567.8
5. If the position occupied by j in P2 has already been taken in C1 by k, i is
moved into the position occupied by k in P2
C1: ..24567.8
6. The other elements of C1 are directly copied from P2
C1: 932456718 . C2 is created similarly, swapping the parents.
Genetic Operators: Mutation
Mutation is aimed at maintaining genetic diversity to try
and explore also regions in the search space which are
not ‘occupied’ by the present population.
Mutation for binary representations
A bit is chosen randomly and is inverted.
1001010010101
1000010010101
For integer representations it is possible to substitute a
gene with a valid random value, or add a positive or
negative quantity to it, from a probability distribution
having its maximum in zero.
Genetic Operators: Mutation
Mutation for floating point representations
From <x1,x2,. . ., xn>, xi[Li,Ui], we generate <x’1,x’2,. . ., x’n>,
x’i[Li,Ui] by substituting values randomly
Uniform Mutation
All elements are substituted by a random value belonging to
the same interval.
Non-uniform Mutation with fixed distribution
Each element of the new vector is generated by adding a
random number belonging to a distribution centered in zero e
decreasing with the absolute value (e.g., Gaussian G(0,s)).
Genetic Operators: Mutation
Mutation for permutations
Two random points are chosen and ….
By swap
123456789
153426789
By insertion
123456789
125346789
By shuffle
123456789
153246789
By inversion
123456789
154326789
Parameters of a genetic algorithm
• Population size
• Termination criterion
• max number of iterations
• fitness threshold (evolution ends if a sufficiently good
solution is found)
• Distribution of the genetic operators:
• probability of clonation (survival)
• crossover probability
• mutation probability
Genetic Programming
Genetic
algorithm,
applied
to
a
different
representation, semantically very different.
Functions are evolved, represented as syntactic trees
Functions (tree nodes)
Terminal symbols
(tree leaves)
Genetic Programming
Tree nodes are functions; tree leaves are terminal
symbols (constants, pointers or data)
Operators must satisfy the closure requirement, which
requires that the same data type (defined over the same
domain) be used for the input and outputs of all nodes so
that any permutation of nodes and of terminal symbols
creates a valid tree (a typed variant also exists, anyway).
The function is evaluated by traversing the syntactic tree.
If a tree is traversed in pre-order, LISP-like code (prefix
notation) is obtained, if it is traversed symmetrically the
function is represented in algebraic notation (infix
notation)
Genetic Programming
Operators must be adapted to fit the representation
Mutation
Genetic Programming
• The properties of genetic operators are very
different from those of the corresponding operator
applied to strings
• The result of mutation may be a tree whose
depth is higher than the original depth
• The result of crossover can be a tree whose
depth is higher than both parents’ depth
• Genetic Programming is a genetic algorithm,
anyway: it is controlled by the same parameters
Genetic Programming
Genetic programming works at a higher level of
abstraction with respect to genetic algorithms:
• In GAs the representation has always symbols
(bits, most frequently) as atomic elements. It is
only necessary to define the semantics of
representation
• In GP the atomic elements need to be defined as
well, i.e. the set of terminal symbols and of the
functions that can be used to build the trees
Genetic Programming
It is necessary to define:
• A function set F (tree nodes) for each of which arity
(number of arguments) must be specified
• A terminal symbol set T (tree leaves)
that meet the following requirements:
• All elements of T are correct expressions
• If f  F is a function with arity n and (e1,…., en) are
correct expressions, then also f(e1,….,en) is correct
• There are no other possible correct expressions
These conditions are equivalent to the closure
requirement: for each function/node, domain and codomain are coincident.
Genetic Programming
The final problem to be solved is constant definition
• A possible solution (Ephemeral Random Constants)
consists of using a constant terminal symbol defined
within a certain interval;
• When a new tree is generated (initialization), i.e., the
first time an ERC is used, it is assigned a random
value within its domain.
• For every subsequent use, an ERC behaves as a
constant, keeping the same value to which it has
been initialized
Genetic Programming
Selection
The same strategies used for GAs are used. However,
since large population are usually generated, an overselection strategy is also used:
• Population is ordered by fitness and divided into two
groups, the former containing x% of the population,
the latter the remaining (100-x)%
• 80% of the individuals is selected from the first
group, the 20% from the second one.
Genetic Programming
Initialization
The so-called ramped half-and-half initialization is
generally used
A maximum depth Dmax for the trees is set, then the
population is initialized using one of the following methods
with equal probability:
• Full method: each branch has depth Dmax; each element is
taken from F if depth D < Dmax, from T otherwise.
• Grow method: trees can be unbalanced, so each element
is taken from F U T, if D < Dmax, from T otherwise.
Genetic Programming
Bloat
Since GP uses a variable-length representation, tree depth
tends to increase with time
This phenomenon is called bloat or, as a joke, “survival of
the fattest” (as opposed to “survival of the fittest”).
Possible remedies:
• Operations which produce individuals with depth D >
Dmax are forbidden
• A term is added to the fitness function, which penalizes
the largest trees (Es. F = Err% + 0.0001 * Size)
(parsimony pressure)