genetic algorithms
Download
Report
Transcript genetic algorithms
Genetic Algorithms
Beasley, Bull and Martin,
An Overview of Genetic Algorithms:
Part 1, Fundamentals &
Part 2, Research Topics
University Computing, 1993
Background of GAs
GAs are based on genetic processes of biological
organisms, i.e. evolution according to principles of
natural selection and survival of the fittest
In nature, individuals in a population compete
with each other for resources and to attract a mate
The fittest ones survive and produce offspring,
spreading their genetic properties to population
Combination of good properties may in time
produce “superfit” offspring
GAs were first proposed by Holland (1975)
Basic notions of GAs
GAs work with a population of individuals, each
representing a solution to the problem at hand
Each individual is assigned a fitness score
Highly fit individuals are selected as parents and
given an opportunity to reproduce by crossover,
leading to exploration of the most promising
regions in the search space
Offspring produced share features taken from their
parents and may be subject to mutation
Basic notions of GAs (cont.)
Best individuals are selected (from parents and/or
offspring) to form the next generation
Over many generations, good features spread
throughout the population, being mixed and
exchanged with other good features
This leads to convergence to a good solution
GAs are robust and used for a variety of problems
The main area for GAs is difficult problems for
which there are no specialized techniques
A generic GA
Generate an initial population of size population_size
Compute fitness of each individual
Repeat
Repeat for population_size/2
Select two parents based on fitness values
Recombine the parents to produce two offspring by
applying crossover (with rate/probability pc)
and mutation (with probability pm)
Compute fitness values of offspring
Form the population for the next generation
Until population has converged
Decisions in a GA
Chromosome representation or coding of a
solution
Fitness function
Population size, generation of initial population
Parent selection for reproduction
Crossover rate/probability (pc), crossover operator
Mutation probability (pm), mutation operator
Forming the population for next generation
Stopping (convergence) condition
Coding or representation
A potential solution to a problem may be coded or
represented by a set of variables
In GAs, each of these variables (solution
components) is called a gene
A string of genes, representing a complete
solution, is called a chromosome
The set of variables represented by a chromosome
is called genotype, solution constructed using
these variables is called phenotype
The ideal representation scheme is binary coding
Examples of binary coding
Optimization of function f(x,y,z)
5
0
x
,
y
,
z
2
1 63 )
(assuming
Phenotype: x=46, y=24, z=13
Genotype: 0 1 1 1 0 1 | 0 0 0 1 1 0 | 1 0 1 1 0 0
Examples of binary coding (cont.)
Assignment problem
Phenotype: facility f1 is assigned to location l1, f2
to l4, f3 to l2
Genotype:
location
1 2 3 4
1 1 0 0 0
facility 2 0 0 0 1
3 0 1 0 0
Examples of binary coding (cont.)
TSP
Phenotype: the tour 1-3-2-4-1
Genotype:
to city
1 2 3 4
1 0 0 1 0
2 0 0 0 1
from city
3 0 1 0 0
4 1 0 0 0
Fitness function
Fitness function returns a single numerical fitness
for a chromosome
Fitness value is used in probabilistic selection of
parents for reproduction; usually the higher the
fitness, the higher the probability of selection
Fitness function may simply be the objective
function where we optimize a single criterion
It may also be a more complicated measure
involving multiple criteria and penalties for
infeasibility
Fitness function (cont.)
Fitness function should be smooth and regular
(similar chromosomes must have close fitness)
It should reflect real value of a chromosome
Let the magnitude of penalty reflect the “amount”
of constraint violation, e.g. how much will it cost
to convert the chromosome into a valid one?
Use approximate function evaluation when
Evaluating true fitness is too costly
Fitness function is stochastic
Population size and generation of
initial population
Increasing the population size usually increases
solution quality but requires more computation
Common population sizes are 20, 30, 50, 100
In any case, population size is such a small
fraction of the search space that increasing it
further is not justified
Initial population can be generated
Randomly
Completely or partly (called seeding) by heuristic(s)
Reproduction
Reproduction involves
Parent selection
Recombination by using crossover and mutation
operators
Crossover is the more important of the two for
rapidly exploring the search space
Mutation provides a small amount of random
search and ensures that every point in search space
is accessible
Parent selection
Some individuals from the population are selected
to form a mating pool (multiple copies of the same
individual may be allowed)
Size of the mating pool depends on the crossover
rate/probability and the replacement scheme
Two extremes are
For 100% replacement: Size of the mating pool is the
same as the population size, pc=1.0
For steady-state replacement: Two parents are selected
to reproduce two offspring, which replace two worst
parents
Parent selection (cont.)
One idea is to allocate the number of reproductive
trials (the number of times an individual is copied
into the mating pool) or to assign selection
probability to individuals in proportion to their
fitness, e.g. for maximization
fi
# of reproducti ve trials for individual i
f
f i f min
selection probabilit y for individual i
f max f min
Parent selection (cont.)
The number of reproductive trials may not be
integer in which case we can use a stochastic
sampling method, e.g. if reproductive trials for
individuals i and j are 1.8 and 1.2 then each will
have one copy in the mating pool and the third
will be either i or j with remainder probabilities
Selection probabilities may not add up to 1.0 in
which case we can normalize.
Parent selection (cont.)
The above would have worked with infinite
population size but with finite population it may
cause a few highly fit individuals to dominate the
population rapidly (premature convergence)
Conversely, the population may converge after
many generations, but without precisely locating
the optimum due to insufficient gradient in fitness
function to push the GA towards the optimum
(slow finishing)
Fitness remapping
Fitness remapping is used to avoid
Premature convergence, by compressing the range of
fitness values
Slow finishing, by expanding the range of fitness values
Selection pressure (ratio of maximum to average
reproductive trials allocated) can be adjusted by
explicit or implicit fitness remapping
Fitness remapping (cont.)
1. Fitness scaling: Bring the maximum number of
reproductive trials allocated to an individual to
desired level by shifting the fitness values, i.e.
f max s f max
f s
f
2. Fitness windowing: Same as above where s is the
minimum fitness observed during the last n
(typically 10) generations
Fitness remapping (cont.)
3. Fitness ranking: To avoid using extreme fitness
values, sort individuals according to raw fitness,
then assign reproductive trials according to rank
(found to be superior to scaling)
4. Tournament selection: Select a pair of
individuals from the population at random (with
replacement), copy the better one into the mating
pool, repeat until the pool is full
5. Probabilistic tournament selection: Better
individual is selected with a probability > 0.5.
Crossover operator
Typically, crossover takes two parents, cuts their
chromosome strings at a randomly chosen
position, swaps the head (or tail) segments to
produce two offspring
Crossover is not usually applied to all pairs of
parents selected for mating
Likelyhood of crossover being applied to a pair,
pc, is usually between 0.6 and 1.0
If crossover is not applied, offspring are produced
by duplicating their parents (no disruption)
Crossover operator (cont.)
Alternatively, taking pc as constant crossover rate,
pc x population_size/2 pairs are selected, and
crossover is applied to all selected pairs
Most common crossover operators for binary
representation are:
1-point crossover
2-point crossover (chromosome is viewed as a loop
rather than a string)
Uniform crossover
1- and 2-point crossover
1-point
crossover
Parent 1:
Parent 2:
1010|001110
0011|010010
Offspring 1: 1 0 1 0 | 0 1 0 0 1 0
Offspring 2: 0 0 1 1 | 0 0 1 1 1 0
2-point
crossover
Parent 1:
Parent 2:
1010|001|110
0011|010|010
Offspring 1: 1 0 1 0 | 0 1 0 | 1 1 0
Offspring 2: 0 0 1 1 | 0 0 1 | 0 1 0
Uniform crossover
A binary crossover mask is used to determine
which gene will be taken from which parent
Parent 1:
Parent 2:
1010001110
0011010010
Crossover mask:
1001011100
Offspring 1:
Offspring 2:
1010001110
0011010010
Mutation operator
Mutation is applied to every offspring by altering
each binary gene with a small probability, pm
(typically 0.001)
Offspring:
1010010010
Mutated offspring:
1010000010
Alternatively, the entire chromosome may be
mutated at once by a higher pm, particularly when
a non-binary representation and problem specific
genetic operators are used
Forming population for next
generation (replacement)
After two offspring are produced and mutated,
they may replace their parents
Unconditionally (a generation gap of 100%)
If they are more fit than their parents
Alternatively, all parents and offspring may be
sorted together according to their fitness, and the
best population_size of them may be selected
Steady-state replacement replaces only a few
parents (two worst parents by two best offspring)
Convergence
A gene is said to converge when 95% of the
population share the same value
The population is said to converge when all genes
have converged
To monitor convergence, plot population average,
population best and incumbent solution
throughout the generations
As the population converges, average fitness
approaches the best
Convergence (cont.)
Why GAs work: Schemata
A schema is a pattern of gene values represented
by a string of characters in the alphabet {0, 1, #}
where # matches anything
For example, the chromosome 1010 contains,
among others, the schemata 10##, #0#0, ##1#,
101#
Order of a schema is the number of non-# symbols
it contains (2, 2, 1, 3 in the example)
Defining length of a schema is the distance
between the outermost non-# symbols (2, 3, 1, 3)
Schema theorem
It is assumed that an individual’s high fitness is
due to the good schemata it contains
Holland (1975) showed that, under simplifying
assumptions, the optimum way to explore the
search space is to allocate reproductive trials to
individuals in proportion to their fitness values
In this way, good schemata receive an
exponentially increasing number of reproductive
trials in successive generations (this is called the
schema theorem)
Schema theorem (cont.)
Holland also showed that, since each individual
contains many different schemata, the number of
schemata effectively processed in each generation
is in the order of population_size3
This property is known as implicit parallelism, and
is one of the explanations for the good
performance of GAs
Binary coding is thought to be ideal because these
theoretical results are valid for binary coding
Building block hypothesis
Goldberg (1989) claims that power of the GA lies
in it being able to find good building blocks
Building blocks are schemata of short defining
length consisting of genes that work well together,
and improve performance when incorporated into
an individual
Short defining length is needed so that building
blocks are disrupted less by random cut points in
1- or 2-point crossover
Building block hypothesis (cont.)
Hence a successful coding scheme encourages
formation of building blocks by ensuring that
Related genes are close together on the chromosome
There is little interaction between genes
Interaction (epistasis) means that contribution of a
gene to the fitness depends on values of other
genes in the chromosome
There is always some interaction between genes in
multimodal fitness functions, and the above
conditions are not easy to satisfy
Exploration and exploitation
Any good search algorithm must find a tradeoff
between exploration and exploitation, e.g. random
search does only exploration whereas traditional
descent does only exploitation
Holland showed that a GA does both
simultaneously in an optimal way, assuming that
Population size is infinite
Fitness function accurately reflects the solution’s utility
Genes in a chromosome do not interact significantly
Exploration and exploitation
(cont.)
The first assumption can never be satisfied in
practice; GA’s “population” is only a sample and
stochastic error is unavoidable
Genetic drift: Even in the absence of any selection
pressure (i.e. a constant fitness function), the GA
will still converge if, by chance, a chromosome
becomes predominant in the population
For the GA to properly exploit, the fitness function
must provide a sufficiently large slope to
counteract the genetic drift
Mutation can be useful in avoiding genetic drift
Comparison of GAs with others
Random search does only exploration, traditional
ascent (hillclimbing) does only exploitation, GA
does both.
Iterated hillclimbing with random restarting points
allocates its trials evenly over the search space,
GA allocates increasing trials to promising regions
SA and TS deal with one candidate solution at a
time, GA has a population and implicit parallelism
TS is usually deterministic, GA is stochastic
SA does not have memory, TS does, GA?
Part 2: Crossover revisited
2-point crossover is better than 1-point because a
chromosome, when consired as a loop, can contain
more building blocks
Schemata of a particular order are equally likely to
be disrupted by uniform crossover, irrespective of
their defining length
Schemata with long defining length are more
likely to be disrupted by 2-point crossover,
irrespective of their order
Crossover revisited (cont.)
Schemata with short defining length are more
likely to be disrupted by uniform crossover, but
the same is not true for longer defining length (?)
Hence, total amount of schemata disruption may
be lower with uniform crossover
Ordering of genes in the chromosome is not
important with uniform crossover, hence it is more
robust than 2-point crossover
Theoretical and empirical results show that there
is no overall winner
Mutation revisited
Mutation is traditionally regarded less important
than crossover and used to provide a small amount
of random search and to avoid genetic drift
However, asexual reproduction can also result in
successful evolution
Naive evolution (just selection and mutation)
results in slower evolution than crossover alone,
but it may find better solutions at the end
Indeed, as the population converges, mutation
becomes more productive than crossover
Inversion and reordering
Order of genes on a chromosome is critical for the
building block hypothesis to work effectively
Purpose of inversion/reordering is to find gene
orderings that have better evolutionary potential
Inversion is a special form of reordering which
reverses the order of genes between two randomly
selected positions
Reordering does not lower epistasis; nor does it
help when linear ordering of genes is not possible
Reordering also expands the search space
More on epistasis (interaction)
In biology, a gene is epistatic when its presence
suppresses the effect of a gene at another position
Even when individual genes are not epistatic, there
will be “chains of influence” (one gene’s product
affects another gene’s function)
Hence, interaction among genes is unavoidable
Interaction is inherent in some problems, e.g.
In TSP, it is the relationship (distance) between cities
that counts, not the cities themselves
Two facilities cannot be assigned to the same location
Deception
Normally, short, low-order schemata contained in
global optimal solution are expected to increase in
frequency throughout the evolution
If schemata not contained in optimal solution have
higher fitness, then they will increase in frequency
faster, and the GA will be misled
Deception is a special case of epistasis
Deceptive problems may be difficult to solve, but
the bias introduced in average fitness estimation
after the first generation may help solve them
Tackling epistasis
Epistasis can be tackled in two ways
As a coding problem
As a GA theory problem
If taken as a coding problem, the solution is to
find a different coding scheme and to develop
appropriate genetic operators, e.g.
Goldsberg’s order-schemata and PMX crossover
Expansive coding which uses a larger number of
weakly interacting genes (larger search space) instead
of a small number of strongly interacting genes
Tackling epistasis (cont.)
We will see such examples for ordering problems
when we discuss genetic operators for TSP
When treated as a GA theory problem, a new
theory (and new algorithms) may have to be
developed, which takes epistasis into account
Although Holland’s convergence proof assumes
low epistasis, there may be a weaker proof for
domains of high epistasis
Non-binary representations
Binary representation, where each gene has a
cardinality of two, is traditionally believed to give
the largest number of schemata and to provide
highest degree of implicit parallelism
Recently, higher-cardinality representations are
claimed to contain more schemata; they can
perform well
Integer or real numbers can be used as highcardinality alphabets, and meaningful problem
specific genetic operators can be defined easily
Non-binary representations
(cont.)
Examples of non-binary crossover operators
Take arithmetic average of the two gene values
Take geometric mean (square root of the product)
Take the difference between the two gene values, add it
to the higher or subtract it from the lower
Examples of non-binary mutation operators
Replace the current value with a random one
Add or subtract a small random amount (creep)
Multiply by a random amount close to one (geometric
creep)
Dynamic operator probabilities
Crossover probability, pc, and mutation
probability, pm, may vary during the evolution
For better exploration, pc may decrease and pm
may increase during the run according to a fixed
schedule, e.g. linearly
For convergence, pm may decrease exponentially
(similar to the temperature in SA)
pc and pm can be adjusted dynamically, depending
on the spread of fitness values, e.g. increase pm as
the spread decreases (as the population converges)
Dynamic operator probabilities
(cont.)
Probability of the more successful operator can be
increased, e.g.
Monitor the fitness improvement due to crossover and
mutation operators over the last n reproductive trials
Give more weight to the more successful operator
For each reproductive trial, choose one of the operators
probabilistically according to its weight
Different crossover and mutation operators may
also be weighted in a similar manner
Niche and speciation
In nature, different species evolve to fill different
ecological niches
Speciation is the process by which a single species
differentiates into two or more different species
Niches are analogous to alternative maxima of
fitness values in GAs
Normally, a GA cannot find these alternatives
because of genetic drift and convergence (a GA
does not allow speciation and the entire population
end up in the same niche)
Niche and speciation (cont.)
To solve this problem, we should
Maintain diversity by encouraging speciation
Share the payoff associated with a niche
Preselection: Offspring replace the parents only if
the offspring’s fitness is higher than that of the
inferior parent (this maintains population diversity
since similar individuals replace each other)
Crowding: Offspring is compared with a few
randomly selected individuals and replaces the
most similar one (again for diversity)
Niche and speciation (cont.)
Restricted mating: Individuals are allowed to mate
only if they are similar (this encourages
speciation); offspring of two highly fit but
dissimilar parents may be unfit
Multiple subpopulations: Population is divided
into subpopulations, each evolving in itself, and
migration is allowed at a limited rate (again for
speciation)
Local mating: Similar to multiple subpopulations,
but without explicit boundaries
Niche and speciation (cont.)
Sharing: Similar individuals that are in the same
niche share the fitness payoff among them
A full niche is no longer rewarding since the payoff is
shared and individual fitness values are reduced
Sharing distributes individuals to peaks in fitness
function in proportion to the height of the peak
Sharing is found to be superior to crowding
Sequential niches: Multiple GA runs are made,
each locating a new peak (previously found peaks
are cancelled out from the fitness function)
Diploidy and dominance
Diploidy: Higher lifeforms like mamals have two
sets of genes; of a pair of genes, one is dominant
and the other is recessive
Diploidy allows two solutions to be remembered
instead of one, and provides higher diversity
Potentially useful gene sets can be maintained in
recessive position
An extension can be to keep the best individuals
(elite solutions?) and try reintroducing them to the
population if the performance falls
Problem specific knowledge
Binary coding, random initial population, and
traditional crossover/mutation operators follow the
biological process more closely (are generic) but
do not make use of problem specific knowledge
Using problem specific knowledge, we can
Find more suitable representation schemes
Generate initial population using heuristics
Develop problem specific genetic operators that
guarantee feasibility, particularly in ordering problems
Use local improvement as a form of mutation