Lecture 10: Learning - Genetic algorithms
Download
Report
Transcript Lecture 10: Learning - Genetic algorithms
Intro to AI
Genetic Algorithm
Ruth Bergman
Fall 2002
Imitating Nature
Aspect of the evolution of organisms:
• The organisms that are ill-suited for an environment
have little chances to reproduce (natural selection)
• Conversely, the best fitting have more chances to
survive and reproduce
Imitating Nature
Reproduction:
• Offspring are similar to their parents
• Random mutations occur and they can bring to better (or worse)
fitting individuals
“The Origin of the Species on the Basis of Natural Selection” C.
Darwin (1859)
Encoding:
• An organism is fully represented by its DNA string, that is a
string over a finite alphabet (4 symbols)
• Each element of this string is called gene
Genetic Algorithm (GA)
• Developed by John Holland in the early 70’s
• Optimization and machine learning techniques
inspired from the process of natural evolution and
evolutionary genetics
– Solutions are encoded as chromosomes
– Search proceeds through maintenance of a population of
solutions
– Reproduction favors “better” chromosomes
– New chromosomes are generated during reproduction
through processes of mutation and cross over, etc.
GA Framework
selection
Search
space
A
0 1 0 0 0
B
1 0 1 1 0
C
1 1 0 1 0
D
0 1 0 1 1
population
cross over
1 0 1 1 0
1 0 0 1 1
0 1 0 1 1
0 1 1 1 0
mutation
1 0 0 1 1
Fitness
evaluation
0 1 1 1 0
reproduction
GA Procedure
•
Start with a population of N individuals
1. Apply the fitness function to all the individuals
2. Select the pairs of individuals for reproduction (repetition
allowed).
3. Each pair generates two children (reproduction with cross-over)
4. Apply a random mutation to the children. The children become
the next generation
5. Apply steps 1,2,3 until some termination criteria applies
Encoding Scheme
• An individual (an organisms) is intended to be a
possible solution for the problem you want to solve
• An individual is represented by a binary string. Such
a string is intended to be the complete description of
the individual
• Example:
Suppose you have to find a number between 0 and 255,
which binary representation contains the same number of 1s
and 0s.
A individual is a string of 8 bits, ex:
h=
0 1 1 1 1 1 1 0
= 126
Fitness Function
• A fitness function is a function that says how good is
a solution, i.e. how well an individual fit the
environment
• Example
f (h) 8 | n1 n0 |
note that the fitness function gets the minimum value (i.e. 0)
when n1 8
or n0 8 and the maximum value (i.e. 8)
when n1 n2 4
The Initial Population
0 1 1 1 1
1 1 0
1 1 1 1 1
1 1 0
0 0 1 0 0
1 0 0
0 0 0 0 0
0 0 1
Optimization
• local optimum 방지
cf.
Hill-climbing Method
GA Search Method
Selection
• Roulette wheel selection
– compute each individual’s contribution to the global fitness as
– The choice of the pairs for reproduction consists of randomly choosing
the individuals (with replacement) with distribution given by P
encoding
A
B
C
D
0 1 11 1 1 1 0
1 1 11 1 1 1 0
0 0 10 0 1 0 0
0 0 00 0 0 0 1
fitness
4
2
4
2
P(-)
.33
.17
.33
.17
D
17%
C
33%
A
33%
B
17%
Roulette Wheel
Crossover
– Randomly choose a cross over point “c”, i.e. a number
between 1 and n
– return two children: one composed by the first c bits of the
first parent and the last n-c bits of the second parent, the
other composed by the first c bits of the second parent and
the n-c bits of the first parents
0 1 1 1 1 1 1 0
0 1 1 1 1 1 0 0
0 0 1 0 0 1 0 0
0 0 1 0 0 1 1 0
c
1 1 1 1 1 1 1 0
1 1 1 1 1 1 0 1
0 0 0 0 0 0 0 1
0 0 0 0 0 0 1 0
Mutation
• mutation on individuals:
some of the children’s bits are changed (with a small,
independent probability
0 0 11 1 1 1 0
0 0 11 0 1 1 0
0 1 1 00 1 0 0
f 8 | 3 5 | 6
0 0 11 0 1 1 0
f 8 | 4 4 | 8
0 0 10 0 1 1 0
f 8 | 3 7 | 4
1 0 11 1 1 0 0
f 8 | 5 3 | 6
maximum found
Stopping Criteria
• Convergence:
– A population is said to converge when all the genes have
converged, I.e. when the value of every bit is the same at
least in the 95% of the individuals in the population
• Since convergence is not guaranteed, we must
consider other stopping criteria:
– Number of generations
– Almost constant value of the best fitting individual
– Almost constant value of the average fitness of the
population
Parameter Settings
• Population size
– How many chromosomes are in population
• Too few chromosome small part of search space
• Too many chromosome GA slow down
– Recommendation : 20-30, 50-100
• Probability of crossover
– How often will be crossover performed
– Recommendation : 80% -95%
• Probability of mutation
– How often will be parts of chromosome mutated
– Recommendation : 0.5% - 1%
Genetic Programming
• One of the central challenges of CS is to get
a computer to do what needs to be done,
without telling it how to do it
– Automatic programming (or program synthesis)
• GP is a branch of genetic algorithms
• Main difference between GP and GA
– Representation of the solution (computer program)
• GA: a string of numbers
– fixed-length character strings
• GP: computer program (lisp or scheme)
– Represent hierarchical computer programs of dynamically
varying sizes and shapes