Genetic Programming
Download
Report
Transcript Genetic Programming
Genetic Programming
•
•
•
•
Dinesh Dharme
Prateek Srivastav
Pankhil Chheda
Harshad Kanhere
Outline
•
•
•
•
•
•
•
Introduction
Motivation
Theory
Symbolic Regression using GP
Applications
Summary
References
Inspiration
• Based on Darwin’s concept of “Survival of
Fittest”
• Fitter individuals survive and reproduce at a
higher rate.
• Structure of individuals in population
changes because of natural selection.
What is Genetic Programming?
• Part of larger discipline called Machine Learning.
Machine learning can be best described as "the study
of computer algorithms that improve automatically
through experience" (Mitchell 1996).
• It attempts to solve the problem - How can computers
be made to do what needs to be done without being
told exactly how to do it?
• This is where the aspect of Artificial Intelligence
comes into play.
Why Genetic Programming?
• It saves time by freeing the human from having to
design complex algorithms. Not only designing the
algorithms but creating ones that give optimal
solutions.
• “It combines genetic algorithms with the basic thrust
of AI, which was to get computers to do things
automatically – to evolve a population of programs”
- John R. Koza
Algorithm
• Randomly create an initial population of programs from the
available primitives
• Repeat
• Execute each program and ascertain its fitness
• Select one or two program(s) from the population with a
probability based on fitness to participate in genetic
operations
• Create new individual program(s) by applying genetic
operations with specified probabilities
• until an acceptable solution is found or some other stopping
condition is met
• return the best-so-far individual
Tree Structure
• Programs are
represented as
tree.
(+ 1 2 (IF (> TIME 10) 3 4))
Basics
• Terminal: The variables and constants in the
program (x, y and 3) which forms leaves of the
tree.
• Funtion: The arithmetic operations (+, * and
max) are internal nodes called functions.
• Primitive set : Functions and terminals together
from primitive set.
Initialization of population
• Full Method: Selecting internal nodes from
function set untill maximum depth is reached.
At the end, select node from terminal set. All
leaves are at same level.
• Grow Method: Select nodes from the primitive
set until maximum depth is reached.
• Ramped half-and-half: Half the initial
population is constructed using full and half is
constructed using grow. Depth limits of trees not
fixed.
Initialization of population(Contd.)
• Initialization need not be entirely random.
• If properties of the desired solution is known,
produce trees accordingly. This creates intial
bias in population for faster convergence.
Selection Strategies
• Tournament Selection: Best individual from a
set of randomly selected individuals is chosen to
be parent.
• Advantage: It rescales fitness among the
population. A best program doesn't populate the
next generation with its children reducing
diversity.
• Disadvantage: A small difference in fitness gets
amplified in next generation. Even though it is
just marginally better than its competitior.
Selection Strategies(Contd.)
• FitnessProportionate
Selection: Select
individuals based
on their fitness
values.
0.17
0.25
0.08
0.5
Breeding Next Generation
• Crossover, mutation and reproduction are main
operators applied on parents to generate
offsprings.
• Subtree Crossover: Create the offspring by
replacing the subtree rooted at the crossover
point in a copy of the first parent with a copy of
the subtree rooted at the crossover point in the
second parent.
• Reproduction: Simply copy an elite individual to
next generation. Ensures passing of good genes.
Subtree Crossover
•
Crossover contd.
• Since most of the nodes of tree are leaves.Hence
crossover results in replacement/exchange of
small genetic material.
• John R. Koza suggested 90 times functions and
10 times terminals should be selected for
crossover.
Mutation
• Subtree Mutation:
Replace the
mutation point by
randomly generated
tree.
• Point Mutation:
Randomly select a
node and replace it
with another
primitive of same
arity.
Contraints on Operators:closure
• Type Consistency: Subtree crossover can mix
and join nodes arbitrarily. Hence all functions
used must be type consistent. i.e they return
values of same type.
• Evaluation safety: Subtree after evaluation
should produce a valid value to be used. For
this, protected versions of functions used. (e.g.
“/” with 2nd argument 0 gives 1 as its output).
Sufficiency
• Sufficiency means solution to the problem can
be expressed through combinations of different
primitives used.
• In general sufficiency cannot be guaranteed. In
areas where theory is well developed, we can
decide on primitives to be used.
Fitness Function
• Gives a measure of how good a computer program
is at solving the given problem.
• Proper encoding of problem through fitness
functions is important for GP's success.
• Measuring fitness of a given program means
executing the program.
• Evaluation of program tree is done in depth first
search order.
GP Parameters
Control parameters:
• Population Size, N
• Maximum number of generations, Gmax
• Probability Pc of crossover
• Probability Pm of mutation
• Probability Pr of reproduction
Five Preparatory steps to set up GP
•
•
•
•
•
Determining the set T of terminals.
Determining the set F of functions.
Determining the fitness measures.
Determining the GP parameters.
Determining the Termination criteria and
result designation
Symbolic Regression using GP
• Symbolic regression: The process of
mechanically creating a computer
program that fit certain numerical data.
• We want to evolve an expression whose
values match those of the quadratic
polynomial x2 + x + 1 in the range [−1,1].
SYMBOLIC REGRESSION
Independent variable X
Dependent variable Y
-1.00
-0.80
-0.60
-0.40
-0.20
0.00
0.20
0.40
0.60
0.80
1.00
1.00
0.84
0.76
0.76
0.84
1.00
1.24
1.56
1.96
2.44
3.00
Preparatory Steps
Objective:
1
Terminal set:
Find a computer program with one input
(independent variable X) whose output
equals the given data
T = {X, Random-Constants}
2
Function set:
F = {+,
3
Fitness:
The sum of the absolute value of the
differences
between
the
candidate
program’s output and the given data
(computed over numerous values of the
independent variable x from –1.0 to +1.0)
4
Parameters:
Population size M = 4
5
Termination:
An individual emerges whose sum of
absolute errors is less than 0.1
-,
*, %}
Symbolic Regression
(initialization)
Generation 0 with
4 individuals
• Two parental
chromosomes exchange part of
•
•
their genetic information to create new
hybrid combinations (recombinant).
No loss of genes, but an exchange of genes
between two previous chromosomes.
No new genes created, preexisting old ones
mixed together.
Symbolic regression(fitness)
x+1
x2 + 1
2
x
0.67
1.00
1.70
2.67
Symbolic Regression
(Generation1)
Reproductio
n:Copy of (a)
Mutation of (c)
: Picking “2” as
mutation point
First offspring of
crossover of (a)
and (b) picking
“+” of parent (a)
and left-most “x”
of parent (b) as
crossover points
Second offspring
of crossover of
(a) and (b)
picking “+” of
parent (a) and
left-most “x” of
parent (b) as
crossover points
Areas where GP will do well
• The interrelationships among the relevant variables
is unknown or poorly understood.
• Finding the size and shape of the ultimate solution
is a major part of the problem.
• Search domain is very large and solutions are
sparsely distributed.
• There are good simulators to test the performance
of tentative solutions to a problem, but poor
methods to directly obtain good solutions.
Applications
• GP was used in designing of an antenna for
deployment on NASA’s Space Technology 5
Mission.
• Designs generated are complex, non-intuitive, and
100% compliant with the mission antenna
performance requirements.
Applications contd.
• Result of
Genetic
Programming
which was
allowed to do
branching.
• Below we
show the
genotype for
this antenna.
Ref. [4]
• forward(L,R): creates a rod of length L and radius R.
• rotate-x(rad): rotates the rod by angle rad around x
axis. Similarly others are defined.
Applications contd.
• Result of
genetic
algorithm.
• Branching
not
allowed.
Ref. [4]
Summary
• Genetic programming now routinely delivers highreturn human-competitive machine intelligence.
• The AI ratio (the “artificial-to-intelligence” ratio) of
a problem-solving method as the ratio of that which
is delivered by the automated operation of the
artificial method to the amount of intelligence that
is supplied by the human applying the method to a
particular problem.
• GP has very high AI ratio.
Summary
• Routine:A problem solving method is routine if it is
general and relatively little human effort is required
to get the method to successfully handle new
problems within a particular domain and to
successfully handle new problems from a different
domain.
• Human-Competitiveness: Previously patented, an
improvement over a patented invention, or
patentable today. Publishable in its own right as a
new scientific result.
References
[1]Book: “A Field Guide to Genetic Programming” by
Poli, Langdon, McPhee. 2008.
[2]www.genetic-pragramming.com
[3]http://en.wikipedia.org/wiki/Genetic_programming
[4]Figures: “An Evolved Antenna for Deployment on
Nasa’s ST5 Mission” by J. Lohn, G. Hornby, D. Linden,
GTPT, May-2004
Suggested Readings
1.Book: “Genetic Programming: On the programming
of computers by means of natural selection”
by John R. Koza, MIT Press
2.Book: “Adaptation in Natural and Artificial
Systems” by John H. Holland, MIT Press