ceng 562 machine learning - METU Computer Engineering

Download Report

Transcript ceng 562 machine learning - METU Computer Engineering

CENG 562
MACHINE LEARNING
GENETIC ALGORITHMS
Presented by
Abdurrahman Çarkacıoğlu
OUTLINE
•
•
•
•
What is GA?
Biological Basis
Terminology
Operations
–
–
–
–
Fitness Function
Reproduction
Crossover
Mutation
• Genetic Programming
• Traditional Methods
vs. GA.
• Applications and
Research Topics
• Summary &
Conclusion
WHAT IS GA ?.
• Search algorithms based on
– mechanics of natural selection
– natural genetics
• Robust, flexible, efficient in vast complex
spaces.
• Use the idea of randomness search but not
directionless.
BIOLOGICAL BASIS OF GA
• Chromosomes
– Heredity aspects of the individual
• Genes
– Encodes a specific feature of the individual.
– Actual value allele.
• Mating
– Both parent pass their chromosomes onto their offsprings.
• The weaker ones tend to die
• Stronger ones survive and produce new offsprings.
• Rarely mutation occurs
GA’s TERMINOLOGY
• String Structures
Chromosomes
(Hypothesis)
• Elements of the String Gene's (Allele)
• Hypothesis/String Structures
– Composed of features, or detectors.
REPRESENTATION of
HYPOTHESIS
• Bit Strings (most of the time)
• Examples:
Outlook={Sunny,Overcast,Rain}
001 encodes Outlook=Rain
011 encodes Outlook=Overcast or Rain
111 encodes Outlook=Sunny or Overcast
or Rain
Rule Representation Example
• Rule :
Whatever the outlook,
if wind=strong then PlayTennis.
Given:
Wind
 {Strong,Weak}
PlayTennis {Yes,No}
Solution: Outlook Wind
PlayTennis
111
10
10
BASIC GA OPERATIONS
 Inner workings of GA is simple.
 Based on
 simple string copying
 substracting concetation
 nothing more,nothing less.
FITNESS FUNCTION
 Returns a single numerical fitness value.
 Problem dependent
 For a function optimization search, it is simply the
value of the function.
• For learning classification rules, it is classification
accuracy over a set of training examples
REPRODUCTION
 Allows
individual
strings to be copied
for possible inclusion
in
the
next
generation.
 Copying chance is
based on the fitness
value.
String Ftn
01001 5
10000 12
01110 9
Perc.
19%
46%
35%
 46% of the time
10000 is selected.
CROSSOVER
• Produce 2 new offspring from 2 parent string.
Initial Strings
Offsprings
MUTATION
• Applied after crossover.
• Randomly alters each gene with a small
probability, typically 0,001.
Point Mutation Example
Initial String
After Mutation
11101101

11111101
Pseudo Code for GA
• choose initial population at random
• evaluate each individual's fitness
• repeat
select individuals to reproduce
mate pairs at random
apply crossover operator
apply mutation operator
evaluate each individual's fitness
• until terminating condition
RECENT WORKS
 Using real numbers instead of bit strings
 Use of knowledge-augmented
operators for GAs.
 Strings may be variable lengths.
genetic
GENETIC PROGRAMMING
• Branch of genetic algorithms
• Difference is the representation of the
solution
• Creates computer programs in the Lisp or
Scheme computer languages as the solution.
Pseudo Code for GP
1)Create an initial population of random composition
of the functions and terminals of the problem.
2) Execute each program, and assign it a fitness
value according to the how well it solves a
problem.
3)Create new population of computer programs.
• )Copy the best existing programs.
• )Create new programs by mutation
• )Create new computer programs by crossover
4) Announce the best computer program when
predefined conditions are satisfied.
GA vs. GP
 GP is much more powerfull than GA.
 The output of the GA is a quantity, while output
of GP is a another computer program.
 Computer programs that program themselves.
 Where there is no ideal solution, GP works best
(ex. A program that drives a car).
Traditional Methods vs. GAs
– .GA a coded form of the function values, rather than with
the actual values themselves.
– .GA use a set, or population not just a single point on the
problem space.
– .GA use only payoff information to guide themselves.
– .Probabilistic, not deterministic.
– .Inherently parallel.
A Brief Survey of GA Technology
•
•
•
•
•
•
•
•
•
.Criminal suspect recognition
.Music Composition
.Constructing and Training Neural Networks
.Scheduling Problems
.Games Playing
.Structural Optimization
.Function Optimization
.Database Query Optimization
Aircraft Design
GABIL by DeJong (1993)
 Uses GA to learn boolean concepts, represented
by disjunctive set of propositional rules.
 Roughly comparable performance among C4.5,
ID5R and AQ14 systems.
 Over 12 synthetic problems
Average generalization accuracy for
GABIL
Others


92.1%
91.6% to 96.6%
Summary & Conclusions
 conduct a randomized, parallel, hill-climbing search
for hypothesis that optimize a predefined fitness
function.
 operate
on
populations
of
strings/hypothesis/chromosomes.
 string represents some underlying parameter set.
 reproduction, crossover and mutation are used to
create new populations.