Transcript original

Lecture 34 of 42
Genetic and Evolutionary Computation
Discussion: GA, GP
Wednesday, 19 November 2008
William H. Hsu
Department of Computing and Information Sciences, KSU
KSOL course page: http://snipurl.com/v9v3
Course web site: http://www.kddresearch.org/Courses/Fall-2008/CIS730
Instructor home page: http://www.cis.ksu.edu/~bhsu
Reading for Next Class:
Sections 22.1, 22.6-7, Russell & Norvig 2nd edition
CIS 530 / 730: Artificial Intelligence
Wednesday, 19 Nov 2008
Computing & Information Sciences
Kansas State University
Learning Hidden Layer Representations
 Hidden Units and Feature Extraction
 Training procedure: hidden unit representations that minimize error E
 Sometimes backprop will define new hidden features that are not explicit in the
input representation x, but which capture properties of the input instances that
are most relevant to learning the target function t(x)
 Hidden units express newly constructed features
 Change of representation to linearly separable D’
 A Target Function (Sparse aka 1-of-C, Coding)
Input
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
Hidden Values
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1








0.89
0.04
0.08
0.01
0.11
0.88
0.01
0.97
0.27
0.99
0.97
0.71
0.03
0.05
0.02
0.22
0.99
0.99
0.80
0.01
0.98
0.60
0.94
0.01
Output








1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
 Can this be learned? (Why or why not?)
CIS 530 / 730: Artificial Intelligence
Wednesday, 19 Nov 2008
Computing & Information Sciences
Kansas State University
Training:
Evolution of Error and Hidden Unit
Encoding
errorD(ok)
hj(01000000), 1  j  3
CIS 530 / 730: Artificial Intelligence
Wednesday, 19 Nov 2008
Computing & Information Sciences
Kansas State University
Training:
Weight Evolution
ui1, 1  i  8
 Input-to-Hidden Unit Weights and Feature Extraction
 Changes in first weight layer values correspond to changes in hidden layer
encoding and consequent output squared errors
 w0 (bias weight, analogue of threshold in LTU) converges to a value near 0
 Several changes in first 1000 epochs (different encodings)
CIS 530 / 730: Artificial Intelligence
Wednesday, 19 Nov 2008
Computing & Information Sciences
Kansas State University
Convergence of Backpropagation
 No Guarantee of Convergence to Global Optimum Solution
 Compare: perceptron convergence (to best h  H, provided h  H; i.e., LS)
 Gradient descent to some local error minimum (perhaps not global minimum…)
 Possible improvements on backprop (BP)
• Momentum term (BP variant with slightly different weight update rule)
• Stochastic gradient descent (BP algorithm variant)
• Train multiple nets with different initial weights; find a good mixture
 Improvements on feedforward networks
• Bayesian learning for ANNs (e.g., simulated annealing) - later
• Other global optimization methods that integrate over multiple networks
 Nature of Convergence
 Initialize weights near zero
 Therefore, initial network near-linear
 Increasingly non-linear functions possible as training progresses
CIS 530 / 730: Artificial Intelligence
Wednesday, 19 Nov 2008
Computing & Information Sciences
Kansas State University
Overtraining in ANNs
 Recall: Definition of Overfitting
 h’ worse than h on Dtrain, better on Dtest
 Overtraining: A Type of Overfitting
 Due to excessive iterations
 Avoidance: stopping criterion
(cross-validation: holdout, k-fold)
 Avoidance: weight decay
Error versus epochs (Example 1)
Error versus epochs (Example 2)
CIS 530 / 730: Artificial Intelligence
Wednesday, 19 Nov 2008
Computing & Information Sciences
Kansas State University
Overfitting in ANNs
 Other Causes of Overfitting Possible
 Number of hidden units sometimes set in advance
 Too few hidden units (“underfitting”)
• ANNs with no growth
• Analogy: underdetermined linear system of equations (more unknowns
than equations)
 Too many hidden units
• ANNs with no pruning
• Analogy: fitting a quadratic polynomial with an approximator of degree
>> 2
 Solution Approaches
 Prevention: attribute subset selection (using pre-filter or wrapper)
 Avoidance
• Hold out cross-validation (CV) set or split k ways (when to stop?)
• Weight decay: decrease each weight by some factor on each epoch
 Detection/recovery: random restarts, addition and deletion of weights,
units
Computing & Information Sciences
CIS 530 / 730: Artificial Intelligence
Wednesday, 19 Nov 2008
Kansas State University
Example:
Neural Nets for Face Recognition
Left
Straight
Right
Up
Output Layer Weights (including w0 = ) after 1 Epoch
Hidden Layer Weights after 25 Epochs
30 x 32 Inputs
Hidden Layer Weights after 1 Epoch
 90% Accurate Learning Head Pose, Recognizing 1-of-20 Faces
 http://www.cs.cmu.edu/~tom/faces.html
CIS 530 / 730: Artificial Intelligence
Wednesday, 19 Nov 2008
Computing & Information Sciences
Kansas State University
Example:
NetTalk
 Sejnowski and Rosenberg, 1987
 Early Large-Scale Application of Backprop
 Learning to convert text to speech
• Acquired model: a mapping from letters to phonemes and stress marks
• Output passed to a speech synthesizer
 Good performance after training on a vocabulary of ~1000 words
 Very Sophisticated Input-Output Encoding
 Input: 7-letter window; determines the phoneme for the center letter and
context on each side; distributed (i.e., sparse) representation: 200 bits
 Output: units for articulatory modifiers (e.g., “voiced”), stress, closest
phoneme; distributed representation
 40 hidden units; 10000 weights total
 Experimental Results
 Vocabulary: trained on 1024 of 1463 (informal) and 1000 of 20000 (dictionary)
 78% on informal, ~60% on dictionary
 http://en.wikipedia.org/wiki/NETtalk_(artificial_neural_network)
CIS 530 / 730: Artificial Intelligence
Wednesday, 19 Nov 2008
Computing & Information Sciences
Kansas State University
NeuroSolutions Demo
CIS 530 / 730: Artificial Intelligence
Wednesday, 19 Nov 2008
Computing & Information Sciences
Kansas State University
PAC Learning:
Definition and Rationale
 Intuition
 Can’t expect a learner to learn exactly
• Multiple consistent concepts
• Unseen examples: could have any label (“OK” to mislabel if “rare”)
 Can’t always approximate c closely (probability of D not being representative)
 Terms Considered
 Class C of possible concepts, learner L, hypothesis space H
 Instances X, each of length n attributes
 Error parameter , confidence parameter , true error errorD(h)
 size(c) = the encoding length of c, assuming some representation
 Definition
 C is PAC-learnable by L using H if for all c  C, distributions D over X,  such
that 0 <  < 1/2, and  such that 0 <  < 1/2, learner L will, with probability at least
(1 - ), output a hypothesis h  H such that errorD(h)  
 Efficiently PAC-learnable: L runs in time polynomial in 1/, 1/, n, size(c)
CIS 530 / 730: Artificial Intelligence
Wednesday, 19 Nov 2008
Computing & Information Sciences
Kansas State University
PAC Learning:
Results for Two Hypothesis
Languages
 Unbiased Learner
 Recall: sample complexity bound m  1/ (ln | H | + ln (1/))
 Sample complexity not always polynomial
 Example: for unbiased learner, | H | = 2 | X |
 Suppose X consists of n booleans (binary-valued attributes)
• | X | = 2n, | H | = 22n
• m  1/ (2n ln 2 + ln (1/))
• Sample complexity for this H is exponential in n
 Monotone Conjunctions
 Target function of the form y  f x1 , , x n   x1'    x 'k
 Active learning protocol (learner gives query instances): n examples needed
 Passive learning with a helpful teacher: k examples (k literals in true concept)
 Passive learning with randomly selected examples (proof to follow):
m  1/ (ln | H | + ln (1/)) = 1/ (ln n + ln (1/))
CIS 530 / 730: Artificial Intelligence
Wednesday, 19 Nov 2008
Computing & Information Sciences
Kansas State University
PAC Learning:
Monotone Conjunctions [1]
 Monotone Conjunctive Concepts
 Suppose c  C (and h  H) is of the form x1  x2  …  xm
 n possible variables: either omitted or included (i.e., positive literals only)
 Errors of Omission (False Negatives)
 Claim: the only possible errors are false negatives (h(x) = -, c(x) = +)
 Mistake iff (z  h)  (z  c)  ( x  Dtest . x(z) = false): then h(x) = -, c(x) = +
 Probability of False Negatives
 Let z be a literal; let Pr(Z) be the probability that z is false in a positive x  D
 z in target concept (correct conjunction c = x1  x2  …  xm)  Pr(Z) = 0
 Pr(Z) is the probability that a randomly chosen positive example has z = false
(inducing a potential mistake, or deleting z from h if training is still in progress)
 error(h)  z  h Pr(Z)
Instance Space X
c
+
-
CIS 530 / 730: Artificial Intelligence
h
+
-
Wednesday, 19 Nov 2008
-
+
+
Computing & Information Sciences
Kansas State University
PAC Learning:
Monotone Conjunctions [2]
 Bad Literals
 Call a literal z bad if Pr(Z) >  = ’/n
 z does not belong in h, and is likely to be dropped (by appearing with value true
in a positive x  D), but has not yet appeared in such an example
 Case of No Bad Literals
 Lemma: if there are no bad literals, then error(h)  ’
 Proof: error(h)  z  h Pr(Z)  z  h ’/n  ’ (worst case: all n z’s are in c ~ h)
 Case of Some Bad Literals
 Let z be a bad literal
 Survival probability (probability that it will not be eliminated by a given
example): 1 - Pr(Z) < 1 - ’/n
 Survival probability over m examples: (1 - Pr(Z))m < (1 - ’/n)m
 Worst case survival probability over m examples (n bad literals) = n (1 - ’/n)m
 Intuition: more chance of a mistake = greater chance to learn
CIS 530 / 730: Artificial Intelligence
Wednesday, 19 Nov 2008
Computing & Information Sciences
Kansas State University
PAC Learning:
Monotone Conjunctions [3]
 Goal: Achieve An Upper Bound for Worst-Case Survival Probability
 Choose m large enough so that probability of a bad literal z surviving across m
examples is less than 
 Pr(z survives m examples) = n (1 - ’/n)m < 
 Solve for m using inequality 1 - x < e-x
• n e-m’/n < 
• m > n/’ (ln (n) + ln (1/)) examples needed to guarantee the bounds
 This completes the proof of the PAC result for monotone conjunctions
 Nota Bene: a specialization of m  1/ (ln | H | + ln (1/)); n/’ = 1/
 Practical Ramifications
 Suppose  = 0.1, ’ = 0.1, n = 100: we need 6907 examples
 Suppose  = 0.1, ’ = 0.1, n = 10: we need only 460 examples
 Suppose  = 0.01, ’ = 0.1, n = 10: we need only 690 examples
CIS 530 / 730: Artificial Intelligence
Wednesday, 19 Nov 2008
Computing & Information Sciences
Kansas State University
PAC Learning:
k-CNF, k-Clause-CNF, k-DNF, k-Term-DNF




k-CNF (Conjunctive Normal Form) Concepts: Efficiently PAC-Learnable

Conjunctions of any number of disjunctive clauses, each with at most k literals

c = C1  C2  …  Cm; Ci = l1  l1  …  lk; ln (| k-CNF |) = ln (2(2n) ) = (nk)

Algorithm: reduce to learning monotone conjunctions over nk pseudo-literals Ci
k
k-Clause-CNF

c = C1  C2  …  Ck; Ci = l1  l1  …  lm; ln (| k-Clause-CNF |) = ln (3kn) = (kn)

Efficiently PAC learnable? See below (k-Clause-CNF, k-Term-DNF are duals)
k-DNF (Disjunctive Normal Form)

Disjunctions of any number of conjunctive terms, each with at most k literals

c = T1  T2  … Tm; Ti = l1  l1  …  lk
k-Term-DNF: “Not” Efficiently PAC-Learnable (Kind Of, Sort Of…)

c = T1  T2  … Tk; Ti = l1  l1  …  lm; ln (| k-Term-DNF |) = ln (k3n) = (n + ln k)

Polynomial sample complexity, not computational complexity (unless RP = NP)

Solution: Don’t use H = C! k-Term-DNF  k-CNF (so let H = k-CNF)
CIS 530 / 730: Artificial Intelligence
Wednesday, 19 Nov 2008
Computing & Information Sciences
Kansas State University
Consistent Learners
 General Scheme for Learning
 Follows immediately from definition of consistent hypothesis
 Given: a sample D of m examples
 Find: some h  H that is consistent with all m examples
 PAC: show that if m is large enough, a consistent hypothesis must be close
enough to c
 Efficient PAC (and other COLT formalisms): show that you can compute the
consistent hypothesis efficiently
 Monotone Conjunctions
 Used an Elimination algorithm (compare: Find-S) to find a hypothesis h that is
consistent with the training set (easy to compute)
 Showed that with sufficiently many examples (polynomial in the parameters),
then h is close to c
 Sample complexity gives an assurance of “convergence to criterion” for
specified m, and a necessary condition (polynomial in n) for tractability
CIS 530 / 730: Artificial Intelligence
Wednesday, 19 Nov 2008
Computing & Information Sciences
Kansas State University
VC Dimension:
Framework
 Infinite Hypothesis Space?
 Preceding analyses were restricted to finite hypothesis spaces
 Some infinite hypothesis spaces are more expressive than others, e.g.,
• rectangles vs. 17-sided convex polygons vs. general convex polygons
• linear threshold (LT) function vs. a conjunction of LT units
 Need a measure of the expressiveness of an infinite H other than its size
 Vapnik-Chervonenkis Dimension: VC(H)
 Provides such a measure
 Analogous to | H |: there are bounds for sample complexity using VC(H)
CIS 530 / 730: Artificial Intelligence
Wednesday, 19 Nov 2008
Computing & Information Sciences
Kansas State University
VC Dimension:
Shattering A Set of Instances


Dichotomies

Recall: a partition of a set S is a collection of disjoint sets Si whose union is S

Definition: a dichotomy of a set S is a partition of S into two subsets S1 and S2
Shattering

A set of instances S is shattered by hypothesis space H if and only if for every dichotomy of S, there exists a hypothesis in H
consistent with this dichotomy


Intuition: a rich set of functions shatters a larger instance space
The “Shattering Game” (An Adversarial Interpretation)

Your client selects an S (an instance space X)

You select an H

Your adversary labels S (i.e., chooses a point c from concept space C = 2X)

You must find then some h  H that “covers” (is consistent with) c

If you can do this for any c your adversary comes up with, H shatters S
CIS 530 / 730: Artificial Intelligence
Wednesday, 19 Nov 2008
Computing & Information Sciences
Kansas State University
VC Dimension:
Examples of Shattered Sets
 Three Instances Shattered
Instance Space X
 Intervals
 Left-bounded intervals on the real axis: [0, a), for a  R  0
-
• Sets of 2 points cannot be shattered
+
0
• Given 2 points, can label so that no hypothesis will be consistent
a
 Intervals on the real axis ([a, b], b  R > a  R): can shatter 1 or 2 points, not 3
 Half-spaces in the plane (non-collinear): 1? 2? 3? 4?
+
a
CIS 530 / 730: Artificial Intelligence
Wednesday, 19 Nov 2008
+
b
Computing & Information Sciences
Kansas State University
Lecture Outline
 Readings for Friday
 Finish Chapter 20, Russell and Norvig 2e
 Suggested: Chapter 1, 6.1-6.5, Goldberg; 9.1 – 9.4, Mitchell
 Evolutionary Computation
 Biological motivation: process of natural selection
 Framework for search, optimization, and learning
 Prototypical (Simple) Genetic Algorithm
 Components: selection, crossover, mutation
 Representing hypotheses as individuals in GAs
 An Example: GA-Based Inductive Learning (GABIL)
 GA Building Blocks (aka Schemas)
 Taking Stock (Course Review)
CIS 530 / 730: Artificial Intelligence
Wednesday, 19 Nov 2008
Computing & Information Sciences
Kansas State University
Simple Genetic Algorithm (SGA)
 Algorithm Simple-Genetic-Algorithm (Fitness, Fitness-Threshold, p, r, m)
// p: population size; r: replacement rate (aka generation gap width), m: string
size
 P  p random hypotheses
// initialize population
 FOR each h in P DO f[h]  Fitness(h)
// evaluate Fitness: hypothesis  R
 WHILE (Max(f) < Fitness-Threshold) DO
 1. Select: Probabilistically select (1 - r)p members of P to add to PS
P hi  
f hi 
 f h 
p
j 1
 2. Crossover:
j
 Probabilistically select (r · p)/2 pairs of hypotheses from P
 FOR each pair <h1, h2> DO
PS += Crossover (<h1, h2>)
// PS[t+1] = PS[t] + <offspring1, offspring2>
 3. Mutate: Invert a randomly selected bit in m · p random members of PS
 4. Update: P  PS
 5. Evaluate: FOR each h in P DO f[h]  Fitness(h)
 RETURN the hypothesis h in P that has maximum fitness f[h]
CIS 530 / 730: Artificial Intelligence
Wednesday, 19 Nov 2008
Computing & Information Sciences
Kansas State University
GA-Based Inductive Learning (GABIL)
 GABIL System [Dejong et al, 1993]
 Given: concept learning problem and examples
 Learn: disjunctive set of propositional rules
 Goal: results competitive with those for current decision tree learning
algorithms (e.g., C4.5)
 Fitness Function: Fitness(h) = (Correct(h))2
 Representation
 Rules: IF a1 = T  a2 = F THEN c = T; IF a2 = T THEN c = F
 Bit string encoding: a1 [10] . a2 [01] . c [1] . a1 [11] . a2 [10] . c [0] = 10011
11100
 Genetic Operators
 Want variable-length rule sets
 Want only well-formed bit string hypotheses
CIS 530 / 730: Artificial Intelligence
Wednesday, 19 Nov 2008
Computing & Information Sciences
Kansas State University
Crossover:
Variable-Length Bit Strings
 Basic Representation
 Start with
a1
a2
c
a1
a2
c
h1
1[0
01
1
11
1]0
0
h2
0[1
1]1
0
10
01
0
 Idea: allow crossover to produce variable-length offspring
 Procedure
 1. Choose crossover points for h1, e.g., after bits 1, 8
 2. Now restrict crossover points in h2 to those that produce bitstrings with
well-defined semantics, e.g., <1, 3>, <1, 8>, <6, 8>
 Example
 Suppose we choose <1, 3>
 Result
h3
11 10
h4
00
CIS 530 / 730: Artificial Intelligence
0
01 1 11
11
0 10
01
0
Wednesday, 19 Nov 2008
Computing & Information Sciences
Kansas State University
GABIL Extensions
 New Genetic Operators
 Applied probabilistically
 1. AddAlternative: generalize constraint on ai by changing a 0 to a 1
 2. DropCondition: generalize constraint on ai by changing every 0 to a 1
 New Field
 Add fields to bit string to decide whether to allow above operators
a1
a2
c
a1
a2
c
AA
DC
01
11
0
10
01
0
1
0
 So now learning strategy also evolves!
 aka genetic wrapper
CIS 530 / 730: Artificial Intelligence
Wednesday, 19 Nov 2008
Computing & Information Sciences
Kansas State University
GABIL Results
 Classification Accuracy
 Compared to symbolic rule/tree learning methods
 C4.5 [Quinlan, 1993]
 ID5R
 AQ14 [Michalski, 1986]
 Performance of GABIL comparable
 Average performance on a set of 12 synthetic problems: 92.1% test
accuracy
 Symbolic learning methods ranged from 91.2% to 96.6%
 Effect of Generalization Operators
 Result above is for GABIL without AA and DC
 Average test set accuracy on 12 synthetic problems with AA and DC: 95.2%
CIS 530 / 730: Artificial Intelligence
Wednesday, 19 Nov 2008
Computing & Information Sciences
Kansas State University
Building Blocks
(Schemas)
 Problem
 How to characterize evolution of population in GA?
 Goal
 Identify basic building block of GAs
 Describe family of individuals
 Definition: Schema
 String containing 0, 1, * (“don’t care”)
 Typical schema: 10**0*
 Instances of above schema: 101101, 100000, …
 Solution Approach
 Characterize population by number of instances representing each schema
 m(s, t)  number of instances of schema s in population at time t
CIS 530 / 730: Artificial Intelligence
Wednesday, 19 Nov 2008
Computing & Information Sciences
Kansas State University
Selection and Building Blocks
 Restricted Case: Selection Only
 f t 
 average fitness of population at time t
 m(s, t)  number of instances of schema s in population at time t
 ûs, t   average fitness of instances of schema s at time t
 Quantities of Interest
 Probability of selecting h in one selection step
f h 
P h   n
i 1f hi 
 Probability of selecting an instance of s in one selection step
f h  uˆ s, t 
P h  s   

 ms, t 
n  f t 
h s  pt  n  f t 
 Expected number of instances of s after n selections
E ms, t  1 
CIS 530 / 730: Artificial Intelligence
uˆ s, t 
 ms, t 
f t 
Wednesday, 19 Nov 2008
Computing & Information Sciences
Kansas State University
Schema Theorem
 Theorem
d 
uˆ s, t 

o s 
E ms, t  1 
 ms, t   1- pc s   1- pm 
l - 1
f t 

 m(s, t)
 number of instances of schema s in population at time t
 f t 
 average fitness of population at time t
 ûs, t 
 average fitness of instances of schema s at time t
 pc
 probability of single point crossover operator
 pm
 probability of mutation operator
 l
 length of individual bit strings
 o(s)
 number of defined (non “*”) bits in s
 d(s)
 distance between rightmost, leftmost defined bits in s
 Intuitive Meaning
 “The expected number of instances of a schema in the population tends
toward its relative fitness”
 A fundamental theorem of GA analysis and design
CIS 530 / 730: Artificial Intelligence
Wednesday, 19 Nov 2008
Computing & Information Sciences
Kansas State University
Genetic Programming
 Readings / Viewings
 View GP videos 1-3
 GP1 – Genetic Programming: The Video
 GP2 – Genetic Programming: The Next Generation
 GP3 – Genetic Programming: Invention
 GP4 – Genetic Programming: Human-Competitive
 Suggested: Chapters 1-5, Koza
 Previously
 Genetic and evolutionary computation (GEC)
 Generational vs. steady-state GAs; relation to simulated annealing, MCMC
 Schema theory and GA engineering overview
 Today: GP Discussions
 Code bloat and potential mitigants: types, OOP, parsimony, optimization,
reuse
 Genetic programming vs. human programming: similarities, differences
CIS 530 / 730: Artificial Intelligence
Wednesday, 19 Nov 2008
Computing & Information Sciences
Kansas State University
GP Flow Graph
Adapted from The Genetic Programming Notebook © 2002 Jaime J. Fernandez
http://www.geneticprogramming.com
CIS 530 / 730: Artificial Intelligence
Wednesday, 19 Nov 2008
Computing & Information Sciences
Kansas State University
Structural Crossover
Adapted from The Genetic Programming Notebook © 2002 Jaime J. Fernandez
http://www.geneticprogramming.com
CIS 530 / 730: Artificial Intelligence
Wednesday, 19 Nov 2008
Computing & Information Sciences
Kansas State University
Structural Mutation
Adapted from The Genetic Programming Notebook © 2002 Jaime J. Fernandez
http://www.geneticprogramming.com
CIS 530 / 730: Artificial Intelligence
Wednesday, 19 Nov 2008
Computing & Information Sciences
Kansas State University
Terminology
 Evolutionary Computation (EC): Models Based on Natural Selection
 Genetic Algorithm (GA) Concepts
 Individual: single entity of model (corresponds to hypothesis)
 Population: collection of entities in competition for survival
 Generation: single application of selection and crossover operations
 Schema aka building block: descriptor of GA population (e.g., 10**0*)
 Schema theorem: representation of schema proportional to its relative fitness
 Simple Genetic Algorithm (SGA) Steps
 Selection
 Proportionate (aka roulette wheel): P(individual)  f(individual)
 Tournament: let individuals compete in pairs or tuples; eliminate unfit
ones
 Crossover
 Single-point: 11101001000  00001010101  { 11101010101, 00001001000 }
 Two-point: 11101001000  00001010101  { 11001011000, 00101000101 }
 Uniform: 11101001000  00001010101  { 10001000100, 01101011001 }
 Mutation: single-point (“bit flip”), multi-point
CIS 530 / 730: Artificial Intelligence
Wednesday, 19 Nov 2008
Computing & Information Sciences
Kansas State University
Summary Points
 Evolutionary Computation
 Motivation: process of natural selection
 Limited population; individuals compete for membership
 Method for parallelizing and stochastic search
 Framework for problem solving: search, optimization, learning
 Prototypical (Simple) Genetic Algorithm (GA)
 Steps
 Selection: reproduce individuals probabilistically, in proportion to fitness
 Crossover: generate new individuals probabilistically, from pairs of “parents”
 Mutation: modify structure of individual randomly
 How to represent hypotheses as individuals in GAs
 An Example: GA-Based Inductive Learning (GABIL)
 Schema Theorem: Propagation of Building Blocks
 Next Lecture: Genetic Programming, The Movie
CIS 530 / 730: Artificial Intelligence
Wednesday, 19 Nov 2008
Computing & Information Sciences
Kansas State University