Machine Learning for Information Retrieval
Download
Report
Transcript Machine Learning for Information Retrieval
Evolutionary Computation
(진 화 연 산)
장병탁
서울대 컴퓨터공학부
E-mail: [email protected]
http://bi.snu.ac.kr./~btzhang/
Byoung-Tak Zhang
School of Computer Science and Engineering
Seoul National University
This material is also available online at http://bi.snu.ac.kr/
Outline
1.
Basic Concepts
2.
Theoretical Backgrounds
3.
Applications
4.
Current Issues
5.
References and URLs
2
1. Basic Concepts
3
Charles Darwin (1859)
“Owing to this struggle for life, any variation, however
slight and from whatever cause proceeding, if it be in any
degree profitable to an individual of any species, in its
infinitely complex relations to other organic beings and to
external nature, will tend to the preservation of that
individual, and will generally be inherited by its
offspring.”
4
Evolutionary Algorithms
A Computational Model Inspired by Natural Evolution and
Genetics
Proved Useful for Search, Machine Learning and
Optimization
Population-Based Search (vs. Point-Based Search)
Probabilistic Search (vs. Deterministic Search)
Collective Learning (vs. Individual Learning)
Balance of Exploration (Global Search) and Exploitation
(Local Search)
5
Biological Terminology
Gene
Functional entity that codes for a specific feature e.g. eye color
Set of possible alleles
Allele
Value of a gene e.g. blue, green, brown
Codes for a specific variation of the gene/feature
Locus
Position of a gene on the chromosome
Genome
Set of all genes that define a species
The genome of a specific individual is called genotype
The genome of a living organism is composed of several
Chromosomes
Population
Set of competing genomes/individuals
6
Analogy to
Evolutionary Biology
Individual (Chromosome) = Possible Solution
Population = A Collection of Possible Solutions
Fitness = Goodness of Solutions
Selection (Reproduction) = Survival of the Fittest
Crossover = Recombination of Partial Solutions
Mutation = Alteration of an Existing Solution
7
The Evolution Loop
initialize population
evaluate
select mating partners
(terminate)
recombine
select
mutate
evaluate
8
Basic Evolutionary Algorithm
Generation of initial solutions
(A priori knowledge, results of earlier run, random)
Evaluation
Generation of
variants by mutation
and crossover
Selection
Solution
Sufficiently
good?
NO
YES
END
9
Procedure
begin
t <- 0;
initialize P(t);
evaluate P(t);
while (not termination condition) do
recombine P(t) to yield C(t);
evaluate C(t);
select P(t+1) from P(t) and C(t);
t <- t+1;
end
end
10
General Structure of EAs
crossover
chromosomes
encoding
solutions
1100101010
1011101110
0011011001
1100110001
110010 1010
101110 1110
110010 1110
101110 1010
mutation
00110 1 1001
new
population
selection
evaluation
1100101110
1011101010
0011001001
roulette
wheel
00110 0 1001
solutions
fitness
computation
11
Population and Fitness
6
4
6
2
8
4
6
6
12
Selection, Crossover, and Mutation
6
Distinction
4
Mutation
6
2
Crossover
8
10
Reproduction
8
4
8
6
6
13
Simulated Evolution
Population
(chromosomes)
Offspring
New
generation
Parents
Genetic
operators
Decoded
strings
Evaluation
(fitness)
Manipulation
Mating
Selection
(mating pool)
Reproduction
14
Selection Strategies
Proportionate Selection
Reproduce offspring in proportion to fitness fi.
f a
j 1
t
j
1
, 1 i
t
ps ai
0, i
Ranking Selection
Select individuals according to rank(fi).
ps ait
f ait
Tournament Selection
Choose q individuals at random, the best of which survives.
ps ait
1
q
i 1
q
i
q
Other Ways
15
Roulette Wheel Selection
Selection is a stochastic process
Probability of reproduction pi = fi /
S k fk
intermediate parent population:
01011
11010
10001
10001
16
Genetic Operators for Bitstrings
• Reproduction: make copies of chromosome (the fitter the
chromosome, the more copies)
10000100
10000100
10000100
• Crossover: exchange subparts of two chromosomes
100|00100
10011111
111|11111
11100100
• Mutation: randomly flip some bits
00000100
00000000
17
Mutation
For a binary string, just randomly “flip” a bit
For a more complex structure, randomly select a
site, delete the structure associated with this site,
and randomly create a new sub-structure
Some EAs just use mutation (no crossover)
Normally, however, mutation is used to search in
the “local search space”, by allowing small
changes in the genotype (and therefore hopefully
in the phenotype)
18
Recombination (Crossover)
Crossover is used to swap (fit) parts of individuals,
in a similar way to sexual reproduction
Parents are selected based on fitness
Crossover sites selected (randomly, although other
mechanisms exist), with some prob.
Parts of the parents are exchanged to produce
children
19
Crossover
One-point crossover
parent A
parent B
11010
10001
offspring A
offspring B
1101 1
offspring A
offspring B
11 00 0
1000 0
Two-point crossover
parent A
parent B
11010
10001
10 01 1
20
2. Theoretical Backgrounds
21
Major Evolutionary Algorithms
Genetic
Programming
Classifier
Systems
Evolution
Strategies
Genetic
Algorithms
Evolutionary
Programming
Hybrids: BGA
• Genetic representation of candidate solutions
• Genetic operators
• Selection scheme
• Problem domain
22
Variants of Evolutionary Algorithms
Genetic Algorithm (Holland et al., 1960’s)
Bitstrings, mainly crossover, proportionate selection
Evolution Strategy (Rechenberg et al., 1960’s)
Real values, mainly mutation, truncation selection
Evolutionary Programming (Fogel et al., 1960’s)
FSMs, mutation only, tournament selection
Genetic Programming (Koza, 1990)
Trees, mainly crossover, proportionate selection
Hybrids: BGA (Muehlenbein et al., 1993)
BGP (Zhang et al., 1995) and others.
23
Evolution Strategy (ES)
Problem of real-valued optimization
Find extremum (minimum) of function F(X): Rn ->R
Operate directly on real-valued vector X
Generate new solutions through Gaussian
mutation of all components
Selection mechanism for determining new parents
24
ES: Representation
One individual:
a x1 ,, xn , 1 ,, n , 1 ,, n I
x
The three parts of an individual:
x
: Object variables
: Standard deviations
: Rotation angles
f (x )
Fitness
Variances
Covariances
25
ES: Operator - Recombination
rr x r r , where rx, r , r {-, d, D, i, I, g, G}, e.g. rdII
x S ,i
xS ,i or xT ,i
xS ,i or xT ,i
i
xi xS ,i ( xT ,i xS ,i ) / 2
xS ,i ( xTi ,i xS ,i ) / 2
xS ,i ( xT ,i xS ,i )
xS ,i i ( xT ,i xS ,i )
i
no recombinat ion
r
discrete
rd
panmictic discrete
rD
intermedia te
ri
panmictic intermedia te
rI
generalize d intermedia te
rg
panmictic generalize d intermedia te rG
26
ES: Operator - Mutation
m{,’,} : I I is an asexual operator.
n = n, n = n(n-1)/2
1
i i exp( N (0,1) N i (0,1)) 2 n
1
j j N j (0,1)
2
n
x x N (0, C( , ))
0.0873
1 < n < n, n = 0
i i exp( N (0,1) N i (0,1))
xi xi i N i (0,1)
n = 1, n = 0
exp( 0 N (0,1))
xi xi N i (0,1)
0 1/ n
27
ES: Illustration of Mutation
Hyperellipsoids
Line of equal probability density to place an offspring
28
ES: Evolution Strategy vs.
Genetic Algorithm
Create random initial population
Insert into
population
Evaluate population
Create random initial population
Insert into
population
Evaluate population
Select individuals for variation
Vary
Vary
Selection
29
Evolutionary Programming (EP)
Original form (L. J. Fogel)
Uniform random mutations
Discrete alphabets
( ) selection
Extended Evolutionary Programming (D. B. Fogel)
Continuous parameter optimization
Similarities with ES
• Normally distributed mutation
• Self-adaptation of mutation parameters
30
EP: Representation
Constraints for domain and variances
Initialization only-operators does not survey
• Search space is principally unconstrained.
Individual
a ( x, ) I Ax As ,
Ax R n and As (std EP) or As Rn (meta EP)
31
EP: Operator - Recombination
No recombination
Gaussian mutation does better (Fogel and Atmar).
• Not all situations
Evolutionary biology
• The role of crossover is often overemphasized.
Mutation-enhancing evolutionary optimization
Crossover-segregating defects
The main point of view from researchers in the field of Genetic
Algorithms
32
EP: Operator - Mutation
General form (std. EP)
xi xi i ( x ) i N i (0,1)
Exogenous parameters i , i must be tuned for a
particular task
Usually xi xi ( x ) Ni (0,1)
Problem
• If global minimum’s fitness value is not zero, exact approachment is
not possible
• If fitness values are very large, the search is almost random walk
• If user does not know about approximate position of global minimum,
parameter tuning is not possible
33
EP: Differences from ES
Procedures for self-adaptation
Deterministic vs. Probabilistic selection
( , ) -ES vs. ( ) -EP
Level of abstraction: Individual vs. Species
34
Genetic Programming
Applies principles of natural selection to computer search and
optimization problems - has advantages over other procedures
for “badly behaved” solution spaces [Koza, 1992]
Genetic programming uses variable-size tree-representations
rather than fixed-length strings of binary values.
Program tree
= S-expression
= LISP parse tree
Tree = Functions (Nonterminals) + Terminals
35
GP: Representation
S-expression: (+ 1 2 (IF (> TIME 10) 3 4))
Terminals = {1, 2, 3, 4, 10, TIME}
Functions = {+, >, IF}
+
1
TIME
2
IF
>
3
4
10
36
GP: Operator - Crossover
+
+
b
a
+
b
b
a
a
b
+
+
a
b
a
+
b
b
b
a
37
GP: Operator - Mutation
+
+
/
b
a
a
+
/
b
b
a
b
b
a
38
Breeder GP (BGP)
[Zhang and Muehlenbein, 1993, 1995]
ES (real-vector)
GA (bitstring)
GP (tree)
Muehlenbein et al.
(1993)
Breeder GA (BGA)
(real-vector + bitstring)
Zhang et al. (1993)
Breeder GP (BGP)
(tree + real-vector + bitstring)
39
GAs: Theory of Simple GA
Assumptions
Bitstrings of fixed size
Proportionate selection
Definitions
Schema H: A set of substrings (e.g., H = 1**0)
Order o: number of fixed positions (FP)
(e.g., o(H) = 2)
Defining length d: distance between leftmost FP and
rightmost FP (e.g., d(H) = 3)
40
GAs: Schema Theorem (Holland
et al. 1975)
f (H , t)
d (H )
o( H )
m( H , t 1) m( H , t )
1 pc
1 pm
f (t )
n 1
m( H , t )
Number of members of H
pc , p m
Probability of crossover and mutation, respectively
Interpretation: Fit, short, low-order schemata
(or building blocks) exponentially grow.
41
ES: Theory
Convergence velocity of (+, )-ES
(
, )
1
v
v 1
k
pk j k pk j k (1 pk j k ) dk
min
v 1 v 1
0 for a ( ) - ES and k min for a ( , ) - ES
k k
where k min
~ 2 ~
1
1
1
1
1 1
exp
erf
2 exp
1
1 2
1 2 1 2
2
1
~1
1 1 exp 1
2
2
2
2
2
2
exp
1 erf 2
2 1
2
8
8
8
2
2 2
~
42
EP: Theory (1)
Analysis of std. EP(Fogel)
Aims at giving a proof of convergence for resulting
algorithm
EP(1,0, q, )
• Mutation: xi xi ( x ) Ni (0,1)
Analysis of a special case EP(1,0,q,1)
• Identical to a (1+1)-ES having
f (x )
Objective function
• Simplified sphere model
n
~
f 2 ( xi ) xi2 r 2
i 1
43
EP: Theory (2)
Combination with the optimal SD 2* 1.224 r / n
~
2*n
f 2 ( xi ) r
1.224
When dimension is increased, the performance is worse
than an algorithm that is able to retain the opt. SD
1 2n
2
2
exp
2
8 r
22 n
2n
1 erf
4r
r 8
The convergence rate of a (1+1)-EP by ~ ( r )
2
2
2
2
n
1
n
n
~
2
exp 1 erf
r
8
8 4
2
2
44
Breeder GP: Motivation for GP
Theory
In GP, parse trees of Lisp-like programs are used
as chromosomes.
Performance of programs are evaluated by
training error and the program size tends to grow
as training error decreases.
Eventual goal of learning is to get small
generalization error and the generalization error
tends to increase as program size grows.
How to control the program growth?
45
Breeder GP: MDL-Based Fitness
Function
Fi ( g ) F ( Ai | D) E ( D | Aig ) C ( Aig )
E ( D | Aig )
Training error of neural trees A for data set D
C ( Aig )
Structural complexity of neural trees A
,
Relative importance of each term
46
Breeder GP: Adaptive Occam
Method (Zhang et al., 1995)
Fi (t ) Ei (t ) (t )Ci (t )
2 Ebest (t 1)
if Ebest (t 1)
N
Cbest (t )
(t )
1
2
N
otherwise
Ebest (t 1)Cbest (t )
Ebest (t 1)
Cbest (t )
Desired performance level in error
Training error of best progr. at gen t-1
Complexity of best progr. at gen. t
47
Bayesian Evolutionary Computation (1/2)
The best individual is defined as the most probable
model of data D given the priori knowledge
P( A | D)
P( D | A) P( A)
P( D | A) P( A)
P ( D)
P( D | A) P( A)dA
P( D | A) P( A)
P( D | A) P( A)
AA ( g )
The objective of evolutionary computation is defined to
find the model A* that maximizes the posterior
probability
A* arg max P( A | D)
A
Bayesian theorem is used to estimate P(A|D) from a
population A(g) of individuals at each generation.
48
Bayesian Evolutionary Computation (2/2)
Bayesian process: The BEAs attempt to explicitly
estimate the posterior distribution of the
individuals from their prior probability and
likelihood, and then sample offspring from the
distribution.
[Zhang, 99]
49
Canonical BEA
(Initialize) Generate A0 { A10 ,, AM0 } from the prior
distribution P0(A). Set generation count t 0.
(P-step) Estimate posterior distribution Pt(A|D) of the
individuals in At.
(V-step) Generate L variations At { A1t 1 ,, AMt 1} by
sampling from Pt(A|D).
(S-step) Select M individuals from A´ into A {A1,, AL }
based on Pt(A´i |D). Set the best individual .
(Loop) If the termination condition is met, then stop.
Otherwise, set t t+1 and go to (P-step).
50
Basic Idea behind BEA
51
Evolving Neural Trees with BEA
Posterior probability of a neural tree A
P( A | D) P( D | A) P( A) P( D | w, k ) p(w, k )
P( D | w, k ) P(w | k ) P(k )
N
2
( yc f ( w ,k ) (x c ))
N
1
c 1
exp
2 2
2
k 1 2
w j k 3
k 1
1
j 1 exp( )
exp
2 (k 3)!
2
52
Features of EAs
Evolutionary techniques are good for problems
that are ill-defined or difficult
Many different forms of representation
Many different types of EA
Leads to many different types of crossover,
mutation, etc.
Some problems with convergence, efficiency
However, they are able to solve a diverse range of
problems
53
Advantages of EAs
Efficient investigation of large search spaces
Quickly investigate a problem with a large number of
possible solutions
Problem independence
Can be applied to many different problems
Best suited to difficult combinatorial problems
54
Disadvantages of EAs
No guarantee for finding optimal solutions with a
finite amount of time: True for all global
optimization methods.
No complete theoretical basis (yes). But much
process is being made.
Parameter tuning is largely based on trial and error
(genetic algorithms); solution: Self-adaptation
(evolution strategies).
Often computationally expensive: Parallelism.
55
3. Applications
56
Application Fields (1)
Experimental optimization & optimization with
subjective evaluation
Coffee recipes; general food recipes
Biochemical fermentation processes
Wind tunnel experiments
Two-phase nozzle optimization experiments
Technical optimization
Design & Production
Logistics
Control of dynamic processes
57
Application Fields (2)
Structure optimization
Structure & parameters of plants
Connection structure & weights of neural nets
Number of thicknesses of layers in multilayer structures
Data Mining
Clustering (number & centers of clusters)
Fitting models to data
Time series prediction
58
Application Fields (3)
Path Planning
Traveling Salesman Problem
Robot Control
Evolutionary Robotics
Evolvable Hardware
59
Application Example 1
Hot Water Flashing Nozzle (1)
Hans-Paul Schwefel
performed the original
experiments
Start
Hot water entering
Steam and droplet at exit
At throat: Mach 1 and onset of flashing
60
Application Example 1
Hot Water Flashing Nozzle (2)
61
Application Example 2
Minimal Weight Trust Layout
Load
Start
922 kp
(LP minimum)
Optimum
738 kp
62
Application Example 3
Concrete Shell Roof
under own and outer load (snow and wind)
Spherical shape
Optimal shape
Height 1.34m
Half span 5.00m
max m min
Orthogonal bending strength
Savings : 36% shell thickness
27% armation
63
Application Example 4
Dipole Magnet Structure (1)
y
y-Range
y
y
Magnet
P1
Magnet
Magnet
N
S
N
S
N
S
n
x
Field region of
interest
x
x
64
Application Example 4
Dipole Magnet Structure (2)
Analysis of the magnetic field by Finite Element
Analysis (FEM)
Minimize sum of squared deviations from the
ideal
Individuals: Vectors of positions (y1, …, yn)
Middle: 9.8% better than upper graphic; bottom:
2.7% better
65
Application Example 5
Optical Multilayers (1)
Desired
Substrate (Glass)
Wavelength
……..
Current
Reflection
Filter layers;
- Thicknesses
- Materials
Goal: Find a filter structure such that the real reflection
behavior matches the desired one as close as possible.
66
Application Example 5
Optical Multilayers (2)
Problem parameters;
Thickness d d1 ,, d n of layers
Layer materials 1 ,,n (integer values)
Number of layers n.
Mixed-integer, variable-dimensional
problem.
67
Application Example 5
Optical Multilayers (3)
Objective function:
~ 2
f d , R d , , R d
d
R d , , :
Reflection of the actual filter for wavelength
Calculation according to matrix method.
R
~
: Desired reflection value
68
Application Example 5
Optical Multilayers (4)
Example topology: Only layer thicknesses vary;
n=2
69
Application Example 5
Optical Multilayers (5)
Example structure:
70
Application Example 5
Optical Multilayers (6)
Parallel evolutionary algorithm
Per node: EA for mixed-integer representation
Isolation and migration of best individuals
Mutation of discrete variables: Fixed pm per population
71
Application Example 6
Circuit Design (1)
Difficulty of automated circuit design:
A vary hard problem
Exponential in the number of components
More than 10300 circuits with a mere 20 components
An important problem
Too few analog designers
There is an “Egg-shell” of analog circuitry around almost all
digital circuits
Analog circuits must be redesigned with each new generation of
process technology
No existing automated techniques
In contrast with digital
Existing analog techniques do only sizing of components, but do
not create the topology
72
Application Example 6
Circuit Design (2)
Development of a circuit with Genetic Programming
In
Out
Developing Circuit
73
Application Example 6
Circuit Design (3)
Each function in the circuit-constructing tree acts
on a part of the circuit and changes it in some way
(e.g. creates a capacitor, creates a parallel structure,
adds a connection to ground, etc)
A “writing head” points from each function to the
part of the circuit that the function will act on.
Each function inherits writing heads from its
parent in the tree
74
Application Example 6
Circuit Design (4)
Example of circuit – constructing program tree
(LIST (C (-0.963 (- (- -0.875 – 0.113) 0.880)) (series (flip end) (series
(flip end) (L –0.277 end) end) (L (- -0.640 0.749) (L –0.123 end))))
(flip (nop (L –0.657 end)))))
LIST
C
FLIP
-0.963
-
-0.875
SERIES
FLIP
-0.880
-0.113
NOP
SERIES
END
FLIP
END
L
-0.277
L
END
END
-0.658
L
-
L
-0.749
-0.658
-0.658
END
END
75
Application Example 7
Neural Network Design (1)
Introduction: EC for NNs
Preprocessing of Training Data
Feature selection
Training set optimization
Training of Network Weights
Non-gradient search
Optimization of Network Architecture
Topology adaptation
Pruning unnecessary connections/units
76
Application Example 7
Neural Network Design (2)
Encoding Schemes for NNs
Bit-strings
Properties of network structure are encoded as bitstrings.
Rules
Network configuration is specified by a graphgeneration grammar.
Trees
Network is represented as “neural trees”.
77
Application Example 7
Neural Network Design (3)
Neural Tree Models
Neural trees
Tree-structured neural networks
Nonterminal nodes: Neural units F {, }
Terminal nodes: Inputs T {x1 , x2 ,, xn }
Links: connection weights wij from j to i
Layer of node i: path length of the longest path to a terminal node
of the substrees of i.
Type of Units
neti wij y j
Sigma unit: the sum of weighted inputs
j
wij y j
Pi unit: the product of weighted inputs neti
j
Output of a neuron: sigmoid transfer function
yi f (neti )
1
1 e neti
78
Application Example 7
Neural Network Design (4)
Tree-Structured Model
S1
W11
W12
X2
W22
X4
W31
W32
S3
X1
W41
X2
W14
S4
P1
S2
W21
W13
S5
W33
W51
W52
X3
P2
X4
W42
W61 W62
X1
X3
X2
W63
X5
W71
X4
W72
S6
W81
W82
X1
X3
79
Application Example 7
Neural Network Design (5)
Evolving Neural Networks by BEAs
Generate M Trees
Prior Distribution
Evaluate Fitness of Trees
Posterior Distribution
Yes
Termination Condition?
No
STOP
Create L New Trees
Model Variation
(Variation Operators)
Select Fitter Trees
Model Selection
80
Application Example 7
Neural Network Design (6)
Structural Adaptation by Subtree Crossover
Neuron type, topology, size and shape of networks are
adapted by crossover.
Neuron types and receptive fields are adapted by mutation.
Connection weights are adapted by an EC-based stochastic
hill-climbing.
81
Application Example 8
Fuzzy System Design (1)
Fuzzy system comprises
Fuzzy membership functions (MF)
Rules
Task is to tailor the MF and rules to get best
performance
Every change to MF affects the rules
Every change to the rules affects the MF
82
Application Example 8
Fuzzy System Design (2)
Solution is to design MF and rules simultaneously
Encode in chromosomes
Aarameters of the MF
Associations and Certainty Factors in rules
Fitness is measured by performance of the Fuzzy
System
83
Application Example 8
Fuzzy System Design (3)
Evolutionary Computation and Fuzzy Systems
Fuzzy Sets (Zadeh): Points have Memberships
FUZZY MEMBERSHIP FOR “COOL”
1.0
Membership
0.0
0o
Temperature (C)
2.5o
Evolutionary Computation can be Used to Optimize Fuzzy
Control Systems
Evolve Membership Functions/Rules
84
Application Example 8
Fuzzy System Design (4)
GA for Fuzzy Control of Cart
F
m
v
x(v=x)
NM
NS ZE
PS
PM
x
NM
NS
ZE
NM PM
PM
PM
NS
PM
PM
PM
ZE
PM
PM
PS
PM
PM PM
Cart Centering and Fuzzy Partitions
PS
PM
NM NM
NS
NM
NM NM NM NM
Fuzzy Control Strategy after 100 Gen.
NM=Negative Medium, NS=Negative Small, ZE=Zero, PS=Positive Small,
PM=Positive Medium, _=No Entry
Defuzzification is Just Weighed Centroid
Mutation Up/Down by One. Two-Point Xover on String Representing the 5x5
Metrix
85
Application Example 9
Data Mining (1)
Data Mining Task
Types of problem to be solved:
Classification
Clustering
Dependence Modeling
etc., etc.
86
Application Example 9
Data Mining (2)
Basic Ideas of GAs in Data Mining
Candidate rules are represented as individuals of a
population
Rule quality is computed by a fitness function
Using task-specific knowledge
87
Application Example 9
Data Mining (3)
Classification with Genetic Algorithms
Each individual represents a rule set
• i.e. an independent candidate solution
Each individual represents a single rule
A set of individuals (or entire population) represents a
candidate solution (rule set)
88
Application Example 9
Data Mining (4)
Individual Representation
In GABIL, an individual is a rule set, encoded as a bit string
[DeJong et al. 93]
It uses k bits for the k values of a categorical attribute
If all k bits of an attribute are set to 1 the attribute is not used by
the rule
Goal attribute: Buy furniture
Marital_status: Single/Married/Divorced
House:Own/Rented/University
Marital_status House
Buy?
The string
011
100
1
Represents the rule
IF(Marital_status=M or D) and (House = 0)
THEN (Buy furniture=y)
89
Application Example 9
Data Mining (5)
An individual is a variable-length string
representing a set of fixed-length rules
rule 1
011 100 1
rule 2
101 110 0
Mutation: traditional bit inversion
Crossover: corresponding crossover points in the
two parents must semantically match
90
Application Example 10
Information Retrieval (1)
Preprocessing and Indexing
Classification System
Text Data
Text Classification
Information Extraction
Information Filtering
DB Template Filling
& Information
Extraction System
Information Filtering System
DB Record
Location
user profile
Date
question
filtered data
answer
feedback
DB
91
Application Example 10
Information Retrieval (2)
Evolutionary Learning:
Applications in IR - Web-Document Retrieval
[Kim & Zhang, 2000]
Link Information,
HTML Tag
Information
Retrieval
<TITLE> <H> <B>
…
<A>
ww11 ww22 ww33 …
wn
w1 w2 w3 …
… wwnn
chromosomes
Fitness
ww11 ww22 ww33 …
wn
w1 w2 w3 …
… wwnn
92
Application Example 10
Information Retrieval (3)
Evolutionary Learning: Applications in IR – Tag Weighting
Crossover
Mutation
chromosome X
x1
x2
x3
chromosome Y
… xn
y1
y2
y3
… yn
chromosome X
x1
x2
x3
change value w.p. Pm
zi = (xi + yi ) / 2 w.p. Pc
z1
z2
z3
… zn
chromosome Z (offspring)
… xn
x1
x2
x3
… xn
chromosome X’
Truncation selection
93
Application Example 10
Information Retrieval (4)
Evolutionary Learning : Applications in IR - Experimental Results
Datasets
TREC-8 Web Track Data
2GB, 247491 web documents (WT2g)
No. of training topics: 10, No. of test topics: 10
Results
94
Application Example 11
Time Series Prediction (1)
[Zhang & Joung, 2000]
Autonomous
Model Discovery
Raw
Time
Series
Preprocessing
Evolutionary
Neural Trees
(ENTs)
Prediction
Combining
Neural Trees
Combine Neural Trees
Outputs
95
Application Example 11
Time Series Prediction (2)
Experimental Setup for the far-infrared NH3 Laser
Design Paramenters
population size
max generation
crossover rate
mutation rate
no. of iterations
training set size
test set size
size of committee pool
no. of committee members
Values Used
200
50
0.95
0.1
100
1000
1000
10
3
96
Application Example 11
Time Series Prediction (3)
Experimental Result
1
0.9
0.8
0.7
x(t)
0.6
Target
Function
0.5
0.4
0.3
0.2
0.1
0
1
201
401
601
801
1001
t
1201
1401
1601
1801
2001
1
0.9
0.8
Computed
Approximation
0.7
0.6
x(t)
0.5
0.4
0.3
0.2
0.1
0
0
200
400
600
800
1000
1200
1400
1600
1800
2000
t
97
Application Example 12
Traveling Salesman Problem (1)
This is a minimization problem
Given n cities, what is the shortest route to each
city, visiting each city exactly once
Want to minimize total distance traveled
Also must obey the “Visit Once” constraint
98
Application Example 12
Traveling Salesman Problem (2)
Encoding represents the order of cities to visit
Candidates scored by total distance traveled
4
3
1
0
6
2
5
99
Application Example 12
Traveling Salesman Problem (3)
Traveling Salesman Problem (TSP)
combinatorial optimization problems
a salesman seeks the shortest tour through n cities
NP-complete problem
100
Application Example 12
Traveling Salesman Problem (4)
Simple Example with the TSP
The “house-call problem”
• Problem: Doctor must visit patients once and only once and
return home in the shortest possible path
Difficulty: The number of possible routes
increases faster than exponentially
• 10 patients = 191,440 possible routes
• 22 patients = 10,000,000,000,000,000,000,000
101
Application Example 12
Traveling Salesman Problem (5)
Representation
Permutation Representation
• cities are listed in the order in which they are visited
[3 2 5 4 7 1 6 9 0 8]
:3-2-5-4-7-6-1-9-0-8
• path presentation, order representation
• lead to illegal tour if the traditional one-point crossover operator is used
Random Keys Representation
• encodes a solution with random numbers from (0, 1)
[0.23 0.82 0.45 0.74 0.87 0.11 0.56 0.69 0.78]
position i in the list represent city I
random number in position I determines the visiting order of city I in a TSP
tour
sort random keys in ascending order to get tour
6-3-7-8-4-9-2-5
eliminate the infeasibility of the offspring
102
Application Example 13
Cooperating Robots (1)
What is Evolutionary Robotics?
An attempt to create robots which evolve using various
evolutionary computational methods
Evolve behaviors or competence modules implemented
in various manners: several languages, relay, neuro
chips, FPGA’s, etc.
“Intelligence is emergent”
Presently, limited to mostly evolution of robot’s control
software. However, some attempts to evolve hardware
began.
GA and its variants used. Most current attempts center
around {population size=50 ~ 500, generations = 50 ~
500}
103
Application Example 13
Cooperating Robots (2)
Industrial Robots Autonomous Mobile Robots
OPEN !!
CLOSED
104
Application Example 13
Cooperating Robots (3)
Cooperating Autonomous Robots
105
Application Example 13
Cooperating Robots (4)
Why Build Cooperating Robots?
Increased scope for missions inherently
distributed in:
• Space
• Time
• Functionality
Increased reliability, robustness (through
redundancy)
Decreased task completion time (through
parallelism)
Decreased cost (through simpler individual
robot design)
106
Application Example 13
Cooperating Robots (5)
Cooperating Autonomous Robots: Application
domain
Mining
Construction
Planetary exploration
Automated manufacturing
Search and rescue missions
Cleanup of hazardous waste
Industrial/household maintenance
Nuclear power plant decommissioning
Security, surveillance, and reconnaissance
107
Application Example 14
Co-evolving Soccer Softbots (1)
Co-evolving
Soccer Softbots
With Genetic
Programming
At RoboCup there are two "leagues": the "real" robot
league and the "virtual" softbot league
How do you do this with GP?
GP breeding strategies: homogeneous and heterogeneous
Decision of the basic set of function with which to evolve players
Creation of an evaluation environment for our GP individuals
108
Application Example 14
Co-evolving Soccer Softbots (2)
Initial Random Population
109
Application Example 14
Co-evolving Soccer Softbots (3)
Kiddie Soccer
110
Application Example 14
Co-evolving Soccer Softbots (4)
Learning to Block the Goal
111
Application Example 14
Co-evolving Soccer Softbots (5)
Becoming Territorial
112
Application Example 15
Evolvable Hardware (1)
EVOLVABLE HARDWARE
HARDWARE IMPLEMENTATION OF
EVOLVABLE SOFTWARE (i.e. GP)
Reconfigurable logic is too slow to make it worthwhile
113
Application Example 15
Evolvable Hardware (2)
FPGAs
Bad, because they are designed for
conventional electronic design
Good, because they can be abused and allow
the exploitation of the physics
WHAT WE NEED
FPMAs
Field Programmable Matter Arrays
114
Application Example 15
Evolvable Hardware (3)
What is a FPMA?
wires
Chemical
substrate
KEY REQUIREMENT
Removing the voltage
must cause the material
to relax to its
former state
A single
piece of
material
Region to
which voltage
may be applied
Can we, by evolving the voltages configure the material to
carry out a useful function?
115
Application Example 15
Evolvable Hardware (4)
XC6216 FPGA
116
Application Example 15
Evolvable Hardware (5)
The Evolvable Hardware Robot Controller
The logic functions are evolved, and
implemented in a RAM chip
A
allows a signal to be either
synchronised to a clock of evolved
frequency, or passed straight through
asynchronously, all under genetic
control.
All evaluations performed using the
REAL HARDWARE
117
Application Example 15
Evolvable Hardware (6)
118
4. Current Issues
119
Innovative Techniques for EC
Effective Operators
Novel Representation
Exploration/Exploitation
Population Sizing
Niching Methods
Dynamic Fitness Evaluation
Multi-objective Optimization
Co-evolution
Self-Adaptation
EA/NN/Fuzzy Hybrids
Distribution Estimation Algorithms
Parallel Evolutionary Algorithms
Molecular Evolutionary Computation
120
1000-Pentium Beowulf-Style
Cluster Computer for Parallel GP
121
Molecular Evolutionary
Computing
011001101010001
ATGCTCGAAGCT
122
Applications of EAs
Optimization
Machine learning
Data mining
Intelligent Agents
Bioinformatics
Engineering Design
Telecommunications
Evolvable Hardware
Evolutionary Robotics
123
5. References and URLs
124
Journals & Conferences
Journals:
Evolutionary Computation (MIT Press)
Trans. on Evolutionary Computation (IEEE)
Genetic Programming & Evolvable Hardware (Kluwer)
Evolutionary Optimization
Conferences:
Congress on Evolutionary Computation (CEC)
Genetic and Evolutionary Computation Conference
(GECCO)
Parallel Problem Solving from Nature (PPSN)
125
WWW Resources
• Hitch-Hiker’s Guide to Evolutionary Computation
• http://alife.santafe.edu/~joke/encore/www
FAQ for comp.ai.genetic
• Genetic Algorithms Archive
• http://www.aic.nrl.navy.mil/galist
Repository for GA related information, conferences, etc.
• EVONET European Network of Excellence on
Evolutionary Comp. : http://www.dcs.napier.ac.uk/evonet
• Genetic Programming Notebook
• http://www.geneticprogramming.com
software, people, papers, tutorial, FAQs
126
Books
• Bäck, Th., Evolutionary Algorithms in Theory and Practice, Oxford University
Press, New York, 1996.
• Mitchell, M., An Introduction to Genetic Algorithms, MIT Press, Cambridge,
MA, 1996.
• Fogel, D., Evolutionary Computation: Toward a New Philosophy of Machine
Intelligence, IEEE Press, NJ, 1995.
• Schwefel, H-P., Evolution and Optimum Seeking, Wiley, New York, 1995.
• Koza, J., Genetic Programming, MIT Press, Cambridge, MA, 1992.
• Goldberg, D., Genetic Algorithms in Search and Optimization, AddisonWesley, Reading, MA, 1989.
• Holland, J., Adaptation in Natural and Artificial Systems, Univeristy of
Michigan Press, Ann Arbor, 1975.
• Rechenberg, I., Evolutionsstrategie: Optimierung technischer Systeme nach
Prinzipien der biologischen Evolution, Frommann-Holzboog Verlag,
Stuttgart, 1973.
• Fogel, L.J., Owens, A.J, and Walsh, M.J., Artificial Intelligence through
Simulated Evolution, John Wiley, NY, 1966.
127
For more information:
http://bi.snu.ac.kr/