Machine Learning for Information Retrieval

Download Report

Transcript Machine Learning for Information Retrieval

Evolutionary Computation
(진 화 연 산)
장병탁
서울대 컴퓨터공학부
E-mail: [email protected]
http://bi.snu.ac.kr./~btzhang/
Byoung-Tak Zhang
School of Computer Science and Engineering
Seoul National University
This material is also available online at http://bi.snu.ac.kr/
Outline
1.
Basic Concepts
2.
Theoretical Backgrounds
3.
Applications
4.
Current Issues
5.
References and URLs
2
1. Basic Concepts
3
Charles Darwin (1859)
“Owing to this struggle for life, any variation, however
slight and from whatever cause proceeding, if it be in any
degree profitable to an individual of any species, in its
infinitely complex relations to other organic beings and to
external nature, will tend to the preservation of that
individual, and will generally be inherited by its
offspring.”
4
Evolutionary Algorithms

A Computational Model Inspired by Natural Evolution and
Genetics
 Proved Useful for Search, Machine Learning and
Optimization
 Population-Based Search (vs. Point-Based Search)
 Probabilistic Search (vs. Deterministic Search)
 Collective Learning (vs. Individual Learning)
 Balance of Exploration (Global Search) and Exploitation
(Local Search)
5
Biological Terminology





Gene
 Functional entity that codes for a specific feature e.g. eye color
 Set of possible alleles
Allele
 Value of a gene e.g. blue, green, brown
 Codes for a specific variation of the gene/feature
Locus
 Position of a gene on the chromosome
Genome
 Set of all genes that define a species
 The genome of a specific individual is called genotype
 The genome of a living organism is composed of several
 Chromosomes
Population
 Set of competing genomes/individuals
6
Analogy to
Evolutionary Biology

Individual (Chromosome) = Possible Solution

Population = A Collection of Possible Solutions

Fitness = Goodness of Solutions

Selection (Reproduction) = Survival of the Fittest

Crossover = Recombination of Partial Solutions

Mutation = Alteration of an Existing Solution
7
The Evolution Loop
initialize population
evaluate
select mating partners
(terminate)
recombine
select
mutate
evaluate
8
Basic Evolutionary Algorithm
Generation of initial solutions
(A priori knowledge, results of earlier run, random)
Evaluation
Generation of
variants by mutation
and crossover
Selection
Solution
Sufficiently
good?
NO
YES
END
9
Procedure
begin
t <- 0;
initialize P(t);
evaluate P(t);
while (not termination condition) do
recombine P(t) to yield C(t);
evaluate C(t);
select P(t+1) from P(t) and C(t);
t <- t+1;
end
end
10
General Structure of EAs
crossover
chromosomes
encoding
solutions
1100101010
1011101110
0011011001
1100110001
110010 1010
101110 1110
110010 1110
101110 1010
mutation
00110 1 1001
new
population
selection
evaluation
1100101110
1011101010
0011001001
roulette
wheel
00110 0 1001
solutions
fitness
computation
11
Population and Fitness
6
4
6
2
8
4
6
6
12
Selection, Crossover, and Mutation
6
Distinction
4
Mutation
6
2
Crossover
8
10
Reproduction
8
4
8
6
6
13
Simulated Evolution
Population
(chromosomes)
Offspring
New
generation
Parents
Genetic
operators
Decoded
strings
Evaluation
(fitness)
Manipulation
Mating
Selection
(mating pool)
Reproduction
14
Selection Strategies

Proportionate Selection
 Reproduce offspring in proportion to fitness fi.

 
 f a 

j 1
t
j
1
 , 1 i  
t
ps ai  
0,   i  

Ranking Selection
 
 Select individuals according to rank(fi).

 
ps ait 
f ait
Tournament Selection
 Choose q individuals at random, the best of which survives.
 
ps ait 

1

q
  i  1
q
   i 
q

Other Ways
15
Roulette Wheel Selection


Selection is a stochastic process
Probability of reproduction pi = fi /
S k fk
intermediate parent population:
01011
11010
10001
10001
16
Genetic Operators for Bitstrings
• Reproduction: make copies of chromosome (the fitter the
chromosome, the more copies)
10000100
10000100
10000100
• Crossover: exchange subparts of two chromosomes
100|00100
10011111
111|11111
11100100
• Mutation: randomly flip some bits
00000100
00000000
17
Mutation
For a binary string, just randomly “flip” a bit
 For a more complex structure, randomly select a
site, delete the structure associated with this site,
and randomly create a new sub-structure
 Some EAs just use mutation (no crossover)
 Normally, however, mutation is used to search in
the “local search space”, by allowing small
changes in the genotype (and therefore hopefully
in the phenotype)

18
Recombination (Crossover)

Crossover is used to swap (fit) parts of individuals,
in a similar way to sexual reproduction
 Parents are selected based on fitness
 Crossover sites selected (randomly, although other
mechanisms exist), with some prob.
 Parts of the parents are exchanged to produce
children
19
Crossover
One-point crossover
parent A
parent B
11010
10001
offspring A
offspring B
1101 1
offspring A
offspring B
11 00 0
1000 0
Two-point crossover
parent A
parent B
11010
10001
10 01 1
20
2. Theoretical Backgrounds
21
Major Evolutionary Algorithms
Genetic
Programming
Classifier
Systems
Evolution
Strategies
Genetic
Algorithms
Evolutionary
Programming
Hybrids: BGA
• Genetic representation of candidate solutions
• Genetic operators
• Selection scheme
• Problem domain
22
Variants of Evolutionary Algorithms





Genetic Algorithm (Holland et al., 1960’s)
Bitstrings, mainly crossover, proportionate selection
Evolution Strategy (Rechenberg et al., 1960’s)
Real values, mainly mutation, truncation selection
Evolutionary Programming (Fogel et al., 1960’s)
FSMs, mutation only, tournament selection
Genetic Programming (Koza, 1990)
Trees, mainly crossover, proportionate selection
Hybrids: BGA (Muehlenbein et al., 1993)
BGP (Zhang et al., 1995) and others.
23
Evolution Strategy (ES)

Problem of real-valued optimization
Find extremum (minimum) of function F(X): Rn ->R

Operate directly on real-valued vector X
 Generate new solutions through Gaussian
mutation of all components
 Selection mechanism for determining new parents
24
ES: Representation
One individual:



a  x1 ,, xn ,  1 ,,  n , 1 ,, n  I



x


The three parts of an individual:

x


: Object variables
: Standard deviations
: Rotation angles

f (x )
Fitness
Variances
Covariances
25
ES: Operator - Recombination
rr x r r , where rx, r , r {-, d, D, i, I, g, G}, e.g. rdII
 x S ,i

 xS ,i or xT ,i

 xS ,i or xT ,i
i


xi   xS ,i  ( xT ,i  xS ,i ) / 2

 xS ,i  ( xTi ,i  xS ,i ) / 2

 xS ,i    ( xT ,i  xS ,i )

 xS ,i   i  ( xT ,i  xS ,i )
i

no recombinat ion
r
discrete
rd
panmictic discrete
rD
intermedia te
ri
panmictic intermedia te
rI
generalize d intermedia te
rg
panmictic generalize d intermedia te rG
26
ES: Operator - Mutation

m{,’,} : I  I is an asexual operator.
 n = n, n = n(n-1)/2
1


 i   i  exp(   N (0,1)    N i (0,1))    2 n 
1
 j   j    N j (0,1)



2
n
   
 
x   x  N (0, C( ,  ))
  0.0873
 
 1 < n < n, n = 0
 i   i  exp(   N (0,1)    N i (0,1))
xi  xi   i  N i (0,1)
 n = 1, n = 0
     exp( 0  N (0,1))
xi  xi     N i (0,1)
 0  1/ n
27
ES: Illustration of Mutation
Hyperellipsoids
Line of equal probability density to place an offspring
28
ES: Evolution Strategy vs.
Genetic Algorithm
Create random initial population
Insert into
population
Evaluate population
Create random initial population
Insert into
population
Evaluate population
Select individuals for variation
Vary
Vary
Selection
29
Evolutionary Programming (EP)

Original form (L. J. Fogel)
Uniform random mutations
Discrete alphabets
 (    ) selection

Extended Evolutionary Programming (D. B. Fogel)
Continuous parameter optimization
Similarities with ES
• Normally distributed mutation
• Self-adaptation of mutation parameters
30
EP: Representation

Constraints for domain and variances
Initialization only-operators does not survey
• Search space is principally unconstrained.

Individual
a  ( x, )  I  Ax  As ,
Ax  R n and As   (std EP) or As  Rn (meta EP)
31
EP: Operator - Recombination

No recombination
Gaussian mutation does better (Fogel and Atmar).
• Not all situations
Evolutionary biology
• The role of crossover is often overemphasized.
 Mutation-enhancing evolutionary optimization
 Crossover-segregating defects
 The main point of view from researchers in the field of Genetic
Algorithms
32
EP: Operator - Mutation

General form (std. EP)

xi  xi   i  ( x )   i  N i (0,1)
Exogenous parameters  i ,  i must be tuned for a
particular task

Usually xi  xi  ( x )  Ni (0,1)
Problem
• If global minimum’s fitness value is not zero, exact approachment is
not possible
• If fitness values are very large, the search is almost random walk
• If user does not know about approximate position of global minimum,
parameter tuning is not possible
33
EP: Differences from ES

Procedures for self-adaptation
 Deterministic vs. Probabilistic selection
 (  ,  ) -ES vs. (    ) -EP
 Level of abstraction: Individual vs. Species
34
Genetic Programming

Applies principles of natural selection to computer search and
optimization problems - has advantages over other procedures
for “badly behaved” solution spaces [Koza, 1992]
 Genetic programming uses variable-size tree-representations
rather than fixed-length strings of binary values.
 Program tree
= S-expression
= LISP parse tree
 Tree = Functions (Nonterminals) + Terminals
35
GP: Representation
S-expression: (+ 1 2 (IF (> TIME 10) 3 4))
Terminals = {1, 2, 3, 4, 10, TIME}
Functions = {+, >, IF}
+
1
TIME
2
IF
>
3
4
10
36
GP: Operator - Crossover
+
+
b
a

+


b
b



a
a
b
+
+




a
b
a

+
b
b

b
a
37
GP: Operator - Mutation
+
+

/
b

a
a
+
/
b
b


a
b
b
a
38
Breeder GP (BGP)
[Zhang and Muehlenbein, 1993, 1995]
ES (real-vector)
GA (bitstring)
GP (tree)
Muehlenbein et al.
(1993)
Breeder GA (BGA)
(real-vector + bitstring)
Zhang et al. (1993)
Breeder GP (BGP)
(tree + real-vector + bitstring)
39
GAs: Theory of Simple GA

Assumptions
Bitstrings of fixed size
Proportionate selection

Definitions
Schema H: A set of substrings (e.g., H = 1**0)
Order o: number of fixed positions (FP)
(e.g., o(H) = 2)
Defining length d: distance between leftmost FP and
rightmost FP (e.g., d(H) = 3)
40
GAs: Schema Theorem (Holland
et al. 1975)
f (H , t) 
d (H ) 
o( H )
m( H , t  1)  m( H , t )
1  pc
1  pm 
f (t ) 
n 1 
m( H , t )
Number of members of H
pc , p m
Probability of crossover and mutation, respectively
Interpretation: Fit, short, low-order schemata
(or building blocks) exponentially grow.
41
ES: Theory

Convergence velocity of (+, )-ES
(

, )
   1
 v
v 1
k  
 pk j k   pk j k   (1  pk j k  ) dk 
min
v 1  v  1 
 0 for a (    ) - ES and k min   for a (  ,  ) - ES
 
 
 k  k
where k min

  ~  2    ~ 
  1  


1
1
1





1  1  
exp 
 erf 
 2 exp 
 1








1 2
1 2    1 2 
2  




1
~1 




  
1  1  exp   1  
2  


2

 
  2









 

2
2
2
  exp 
   1  erf  2  
2  1   


   2
8 
8   
8  

2
  2 2



~
42
EP: Theory (1)

Analysis of std. EP(Fogel)
Aims at giving a proof of convergence for resulting
algorithm
 EP(1,0, q,  )


• Mutation: xi  xi  ( x )  Ni (0,1)
Analysis of a special case EP(1,0,q,1)

• Identical to a (1+1)-ES having
  f (x )
Objective function
• Simplified sphere model
n
~
f 2 ( xi )   xi2  r 2
i 1
43
EP: Theory (2)

Combination with the optimal SD  2*  1.224  r / n
~
 2*n
f 2 ( xi )  r 
1.224
When dimension is increased, the performance is worse
than an algorithm that is able to retain the opt. SD
 1   2n 
2
2 
exp   


2
 8 r 
  22 n 
  2n  

1  erf 
 
 4r 
 r 8 

The convergence rate of a (1+1)-EP by ~   (  r )
2
2
2
2



 n
1
n
n


~
 2  
exp     1  erf 
    r
 8 
 8  4
 2
2
44
Breeder GP: Motivation for GP
Theory

In GP, parse trees of Lisp-like programs are used
as chromosomes.
 Performance of programs are evaluated by
training error and the program size tends to grow
as training error decreases.
 Eventual goal of learning is to get small
generalization error and the generalization error
tends to increase as program size grows.
 How to control the program growth?
45
Breeder GP: MDL-Based Fitness
Function
Fi ( g )  F ( Ai | D)  E ( D | Aig )  C ( Aig )
E ( D | Aig )
Training error of neural trees A for data set D
C ( Aig )
Structural complexity of neural trees A
 ,
Relative importance of each term
46
Breeder GP: Adaptive Occam
Method (Zhang et al., 1995)
Fi (t )  Ei (t )   (t )Ci (t )
  2 Ebest (t  1)
if Ebest (t  1)  
 N
Cbest (t )
 (t )  
1
2
N
otherwise

Ebest (t  1)Cbest (t )

Ebest (t  1)
Cbest (t )
Desired performance level in error
Training error of best progr. at gen t-1
Complexity of best progr. at gen. t
47
Bayesian Evolutionary Computation (1/2)

The best individual is defined as the most probable
model of data D given the priori knowledge
P( A | D) 

P( D | A) P( A)
P( D | A) P( A)


P ( D)
 P( D | A) P( A)dA
P( D | A) P( A)
 P( D | A) P( A)
AA ( g )
The objective of evolutionary computation is defined to
find the model A* that maximizes the posterior
probability
A*  arg max P( A | D)
A

Bayesian theorem is used to estimate P(A|D) from a
population A(g) of individuals at each generation.
48
Bayesian Evolutionary Computation (2/2)

Bayesian process: The BEAs attempt to explicitly
estimate the posterior distribution of the
individuals from their prior probability and
likelihood, and then sample offspring from the
distribution.
[Zhang, 99]
49
Canonical BEA





(Initialize) Generate A0  { A10 ,, AM0 } from the prior
distribution P0(A). Set generation count t  0.
(P-step) Estimate posterior distribution Pt(A|D) of the
individuals in At.
(V-step) Generate L variations At  { A1t 1 ,, AMt 1} by
sampling from Pt(A|D).
(S-step) Select M individuals from A´ into A  {A1,, AL }
based on Pt(A´i |D). Set the best individual .
(Loop) If the termination condition is met, then stop.
Otherwise, set t  t+1 and go to (P-step).
50
Basic Idea behind BEA
51
Evolving Neural Trees with BEA

Posterior probability of a neural tree A
P( A | D)  P( D | A) P( A)  P( D | w, k ) p(w, k )
 P( D | w, k ) P(w | k ) P(k )






 N
2 
  ( yc  f ( w ,k ) (x c )) 
N
1 
 c 1

 exp  

2 2
2  




 k 1 2 
  w j  k 3
k 1
1 
  j 1   exp(  )
exp


2  (k  3)!
2 




52
Features of EAs






Evolutionary techniques are good for problems
that are ill-defined or difficult
Many different forms of representation
Many different types of EA
Leads to many different types of crossover,
mutation, etc.
Some problems with convergence, efficiency
However, they are able to solve a diverse range of
problems
53
Advantages of EAs

Efficient investigation of large search spaces
Quickly investigate a problem with a large number of
possible solutions

Problem independence
Can be applied to many different problems

Best suited to difficult combinatorial problems
54
Disadvantages of EAs

No guarantee for finding optimal solutions with a
finite amount of time: True for all global
optimization methods.
 No complete theoretical basis (yes). But much
process is being made.
 Parameter tuning is largely based on trial and error
(genetic algorithms); solution: Self-adaptation
(evolution strategies).
 Often computationally expensive: Parallelism.
55
3. Applications
56
Application Fields (1)

Experimental optimization & optimization with
subjective evaluation
Coffee recipes; general food recipes
Biochemical fermentation processes
Wind tunnel experiments
Two-phase nozzle optimization experiments

Technical optimization
Design & Production
Logistics
Control of dynamic processes
57
Application Fields (2)

Structure optimization
Structure & parameters of plants
Connection structure & weights of neural nets
Number of thicknesses of layers in multilayer structures

Data Mining
Clustering (number & centers of clusters)
Fitting models to data
Time series prediction
58
Application Fields (3)

Path Planning
Traveling Salesman Problem

Robot Control
Evolutionary Robotics
Evolvable Hardware
59
Application Example 1
Hot Water Flashing Nozzle (1)
Hans-Paul Schwefel
performed the original
experiments
Start
Hot water entering
Steam and droplet at exit
At throat: Mach 1 and onset of flashing
60
Application Example 1
Hot Water Flashing Nozzle (2)
61
Application Example 2
Minimal Weight Trust Layout
Load
Start
922 kp
(LP minimum)
Optimum
738 kp
62
Application Example 3
Concrete Shell Roof
under own and outer load (snow and wind)
Spherical shape
Optimal shape
Height 1.34m
Half span 5.00m
max m  min
Orthogonal bending strength
Savings : 36% shell thickness
27% armation
63
Application Example 4
Dipole Magnet Structure (1)
y
y-Range
y
y
Magnet
P1
Magnet
Magnet
N
S
N
S
N
S
n
x
Field region of
interest
x
x
64
Application Example 4
Dipole Magnet Structure (2)

Analysis of the magnetic field by Finite Element
Analysis (FEM)
 Minimize sum of squared deviations from the
ideal
 Individuals: Vectors of positions (y1, …, yn)
 Middle: 9.8% better than upper graphic; bottom:
2.7% better
65
Application Example 5
Optical Multilayers (1)
Desired
Substrate (Glass)
Wavelength
……..
Current
Reflection
Filter layers;
- Thicknesses
- Materials
Goal: Find a filter structure such that the real reflection
behavior matches the desired one as close as possible.
66
Application Example 5
Optical Multilayers (2)
Problem parameters;

Thickness d  d1 ,, d n  of layers

Layer materials   1 ,,n  (integer values)
Number of layers n.
Mixed-integer, variable-dimensional
problem.
67
Application Example 5
Optical Multilayers (3)
Objective function:
 



 
 

~ 2
f d ,   R d , ,   R   d

d

 
 R d , ,  :
Reflection of the actual filter for wavelength
Calculation according to matrix method.
 R  
~
: Desired reflection value
68
Application Example 5
Optical Multilayers (4)

Example topology: Only layer thicknesses vary;
n=2
69
Application Example 5
Optical Multilayers (5)

Example structure:
70
Application Example 5
Optical Multilayers (6)

Parallel evolutionary algorithm
Per node: EA for mixed-integer representation
Isolation and migration of best individuals
Mutation of discrete variables: Fixed pm per population
71
Application Example 6
Circuit Design (1)
Difficulty of automated circuit design:

A vary hard problem
 Exponential in the number of components
 More than 10300 circuits with a mere 20 components

An important problem
 Too few analog designers
 There is an “Egg-shell” of analog circuitry around almost all
digital circuits
 Analog circuits must be redesigned with each new generation of
process technology

No existing automated techniques
 In contrast with digital
 Existing analog techniques do only sizing of components, but do
not create the topology
72
Application Example 6
Circuit Design (2)
Development of a circuit with Genetic Programming
In
Out
Developing Circuit
73
Application Example 6
Circuit Design (3)

Each function in the circuit-constructing tree acts
on a part of the circuit and changes it in some way
(e.g. creates a capacitor, creates a parallel structure,
adds a connection to ground, etc)
 A “writing head” points from each function to the
part of the circuit that the function will act on.
 Each function inherits writing heads from its
parent in the tree
74
Application Example 6
Circuit Design (4)
Example of circuit – constructing program tree
(LIST (C (-0.963 (- (- -0.875 – 0.113) 0.880)) (series (flip end) (series
(flip end) (L –0.277 end) end) (L (- -0.640 0.749) (L –0.123 end))))
(flip (nop (L –0.657 end)))))
LIST
C
FLIP
-0.963
-
-0.875
SERIES
FLIP
-0.880
-0.113
NOP
SERIES
END
FLIP
END
L
-0.277
L
END
END
-0.658
L
-
L
-0.749
-0.658
-0.658
END
END
75
Application Example 7
Neural Network Design (1)
Introduction: EC for NNs

Preprocessing of Training Data
Feature selection
Training set optimization

Training of Network Weights
Non-gradient search

Optimization of Network Architecture
Topology adaptation
Pruning unnecessary connections/units
76
Application Example 7
Neural Network Design (2)
Encoding Schemes for NNs

Bit-strings
Properties of network structure are encoded as bitstrings.

Rules
Network configuration is specified by a graphgeneration grammar.

Trees
Network is represented as “neural trees”.
77
Application Example 7
Neural Network Design (3)
Neural Tree Models

Neural trees
 Tree-structured neural networks
 Nonterminal nodes: Neural units F  {, }
 Terminal nodes: Inputs T  {x1 , x2 ,, xn }
 Links: connection weights wij from j to i
 Layer of node i: path length of the longest path to a terminal node
of the substrees of i.

Type of Units
neti   wij y j
 Sigma unit: the sum of weighted inputs
j
wij y j
 Pi unit: the product of weighted inputs neti  
j
 Output of a neuron: sigmoid transfer function
yi  f (neti ) 
1
1  e  neti
78
Application Example 7
Neural Network Design (4)
Tree-Structured Model
S1
W11
W12
X2
W22
X4
W31
W32
S3
X1
W41
X2
W14
S4
P1
S2
W21
W13
S5
W33
W51
W52
X3
P2
X4
W42
W61 W62
X1
X3
X2
W63
X5
W71
X4
W72
S6
W81
W82
X1
X3
79
Application Example 7
Neural Network Design (5)
Evolving Neural Networks by BEAs
Generate M Trees
Prior Distribution
Evaluate Fitness of Trees
Posterior Distribution
Yes
Termination Condition?
No
STOP
Create L New Trees
Model Variation
(Variation Operators)
Select Fitter Trees
Model Selection
80
Application Example 7
Neural Network Design (6)
Structural Adaptation by Subtree Crossover
 Neuron type, topology, size and shape of networks are
adapted by crossover.


Neuron types and receptive fields are adapted by mutation.
Connection weights are adapted by an EC-based stochastic
hill-climbing.
81
Application Example 8
Fuzzy System Design (1)

Fuzzy system comprises
Fuzzy membership functions (MF)
Rules

Task is to tailor the MF and rules to get best
performance
 Every change to MF affects the rules
 Every change to the rules affects the MF
82
Application Example 8
Fuzzy System Design (2)

Solution is to design MF and rules simultaneously
 Encode in chromosomes
Aarameters of the MF
Associations and Certainty Factors in rules

Fitness is measured by performance of the Fuzzy
System
83
Application Example 8
Fuzzy System Design (3)
Evolutionary Computation and Fuzzy Systems

Fuzzy Sets (Zadeh): Points have Memberships
FUZZY MEMBERSHIP FOR “COOL”
1.0
Membership
0.0
0o
Temperature (C)
2.5o

Evolutionary Computation can be Used to Optimize Fuzzy
Control Systems
 Evolve Membership Functions/Rules
84
Application Example 8
Fuzzy System Design (4)
GA for Fuzzy Control of Cart
F
m
v
x(v=x)
NM
NS ZE
PS
PM
x
NM
NS
ZE
NM PM
PM
PM
NS
PM
PM
PM
ZE
PM
PM
PS
PM
PM PM
Cart Centering and Fuzzy Partitions
PS
PM
NM NM
NS
NM
NM NM NM NM
Fuzzy Control Strategy after 100 Gen.

NM=Negative Medium, NS=Negative Small, ZE=Zero, PS=Positive Small,
PM=Positive Medium, _=No Entry
 Defuzzification is Just Weighed Centroid
 Mutation Up/Down by One. Two-Point Xover on String Representing the 5x5
Metrix
85
Application Example 9
Data Mining (1)

Data Mining Task
Types of problem to be solved:
Classification
Clustering
Dependence Modeling
etc., etc.
86
Application Example 9
Data Mining (2)

Basic Ideas of GAs in Data Mining
Candidate rules are represented as individuals of a
population
Rule quality is computed by a fitness function
Using task-specific knowledge
87
Application Example 9
Data Mining (3)

Classification with Genetic Algorithms
Each individual represents a rule set
• i.e. an independent candidate solution
Each individual represents a single rule
A set of individuals (or entire population) represents a
candidate solution (rule set)
88
Application Example 9
Data Mining (4)

Individual Representation
 In GABIL, an individual is a rule set, encoded as a bit string
[DeJong et al. 93]
 It uses k bits for the k values of a categorical attribute
 If all k bits of an attribute are set to 1 the attribute is not used by
the rule
 Goal attribute: Buy furniture
 Marital_status: Single/Married/Divorced
 House:Own/Rented/University
Marital_status House
Buy?
The string
011
100
1
 Represents the rule
IF(Marital_status=M or D) and (House = 0)
THEN (Buy furniture=y)
89
Application Example 9
Data Mining (5)

An individual is a variable-length string
representing a set of fixed-length rules
rule 1
011 100 1
rule 2
101 110 0

Mutation: traditional bit inversion
 Crossover: corresponding crossover points in the
two parents must semantically match
90
Application Example 10
Information Retrieval (1)
Preprocessing and Indexing
Classification System
Text Data
Text Classification
Information Extraction
Information Filtering
DB Template Filling
& Information
Extraction System
Information Filtering System
DB Record
Location
user profile
Date
question
filtered data
answer
feedback
DB
91
Application Example 10
Information Retrieval (2)
Evolutionary Learning:

Applications in IR - Web-Document Retrieval
[Kim & Zhang, 2000]
Link Information,
HTML Tag
Information
Retrieval
<TITLE> <H> <B>
…
<A>
ww11 ww22 ww33 …
wn
w1 w2 w3 …
… wwnn
chromosomes
Fitness
ww11 ww22 ww33 …
wn
w1 w2 w3 …
… wwnn
92
Application Example 10
Information Retrieval (3)
Evolutionary Learning: Applications in IR – Tag Weighting
 Crossover
 Mutation
chromosome X
x1
x2
x3
chromosome Y
… xn
y1
y2
y3
… yn
chromosome X
x1
x2
x3
change value w.p. Pm
zi = (xi + yi ) / 2 w.p. Pc
z1
z2
z3
… zn
chromosome Z (offspring)

… xn
x1
x2
x3
… xn
chromosome X’
Truncation selection
93
Application Example 10
Information Retrieval (4)
Evolutionary Learning : Applications in IR - Experimental Results
 Datasets
 TREC-8 Web Track Data
 2GB, 247491 web documents (WT2g)
 No. of training topics: 10, No. of test topics: 10

Results
94
Application Example 11
Time Series Prediction (1)

[Zhang & Joung, 2000]
Autonomous
Model Discovery
Raw
Time
Series
Preprocessing
Evolutionary
Neural Trees
(ENTs)
Prediction
Combining
Neural Trees
Combine Neural Trees
Outputs
95
Application Example 11
Time Series Prediction (2)

Experimental Setup for the far-infrared NH3 Laser
Design Paramenters
population size
max generation
crossover rate
mutation rate
no. of iterations
training set size
test set size
size of committee pool
no. of committee members
Values Used
200
50
0.95
0.1
100
1000
1000
10
3
96
Application Example 11
Time Series Prediction (3)
Experimental Result
1
0.9
0.8
0.7
x(t)
0.6
Target
Function
0.5
0.4
0.3
0.2
0.1
0
1
201
401
601
801
1001
t
1201
1401
1601
1801
2001
1
0.9
0.8
Computed
Approximation
0.7
0.6
x(t)

0.5
0.4
0.3
0.2
0.1
0
0
200
400
600
800
1000
1200
1400
1600
1800
2000
t
97
Application Example 12
Traveling Salesman Problem (1)

This is a minimization problem
 Given n cities, what is the shortest route to each
city, visiting each city exactly once
 Want to minimize total distance traveled
 Also must obey the “Visit Once” constraint
98
Application Example 12
Traveling Salesman Problem (2)

Encoding represents the order of cities to visit
 Candidates scored by total distance traveled
4
3
1
0
6
2
5
99
Application Example 12
Traveling Salesman Problem (3)

Traveling Salesman Problem (TSP)
combinatorial optimization problems
a salesman seeks the shortest tour through n cities
NP-complete problem
100
Application Example 12
Traveling Salesman Problem (4)
Simple Example with the TSP
The “house-call problem”
• Problem: Doctor must visit patients once and only once and
return home in the shortest possible path
Difficulty: The number of possible routes
increases faster than exponentially
• 10 patients = 191,440 possible routes
• 22 patients = 10,000,000,000,000,000,000,000
101
Application Example 12
Traveling Salesman Problem (5)

Representation
 Permutation Representation
• cities are listed in the order in which they are visited
[3 2 5 4 7 1 6 9 0 8]
:3-2-5-4-7-6-1-9-0-8
• path presentation, order representation
• lead to illegal tour if the traditional one-point crossover operator is used
 Random Keys Representation
• encodes a solution with random numbers from (0, 1)
[0.23 0.82 0.45 0.74 0.87 0.11 0.56 0.69 0.78]
 position i in the list represent city I
 random number in position I determines the visiting order of city I in a TSP
tour
 sort random keys in ascending order to get tour
6-3-7-8-4-9-2-5
 eliminate the infeasibility of the offspring
102
Application Example 13
Cooperating Robots (1)

What is Evolutionary Robotics?
An attempt to create robots which evolve using various
evolutionary computational methods
Evolve behaviors or competence modules implemented
in various manners: several languages, relay, neuro
chips, FPGA’s, etc.
“Intelligence is emergent”
Presently, limited to mostly evolution of robot’s control
software. However, some attempts to evolve hardware
began.
GA and its variants used. Most current attempts center
around {population size=50 ~ 500, generations = 50 ~
500}
103
Application Example 13
Cooperating Robots (2)
Industrial Robots Autonomous Mobile Robots
OPEN !!
CLOSED
104
Application Example 13
Cooperating Robots (3)
Cooperating Autonomous Robots
105
Application Example 13
Cooperating Robots (4)

Why Build Cooperating Robots?
Increased scope for missions inherently
distributed in:
• Space
• Time
• Functionality
Increased reliability, robustness (through
redundancy)
Decreased task completion time (through
parallelism)
Decreased cost (through simpler individual
robot design)
106
Application Example 13
Cooperating Robots (5)

Cooperating Autonomous Robots: Application
domain
Mining
Construction
Planetary exploration
Automated manufacturing
Search and rescue missions
Cleanup of hazardous waste
Industrial/household maintenance
Nuclear power plant decommissioning
Security, surveillance, and reconnaissance
107
Application Example 14
Co-evolving Soccer Softbots (1)
Co-evolving
Soccer Softbots
With Genetic
Programming

At RoboCup there are two "leagues": the "real" robot
league and the "virtual" softbot league
 How do you do this with GP?
 GP breeding strategies: homogeneous and heterogeneous
 Decision of the basic set of function with which to evolve players
 Creation of an evaluation environment for our GP individuals
108
Application Example 14
Co-evolving Soccer Softbots (2)

Initial Random Population
109
Application Example 14
Co-evolving Soccer Softbots (3)

Kiddie Soccer
110
Application Example 14
Co-evolving Soccer Softbots (4)

Learning to Block the Goal
111
Application Example 14
Co-evolving Soccer Softbots (5)

Becoming Territorial
112
Application Example 15
Evolvable Hardware (1)
EVOLVABLE HARDWARE

HARDWARE IMPLEMENTATION OF
EVOLVABLE SOFTWARE (i.e. GP)
Reconfigurable logic is too slow to make it worthwhile
113
Application Example 15
Evolvable Hardware (2)
FPGAs

Bad, because they are designed for
conventional electronic design
 Good, because they can be abused and allow
the exploitation of the physics
WHAT WE NEED

FPMAs
Field Programmable Matter Arrays
114
Application Example 15
Evolvable Hardware (3)
What is a FPMA?
wires
Chemical
substrate
KEY REQUIREMENT
Removing the voltage
must cause the material
to relax to its
former state
A single
piece of
material

Region to
which voltage
may be applied
Can we, by evolving the voltages configure the material to
carry out a useful function?
115
Application Example 15
Evolvable Hardware (4)

XC6216 FPGA
116
Application Example 15
Evolvable Hardware (5)
The Evolvable Hardware Robot Controller
The logic functions are evolved, and
implemented in a RAM chip
A
allows a signal to be either
synchronised to a clock of evolved
frequency, or passed straight through
asynchronously, all under genetic
control.
All evaluations performed using the
REAL HARDWARE
117
Application Example 15
Evolvable Hardware (6)
118
4. Current Issues
119
Innovative Techniques for EC

Effective Operators
 Novel Representation
 Exploration/Exploitation
 Population Sizing
 Niching Methods
 Dynamic Fitness Evaluation

Multi-objective Optimization
 Co-evolution
 Self-Adaptation
 EA/NN/Fuzzy Hybrids
 Distribution Estimation Algorithms
 Parallel Evolutionary Algorithms
 Molecular Evolutionary Computation
120
1000-Pentium Beowulf-Style
Cluster Computer for Parallel GP
121
Molecular Evolutionary
Computing
011001101010001
ATGCTCGAAGCT
122
Applications of EAs









Optimization
Machine learning
Data mining
Intelligent Agents
Bioinformatics
Engineering Design
Telecommunications
Evolvable Hardware
Evolutionary Robotics
123
5. References and URLs
124
Journals & Conferences

Journals:
Evolutionary Computation (MIT Press)
Trans. on Evolutionary Computation (IEEE)
Genetic Programming & Evolvable Hardware (Kluwer)
Evolutionary Optimization

Conferences:
Congress on Evolutionary Computation (CEC)
Genetic and Evolutionary Computation Conference
(GECCO)
Parallel Problem Solving from Nature (PPSN)
125
WWW Resources
• Hitch-Hiker’s Guide to Evolutionary Computation
• http://alife.santafe.edu/~joke/encore/www
FAQ for comp.ai.genetic
• Genetic Algorithms Archive
• http://www.aic.nrl.navy.mil/galist
Repository for GA related information, conferences, etc.
• EVONET European Network of Excellence on
Evolutionary Comp. : http://www.dcs.napier.ac.uk/evonet
• Genetic Programming Notebook
• http://www.geneticprogramming.com
software, people, papers, tutorial, FAQs
126
Books
• Bäck, Th., Evolutionary Algorithms in Theory and Practice, Oxford University
Press, New York, 1996.
• Mitchell, M., An Introduction to Genetic Algorithms, MIT Press, Cambridge,
MA, 1996.
• Fogel, D., Evolutionary Computation: Toward a New Philosophy of Machine
Intelligence, IEEE Press, NJ, 1995.
• Schwefel, H-P., Evolution and Optimum Seeking, Wiley, New York, 1995.
• Koza, J., Genetic Programming, MIT Press, Cambridge, MA, 1992.
• Goldberg, D., Genetic Algorithms in Search and Optimization, AddisonWesley, Reading, MA, 1989.
• Holland, J., Adaptation in Natural and Artificial Systems, Univeristy of
Michigan Press, Ann Arbor, 1975.
• Rechenberg, I., Evolutionsstrategie: Optimierung technischer Systeme nach
Prinzipien der biologischen Evolution, Frommann-Holzboog Verlag,
Stuttgart, 1973.
• Fogel, L.J., Owens, A.J, and Walsh, M.J., Artificial Intelligence through
Simulated Evolution, John Wiley, NY, 1966.
127
For more information:
http://bi.snu.ac.kr/