slides - Indico
Download
Report
Transcript slides - Indico
Computing in High Energy and Nuclear Physics, 13-17 February 2006, Mumbai, India
1
2
Introduction to evolutionary computation
Gene Expression Programming (GEP)
Application of GEP for Event Selection
Conclusion
Liliana Teodorescu, Brunel University
3
Evolutionary computation simulates the natural evolution on a computer
process leading to maintenance or increase of a population
ability to survive and reproduce in a specific environment
quantitatively measured by evolutionary fitness
Goal of natural evolution - to generate a population of individuals with
increasing fitness
Goal of evolutionary computation - to generate a set of solutions (to a
problem) of increasing quality
Liliana Teodorescu, Brunel University
4
Individual – candidate solution to a problem
decoding
encoding
Chromosome – representation of the candidate solution
Gene – constituent entity of the chromosome
Population – set of individuals/chromosomes
Fitness function – representation of how good a candidate solution is
Genetic operators – operators applied on chromosomes in order
to create genetic variation (other chromosomes)
Liliana Teodorescu, Brunel University
5
Natural evolution simulation - core of the evolutionary algorithms:
Basic evolutionary algorithm
Run
Start
Initial population creation (randomly)
Fitness evaluation (of each chromosome)
New generation
Problem definition
Encoding of the
candidate solution
Fitness definition
Run
Decoding the best fitted
chromosome = solution
yes
Terminate?
Stop
no
Selection of individuals (proportional with fitness)
Reproduction (genetic operators)
Replacement of the current population with the new one
Liliana Teodorescu, Brunel University
6
Genetic Algorithms (GA) (J. H. Holland, 1975)
Genetic Programming (GP) (J. R. Koza, 1992)
Gene Expression Programming (GEP) (C. Ferreira, 2001)
Main differences
Encoding method
Reproduction method
Liliana Teodorescu, Brunel University
7
search for the computer program that solve the problem (as GP)
works with two entities: chromosomes and expression trees
Encoding
Candidate solution represented
by an expression tree (ET)
Q
( a b ) (c d )
*
Q*-+abcd
+
a
ET encoded in a chromosome:
read ET left - right
and top - down
b
c
Q means sqrt
d
Decoding the chromosome (translates the chromosome in an ET)
•first line of ET (root) – first element of the chromosome
•next line of ET – as many arguments needed by the element in
the previous line
Liliana Teodorescu, Brunel University
8
Reproduction
Genetic operators applied on chromosoms not on ET =>
always produce sintactically correct structures!
Recombination – exchanges parts of two chromosomes
Mutation – changes the value of a node
Transposition – moves a part of the chromosome to another location
in the same chromosome
e.g. Mutation: Q replaced with *
*b+a-aQab+//+b+babbabbbababbaaa
*
+
b
-
a
a
*b+a-aQab+//+b+babbabbbababbaaa
*
+
b
-
a
Q
a
a
*
a
b
Liliana Teodorescu, Brunel University
9
Chromosome – has one or more genes of equal length
Gene – head: contains both functions and terminals (length h)
- tail: contains only terminals (length t)
t=h(n-1)+1
n – number of arguments of the function
with the highest number of arguments
e.g. set of functions: Q,*,/,-,+
set of terminals: a,b
n=2; h=15 (choosen)
t =16 => length of gene=15+16=31
*
+
b
-
a
a
*b+a-aQab+//+b+babbabbbababbaaa
ET ends before the end of the gene!
Q
a
Liliana Teodorescu, Brunel University
10
GEP for event selection
cuts/selection criteria finding
classification problem (signal/background classification)
statistical learning approach
Data samples:
Monte-Carlo simulation from BaBar experiment
Ks production in e+e- (~10 GeV)
5000 training events (for classification rule extraction)
5000 test events (others than training events)
limitations imposed by APS 3.0
S/N = 0.25, 1, 5
Software resources
APS 3.0 (Automatic Problem Solver)
- commercial package (Windows based)
- www.gepsoft.com
- function finding, classification, time series analysis
Liliana Teodorescu, Brunel University
11
Functions and constants to be used in the classification rules (cut type rule)
10 functions
- AND1 (x<0 and y<0 => 1 else 0), AND2 (x0 and y0 => 1 else 0)
- OR1 (x<0 or y<0 => 1 else 0), OR2 (x0 or y0 => 1 else 0)
- <, >, <=, >=, =, !=
36 functions
-previous 10 functions + common mathematical functions
constants
- floating point constants (-10,10)
Data – variables usually used in a cut based analysis for K S selection
- doca (distance of closest approach)
- |cos(hel)|
- Fsig (Flight Significance)
- Mass (KS reconstructed mass)
- RXY, |RZ| (region around interaction point)
- SFL (Signed Flight Length)
- Pchi (2 probability of the vertex)
GEP parameters
- fitness function: number of hits (events correctly classified)
- gene length (head = 1-20)
- no. of chromosomes per generation: 100
- no. of generations per run: 1000-20000
- genetic operators rates: mutation 0.044, inversion 0.1, transposition 0.3, recombination 0.1
Liliana Teodorescu, Brunel University
12
Data sample: S/N =0.25
10 functions
No. of genes = 1, Head length =10
Fsig 5.26,
Rxy < 0.19,
doca <1,
Pchi > 0
Classification Accuracy = 95.36%
Liliana Teodorescu, Brunel University
13
Data samples: S/N =0.25, 1, 5
10 functions
95%
S/N = 0.25
5%
92%
S/N = 1
92%
S/N = 5
8%
8%
Liliana Teodorescu, Brunel University
14
Data sample: S/N =0.25
10 functions
Head
Acc. (%)
Selection criteria
1
83.34
Fsig 9.93
2
90.76
Fsig 8.80, doca <1
3
94.88
Fsig > 3.67, Rxy Pchi
4
94.88
Fsig > 3.67, Rxy Pchi
5
95.04
Fsig 3.63, |Rz| 2.65, Rxy < Pchi
7
95.04
Fsig 3.64, Rxy < Pchi, Pchi > 0
10
95.360
Fsig 5.26, Rxy < 0.19, doca <1, Pchi > 0
20
95.50
Fsig > 4.1, Rxy 0.2, SFL > 0.2, Pchi > 0,
doca > 0, Rxy Mass
Liliana Teodorescu, Brunel University
15
GEP
20
Cut-based analysis
95.50
Fsig > 4.1, Rxy 0.2, SFL > 0.2, Pchi > 0,
doca > 0, Rxy Mass
No cut
Fsig 4.0
Rxy 0.2cm
SFL 0cm
Pchi > 0.001
doca 0.4cm
|Rz| 2.8cm
Previous cuts
+
doca > 0
Rxy Mass
Reduction
S: 15%
B: 98%
Reduction
S: 16%
B: 98.3%
Fsig 4.1
Rxy 0.2cm
SFL 0.2cm
Pchi > 0
Previous cuts
+
doca 0.4
| Rz| 2.8cm
Liliana Teodorescu, Brunel University
16
GEP
5
Reduction
S: 7.6%
B: 87.8%
95.04
Fsig 3.63, |Rz| 2.65, Rxy < Pchi
Fsig 3.63
|Rz| 2.65cm
Reduction
S: 16%
B: 97.8%
Fsig 3.63
|Rz| 2.65cm
Rxy<Pchi
Liliana Teodorescu, Brunel University
17
Data samples: S/N =0.25, 1, 5
10 functions
S/N = 0.25
S/N = 5
S/N = 1
Liliana Teodorescu, Brunel University
18
Data sample: S/N =0.25
36 mathematical functions
No. of genes = 1, Head length =10
Classification Accuracy = 95.00%
Liliana Teodorescu, Brunel University
19
Data sample: S/N =0.25
36 functions
Classification Accuracy 95%
Liliana Teodorescu, Brunel University
20
GEP allows
fast identification of powerful cuts
signal/background separation of 92-95% accuracy
for samples with S/N = 0.25, 1, 5
potential of discovering new correlations between
variables
large number of selection functions does not improve
the classification accuracy
GEP
is still in the R&D phase
needs software development -> underway
Liliana Teodorescu, Brunel University