Using data sets to simulate evolution within complex environments

Download Report

Transcript Using data sets to simulate evolution within complex environments

Using data sets to simulate evolution
within complex environments
Bruce Edmonds
Centre for Policy Modelling
Manchester Metropolitan University
Main Issue
• Does the complexity of the environment
significantly affect evolutionary processes?
• Where “complexity” means that there are
exploitable patterns in the environment but
these are difficult to discover
• Adding randomness to an environment
and/or fitness is not satisfactory
• NK model of fitness adjusts the difficulty of
a fitness space (second order uniformity)
Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 2
Idea of Talk
• Evolutionary data-mining is where ideas
from biological evolution are applied to
data-mining – finding patterns in data
• Data sets exist for the purpose of testing
different ML algorithms that have patterns in
them, albeit difficult to discover
• Reversing this... I am suggesting the use of
complex data sets as a test bed to
investigate how the complexity of the
environment might affect evolution
Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 3
The Data Set Environment
• Find a rich data set (preferably one derived from a
naturally complex system) with many independent
variables
• The gene of an individual is an arbitrary arithmetic
expression stored as a tree (or similar technique)
• Resource in the model is modelled by distributing to
individuals predicting the outcome variable of local
data better than its competitors
• The gene are mutated and crossed as the
simulation progresses
• Individuals are selected for/against depending on
their total success in predicting
Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 4
Cleveland Heart Disease Data
• 281 Data Points
• 13 Diagnostic variables: age, sex, cp (chest
pain), trestbps (resting blood pressure),
chol (cholesteral), fbs (fasting blood sugar),
restecg (resting ecg type), thalach (max
heart rate), exang (exercise induced
angina), oldpeak (ST depression induced
by exercise), slope (slope of exercise), ca
(num blood vessels), thal
• Predicts severity of Heart Attack (0-4)
Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 5
The Evolutionary Model I
Data Space
1.1
3.7
0.8
For each data point (or a
random subset of them)
evaluate (a random
selection of) near
individuals to determine
the
share of fitness
each
Individuals
each
with
receive
on of
genes(depending
composed
predictive
success)
an arithmetic
expression
Sum
of fitness to predict
HD based
on the
determines
which
breed
and
die 13 variables
other
Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 6
The Evolutionary Model II
Data Space
15.5
12.5
17.6
8.6
9.0
3.2
23.7
12.3
8.1
N times:
1. probabilistically select
a winner on fitness
2. probabilistically select
a loser on lack of
fitness
3. kill loser
Either
4. propagate winner
locally with possible
mutation
5. mate with another
local based on fitness
Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 7
Start of Simulation (HD Data)
Data points from set
distributed over
space dependent
on 2 variables
chol (x) & thalach (y)
Individuals each
with gene which is
an arithmetic
expression, e.g.:
Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 8
After 25 ticks (HD Data)
Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 9
After 50 ticks (HD Data)
Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 10
After 75 ticks (HD Data)
Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 11
After 300 then 100 w/o Variation
1
ca
sex
slope
fbs/oldpeak
restecg+fbs+1
Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 12
Illustrative Results
•
•
•
•
•
•
Heart Disease Data Set
20 runs with each setting
1000 individuals, 1000 iterations
Locality parameter 0.1 (radius)
Comparison of Original vs Ersatz Data Sets
Fixed normal noise (0, 0.1) added to both
data sets
Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 13
Ersatz Data Set
• Comparison Data Set
• For each variable separately: approximate a
normal distribution of its values
• Then reconstruct a data set using this
distribution for each value independently
• Results in a Data Set with similar shape
and randomness
• But without the predictive variable being
linked in to the explanatory variables
Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 14
Fitness (HD Data)
Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 15
Spread (HD Data)
Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 16
Gene Complexity/Depth (HD Data)
Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 17
All Runs’ Complexity
Original
Ersatz
Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 18
Fitness (White Wine Data)
Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 19
Depth (White Wine Data)
Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 20
Depth (White Wine Data)
Original
Original with 0.1 noise
Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 21
Depth – locality 0.1 (HD Data)
Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 22
Depth – locality 0.2 (HD Data)
Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 23
Depth – locality 0.4 (HD Data)
Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 24
Concluding Questions
• When mighty the complexity of the environment
effect evolutionary processes?
• How might the complexity of the environment effect
evolutionary processes?
• Will models with a simple environment tell us about
evolution in the wild?
– When and about what aspects will models with simple
environments be sufficient?
– In what ways might evolution differ when in complex
environments?
• What kind of complexity might we need?
• How might one measure this complexity in the wild
(if this is even possible)?
Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 25
The End
Bruce Edmonds
http://bruce.edmonds.name
Centre for Policy Modelling
http://cfpm.org
White Wine Quality Data
• 4898 Data Points
• 11 Diagnostic variables: fixed acidity,
volatile acidity, citric acid, residual sugar,
chlorides, free sulfur dioxide, total sulfur
dioxide, density, pH, sulphates, alcohol
• Predicts judged quality of wine (0-10)
Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 27