Transcript Slides

Satisfiability
and
Evolution
Adi Livnat, Christos Papadimitriou,
Aviad Rubinstein, Greg Valiant,
Andrew Wan
“One curious aspect of evolution is that
everybody thinks he understands it!”
Jacques Monod
“Nothing makes sense in life except in the
light of evolution”
Theodosius Dobzhansky
Conrad H. Waddington
Waddington’s Experiment (1952)
Generation 0
Temp:
20o C
Waddington’s Experiment (1952)
Generation 1
Temp:
40o C
~15% changed
Select and breed
those
Waddington’s Experiment (1952)
Generation 2
Temp:
40o C
~60% changed
Select and breed
those
Waddington’s Experiment (1952)
Generation 3
Temp:
40o C
~63% changed
Select and breed
those
Waddington’s Experiment (1952)
(…)
Generation 20
o
Temp: 40 C
~99% changed
Surprise!
Generation 20
Temp:
20o C
~25% stay changed!!
Genetic Assimilation
• Adaptations to the environment become
genetic!
Is There a Genetic Explanation?
Suppose:
•The “red phenotype” depends
on genes x1, …, xn plus h = “high temp”
•“red” = F(x, h), Boolean
•Each gene has two alleles, 0-1
•Initially xi ~ i.i.d. uniform, h = 0
How do Allele Frequencies Change
from Generation to Generation?
• Suppose xi = 1 with probability pi
• Next generation?
pi' = probp [ xi = 1 | F(x, h) ]
A Genetic Explanation?
Wanted: Boolean function F ( x, h ) with these
properties:
•Initially, Prob x ~ p[0] [F ( x, h = 0)] ≈ 0%
•Then Probp[0][F ( x, 1)] ≈ 15%
•After breeding Probp[1][F ( x, 1)] ≈ 60%
•Successive breedings, Probp[20][F( x,1)] ≈ 99%
•Finally, Probp[21][F( x, 0)] ≈ 25%
A Genetic Explanation!
“red” = “x1 + x2 + … + x10 + 3h ≥ 10”
•T = 0, ~0%
•T = 1, ~15%
•T = 2, ~60%
•T = 20, ~ 99%
•h = 0, ~25%
•[C. Stern American Naturalist 1958]
Stepping Back now:
Evolution Today
• A powerful and prestigious theory
• Founded on the ideas of Darwin and Mendel
• Informed by sophisticated math models
developed in the early 20th century (mainly,
population genetics)
• Currently weathering a deluge of molecular
data
• Experimental science: R. Lenski
Yet many important
mysteries remain:
• What is the role of sex/recombination?
• Why is there so much genetic diversity
within species?
• How do complex traits emerge?
Now back to Waddington’s
Experiment
•The “red phenotype” seems to be a complex
trait which actually does emerge…
•(with a little help from Conrad W.)
So, how about generalizing it
• We have an arbitrary Boolean function of
genes (no environmental variable h)
• Suppose the satisfying genotypes have a
small fitness advantage (1 vs. 1 + ε, say)
• (Instead of a 0 - 1 advantage as in
Waddington’s experiment)
• Will this trait be fixed eventually?
Perhaps Monotone Functions?
• Recall: pi' = probp [ xi = 1 | F(x, h) ]
• Now, pi’ ≈ (1 – ε) × pi +
ε × probp [ xi = 1 | F(x) ]
• If F(x) is monotone, and xi has some
influence, then pi' > pi
• After exponentially many steps, done
Monotone Functions (cont.)
• After exponentially many steps, done
Theorem: n / (ε×σ0) steps suffice
σ0 = initial satisfaction probability
Proof: Boolean Fourier analysis
Note: essentially best possible (tribes)
A perenthesis: Genetic
Algorithms
• My road to Evolution
• In life sex is succesful and ubiquitous
• Why do GA perform so poorly when
compared to Simulated Annealing?
• Answer: Evolution is not a good metaphor
for heuristics
• Sex a mediocre optimizer of fitness
Back to monotone functions:
But wait a minute…
• Why are we assuming a product
distribution?
• Isn’t there linkage (correlation) in genetics?
• For example, imagine F = … (x1 = x2) …
• Prob [x1 = x2 ] = ¾ ≠ ½ × ½ + ½ × ½
• Linkage disequilibrium: LD = 1/8
Nagylaki’s Theorem
Theorem [N 1993]: After O(log n)
generations, LD = O(ε)
[recall, ε is the selection strength, the
max difference in fitness, assumed to
be tiny]
Why? Trace genes in ancestor tree
…
3 log n
…
Bounding
Linkage Disequilibrium
• But if there is selection, sampling is not
quite uniform
• ~ ε bias introduced at each generation
• Therefore, LD = O(ε log n)
• The analysis in Nagylaki’s Theorem gets rid
of the log n factor
So, Assumption Justified!
• OK to assume product distribution.
• (Since fitness values are 1 and 1 + ε)
• So, back to the question about arbitrary
Boolean functions
Arbitrary Boolean Functions!
Main Theorem: Any Boolean function
of genes which confers a small
evolutionary advantage will be
eventually fixed with high probability,
with polynomial population and number
of generations
Wait a minute, this is wrong!
•
•
•
•
XOR: Suppose that F = (x1 ≠ x2)
What will happen if we start uniform?
No change!
Also, if, e.g., F = “exactly k out of n are
true” and we start at xi = k/n
• The result fails in the infinite population
approximation
Main Theorem:
Precise statement
• To form the next generation:
1. Sample from current product
distribution {pi}, N individuals, let the
empirical distribution be {qi}
(gets you unstuck in XOR etc.)
2. Then apply Selection:
pi’ ≈ qi + ε × probq [ xi = 1 | F(x) ]
Main Theorem:
Detailed Parameters
•
•
•
•
•
•
n = number of genes involved
σ0 = initial satisfaction probability
ε = selection strength, must be < 1/n
N = population > n3 / (σ0)4
T = number of generations = poly (n, 1/σ0)
Failure probability < 2/n
ε < 1/n ?!?
• How come a theorem about the
effectiveness of selection seems to fail
when selection is strong?
• Intuitive explanation: In the interior of the
cube the process is close to gradient
descent
• Which is hard to analyze when the step is
large…
Main Open Problem
Greg’s Conjecture: Result remains true even
if the fitness is 0 – 1
•Evidence from experiments
•Note that Nagylaki’s Theorem does not
provide a justification here
Main Theorem: Proof
• We want to show that the sample - select
process leads to a satisfying population
• We track the expected fitness f[p]
• Core of the proof: We bound the variance
introduced by sampling by the expected
fitness increase in the selection step:
pi’ = qi + ε × probq [ xi = 1 | F(x) ]
Main Theorem: Proof (cont.)
pi’ = qi + ε × probq [ xi = 1 | F(x) ]
•We show:
variance introduced
fitness increase
Eq[(f (q) –f (p))2] ≤ E [f (p’) – f (q)]/(N(1 – nε))
ε must be small!
Main Theorem: Proof (cont.)
Eq[(f (q) –f (p))2] ≤ Eq[f (p’) – f (q)]/(N(1 – nε))
Main Theorem: Proof (cont.)
Eq[(f (q) –f (p))2] ≤
linear mass of the q-biased Fourier transform of f
Eq [Σi (Fq{i})2]/N
≤
Eq[f (p’) – f (q)]/(N(1 – nε))
Main Theorem: Proof (cont.)
Eq[(f (q) –f (p))2] ≤
Eq[Σi (Fq{i})2]/N
≤ Eq[f (p’) – f (q)]/(N(1 – nε))
Main Theorem: Proof (cont.)
Eq[(f (q) –f (p))2] ≤
Eq[Σi (Fq{i})2]/N
≤
Eq[f (p’) – f (q)]/(N(1 – nε))
Main Theorem: Proof (cont.)
Eq[(f (q) –f (p))2] ≤
Σi (Fq{i})2
≤
(f (p’) – f (q))/(1 – nε)
Main Theorem: Proof (cont.)
• Next: the total effect of the variance steps is
small
• Idea: Σt f (qt+1) – f (pt) is a martingale
• But no obvious upper bound
• Need specialized martingale inequality
• Therefore increase prevails (and the process
will get unstuck from spurious fixpoints such
as XOR)
Main Theorem: Proof (cont.)
• Finally: the process gets so close to the
boundary that increase is miniscule
• Random walk (with absorbing boundaries!)
will eventually get stuck at a vertex of the cube
• End of proof
• What? All diversity lost?
Discussion
• An interesting and nontrivial algorithmic fact
about satisfiability
• Remember Greg’s conjecture
• Parameter bounds should be very improvable
• Monotone functions bound essentially tight
• Faster for threshold functions? (Rocco
Servedio and Li-Yang Tan, in progress)
Implications for Evolution?
• Interesting new mechanism for the
emergence of complex traits
• Impossible without sex
Implications for Evolution?
(cont.)
• Is the “soup of alleles” model used here a
productive computational metaphor for
studying Evolution under sex?
Implications for Evolution?
(cont.)
• [CLPV, PNAS 2014]: Evolution under sex
is tantamount to a repeated coordination
game played by the genes: the strategies are
the allleles, the probabilities are the
frequencies in the population, the utility is
fitness, and the game is played through
multiplicative weights!
• [Meir and Parkes 2014], [Mehta, Panageas,
Piliouras 2014],…
Soooo, Evolution and TCS
Remember the three mysteries of Evolution
(not the only ones, btw):
• What is the role of sex/recombination?
• Why is there so much genetic diversity
within species?
• How do complex traits emerge?
TCS has important contributions to make to
all three!
Thank You!