Continuous-Time Coalescence

Download Report

Transcript Continuous-Time Coalescence

N-gene Coalescent Problems
• Probability of the 1st success after waiting t,
given a time-constant, a ~ p, of success
at
Exp(a,t)  ae
1
E(Exp(a,t)) 
a
1
Var(Exp(a,t))  2
a
3/26/2016
Comp 790– Continuous-Time Coalescence
1
Review N-genes
• Likelihood k genes have a distinct lineage is:
(2N 1) (2N  2)
2N
2N
(2N  (k 1))

2N
k1

1  2Ni
i1
• Manipulating a little
k1
1 

i1
k1
i
2N
1

i1
j
O
2N
k  1
 1     O
2 2N

1
N2
The 1st gene can
choose its parent
freely, but the next
k-1 must choose
from the remainder
Genes without a
child

1
N2
• Where, for large N, 1/N2 is negligible

3/26/2016
Comp 790– Continuous-Time Coalescence
2
Approx N-gene Coalescence
• Approximate probability k-genes have different
Recall that the 2-gene case had a
parents:
similar form, but with 1 in place of the
 
k 1
1   
2 2N
combinatorial. Here the combinatorial
terms accounts for all possible
k-choose-2 pairs, which are treated
independently
• The probability two or more have a common parent:
  k  1  k  1
1  
1  2 2N 
 2 2N
     
• Repeated distinct lineages for j generations leads to a
geometric
 distribution, with
k  1
p   
2 2N
3/26/2016
 k  1 j1k  1
P(N  j)  
1  2 2N 
 2 2N
     
Comp 790– Continuous-Time Coalescence
3
Impact of Approximation
• Approximation is not “proper” for all values of k < 2N
k  1
k(k 1)
16N 1 1
1     1 
 0 for k 
4N
2
2 2N
• Considering the following values of N

3/26/2016
N
10
100
1000
10000
100000
1000000
k
7
21
64
201
633
2001
Comp 790– Continuous-Time Coalescence
4
Fix N and Vary k
• Comparing the actual to the approximation
3/26/2016
Comp 790– Continuous-Time Coalescence
5
Concrete Example
• In a population of 2N = 10 the probability that 3 genes have
one ancestor in the previous generation is:
1 1
1

10 10 100
The 1st gene can choose its parent
freely, while the next 2 must choose
the same one
• The probability that all 3 have a different ancestor is:

10 9 8
72

10 10 10 100
The ist gene can choose its parent
from the 10, while the next 2 must
choose the remainder
• The remaining probability is that the 3 genes have two
parents in the previous generation

1
3/26/2016
1
72
27


100 100 100
Comp 790– Continuous-Time Coalescence
6
Example Continued
• The probability is that 2 or more genes have common
parents in the previous generation is:
27
1
28


100 100 100
The probability that 2 have common
parents plus the probability all 3 have
a common parent
• By our approximation term the probability that two
or more genes share a common parent is:

3  1
3
  
2 10 10
Error in approximation for k=3, 2N=10
error 
3 28
2


10 100 100
1
1
10

  3.33
p 3  1
3
 
2 10
Comp 790– Continuous-Time Coalescence
• Leads to a MRCA estimate of

3/26/2016
7
For Large N and Small k
• For 2N > 100, the agreement improves, so long as k << 2N
• The advantage of the approximation is that it fit’s the “form”
of a geometric distribution, an thus can be generalized to a
continuous-time model
3/26/2016
Comp 790– Continuous-Time Coalescence
8
Continuous-time Coalescent
• In the Wright-Fisher model time is measures in discrete units,
generations.
• A continuous time approximation is conceptually more useful,
and via the given approximation, computationally simple
• Moreover, a continuous model can be constructed that is
independent of the population size (2N), so long as our
sample size, k, is much smaller (one of those rare cases where
a small sample size simplifies matters)
• The only time we will need to consider population size (2N) is
when we want to convert from time back into generations.
3/26/2016
Comp 790– Continuous-Time Coalescence
9
Continuous-time Derivation
j
• As before, let t  , where j is now time measured in
2N
generations
• It follows that j = 2Nt translates continuous time, t, back into
generations j. In practice floor(2Nt) is used to assign a discrete

generation
number.
• The waiting time, Tkc , for k genes to have k – 1 or fewer
k
ancestors is exponentially distributed, Tkc ~ Exp 
, derived
2
k
from t = j/2N, M=2N and p  
/ 2N
2
The probability that k genes will have
• Giving: 

 
P Tkc  t  1  e

3/26/2016
 
k t
2 

k-1 or fewer ancestors at some time
greater than or equal to t
Comp 790– Continuous-Time Coalescence
10
Visualization
 
• Plots of P Tkc  t , for k = [3, 4, 5, 6]
k=6
k=3

k=4
k=5
3/26/2016
Comp 790– Continuous-Time Coalescence
11
Continuous Coalescent Time Scale
• In the continuous-time time constant is a measure of ancestral population
size, with the original at time 0, ½ the original at time 0.5, and ¼ at 1.0
Population
size
t
2.6N
1.3
2N
1.0
N
0.5
0
0.0
1
3/26/2016
2
3
4
5
Comp 790– Continuous-Time Coalescence
6
12
A Coalescent Model
• The continuous coalescent lends itself to generative models
• The following algorithm constructs a plausible genealogy for
n genes
1. Start with k = n genes
c
c
2. Simulate the waiting time, Tk , to the next event, Tk ~ Exp 2k
3. Choose a random pair (i, j) with 1 ≤ i < j ≤ k uniformly
k
among the 2 pairs
4. Merge I and J into one gene and decrease the sample size
by one, k  k -1

5. Repeat from step 2 while k > 1


 is backwards, it begins from the current
• This model
populations and posits ancestry, in contrast to a forward
algorithm like those used in the first lecture
3/26/2016
Comp 790– Continuous-Time Coalescence
13
Properties of a Coalescent Tree
• The height, Hn, of the tree is the sum of time epochs, Tj, where there
are j = n, n-1, n-2, … , 2, 1 ancestors.
• The distribution of Hn amounts to a convolution of the exponential
variables whose result is:

n
 e
P Hn  t 
• Where

• With

3/26/2016
k1
 
k t
2 
(1)k1 (2k 1)F(k)
G(k)
F(k)  n(n 1)(n  2) (n  k 1)
G(k)  n(n 1)(n  2) (n  k 1)
n
E(H n ) 
n
 E(T )  2 j(j11)  21  
1
n
j
j2
j2
n
Var(H n ) 

n
As n ∞, E(Hn)  2,
and, if n=2, E(H2)=1.
Thus, the waiting
time for n genes to
find their common
ancestor is less than
twice the time for 2!
 j (j 1)
Var(Tj )  4
1
2
2
j2
j2
Comp 790– Continuous-Time Coalescence
14
3/26/2016
Comp 790– Continuous-Time Coalescence
15