Coalescence with Mutations

Download Report

Transcript Coalescence with Mutations

Coalescence with Mutations
• Towards incorporating greater realism
• Last time we discussed 2 idealized models
– Infinite Alleles, Infinite Sites
• A realistic model would also incorporate
– Insertions, deletions, inversions
• Sequences are large, but not infinite
– Mutations can occur at the same point but on
different lineages
4/1/2016
Comp 790– Coalescence with Mutations
1
Finite Sites Model
• Jukes-Cantor model- all positions are equally likely to mutate,
mutations to any of the 3 other nucleotides are equally probable
• Kimura model- accommodates that
transitions (A<->T and C<->G)
occur more frequently
ACCTGCAT
than transversions
ACGTGCAT
• Positions evolve
ACGTGCTT
independently
TCCTGCAT
ACGTGCTT
ACGTGCTA
ACGTGCAA
TCCTGCAT
4/1/2016
ACGTGCTT ACGTGCTA ACGTGCAA TCCTGCAT TCCTGCAT
Comp 790– Coalescence with Mutations
2
Wright-Fisher with Mutations
• Each gene passed on to the next
generation is subject to mutation
with probability u (probability 1 - u
it is copied without modification)
• Can accommodate any one of Infinite
Alleles, Infinite Sites, or Finite Sites
• Overall structure is the same
• Working backwards from the present,
the probability that a lineage experiences
the first mutation j generations in
the past is: P(T  j)  u(1u) j1
M
4/1/2016
Population of 8 with 4 alleles;
one group of 5, and 3 of 1
Comp 790– Coalescence with Mutations
3
Return of the Basic Coalescent
• Formula is similar in form to the discrete coalescent
• TM denotes the number of generations until the first
mutation event. It is a geometric variable with
parameter u.
• If time is measured in units of 2N generations then:
P(TM  j)  1  (1 u)  1  e
j
t
2
 P(TMc  t)
• Where θ = 4Nu, the population mutation rate, can be
interpreted as the expected number of mutations
separating two samples
4/1/2016
Comp 790– Coalescence with Mutations
4
Probabilities of Events
• If we consider n disjoint lineages then the time to the first
mutation along any line the distribution is exponential with
parameter n
2
• Wait for both mutation and coalescence events, then the
parameter is the sum of the two parameters (consequence of
min(U,V)
~ Exp(a+b))

n  n n(n 1   )

 
2  2
2
• Whether the 1st event is a coalescent of mutation event is
determined by a biased Bernoulli trials, with probabilities

n(n 1)
n 1
2
n 1
 , of mutation

for coalescence, and, 1 

n(n 1) n n 1  
n 1   n 1  

2
2
4/1/2016
Comp 790– Coalescence with Mutations
5
Simulating Sequence Evolution
• Simulating a set of genes with mutations
1. Put k=n, where n is the sample size
2. Choose an exponential variable with parameter k(k1+θ)/2
3. With probability (k-1)/(k-1+θ) the event is a
coalescent event and probability θ/(k-1+θ) it is a
mutation event
4. If a coalescent event, choose a pair randomly to
coalesce, set k  k-1
5. If a mutation event, choose a lineage to mutate
6. Continue until k is one
4/1/2016
Comp 790– Coalescence with Mutations
6
Coalescent in Python
• Straightforward translation
into Python
4/1/2016
T = [[i,0.0] for i in xrange(N)]
# gene number and time of merge
k=N
theta = 4.0*N*0.1
t = 0.0
while k > 1:
t += expovariate(0.5*k*(k-1+theta))
if (random() < theta/(k-1+theta)):
i = randint(0,k-1)
T[i] = [T[i], t]
else:
i = randint(0,k-1)
j = randint(0,k-1)
while i == j:
j = randint(0,k-1)
T[i] = [T[i], T[j], t]
T.pop(j)
k -= 1
Comp 790– Coalescence with Mutations
7
An Alternate Algorithm
• Waiting time until a mutation along a lineage is an exponential distribution
with parameter θ/2 (slide 4)
• Equivalent to distributing mutations along a t-length path with Poisson
distribution having parameter tθ/2
(t ) j t /2
P(Mt  j) 
e
j
j!2
• With mean, tθ/2.
• Given the number of mutations on a branch, the times are random
• The number and
 times of mutations on each branch are independent
• Thus, coalescence can be computed first, then mutations can be inserted
along each branch
• The placing mutations on branches is called a “Poisson process”
4/1/2016
Comp 790– Coalescence with Mutations
8
Poisson Algorithm
• A variant of the continuous-time coalescent with
mutations
1. Simulate the genealogy of N sequences
according
k 
to a coalescent process with rate 2  (algorithm from
 
lecture 4)
2. For each branch generate a random number, Mt,
using a Poisson distribution with parameter tθ/2,
where t is the length of the branch
3. For each branch, the times of the Mt mutation
events are chosen at random
4/1/2016
Comp 790– Coalescence with Mutations
9
Benefits of New Algorithm
• Mutations can be added onto
a genealogy in retrospect
• Tweak mutation and
coalescent parameters
independently
• Mutation does not effect the
fundamental results of
coalescence
• Mutations only impact the
extant Alleles
4/1/2016
Comp 790– Coalescence with Mutations
1.
2.
How long ago did N
haplotypes diverge?
What mutation rate
would explain the
diversity seen?
Book
• What’s ahead
• I will finish chapter 2 next Tuesday
• From there on, one of you will be responsible
for the a chapter
– Each chapter in 2 lectures (pick up the pace a bit),
a Thursday followed by a Tuesday
– You’ll have a weekend to prepare each lecture
– I will do chapter 8
4/1/2016
Comp 790– Continuous-Time Coalescence
11