Transformation of Markov Jump Processes and Applications in

Download Report

Transcript Transformation of Markov Jump Processes and Applications in

Transformation of Markov
Jump Processes and
Applications in Genetic Study
Zhi-Ming Ma
7ICSSA,2014 Aug 7,Seoul
Email: [email protected]
http://www.amt.ac.cn/member/mazhiming/index.html
The talk is base on our recent joint papers:
• A Transformation of Markov Jump Processes and
Applications in Gnetic Study, Discrete and Continuous Dynamical
Systems - Series A, 2014, Vol. 34, no. 12, 4997 – 5043
Xian Chen, Zhi-Ming Ma
• Markov Jump Processes in Modeling Coalescent with
Recombination Annals of Statistics 2014, Vol. 42, No. 4, 1361-1393
Xian Chen, Zhi-Ming Ma, Ying Wang
,
• A Model for Coalescent with Recombination,
BMC Bioinformatics, accepted, 2014.
Ying Wang, Ying Zhou, Linfeng Li, Xian Chen,
Yutig Liu, Zhi-Ming Ma, Shuhua Xu
1. Academy of Math and Systems Science, CAS 2. CAS-MPG Partner Institute for Computational Biology
3: Beijing Jiaotong University,
M. Sharpe (1988)
General theory of Markov Processes
The result may be traced back to E.B. Dynkin (1965),
Markov Process, Section 6 of Chapter 10
Generator of Markov Jump process
-----q-pair (q(x), q(x,A))
A Markov jump process is called a q-process,
if its transition function satisfies (2) and (3).
q-pair and q-processes
• For rigorous definition and discussion of the q-pair
and q-process we refer to e.g. [Chen 204]. The well
known uniqueness problem of q-processes was
completely solved by Chinese experts in the last
century (c.f. [Hou 1974] [Chen-Zheng 1983].
• q-pair plays the role of generator in the study of
Markov jump processes. In practice sometimes it is
more convenient to collect the information of q-pairs,
or one can only obtain the information of q-pairs.
Therefore it is worth to investigate the transform
of Markov jump processes in terms of q-pairs
q-consistency
In most situation of applications (e.g. both
the state spaces are complete separable
metric spaces) we need only to check weak
q-consistency.
The proof of (ii) employs the well known
Blackwell Theorem
Main Results
Discussion and Examples
Further Remark:
Lumpability of Markov Chain
Motivation and
Applications
Markov Jump Processes in Modeling
Coalescent with Recombination ,
Xian Chen, Zhi-Ming Ma, Ying Wang ,
Annals of Statistics 2014, Vol. 42, 1361-1393
Background : What is recombination?
Germ cells

Recombination is a
process by which a
molecule of nucleic
acid (usually DNA,
but can also be RNA)
is broken and then
joined to a different
one.
Chromosome
breaks up
gamete 1
gamete 2
Why study recombination?
• An important mechanism generating and
maintaining diversity
• One of the main sources to provide new
genetic material to let nature selection carry
on
Mutation
Selection
Recombination
Application of recombination
information

DNA sequencing


Disease study


Identify the alleles that are co-located on the same
chromosome
Estimate disease risk for each region of genome
Population history study

Discover admixture history

Reconstruct human phylogeny
Statistical Inference of Recombination
The phenomenon of recombination is extremely complex.
Simulation methods are
indispensable in the statistical
inference of recombination.
--can be applied to exploratory data
analysis.
Samples simulated under various models
can be combined with data to test hypotheses.
-- can be used to estimate recombination rate.
Basic model assumption


Wright-fisher model with recombination

The population has constant size N,

With probability 1-r, uniformly choose one parent to copy from
(no recombination happens), with probability r, two parents are
chosen uniformly at random, and a breakpoint s is chosen by a
specified density (recombination event happens ).
Continuous model is obtained by letting N tends to
infinity. Time is measured in units of 2N, and the
recombination rate per gene per generation r is scaled
by 2rN=constant. The limit model is a continuous
time Markov jump process .
Model the sequence data
Without recombination
–
Sequence can be regarded as a point
With recombination
– Sequence should be regarded as an
interval [0,1)
Coalescent without recombination

Trace the ancestry of the
samples

Markov jump process --
Coalescent
Process
(Kingmann 1982)
A realization for sample
of size 5
Coalescent with recombination
Back in time model

First proposed (Hudson 1983)

Ancestry recombination graph (ARG)
(Griffiths R.C., Marjoram P. 1997)

Software: ms (Hudson 2002)
Spatial model along sequences

Point process along the sequence
(Wuif C., Hein J. 1999)

Approximations: SMC(2005)、
SMC’(2006)、MaCS(2009)
Resulting structure: ARG
Back in time model
• Merit
Due to the Markov property, it is
computationally straightforward and simple
• Disadvantage
It is hard to make approximation, hence it is
not suitable for large recombination rate
Spatial model along sequences
• Merits
- the spatial moving program is easier to approximate
- approximations: SMC(2005)、SMC’(2006)、
MaCS(2009)
• Disadvantages
- it will produce redundant branches
- complex non-Markovian structure
- the mathematical formulation is cumbersome and
up to date no rigorous mathematical formulation
Our model: SC algorithm
• SC is also a spatial algorithm
• SC does not produce any redundant
branches which are inevitable in Wuif
and Hein’s algorithm.
• Existing approximation algorithm (SMC,
SMC’, MaCS) are all special cases of our
model.
Rigorous Mathematical Model
• We prove rigorously for the first time that the
statistical properties of the ARG generated by
our spatial moving model and that generated
by a back in time model are the same: they
share the same probability distribution on the
space of ARG
• Provides a unified interpretation for the
algorithms of simulating coalescent with
recombination.
Mathematical models
• Markov jump process behind back in time model
- state space
- existence of Markov jump process
- sample paths concentrated on G
• Point process
corresponding to the
spatial model
- construct
on G
- projection of q-processes
(a special transformation)
- distribution of
• Identify the probability distribution
Back in time model
Starts at the present
and performs backward
in time generating
successive waiting
times together with
recombination and
coalescent events until
GMRCA (Grand Most
Recent Common
Ancestor)
State space of the process
0.4
0.4
𝑓1 = 1 𝐼[0,0.4) , 𝑓2 =
2 𝐼[0,0.4) + 1,2 𝐼[0.4,1) , 𝑓3 = 3 𝐼[0,1)
0.4
𝑓1 = 1 𝐼[0,0.4) , 𝑓2 = 1 𝐼[0.4,1) , 𝑓3 =
2 𝐼[0,1) , 𝑓4 = 3 𝐼[0,1)
𝑓1 = 1 𝐼[0,1) , 𝑓2 = 2 𝐼[0,1) , 𝑓3 = 3 𝐼[0,1)
State space of the process
Introduce suitable operators on E
avoiding
redundant
recombination
coalescence
Q-pair of the Markov Jump Process
Define further
Key point: prove that
Uniqueness of the q-process
Hence the q-pair is regular.
Theorem 2.25 in [Chen 2004]
We have proved that our q-pair satisfies the
above condition. Hence the q-process is unique.
Existence of the q-process
Intuitively the q-process will arrive at the absorbing state in at
most finite many jumps. A rigorous proof needs a result in
[Chen2004] concerning monotonicity .
Monotonicity
(application of order preserving coupling)
Define
𝑃1 t, x, ∆
→ 1 𝑎𝑠 𝑡 → ∞
Monotonicity
(application of order preserving coupling)
We define
By [Chen 2004] Th.5.47 one can check that
X(t) will almost surely arrive
at
in at most finite many jumps
Monotonicity
(application of order preserving coupling)
Hypothesis 5.46 in [Chen 2004]
Theorem 5.47 in [Chen 2004]:
Point process corresponding to spatial model: construct on
0.7
0.4
0.7
0.4 0.7
0.4
0.7
0.7
0.4
Point process corresponding to spatial model: construct on
T11
11  ({3},{3})
0.7
0.4
0.7
0.4 0.7
0.4
0.7
0.4
0.7
Z 1  ((T01 ,  01 ), (T11 , 11 ))
T01
01  ( ,{2})
0.4
 1  ( f ( S0 ), f ( S1 ))
Point process corresponding to spatial model: construct on
T32
32  ({1, 2},{2},{1})
0.7
0.4
T22
 22  ( ,  ,{1})
0.7
0.4 0.7
0.4
0.7
0.7
0.4
T12
12  ({2},  ,{1})
2
0
T
02  ( ,  ,{1})
0.7
0.4
Z 2  ((T02 , 021 ), (T12 , 12 ), (T22 ,  22 ), (T32 , 32 )),  2  ( f ( S0 ), f ( S1 ), f ( S2 ))
Projection of q-processes
Projection of q-processes
a Markov jump process ?
Point process corresponding to spatial model: thinning of the
point process
Waiting time is exponentially distributed with parameter
depends on the total length of the current local tree
Point process 𝑆𝑖 ,𝑍 𝑖 :𝑖 ≥ 0 corresponding to
spatial model: the distribution of 𝑆𝑖 ,𝑍 𝑖 :𝑖 ≥ 0
Sketch of the proof :
Step 1:
Point process 𝑆𝑖 ,𝑍 𝑖 :𝑖 ≥ 0 corresponding to
spatial model: the distribution of 𝑆𝑖 ,𝑍 𝑖 :𝑖 ≥ 0
Step 2: Let 𝛼𝑚 = 2−𝑚 ([2𝑚 𝑆𝑖 ]+1), calculate
Step 3: Approximation
Point process 𝑆𝑖 ,𝑍 𝑖 :𝑖 ≥ 0 corresponding to
spatial model: the distribution of 𝑆𝑖 ,𝑍 𝑖 :𝑖 ≥ 0
The position of
on the current local tree
is uniformly distributed on the local tree
Point process 𝑆𝑖 ,𝑍 𝑖 :𝑖 ≥ 0 corresponding to
spatial model: the distribution of 𝑆𝑖 ,𝑍 𝑖 :𝑖 ≥ 0
Sketch of the proof:
Step 1: Let 𝛼𝑚 = 2−𝑚 ([2𝑚 𝑆𝑖 ]+1), 𝜎𝑛 = 2−𝑛 ([2𝑛 𝑆𝑖 ]+1). Define
Then one can check that
Point processes 𝑆𝑖 ,𝑍 𝑖 :𝑖 ≥ 0 corresponding to
spatial model: the distribution of 𝑆𝑖 ,𝑍 𝑖 :𝑖 ≥ 0
Step 2:
Point process 𝑆𝑖 ,𝑍 𝑖 :𝑖 ≥ 0 corresponding to
spatial model: the distribution of 𝑆𝑖 ,𝑍 𝑖 :𝑖 ≥ 0
The proof is cumbersome and complicated.
Point process 𝑆𝑖 ,𝑍 𝑖 :𝑖 ≥ 0 corresponding to
spatial model: the distribution of 𝑆𝑖 ,𝑍 𝑖 :𝑖 ≥ 0
It will coalesce to any existing branches
independently with rate 1.
Point processes 𝑆𝑖 ,𝑍 𝑖 :𝑖 ≥ 0 corresponding to
spatial model: the distribution of 𝑆𝑖 ,𝑍 𝑖 :𝑖 ≥ 0
It will move along the branch and leave it
with rate
Point process 𝑆𝑖 ,𝑍 𝑖 :𝑖 ≥ 0 corresponding to
spatial model: the distribution of 𝑆𝑖 ,𝑍 𝑖 :𝑖 ≥ 0
If it does not leave the branch before 𝜀 𝑔 ,
it will move along to the next branch.
SC algorithm
•
•
•
•
: a standard coalescent
: the th recombination point
: the ARG constrained on
: the extra branched on than
• The whole ARG can be considered as a point
process
Identical probability distribution
of the back in time model and spatial model
Identical probability distribution
of the back in time model and spatial model
Thank you !