Transcript tutorial5

Markov Chains
Algorithms in Computational Biology
Spring 2006
Slides were edited by Itai Sharon from Dan Geiger and Ydo Wexler
1
Dependencies Along
Biological Sequences

So far we assumed every letter in a sequence is
sampled randomly from some distribution q()


There are special subsequences in the genome in
which dependencies between nucleotides exist



This model could suffice for alignment scoring, but it is not the case in
true genomes.
Example 1: TATA within the regulatory area, upstream a gene.
Example 2: CG pairs
We model such dependencies by Markov chains and
hidden Markov Models (HMMs)
2
Markov Chains

A chain of random variables in which the next one
depends only on the current
 Given X=x1…xn, then P(xi|x1…xi-1) =P(xi|xi-1)
X1

X2
Xn-1
Xn
The general case: kth–order Markov process
 Given X=x1…xn, then P(xi|x1…xi-1) =P(xi|xi-1…xi-k)
3
Markov Chains

An integer time stochastic process, consisting of
a domain D of m>1 states {s1,…,sm} and



An m dimensional initial distribution vector (p(s1),.., p(s,)).
An m x m transition probabilities matrix M = (aij)
For example:



D can be the letters {A, C, G, T}
p(A) the probability of A to be the 1st letter in a sequence
aAG the probability that G follows A in a sequence.
4
Markov Chains

For each integer n, a Markov Chain assigns
probability to sequences (x1…xn) over D (i.e, xiD)
as follows:
n
p (( x1 , x2 ,...xn ))  p( X 1  x1 ) p( X i  xi | X i 1  xi 1 )
n
 p( x1 ) axi1xi
i 2
i 2

Similarly, (x1…xi…) is a sequence of probability
distributions over D. There is a rich theory which
studies the properties of these sequences.
5
Matrix Representation

The transition probabilities Matrix M=(ast)
 M is a stochastic Matrix:
A
B
0.95
0
 t ast  1
A

The initial distribution vector
(U1…Um) defines the distribution
of X1 P(X1= si)=Ui
B
C
D

C
D
0.05
0
0.2
0.5
0
0.3
0
0.2
0
0.8
0
0
1
0
Then after one move, the distribution changes to
x2=x1M
6
Matrix Representation

Example:


if X1=(0, 1, 0, 0)
Then X2=(0.2, 0.5, 0, 0.3)
And if X1=(0, 0, 0.5, 0.5)
then X2=(0, 0.1, 0.5, 0.4).
A
B
C
D

A
B
C
0.95
0
0.05
0
0.2
0.5
0
0.3
0
0.2
0
0.8
0
0
1
0
D
The ith distribution is Xi=X1Mi-1
7
Representing a Markov Model
as a Digraph
0.95
A
0.2
0.5
B
0.05
0.2
0.3
0.8
1
C
D
A
B
C
0.95
0
0.05
0
0.2
0.5
0
0.3
0
0.2
0
0.8
0
0
1
0
Each directed edge AB is associated with the
transition probability from A to B.
D
8
Markov Chains – Weather
Example


Weather forecast:

raining today

No rain today




40% rain tomorrow
60% no rain tomorrow
20% rain tomorrow
80% no rain tomorrow
Stochastic FSM:
0.6
0.4
rain
0.8
no rain
9
0.2
Markov Chains – Gambler
Example

Gambler starts with 10$

At each play we have one of the following:



Gambler wins 1$ with probability p
Gambler looses 1$ with probability 1-p
Game ends when gambler goes broke, or gains a
fortune of 100$
p
0
1
1-p
p
p
99
2
1-p
p
100
1-p
1-p
Start
(10$)
10
Properties of Markov Chain
States

States of Markov chains are classified by the digraph
representation (omitting the actual probability
values)

Recurrent states:


s is recurrent if it is accessible from
all states that are accessible from s.
C and D are recurrent states.
Transient states:

“s is transient” if it will be visited
a finite number of times as n.
A and B are transient states.
11
Irreducible Markov Chains

A Markov Chain is irreducible if the corresponding
graph is strongly connected (and thus all its states
are recurrent).
E
12
Properties of Markov Chain
states

A state s has a period k if k is the GCD of the lengths
of all the cycles that pass via s.

Periodic states


A state is periodic if it has a period k>1.
in the shown graph the period of A is 2.
Aperiodic states

E
A state is aperiodic if it has
a period k=1.
in the shown graph the period
of F is 1.
13
Ergodic Markov Chains

A Markov chain is ergodic if:



the corresponding graph is irreducible.
It is not peridoic
Ergodic Markov Chains are important

they guarantee the corresponding Markovian process converges
to a unique distribution, in which all states have strictly positive
probability.
14
Stationary Distributions for
Markov Chains

Let M be a Markov Chain of m states, and let V=(v1,…, vm)
be a probability distribution over the m states

V=(v1,…, vm) is stationary distribution for M if VM=V.
 one step of the process does not change the distribution
V is a stationary distribution
V is a left (row) Eigenvector of M with Eigenvalue 1
15
“Good” Markov chains

A Markov Chains is good if the distributions Xi
satisfy the following as i:



converge to a unique distribution, independent of the initial
distribution
In that unique distribution, each state has a positive probability
The Fundamental Theorem of Finite Markov Chains:

A Markov Chain is good  the corresponding graph is ergodic.
16
“Bad” Markov Chains

A Markov chains is not “good” if either :



It does not converge to a unique distribution
It does converge to a unique distribution, but some states in this
distribution have zero probability
For instance:


Chains with periodic states
Chains with transient states
17
An Example: Searching the
Genome for CpG Islands

In the human genome, the pair CG appears less than
expected



Due to biological reasons, this process is sometimes
suppressed in short stretches of genome


the pair CG often transforms to (methyl-C) G which often transforms
to TG.
Hence the pair CG appears less than expected from independent
frequencies of C and G alone.
such as in the start regions of many genes.
These areas are called CpG islands (p denotes “pair”).
18
CpG Islands

We consider two questions (and some variants):



Question 1: Given a short stretch of genomic data, does it come
from a CpG island ?
Question 2: Given a long piece of genomic data, does it contain
CpG islands in it, where, what length?
We “solve” the first question by modeling strings with
and without CpG islands as Markov Chains


States are {A,C,G,T} but
Transition probabilities are different
19
CpG Islands

The “+” model



Use transition matrix A+=(a+st), Where:
a+st = (the probability that t follows s in a CpG island)
The “-” model


Use transition matrix A-=(a-st), Where:
A-st = (the probability that t follows s in a non CpG island)
20
CpG Islands

To solve Question 1 we need to decide whether a
given short sequence of letters is more likely to
come from the “+” model or from the “–” model.

This is done by using the definitions of Markov
Chain, in which the parameters are determined by
known data and the log odds-ratio test.
21
CpG Islands – the “+” Model

We need to specify p+(xi|xi-1) where + stands for CpG
Island. From Durbin et al we have:
A
C
G
T
A
0.180
0.274
0.426
0.120
C
0.171
0.368
0.274
0.188
G
0.161
0.339
0.375
0.125
T
0.079
0.355
0.384
0.182
22
CpG Islands – the “-” Model

p-(xi|xi-1) for non-CpG Island is given by:
A
C
G
T
A
0.300
0.205
0.285
0.210
C
0.322
0.298
0.078
0.302
G
0.248
0.246
0.298
0.208
T
0.177
0.239
0.292
0.292
23
CpG Islands
X1

X2
XL-1
XL
Given a string X=(x1,…, xL), now compute the ratio
L 1
RATIO 
p ( x |  model)

p ( x |  model)
p
 ( xi 1
| xi )
i 0
L 1
 p (x

i 1
| xi )
i 0


RATIO>1  CpG island is more likely
RATIO<1  non-CpG island is more likely.
24