Transcript ppt

Genome evolution:
a sequence-centric approach
Lecture 3: From Trees to HMMs
Web: www.wisdom.weizmann.ac.il/~atanay/GenomeEvo
Access ppts and ex. directly: /home/atanay/public_html/GenomeEvo/
Subscribe to course messages: [email protected]
(Continuous time)
Markov Chaing
Simple Tree Models
(Probability, Calculus/Matrix theory, some graph theory, some statistics)
Probabilistic models
Genome structure
Inference
Mutations
Parameter estimation
Population
Inferring Selection
Course outline
Stochastic Processes and Stationary Distributions
Process
Model
t
Stationary
Model
Inference on trees and the EM algorithm: summary
Inference using dynamic programming (up-down Message passing):
 up( x
up ( xi ) 
left i
) Pr( xleft i | xi )up ( xrighti ) Pr( xrighti | xi )
xleft i , xrig h ti
 down( x
down( xi ) 
pa i
) Pr( xi | xpa i )up ( xsibi ) Pr( xsibi | xpa i )
xp a i , xsib i
up ( si )  observed ?1 : 0 down(root )  Pr( root )
Marginal/Posterior probabilities:
1
down( xi )up( xi )
Z
1
Pr( xi , xpa i | s)   up( xsibi )down( xpa i )up( xi ) Pr( xsibi | xpa i ) Pr( xi | xpa i )
Z xsibi
Pr( xi | s) 
Pr( s |  )   up(root )down(root )
root
The EM update rule (conditional probabilities per lineage):
Pr( xi | xpa i ,  k 1 ) 
 Pr(x , x
j 1,.., N
i
pa i
| s j , k ) /
 Pr(x
j 1,.., N
pa i
| s j , k )
Bayesian Networks
Defining the joint probability for a set of random variables given:
1) a directed acyclic graph G  ( X , pa)
2) Conditional probabilities Pr( xi | pa xi )
Pr( x)   i Pr( xi | pa xi )
Claim: if G have no cycles,
 i Pr( xi | pa xi )  1
x
whiteboard/
exercise
Claim: if G is a tree, we can compute marginals and total probability efficiently
Proof: exactly what we did last time..
Claim: For General G, inference is NP hard
Why the up-down will not work?
whiteboard/
exercise
We will discuss methods for approximate inference in detail later, now, lets look for
more easy cases
Markov Models
xt the state at time t
Transition probabilities are defining the process
Add an initial condition to define a distribution on infinite sequences:
Pr( xt 1 | xt ), Pr( x0 )  Pr( x0 , x1 ,.., xt ,....)
Problem: we observe finite sequences…and infinite probability spaces are difficult to work with
Solution: add an absorbing finish state. Add start state to express probability at time 0.
Pr( s)  Pr( start , s1 ,.., s n , finish, finish...)
Caution! This is NOT
the HMM Bayes Net
Hidden Markov Models
1.Cycles
2.States are NOT random vars!
Emission space
Observing only emissions of states to some probability space E
Each state is equipped with an emission distribution Pr( e | x ) (x a state, e emission)
Pr( s, e)   i Pr( s i | s i 1 ) Pr(ei | s i )
Hidden Markov Models
The HMM can be viewed as a template-model
Given a sequence of observations or just its length, we can form a BN
Since the BN will be have a tree topology, we know how to compute posteriors
Start
States
Emissions
Finish
Inference in HMM
Basically, exactly what we saw for trees
Forward formula: (like the down alg):
f si 

f si'1 Pr( s | s' ) Pr(ei 1 | s' )
s 'N ( s )
f s0  ( s  start ) ?1 : 0
Backward formula: (like the up alg):
bsi 
i 1
i
b
Pr(
s
'
|
s
)
Pr(
e
| s)
 s'
s 'N ( s )
bsN  ( s  finish ) ?1 : 0
Start
States
EM for HMMs
Emissions
Can we apply the tree EM verbatim?
Almost, but we have to share parameters:
Pr( s | s' , k 1 ) ~  f si'bsi 1 Pr(ei | s' , k ) Pr( s | s' , k )
i
Pr(e | s, k 1 ) ~  f si bsi Pr(ei | s, k ) Pr(ei )
i
Claim: HMM EM is monotonically improving the likelihood
(i.e., sharing parameters is ok)
Finish
Hidden states
Example:
Two Markov models describe our data
Switching between models is occurring at random
How to model this?
No Emission
Hidden state
Hidden states
f si( emitting) 
f si( hidden) 
i 1
s'
s '( emitting)
Pr( s | s ' ) Pr(e i 1 | s ' ) 
f
Pr( s | s ' ) Pr(e i | s ' ) 
b
Pr( s ' | s ) Pr(e i | s ) 
i
s'
s '( emitting)
bsi ( emitting) 
bsi ( hidden) 
f
Hidden
i 1
s'
s '( emiiting)
b
i 1
s'
s '( emitting)
Pr( s ' | s ) 
b
i
s'
s '( hidden)
What about hidden cycles?
f
i 1
s'
s '( hidden)
Pr( s | s ' )
f
Pr( s | s ' )
b
Pr( s ' | s ) Pr(e i | s )
i
s'
s '( hidden)
i
s'
s '( hidden)
Pr( s ' | s )
Emitting
Profile HMM for Protein or DNA motifs
D
D
I
S
M
D
I
M
D
I
M
D
I
M
D
I
M
I
M
•M (Match) states emit a certain amino acid/nucleotide profile
•I (Insert) states emit some background profile
•D (Delete) states are hidden
•Can use EM to train the parameters from a set of examples
•The use the model for classification or annotation
•(Both emissions and transition probabilities are informative!)
•(How do we determine the right size of the model?)
(google PFAM, Prosite, “HMM profile protein domain”)
F
N-order Markov model
•For evolutionary models, the Markov property makes much sense
•For spatial (sequence) effects, the Markov property is a (horrible) heuristic
•N-order relations can be modeled naturally
Common error:
Forward/Backward in N-order HMM. Dynamic programming would work?
1-HMM Bayes Net:
Start
States
Finish
Emissions
2-HMM Bayes Net:
Start
Finish
(You shall explore the inference problem in Ex 2)
Pair-HMM
Given two sequences s1,s2, an alignment is defined by a set of ‘gaps’ (or
indels) in each of the sequences.
indel
ACGCGAACCGAATGCCCAA---GGAAAACGTTTGAATTTATA
ACCCGT-----ATGCCCAACGGGGAAAACGTTTGAACTTATA
indel
Affine gap cost:
  l
Standard distance metric:
Substitution matrix:  (a, b) ~ log(Pr( a  b))
d (l 1 , l 2 , s1 , s 2 )   ( s1[l 1i ], s2 [l i2 ])  (# gaps) 
i
Standard dynamic programming algorithm compute the best alignment given
such distance metric:
d (s1 , s 2 )  min
d (l1 , l 2 , s1 , s 2 )
l 1 ,l 2
Pair-HMM
Generalize the HMM concept to probabilistically model alignments.
Problem: we are observing two sequences, not a-priori related. What will be
emitted from our HMM?
G1
M
S
F
G2
Match states emit and aligned nucleotide pair
Gap states emit a nucleotide from one of the sequences only
Pr(M->Gi) – “gap open cost”, Pr(G1->G1) – “gap extension cost”
Is it a BN template?
Forward-backward formula?
Whiteboard/
Exercise
Mixture models
P( x |  )   pi N ( x; i ,  i )
i
Inference?
EM for Parameter estimation?
What about very high dimensions?
Whiteboard/
Exercise