Transcript Slide 1

Speeding Up Algorithms for Hidden
Markov Models by Exploiting
Repetitions
Shay Mozes
Oren Weimann (MIT)
Michal Ziv-Ukelson (Tel-Aviv U.)
Shortly:
• Hidden Markov Models are extensively used to model
processes in many fields
• The runtime of HMM algorithms is usually linear in
the length of the input
• We show how to exploit repetitions to obtain speedup
• First provable speedup of Viterbi’s algorithm
• Can use different compression schemes
• Applies to several decoding and training algorithms
2
Markov Models
P1←1 = 0.9
• states
q1 , … , qk
• transition probabilities
Pi←j
• emission probabilities
ei(σ) σєΣ
P2←1 = 0.1
P2←2 = 0.8
q2
q1
P1←2 = 0.2
e1(A) = 0.3
e2(A) = 0.2
e1(C) = 0.2
e2(C) = 0.3
e1(G) = 0.2
e2(G) = 0.3
e1(T) = 0.3
e2(T) = 0.2
• time independent, discrete, finite
3
Hidden Markov Models
time
states
1
1
1
1
2
2
2
2
k
k
k
k
x2
x3
xn
x1
observed
string
• We are only given the description of the model and
the observed string
• Decoding: find the hidden sequence of states that is
most likely to have generated the observed string
4
Decoding – Viterbi’s Algorithm
1 2 3 4 5 6 7 8 9
time
n
1
states
2
3
4
5
6
a a c g a c g g t
vvv66v[4]=
P4←2
[4]=maxj{ee44(c)·P
(c)·P
[j]}
6[4]=
6[4]=
4←j·v55[2]
probability of best sequence of states that
emits first 5 chars and ends in state 2j
5
Outline
•
•
•
•
•
Overview
Exploiting repetitions
Using LZ78
Using Run-Length Encoding
Summary of results
6
VA in Matrix Notation
v1[i]=maxj{ei(x1)·Pi←j · v0[j]}
v1[i]=maxj{ Mij(x1) · v0[j]}
Mij(σ) = ei (σ)·Pi←j
(A⊗B)ij= maxk{Aik ·Bkj
}
Viterbi’s algorithm:
O(k2n)
⊗ M(x) )⊗⊗v v
vn=M(xn) ⊗ M(xn-1) ⊗v···
1 = M(x
11
00
O(k3n)
v2 = M(x2) ⊗ M(x1) ⊗ v0
7
Exploiting Repetitions
c
a
t
g
a
a
c
t
g
a
a
c
vn=M(c)⊗M(a)⊗M(a)⊗M(g)⊗M(t)⊗M(c)⊗M(a)⊗M(a)⊗M(g)⊗M(t)⊗M(a)⊗M(c)⊗v
12 steps
• compute M(W) = M(c)⊗M(a)⊗M(a)⊗M(g) once
• use it twice!
vn=M(W)⊗M(t)⊗M(W)⊗M(t)⊗M(a)⊗M(c) ⊗v0
6 steps
8
Exploiting repetitions
ℓ - length of repetition W
λ – number of times W repeats in string
computing M(W) costs (ℓ -1)k3
matrix-matrix
multiplication
each time W appears we save (ℓ -1)k2
>
matrix-vector
multiplication
W is good if λ(ℓ -1)k2 > (ℓ -1)k3
number of repeats λ > k number of states
9
General Scheme
I. dictionary selection:
choose the set D={Wi } of good substrings
II. encoding:
compute M(Wi ) for every Wi in D
Offline
III. parsing:
partition the input X into good substrings
X = Wi1Wi2 … Win’
X’ = i1,i2, … ,in’
IV. propagation:
run Viterbi’s Algorithm on X’ using M(Wi)
10
Outline
•
•
•
•
•
Overview
Exploiting repetitions
Using LZ78
Using Run-Length Encoding
Summary of results
11
LZ78
• The next LZ-word is the longest LZ-word
previously seen plus one character
• Use a trie
g
a
• Number of LZ-words is
asymptotically < n ∕ log n
aacgacg
c
g
12
Using LZ78
Cost
I.
dictionary selection:
D = words in LZ parse of X
I.
O(n)
II. encoding: use incremental nature of LZ
M(Wσ)= M(W) ⊗ M(σ)
II. O(k3n ∕ log n)
III. parsing:
X’ = LZ parse of X
III. O(n)
IV. propagation:
run VA on X’ using M(Wi )
IV. O(k2n ∕ log n)
Speedup:
k2n
k3n ∕ log n
log n
k
13
a
Improvement
g
c
g
• Remember speedup condition: λ > k
• Use just LZ-words that appear more than k times
• These words are represented by trie nodes with more
than k descendants
• Now must parse X (step III) differently
• Ensures graceful degradation with increasing k:
Speedup: min(1,log n ∕ k)
14
Experimental results
~x5 faster:
• Short - 1.5Mbp chromosome 4 of S. Cerevisiae (yeast)
• Long - 22Mbp human Y-chromosome
15
Outline
•
•
•
•
•
Overview
Exploiting repetitions
Using LZ78
Using Run-Length Encoding
Summary of results
16
Run Length Encoding
aaaccggggg → a3c2g5
aaaccggggg → a2a1c2g4g1
17
Summary of results
•
•
•
•
•
•
•
•
•
General framework
LZ78
log(n) ∕ k
RLE
r ∕ log(r)
Byte-Pair Encoding
r
Path reconstruction
O(n)
F/B algorithms (standard matrix multiplication)
Viterbi training
same speedups apply
Baum-Welch training
speedup, many details
Parallelization
18
Thank you!
Any questions?
19
Path traceback
• In VA, easy to do in O(n) time by keeping
track of maximizing states during computation
• The problem: we run VA on X’, so we get the
sequence of states for X’, not for X.
we only get the states on the boundaries of
good substrings of X
• Solution: keep track of maximizing states
when computing the matrices M(w).
Takes O(n) time and O(nk2) space
20
Training
•
•
Estimate unknown parameters Pi←j , ei(σ)
Use Expectation Maximization:
1. Decoding
2. Recalculate parameters
•
Viterbi Training: each iteration costs
O( VA + n + k2)
Decoding (bottleneck)
speedup!
path traceback +
update Pi←j , ei(σ)
21
Baum Welch Training
• each iteration costs:
O( FB + nk2)
Decoding O(nk2)
path traceback +
update Pi←j , ei(σ)
• If substring w has length l and
2 repeats λ times
lk
satisfies:

l k
2
then can speed up the entire process by precalculation
22