Transcript Slide 1

Speeding Up Algorithms for Hidden
Markov Models by Exploiting
Repetitions
Yuri Lifshits (Caltech)
Shay Mozes (Brown Uni.)
Oren Weimann (MIT)
Michal Ziv-Ukelson (Ben-Gurion Uni.)
Hidden Markov Models (HMMs)
Sunny
Rainy
Cloudy
Hidden States : sunny, rainy, cloudy.
q1 , … , qk
Transition probabilities: the probability
of the weather given the previous day's
weather.
Pi←j = the probability to make a
transition to state i from state j
Hidden Markov Models (HMMs)
Observable states : the states of the process
that are `visible‘
Σ ={soggy, damp, dryish, dry}
Humidity in IBM
Emission probabilities : the probability
of observing a particular observable state
given that we are in a particular hidden
state.
ei(σ) = the probability to observe σєΣ
given that the state is i
Shortly:
• Hidden Markov Models are extensively used to model
processes in many fields (error-correction, speech
recognition, computational linguistics, bioinformatics)
• We show how to exploit repetitions to obtain speedup
of HMM algorithms
• Can use different compression schemes
• Applies to several decoding and training algorithms
HMM Decoding and Training
• Decoding:
Given the model (emission and transition probabilities) and an
observed sequence X, find the hidden sequence of states that is
most likely to have generated the observed sequence.
– X = dry,dry,damp,soggy…
• Training:
Given the number of hidden states and an observed sequence
estimate emission and transition probabilities Pi←j , ei(σ)
Decoding
time
states
1
1
1
1
2
2
2
2
k
k
k
k
x2
x3
xn
x1
observed
string
• Decoding: Given the model and the observed string
find the hidden sequence of states that is most likely
to have generated the observed string
Decoding – Viterbi’s Algorithm (VA)
1 2 3 4 5 6 7 8 9
time
1
states
2
3
4
5
6
a a c g a c g g t
vvv66v[4]=
P4←2
[4]=maxj{ee44(c)·P
(c)·P
[j]}
6[4]=
6[4]=
4←j·v55[2]
probability of best sequence of states that
emits first 5 chars and ends in state 2j
n
Outline
•
•
•
•
•
•
•
Overview
Exploiting repetitions
Using Four-Russions speedup
Using LZ78
Using Run-Length Encoding
Training
Summary of results
VA in Matrix Notation
v1[i]=maxj{ei(x1)·Pi←j · v0[j]}
v1[i]=maxj{Mij(x1) · v0[j]}
Viterbi’s algorithm:
Mij(σ) = ei (σ)·Pi←j
(A⊗B)ij= maxk{Aik ·Bkj}
j
ki
i
1
1
1
2
2
2
k
k
σx1
x2
v1 = M(x1) ⊗ v0 k
vn=M(xn) ⊗ M(xn-1) ⊗ ··· ⊗ M(x1) ⊗ v0
v2 = M(x2) ⊗ M(x1) ⊗ v0
O(k3n)
O(k2n)
Exploiting Repetitions
c
a
t
g
a
a
c
t
g
a
a
c
vn=M(c)⊗M(a)⊗M(a)⊗M(g)⊗M(t)⊗M(c)⊗M(a)⊗M(a)⊗M(g)⊗M(t)⊗M(a)⊗M(c)⊗v0
12 steps
• compute M(W) = M(c)⊗M(a)⊗M(a)⊗M(g) once
• use it twice!
vn=M(W)⊗M(t)⊗M(W)⊗M(t)⊗M(a)⊗M(c) ⊗v0
6 steps
Exploiting repetitions
ℓ - length of repetition W
λ – number of times W repeats in string
computing M(W) costs (ℓ -1)k3
matrix-matrix
multiplication
each time W appears we save (ℓ -1)k2
>
matrix-vector
multiplication
W is good if λ(ℓ -1)k2 > (ℓ -1)k3
number of repeats = λ > k = number of states
General Scheme
I. dictionary selection:
choose the set D={Wi } of good substrings
II. encoding:
compute M(Wi ) for every Wi in D
Offline
III. parsing:
partition the input X into good substrings
X’ = Wi1Wi2 … Win’
IV. propagation:
run Viterbi’s Algorithm on X’ using M(Wi)
Outline
•
•
•
•
•
•
•
Overview
Exploiting repetitions
Using Four-Russions speedup
Using LZ78
Using Run-Length Encoding
Training
Summary of results
Using the Four-Russians Method
Cost
I.
dictionary selection:
D = all strings over Σ of length < l
I.
O(1)
II. encoding: incremental construction
M(Wσ)= M(W) ⊗ M(σ)
II. O(2|Σ|l k3)
III. parsing:
X’ = split X to words of length l
III. O(n)
IV. propagation:
run VA on X’ using M(Wi )
IV. O(k2n / l)
Speedup:
k2n
O(2|Σ|l k3 + k2n / l) = Θ(log n)
Outline
•
•
•
•
•
•
•
Overview
Exploiting repetitions
Using Four-Russions speedup
Using LZ78
Using Run-Length Encoding
Training
Summary of results
Lempel Ziv 78
• The next LZ-word is the longest LZ-word
previously seen plus one character
• Use a trie
• Number of LZ-words is
g
a
asymptotically < n ∕ log n
c
aacgacg
g
Using LZ78
Cost
I.
dictionary selection:
D = all LZ-words in X
I.
O(n)
II. encoding: use incremental nature of LZ
M(Wσ)= M(W) ⊗ M(σ)
II. O(k3n ∕ log n)
III. parsing:
X’ = LZ parse of X
III. O(n)
IV. propagation:
run VA on X’ using M(Wi )
IV. O(k2n ∕ log n)
Speedup:
k2n
k3n ∕ log n
log n
k
a
Improvement
c
g
• Remember speedup condition: λ > k
• Use just LZ-words that appear more than k times
• These words are represented by trie nodes with more
than k descendants
• Now must parse X (step III) differently
• Ensures graceful degradation with increasing k:
Speedup: min(1,log n ∕ k)
g
Experimental Results – CpG Islands
~x5 faster:
• Short - 1.5Mbp chromosome 4 of S. Cerevisiae (yeast)
• Long - 22Mbp human Y-chromosome
Outline
•
•
•
•
•
•
•
Overview
Exploiting repetitions
Using Four-Russions speedup
Using LZ78
Using Run-Length Encoding
Training
Summary of results
Run Length Encoding
aaaccggggg → a3c2g5
Offline
| |
aaaccggggg → a2a1c2g4g1
Path traceback
• In VA, easy to do in O(n) time by keeping
track of maximizing states during computation
• The problem: we only get the states on the
boundaries of good substrings of X
• Solution: keep track of maximizing states
when computing the matrices
M(W)=M(W1) ⊗ M(W2).
Takes O(n) time and O(n’k2) space
Outline
•
•
•
•
•
•
•
Overview
Exploiting repetitions
Using Four-Russions speedup
Using LZ78
Using Run-Length Encoding
Training
Summary of results
Training
•
Estimate model θ = {Pi←j , ei(σ)} given X.
–
•
find θ that maximize P(X | θ).
Use Expectation Maximization:
1. Decoding using current θ
2. Use decoding result to update θ
VA Training
• Aij = #of times state i follows state j in the most likely
sequence of states.
• Ei(σ) = #of times the letter σ is emitted by the state i in the
most likely sequence.
•
Each iteration costs
O( VA + n + k2)
Decoding (bottleneck)
speedup!
path traceback +
update Pi←j , ei(σ)
The Forward-Backward Algorithm
–
The forward algorithm calculates ft[i] the probability to
observe the sequence x1, x2, …, xt requiring that the t’th state
is i.
ft=M(xt) ● M(xt-1) ● … ● M(x1) ● f0
–
The backward algorithm calculates bt[i] the probability to
observe the sequence xt+1, xt+2, …, xn given that the t’th state
is i.
bt= bn ● M(xn) ● M(xn-1) ● … ● M(xt+2) ● M(xt+1)
Baum Welch Training (in a nutshell)
• Aij =
Σt
ft [j] ● Pi←j ● ei(xt+1) ● bt+1[i]
• each iteration costs:
O( FB + nk2)
Decoding O(nk2)
path traceback +
update Pi←j , ei(σ)
• If substring W has length l and repeats λ times
2
satisfies:
lk

l  k2
then can speed up the entire process by precalculation
Outline
•
•
•
•
•
•
•
Overview
Exploiting repetitions
Using Four-Russions speedup
Using LZ78
Using Run-Length Encoding
Training
Summary of results
Summary of results
•
•
•
•
•
•
•
•
•
•
•
General framework
Four-Russians
LZ78
RLE
Byte-Pair Encoding
SLP, LZ77
Path reconstruction
Forward-Backward
Viterbi training
Baum-Welch training
Parallelization
log(n)
log(n) ∕ k
r ∕ log(r)
r
r/k
O(n)
same speedups
same speedups
speedup, many details
Thank you!