Slides - Indico LAL

Download Report

Transcript Slides - Indico LAL

Structured Hidden Markov Model:
a General Framework for Modeling
Complex sequences
Ugo Galassi, Attilio Giordana e Lorenza Saitta
Dipartimento di Informatica, Università del Piemonte Orientale,
Via Bellini 25G, 15100 Alessandria, Italy.
www.edygroup.di.unipmn.it
Talk outline
- The Markovian approach to process modeling.
- The Structured HMM (S-HMM).
- S-HMM properties.
- Modeling duration, gaps and motifs
- A Comparison between S-HMM and alternative
markovian approaches
- A glance to a project based on S-HMM
The Notion of Process in Statistics
A process is a path through a set of possible states governed by an
assigned probability distribution.
First order observable Markov processes: the transition to state q(t)
depends only on the state q( t-1)
The process is fully characterized by a 4-tuple <Q, O, A, >
• Q is the set of states
• O = {oi | qi, 1  i  |Q|} is the set of output values produced in
states Q
• A is a matrix of |Q| ´ |Q| governing the transition probability for
every state pair.
•  is vector assigning to each state the probability of being the
initial state.
The Observable Markov Model
1.
2.
3.
A process is a path through a set of possible states.
A transition to a new state only depends upon the
current state.
In any state a deterministic observation is generated
g a c c t g
0|g
a01
1|a
a02
0
a22
2|c
a13 a23
1
4|t
4
a40
1 a01
3|c
a34
a40
3
0
2 a02
a14
2
a33
a44
A
a22
3
a13
4
a14
a23
a33
a34
a44
A: encodes the probability distribution of
the transitions between any state pair
Observable MPs may be too complex to estimate...
• HMM: an approximation where a single hidden state represents a subset of
observable states
HMM is a 5-tuple  = <Q, O, A, B, >
B is a matrix assigning to each state qi a the probability of producing output oj
A(obs)  A(hmm) ´ B(hmm)
• Further simplifications by setting constraints on A(hmm) and B(hmm)
Task specific models, Factorial HMM, Hierarchical HMM
HMM Formal Definition
Given a set Q of states and an alphabet V of symbols, an
HMM is fully defined by a 5-tuple = <Q,O,A, B, >,
being:
• Q a set of possible states
• O an alphabet describing the observations
• A a distribution governing the probability of transition from
any state qi to another qj belonging to Q.
• B is a distribution governing the probability of observing
symbol ok belonging to O, in every state qi.
 is a distribution specifying for every state qi the
probability of being the initial state.
The Hidden Markov Model (HMM)
• ……
• The emission is a governed by a distribution B
depending upon the current state
• The current state is no longer identified from the
current observation observation!
0
a01
1
g a c c t g
a02
a22
2
0
a13 a23
a14
3
a34
a40
4
a33
a44
1
2
3
4
a
0.0 0.6 0.2 0.1 0.0
c
0.1 0.2 0.6 0.6 0.3
g
0.6 0.1 0.0 0.2 0.0
t
0.3 0.1 0.2 0.1 0.7
B
Why to use HMM instead of OMM?
Some processes are HMMs!
State = pressure: {High, Low}
Observation: {Rain, Sun}
0.28
Low
High
0.72
0.32
A
S:H
S:L
B
S:H
0.72
0.32
O:S 0.82
0.25
S:L
0.28
0.68
O:R 0.18
0.75
S:H
S:L
0.68
Why to use HMM instead of OMM?
The corresponding HMM has more parameters to estimate!
a00
High
a00
Low
a00
a00
a00*b00
a01*b00
a11*b10
0S
1S
0R
1R
a00*b01
|Ao| = |AH| |BH|
What we Earn and what we Pay for
Earning:
The corresponding HMM has a much smaller number of
parameters to estimate.
Paying:
Less parameters => less degrees of freedom to tune the model
The HMM is a more rough approximation of the reality
The Structured HMM: S-HMM
Basic block: < Ak, Bk, Ik, Ek>
Composite block: < A0, I0, E0>
Null block
Depending on the Basic block
structure, it may be necessary
to require that Ak be a acyclic
in the composite blocks.
The Basic Functions Characterizing a HMM
Given a sequence of observations o1, o2, …, ot, ….. oT
t(i) computes the probability of being in state qi at time t after
observing o1, o2, …, ot
Recursive definition:
1) Initial step (t=1): 1(i) = i bi(O1) 1 i N
2) Inductive step: t+1(i) = [ Sk=1 t(k)aki ] bi(Ot+1 )
being 1 i N and 1 t T-1
3) Termination: P(O|) = Sk=1 T(i)
t(i) computes the probability of being in state qi at time t given
the future observation sequence ot+1, ….. oT
Functions and Revisited for S-HMM
Strong reduction of the
complexity……
Theorem: Given a S-HMM, the complexity C of function  and  is
being T the length of the sequence of observations, ni the number of
states in block Bi, and M the cardinality of the observation set O.
Advantages and Disadvantages Provided by
the S-HMM architecture?
It can be seen as a restriction of the Hierarchical HMM (Fine, Singer,
Yoram)
Advantages:
1.
2.
3.
4.
Strong reduction of the parameters to estimate
Blocks can be locally trainable/modified
Blocks learned from different data e with different methods
can be patched together.
Low computational complexity
Disadvantages:
The modeling power is reduced with respect to the
basic HMM.
Modeling Motifs
Matching states
Silent States
I
P
A
A
R
Insertion states
I
The complexity for  is O(NT)
S
E
The Problem of Modeling « Duration »
An approach originated in signal processing community:
• To add a new probability distribution modeling the permanence in a state
• The model becomes a Hidden Semi-Markov Model!
A
B
AB
A
B
A
An alternative approach is to expand a state into a sequence of states:
• Fine grain discrete time representation.
• The number of states can grow very large
AAABBBBABAAAAAAABBBAAAAA
Modeling gap duration
States can be thought of as insertion-states
(A)*
p
p( dq = t) = p
q 1-p
t
In general, it is not plausible!
P(dq = t)
1
2
3
4
5
6
1
p
1
2
1-p
p
p
p
3
4
5
1-p
1-p
7
1.0
t
8
2
1.0
3
0.25
1.0
4
1.0
5
0.8
0.75 1.0
6
7
E
0.2
P
1-p
E
The complexity for  is O(2NT)
5
An Experimental Evaluation of ’s complexity
Abstract Symbolic Description
The S-HMM architecture, which EDY can
discover from a set of sequences.
A Complex Model can be
Constructed Incrementally
S
S
G1
M1
G2
E
G3
M2
G4
M1
G2
motifs are detected as regularities
emerging from noise.
E
G1
Inside a gap new motifs can be
detected, which were not reliably
detectable consiering the entire
sequence.
M3
G5
S
G3
M2
G1
G6
G4
M1
G2
E
P(motif | S) < P(motif | S,)
« Process Profiling » case studies
c
_
h
_
i
_
i
_
The typing activity is logged
78
16
47
0
31
0
78
31
125
_ 16
s 62
_ 0
o 79
_ 125
g 78
_ 47
n 93
_ 0
a 78
_ 16
109
_ 16
d 78
_ 16
i 62
_ 0
. . .
. . .
. . .
Name and Surname
1: SSSSS...............AAAAA.........IIIIIII..........................TTTTT..........TTTTTTT..............…AAAAAA
2: SSSSSSSSSSSSSS...........AAAAAIIIIII......................TTTTT...........TTTTTTT...…AAAAAAAAAAAAAAA
3:SSSSSSSSSSSSSSS........IIIIIIAAAAAAA..................TTTTTT..........TTTTTTTT....................................................
................................................AAAAAAA
1.
2.
S-HMM for the name
S-HMM for the surname
PS
LS: 30 sequences
TS+: 90 sequences
TS-: 150 sequences
PN
An S-HMM example
Thick line denotes “motifs”, light line denotes gaps.
The model also accounts for the typical ‘mistakes’ of a user
User Profiling
1.
Ten users typed a corpus of about 3000 words
2.
Nine short “key words”, frequently occurring in the corpus have been chosen
3.
Foe every user a set U of 9 S-HMMs has been constructed (one S-HMM for
every key-word)
4.
The set U is the profile of user u
5.
Every profile U has been evaluated both when the performer is user u, and
when the performer is another user
P = t [P(Ot|w)/Pavg(u,w)]
Key words
The performer corresponds to the profile
Prob
(LOG)
The performer doesn’t correspond to the profile
N of tested key-words
www.edygroup.di.unipmn.it
Process Interaction
?
OBS1
OBS2
Further extensions
Basic HMM doesn’t consider possible conditioning on events
occurring in the “world”
Non-stationary HMM: the probability distributions A and B change over
the time (some hidden variable is conditioning A and B)
IO-HMM transitions from one state to another (distribution A) depend
upon observations generated by other processes
Conclusion
We presented the Structured Hidden Markov Model, a
restriction of the Hierarchical Hidden Markov Model
We have shown how the complexity can be drastically limited
by exploiting the structure of the model, and how incremental
learning algorithms can be designed.
Applications involving very large numbers of states have been
already deployed.
Other applications are in progress to process profiling, signal
processing and molecular biology.