Summarizing Sequential Data with Closed Partial Orders

Download Report

Transcript Summarizing Sequential Data with Closed Partial Orders

Summarizing Sequential Data with
Closed Partial Orders
Gemma Casas-Garriga
Proceedings of the SIAM International Conference on Data Mining
(SDM'05)
Advisor:Jia-Ling Koh
Speaker:Chun-Wei Hsieh
03/10/2006
1
Introduction



Closed patterns is a compact and
significative set
The number of closed patterns may be
still quite large
Summarizing closed patterns with postprocessing
2
Motivation

<(A)(C)(C)(C)(A)>,<(C)(A)(C)(C)(A)>
Which is better than the other ?
3
Main steps


Grouping Closed Sequential Patterns
Obtaining Closed Partial Orders
4
Grouping Closed Sequential Patterns



A valid pair (S, T )
S ⊆CS is a nonredundant set of closed
sequences, whose tid lists are at least T
T ⊆ D is the maximal set of
transactions where all s ∈ S are
contained.
5
Grouping Closed Sequential Patterns
6
Grouping Closed Sequential Patterns


A naive approach is
to group closed
sequences with the
same tid list
<(C)(A)(C)>?
The naive way may
miss some element

Ex: <(C)(A)(C)>
7
Grouping Closed Sequential Patterns

Let (S, T ) be a valid pair, then we have
that S =
t
tT


for all s ∈ S we have that tid(s) is at least T
It has to use the transactions of the
database
8
Grouping Closed Sequential Patterns

Given two valid pairs (S′, T′) and (S, T ),
if T ⊆ T′ then for all s′∈ S′
there exists s ∈ S s.t. s′⊆ s.
(S′, T ′)
(S, T )
9
Grouping Closed Sequential Patterns
10
Grouping Closed Sequential Patterns
11
Obtaining Closed Partial Orders


obtain a compact representation from
each valid pair (S, T )
A partial order can be modelled as a triple
p = (V,E, l)
12
Obtaining Closed Partial Orders

Given a set of sequences S and
let s, s′ ∈ S be two sequences
s = <(I1 ) (Ii ) (In )> , S' = <(I' )
1

if
(I' j )
(I'm )>
− I i = I ' j ; and,
− head (s, I ) ⋄ tail ( S' , j + 1) ⊆ S'' , for some S'' ∈ S; and,
− head (S' , j ) ⋄ tail ( s , i + 1) ⊆ S'' , for some S''∈ S.
then that position i of s matches with
position j of S' ; note it by p[i] ∼ q[j].
13
Obtaining Closed Partial Orders

S={<(A)(C)(C)(C)(A)>,<(C)(A)(C)(C)(A)>}
AC CCA
C ACCA
S
CCCA
ACACCA  S
ACC CA
CAC CA
CACCA
S
ACCCA
S
14
Obtaining Closed Partial Orders
15
Obtaining Closed Partial Orders
16
Obtaining Closed Partial Orders


Using the transitivity property to improve
the algorithm
Transitivity: Given a valid pair (S, T )
let s, S', S'' ∈ S,
if s[i] ∼S' [j] and S' [j] ∼S'' [k],
then s[i] ∼ S'' [k].
17
Simultaneity condition of input sequences
18
Experiment

3 different sequential database



Synthetic data (1000 transactions)
The command history of a unix computer
user (607 transactions)
The first chapter of the book “1984” by
George Orwell (340 transactions)
19
Experiment
20
Experiment
21
Experiment
22