Making Time-series Classification More Accurate Using Learned

Download Report

Transcript Making Time-series Classification More Accurate Using Learned

Making Time-series Classification
More Accurate Using Learned
Constraints
© Chotirat “Ann” Ratanamahatana
Eamonn Keogh
2004 SIAM International Conference on DATA MINING
April 22, 2004
Roadmap
• Time series and their similarity measures
• Euclidean distance and its limitation
• Dynamic time warping (DTW)
• Global constraints
• R-K band
• Experimental Evaluation
• Conclusions and future work
Important Note!
You are free to use any slides in this talk for teaching
purposes, provide that the authorship of the slides is clearly
attributed to Ratanamahatana and Keogh.
You may not use any text or images contained here in a
paper (including tech reports or unpublished works) or
tutorial, without the express permission of Dr.Keogh.
Chotirat Ann Ratanamahatana and Eamonn Keogh. Making Time-series
Classification More Accurate Using Learned Constraints. In proceedings of SIAM
International Conference on Data Mining (SDM '04), Lake Buena Vista, Florida,
April 22-24, 2004. pp. 11-22
Classification in Time Series
Classification, in general,
maps data into predefined
groups (supervised learning)
Will this person buy a
computer?
Age
Income
Student
CreditRating
Class: buy comp.
28
High
No
Fair
No
25
High
No
Excellent
No
35
High
No
Fair
Yes
45
Medium
No
Excellent
No
18
Low
Yes
Fair
Yes
49
High
No
Fair
??
Pattern Recognition is a
type of supervised
classification where an
input pattern is classified
into one of the classes
based on its similarity to
these predefined classes.
Which class does
belong to?
Class A
Class B
Euclidean Distance Metric
Given 2 time series
Q = q1, …, qn and
C = c1, …, cn
their Euclidean distance is
defined as
1.5
1
0.5
0
Q
-0.5
C
-1
-1.5
0
50
100
150
0
50
100
150
1.5
1
D(Q, C ) 
n
2
(
q

c
)
 i i
i 1
0.5
0
-0.5
-1
-1.5
Limitations of Euclidean Metric
Very sensitive to some
distortion in the data
Training data consists
of 10 instances from
each of the 3 classes
Perform a 1-nearest neighbor
algorithm, with “leaving-one-out”
evaluation, averaged over 100 runs.
Euclidean distance Error rate:
29.77%
DTW Error rate:
3.33 %
Dynamic Time Warping (DTW)
Euclidean Distance
One-to-one alignments
Time Warping Distance
Non-linear alignments are allowed
How Is DTW Calculated? (I)
Q
DTW (Q, C ) min 


C

K
k 1
wk
C
Q
Warping path w
How Is DTW Calculated? (II)
Each warping path w can be found using dynamic programming to evaluate
the following recurrence:
 (i, j )  d (qi , c j )  min{  (i  1, j  1),  (i  1, j ),  (i, j  1)}
where γ(i, j) is the cumulative distance of the distance d(i, j) and its minimum
cumulative distance among the adjacent cells.
(i-1, j)
(i-1, j-1)
(i, j)
(i, j-1)
Global Constraints (I)
Prevent any
unreasonable
warping
C
C
Q
Q
Sakoe-Chiba Band
Itakura Parallelogram
Global Constraints (II)
A Global Constraint for a sequence of size m is defined by R, where
Ri = d
0  d  m, 1  i  m.
Ri defines a freedom of warping above and to the right of the diagonal
at any given point i in the sequence.
Ri
Sakoe-Chiba Band
Itakura Parallelogram
Is Wider the Band, the Better?
Euclidean distance = 2.4836
DTW dist = 1.6389
R=1
DTW dist = 1.0204
R = 10
DTW dist = 1.0204
R = 25
identical
Wider Isn’t Always Better
Recall this
example
Most accuracies peak at smaller window size
4
5
x 10
100
4.5
95
4
3.5
CPU Time (msec)
Accuracy (%)
90
auslan
gun
digit
trace
wordspotting
85
80
75
3
auslan
gun
digit
trace
wordspotting
2.5
2
1.5
70
1
65
60
0
0.5
0
10
20
Euclidean
30
40
Warping Window Size
50
60
70
0
10
20
30
40
Warping Window Size
50
60
70
Larger warping window is not always a good thing.
Ratanamahatana-Keogh Band
(R-K Band)
Solution: we create an arbitrary shape and size of the band that is
appropriate for the data we want to classify.
How Many Bands Do We Need?
• Of course, we could use ONE same band to classify
all the classes, as almost all of the researchers do.
• But…the width of the band does depend on the
characteristic of the data within each class. Having
one single band for classification is unlikely to
generalize.
• Our proposed solution:
We create an arbitrary band (R-K band) for each
class and use it accordingly for classification.
How Do We Create an R-K Band?
First Attempt: We could look at the data and manually create the shape of the bands.
(then we need to adjust the width of each band as well until we get a good result)
1.5
1
1
0.5
0.5
0
0
-0.5
-0.5
-1
-1
-1.5
-1.5
-2
-2
-2.5
0
50
100
150
200
250
-2.5
250
250
200
200
150
150
100
100
50
50
50
100
150
200
250
0
100 % Accuracy!
50
100
150
200
250
50
100
150
200
250
Learning an R-K Band Automatically
Our heuristic search algorithm automatically learns the bands from the data.
(sometimes, we can even get an unintuitive shape that give a good result.)
1.5
1
1
0.5
0.5
0
0
-0.5
-0.5
-1
-1
-1.5
-1.5
-2
-2
-2.5
0
50
100
150
200
250
-2.5
250
250
200
200
150
150
100
100
50
50
50
100
150
200
250
0
50
100
150
200
250
50
100
150
200
250
100 % Accuracy as well!
R-K Band Learning With Heuristic Search
Calculate h(1)
Calculate h(2)
h(2) > h(1) ?
Yes
No
R-K Band Learning in Action!
Click on
figure
to
animate
Experiment: Datasets
1.
Gun Problem
2.5
2.5
2
2
1.5
1.5
1
1
0.5
0.5
0
0
-0.5
-0.5
-1
2.
50
100
150
-1
0
50
100
150
Trace (transient classification benchmark)
4
4
4
4
3
3
3
3
2
2
2
2
1
1
1
1
0
0
0
0
-1
-1
-1
-1
-2
-3
3.
0
-2
0
50
100
150
200
250
300
-3
-2
0
50
100
150
200
250
300
Handwritten Word Spotting data
-3
-2
0
50
100
150
200
250
300
-3
0
50
100
150
200
250
300
Experimental Design
We measure the accuracy and CPU time of
each dataset, using the following methods:
1. Euclidean distance
2. Uniformed warping window (size 1 to 100)
3. Learning different R-K bands for all classes, and
perform classification based on them.
The leaving-one-out in 1-nearest-neighbor classification is used to
Measure the accuracy.
The lower bounding method is also used to prune off unnecessary
Calculation of DTW.
Experimental Results (I)
2.5
2.5
2
2
1.5
1.5
1
1
0.5
0.5
0
0
-0.5
-0.5
-1
0
50
100
150
-1
140
140
120
120
100
100
80
80
60
60
40
40
20
20
20
40
60
80
100
120
140
Gun Draw
0
50
20
40
100
60
80
100
150
120
140
Point
Euclidean
Best Unif. Warping
10% Unif. Warping
DTW with R-K Band
Error Rate (%)
5.5
1.0 (width = 4)
4.5 (width = 15)
0.5 (max width = 4)
CPU Time (msec)
N/A
2,440
5,430
1,440
CPU Time (no LB)
60
11,820
17,290
9,440
Experimental Results (II)
4
4
4
4
3
3
3
3
2
2
2
2
1
1
1
1
0
0
0
0
-1
-1
-1
-1
-2
-2
-2
-2
-3
0
50
100
150
200
250
300
-3
0
50
100
150
200
250
300
-3
0
50
100
150
200
250
300
-3
250
250
250
250
200
200
200
200
150
150
150
150
100
100
100
100
50
50
50
50
50
100
150
200
250
50
100
150
200
250
50
100
150
200
250
0
50
50
100
100
150
150
200
250
200
Euclidean
Best Unif. Warping
10% Unif. Warping
DTW with R-K Band
Error Rate (%)
11
0 (width = 8)
0 (width = 27)
0 (max width = 7)
CPU Time (msec)
N/A
16,020
34,980
7,420
CPU Time (no LB)
210
144,470
185,460
88,630
300
250
Conclusions
• Different shapes and widths of the band
contributes to the classification accuracy.
• Each class can be better recognized using its
own individual R-K Band.
• Heuristic search algorithm is a good approach to
R-K Band learning.
• R-K Band combining with the Lower Bounding
technique yields higher accuracy and makes a
classification task much faster.
Future Work
• Investigate other choices that may make envelope
learning more accurate.
– Heuristic functions
– Search algorithm (refining the search)
• Is there a way to always guarantee an optimal solution?
• Examine the best way to deal with multi-variate time
series.
• Consider a more generalized form of our framework, i.e.
a single R-K Band is learned for a particular domain.
• Explore the utility of R-K Band specifically on real-world
problems: music, bioinformatics, biomedical data, etc.
Contact: [email protected]
[email protected]
Homepage: http://www.cs.ucr.edu/~ratana
All datasets are publicly available at:
UCR Time Series Data Mining Archive: http://www.cs.ucr.edu/~eamonn/TSDMA