How to index Dynamic Time Warping?

Download Report

Transcript How to index Dynamic Time Warping?

Exact Indexing of Dynamic Time Warping
Dr Eamonn Keogh
University of California – Riverside
Computer Science & Engineering Department
What are Time Series?
• A time series is a collection of observations made sequentially in time.
• Lots of useful information can be obtained by measuring time series data
over times.
pattern that are commonly being classified
• Time series occur in virtually every medical, scientific and businesses
domain
• Finding out the similarity between two time series is the heart of many
time series data mining applications
What are the challenges of working with Time Series
data?
• subjective notion of similarity
 How do we define similarity
• large amount of data
• different type of data format
 How do we search quickly
Any solutions available?
•We need a method that allows an elastic shifting of the time axis, to
accommodate sequences which are similar, but out of time phase
•Euclidean Distance
•most popular approach for defining similarity and indexing of time series
data.
•a very brittle distance approach which cannot index time series accurately
among two different time phases.
•Dynamic Time Warping
•base on dynamic programming which proved to be a very reliable method.
•does not obey the triangular inequality. This has resisted attempts at exact
indexing.
•“… performance on large database may be a limitation.”
What are the challenges of working with Time Series
data? cont.
Classification experiment on Cylinder-Bell-Funnel dataset
• Training data consists of 10 exemplars from each class.
• (One) Nearest Neighbor Algorithm.
• “Leaving-one-out” evaluation, averaged over 100 runs.
Times Series A
Times Series B
Comparison of two approaches
mean error rate
speed
Euclidean Distance Metric
0.2734
X
Dynamic Time Warping (DTW)
0.0269
230X
•The result proved the reliability of DTW and motivates the necessity of introducing
technique to index DTW
What is Dynamic Time Warping ?
• DTW is being used in different area like chemical engineering, pattern
matching, bioinformatics, . . .
What is Time Warping?
Given: two sequences x1,x2,...,xn and y1,y2,...,ym
two series in different time phase
Wanted: align two sequence base on a common
time-axis
shifting of time axis
Aligning time series with Dynamic Programming Matrix
optimal warping path
two time series Q and C,
length n and m respectively
an (n*m) matrix is constructed to
store the distance between
items in Q and C.
the result alignment
What is Dynamic Time Warping? cont.
• There're three basic constraints for time
warping
Boundary conditions
-we want the path not to skip a part
at the beginning or ending of utterance
Continuity
-no jumps
Monotonicity
- we can't go back in time
•In the matrix, there are many warping paths that satisfy
the three basic constraints.
Goal : How can we find a path that gives
the minimal overall distance?
formula of dynamic programming
(i,j) = d(qi,cj) + min{ (i-1,j-1) , (i-1,j ) , (i,j-1) }
Demonstration of computing the Minimal Editing Distance: http://isl.ira.uka.de/speechCourse/slides/dtw/editdist/applet/applet.html
How to speed up the calculation of DTW?
Basic idea: Approximate the time series with some compressed or down sampled
representation, and do DTW on the new representation.
Q
p
w
k
C
j
11 w1
n
i
Solution : Lower Bounding Measure with Global Path Constraint
What is Global Path Constraints ?
- path should be close to diagonal
- in theory, it limits warping path by how far it may
stay from the diagonal
- in practice, it constrains the range of indices in the
warping path
Why using global constraints ?
- speed up the DTW distance calculation
(reduces the search effort from O(n2) to O(n))
- to avoid a relatively small section of one sequence
maps onto a relatively large section of another
sequence.
warping window
How to speed up the calculation of DTW? cont.
What is Lower Bounding Measure ?
- a dimensionality reduction technique
Why using Lower Bounding Measure ?
- both Euclidean metric and DTW is either heavily I/O bound or very demanding in terms of CPU
time.
- a fast lower bounding function can address this problem by erasing sequences that could not
possibly be a best match.
How to define a good Lower Bounding Measure ?
A good lower bounds function should basically match two criteria
- must be fast to compute
- must produces a relatively tight lower bounds which means that it can more tightly
approximates the true DTW distance
Two existing type of Lower bounding measure
[LB_Kim]
[LB_Yi]
squared different between two series'
first (A), last (D), min (B) and max (C)
sum of squared length of gray line
represent the minimum the corresponding
points contribution to the overall DTW
How to speed up the calculation of DTW? cont.
Proposed lower bounding measure : LB_Keogh with global constraint
Notation
A: bounding envelope - Sako-China Band (global constraint)
B: bounding envelope - Itakura Parallelogram (global constraint)
Q: original sequence U: Upper L: Lower
[LB_Keogh]
1.
Limiting the range of warping path by using
global constraint
2.
Approximating the tightest lower bound by using
LB_Keogh
Itakura Parallelogram and LB_Keogh together
produces the tightest bounds
squared sum of the distances from every part of the
candidate sequence C not falling within the bounding
envelope, to the nearest orthogonal edge of the
bounding envelope is returned as the lower bound.
LB_Keogh lower bound <= DTW bound
How to index Dynamic Time Warping?
Piecewise Constant Approximation (PAA)
Basic idea: Represent the time series as a sequence of box basis functions.
Each box is in same length
• Reducing the time series from n dimensions to N
dimensions, the data is divided into N equal sized
“frames”.
Why using PAA ?
• time series data may include hundreds to thousands items,
this will rapidly degrade the performance of indexing.
• 16 dimension time series will be reasonably handled by multidimensional index structure.
• a way is needed to further reduce the dimension of lower
bound by LB_Keogh
• PAA is the most efficient technique among other approaches
(Wavelets, Fourier Transforms, Adaptive Piecewise Constant Approximation)
A sequence of length 256 is
reduced to 16 dimensions
How to index Dynamic Time Warping? cont.
Modified PAA to index time warped queries [LB_PAA]
• there are two time series data sets (Q and C) in length n, both are being divided into N
dimension. C is a candidate sequence. Q is a query sequence.
• approximate the minimum bounding rectangle (R) in each dimension of candidate
sequence C
• approximate the max (U^) and min (L^) point in each dimension of query sequence Q
by using LB_PAA
How to index Dynamic Time Warping? cont.
Modified PAA to index time warped queries
•define a MINDIST(Q,R) function that returns a lower bounding measure of the distance
between a query Q, and R, were R is a Minimum Bounding Rectangle (MBR) of C.
l4
original dimension
l5
L^2
h2
reduced dimension
U^
4
U^
5
How to search time series with DTW ?
K-Nearest Neighbor Search Algorithm
What is K-NN Search - KNNSearch(Q,K)?
• query sequence Q and desired number of K time series neighbors
from a set C
• priority queue is being used for storing the index in an increasing
order of distance
RangeSearch Algorithm
What is RangeSearch Algorithm - RangeSearch(Q,E,T)?
• answering a range queries
• a classic R-tree-style recursive search algorithm
Experimental Evaluation
Evaluation among three lower bounding measures (LB_Kim, LB_Yi, LB_Keogh)
•Comparing tightness of lower bound against
query length
•Comparing pruning power against database size
Evaluation between linear scan and LB_Keogh
•Comparing normalized CPU cost against data size
Conclusion
• This paper override the traditional believe of "dynamic time
warping ...cannot be speeded up by indexing
• However, it based on two assumption
- both time series data are in the same length
(out of time phase is allowed)
- index sequence when warping path is constrained
(Boundary conditions, Continuity, Monotonicity, Global constraint)
• The proposed approach is state of the art in terms of efficiency
and flexibility. It may benefit the matching of 2 and 3 dimensional
shapes.
Acknowledge
Dr Eamonnn Keogh
(Computer Science & Engineering Department, University of
California – Riverside, Riverside,CA 92521)
Exact Indexing of Dynamic time Warping
A Tutorial on Indexing and Mining Time Series Data
http://www.cs.ucr.edu/~eamonn/tutorial_on_time_series.ppt
Carnegie Mellon University
Automatic Speech Recognition
http://werner.ira.uka.de/speechCourse