Transcript Document

Duong Tuan Anh
Faculty of Computer Science and Technology
Ho Chi Minh City University of Technology
Tutorial MIWAI December 2012
1
OUTLINE
 Introduction
 Definitions of time series motifs
 Applications of time series motifs
 Some well-known algorithms of finding motifs
 Our proposed method
 Conclusions
2
3
25.1750
25.2250
25.2500
25.2500
25.2750
25.3250
25.3500
25.3500
25.4000
25.4000
25.3250
25.2250
25.2000
25.1750
..
..
24.6250
24.6750
24.6750
24.6250
24.6250
24.6250
24.6750
24.7500
What are time series?
A time series is a collection of observations made
sequentially in time
29
28
27
26
25
24
23
0
50
100
150
200
250
300
350
400
450
500
Examples: Financial time series, scientific time series
Time series data mining
 Q. Yang & X. Wu, “10 Challenging Problems in Data
Mining Research”, Int. Journal on Information
Technology and Decision Making, Vol. 5, No. 4 (2006),
597-604
3.Mining sequence data and time series data
Time series data mining
 Time series data mining is a field of data mining to
deal with the challenges from the characteristics of
time series data.
 Time series data have the following characteristics:
 Very large datasets (terabyte-sized)
 Subjectivity (The definition of similarity depends on
the user)
 Different sampling rates
 Noise, missing data, etc.
What do we want to do with the time series data?
Clustering
Motif Discovery
Classification
Rule
10
Discovery

Query by
Content
s = 0.5
c = 0.3
Visualization
Novelty Detection
Time series motifs
 Motif: the most frequently occurring pattern in a
longer time series
8
Motif Discovery
Problem Description
Unsupervised detection and
modeling of previously unknown
recurring patterns in real-valued
time series
Discovery due to unknowns
Number of motifs + occurrences
Location and length of occurrences
“Shape” of each motif
9
J. Lin, E. Keogh, Patel, P. and Lonardi, S., Finding Motifs in Time Series, The
2nd Workshop on Temporal Data Mining, at the 8th ACM SIGKDD Int. Conf. on
Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, 2002.
10
Definitions
 Definition 1 Time Series:
 A time series T = t1,…,tm is an ordered set of m realvalued variables.
 Definition 2 Subsequences:
 Given a time series T of length m, a subsequence C of T
is a sampling of length n ≤ m of contiguous position
from T, that is, C = tp,…,tp+n-1 for 1≤ p ≤ m – n + 1.
 Definition 3. Match:
 Given a positive real number R (called range) and a time
series T containing a subsequence C beginning at
position p and a subsequence M beginning at q, if D(C,
M) ≤ R, then M is called a matching subsequence of C.
11
Definitions
 Definition 4 Trivial Match:
 Given a time series T, containing a subsequence C
beginning at position p and a matching subsequence M
beginning at q, we say that M is a trivial match to C if
either p = q or there does not exist a subsequence M’
beginning at q’ such that D(C, M’) > R, and either q < q’<
p or p < q’< q.
12
Definitions
 Definition 5 1-Motif:
Given a time series T, a subsequence of length n and a
range R, the most significant motif in T (called 1Motif) is the subsequence C1 that has the highest count
of non-trivial matches.
 Definition 6 K-Motifs:
The K-th most significant motif in T (called thereafter
K-Motif) is the subsequence CK that has the highest
count of non-trivial matches, and satisfies D(CK, Ci) >
2R, for all 1 ≤ i < K .
13
Definitions
 K-Motifs:
If the motifs are only required to be R distance apart as in A, then the
two motifs may share the majority of their elements. In contrast, B
illustrates that requiring the centers to be at least 2R apart insures that
the motifs are unique.
14
Algorithm Find-1-Motif-Brute-Force(T, n, R)
best_motif_count_so_far = 0
best_motif_location_so_far = null;
for i = 1 to length(T) – n + 1
count = 0; pointers = null;
for j = 1 to length(T) – n + 1
if Non_Trivial_Match (C[i: i + n – 1], C[j: j + n – 1], R ) then
count = count + 1;
The algorithm requires O(m2) calls
pointers = append (pointers, j);
to the distance function.
end
end
if count > best_motif_count_so_far then
best_motif_count_so_far = count;
best_motif_location_so_far = i;
motif_matches = pointers;
This procedure calls
end
distance function
end
15
16
Motif-based classification of time series
(Buza et al.,2009)
Motifs can be used for time series classification. This can
be done in two steps:
 (i) Motifs of all time series are extracted
 (ii) Each time series is represented as an attribute
vector using motifs so that a classifier like SVM, Naïve
Bayes, etc. can be applied.
Buza, K. and Thieme, L. S.: Motif-based Classification of Time Series with
Bayesian Networks and SVMs. In: A. Fink et al. (eds.) Advances in Data
Analysis, Data Handling and Business Intelligences, Studies in Classification, Data
Analysis, Knowledge Organization. Springer-Verlag, pp. 105-114 (2010).
17
Motif-based clustering of time series
(Phu & Anh, 2011)
Motif information are used to initialization k-means
clustering of time series:
 Step 1: We find 1-motifs for all time series in the database.
 Step 2: We apply k-Means clustering on the 1-motifs of all
time series to obtain the clusters of motifs. From the
centers of the motif clusters, we derive the associated time
series and choose these time series as initial centers for the
k-Means clustering .
Phu, L. and Anh, D. T., Motif-based Method for Initialization k-Means
Clustering of Time Series Data, Proc. of 24th Australasian Joint Conference (AI
2011), Perth, Australia, Dec. 5-8. Dianhui Wang, Mark Reynolds (Eds.), LNAI
7106, Springer-Verlag, 2011, pp. 11-20.
18
Motif-based time series prediction
Stock Temporal Prediction based on Time
Series Motifs (Jiang et al., 2009)
 For a certain length n, we can find a motif. A motif is a
set of subsequences that are non-trivial matches with
each other. Each subsequence in this set is called an
instance of the motif
 For different lengths of subsequences, we can find
different motifs from a time series.
 The idea: If the subsequence in the current sliding
window matches with the prefix of a particular motif,
we can predict that it will go like the suffix of the
motif.
19
Motif-based time series prediction (cont.)
 But the subsequence in the current window may be fit
for a number of motifs and it makes many possibilities
of the suffix.
 Rule: If a subsequence is similar with every instance
in a motif, then we can conclude that it belongs to the
motif and we can use the motif for prediction.
 This method is applied for short-term stock
prediction
Jiang, Y., Li, C., Han, J.: Stock temporal prediction based on time series
motifs. In: Proc. of 8th Int. Conf. on Machine Learning and Cybernetics, Baoding,
China, July 12-15 (2009).
20
Signature verification using time series
motifs (Gruber et al., 2006)
The process consists of 4 steps:
 Step 1: Signatures are converted to time series
 Step 2: Time series motifs are extracted using EP_C
algorithm (using important Extreme points and
Clustering)
 Step 3: Motifs are used to train a dynamic radian basis
function network (DRBF) that can classify time series
 Step 4: Time series classification is applied to online
signature verification
Gruber C., Coduro, M., Sick, B.: Signature Verification with Dynamic RBF
Networks and Time Series Motifs. In : Proc of 10th Int. Workshop on Frontiers in
21
Handwriting Recognition (2006).
Finding repeated images in database of
shapes (Xi et al., 2007)
 Convert 2-dimensional shapes into time series
 Find repeated images or “image motifs” in these time
series
 Define a new form of Euclidean distance ( “Rotation
invariant Euclidean distance”)
 Use a modified variant of Random projection
 “Image motifs” can be applied in anthropology,
palaeography (study of old texts) and zoology.
Xi, X., Keogh, E., Wei, L., Mafra-Neto, A., Finding Motifs in a Database of
Shapes, Proc. of SIAM 2007, pp. 249-270.
22
TWO WELL-KNOWN ALGORITHMS OF
FINDING TIME SERIES MOTIF
-Random Projection
-Mueen-Keogh Algorithm
23
1. Random projection (B. Chiu, 2003)
 Algorithm adapting PROJECTION method for detecting
motifs in biological sequences to detecting time series
motifs.
 It’s based on locality-preserving hashing.
 Algorithm requires some pre-processing:
 First, apply PAA , a method for dimensionality reduction
 Discretize the transformed time series into symbolic strings
(apply SAX)
24
PAA for dimensionality reduction
 Time series databases are often extremely large.
Searching directly on these data will be very complex
and inefficient.
 To overcome this problem, we should use some of
transformation methods to reduce the magnitude of
time series.
 These transformation methods are called
dimensionality reduction techniques.
 Some popular dimensionality reductions: DFT
(Discrete Fourier Transform), DWT (Discrete Wavelet
Transform), PAA (Piecewise aggregate
Approximation), etc.
PAA
 To reduce the time series from n dimensions to w dimensions, the
data is divided into w equal-sized segments.
 The mean value of the data within a segment is calculated and a
vector of these values becomes the data-reduced representation.
DISCRETIZATION WITH SAX
 Discretization of a time series is tranforming it into a
symbolic string.
 The main benefit of this discretization: there is an enormous
wealth of existing algorithms and data structures that allow
the efficient manipulations of symbolic representations.
 Lin and Keogh et al. (2003) proposed a method called
Symbolic Aggregate Approximation (SAX), which allows the
descretization of original time series into symbolic strings.
 This discretization method is based on PAA representation
and assumes normality of the resulting aggregated values.
 SAX is a process which maps the PAA representation of the
time series into a sequence of discrete symbols.
SAX
 Let a be the size of the alphabet that is used to discretize the
time series. To symbolize the time series we have to find the
values:
where
B = 1,…,a-1 are called breakpoints (0 and a are defined as -
and +).
 We notice that real financial time series often have a Gaussian
distribution. To expect the equal likelihood (1/a) for each
symbol, we have to pick the values basing on a statistical table
for Gaussian distribution.
 Definition: Breakpoints are a sorted list of number B = 1,…,a-1
such that the area under a Gausian curve from i to i+1 = 1/a.
Breakpoints and symbols
Using the breakpoints, the time series
T  t1, ...,tw
will be discretized into the symbolic string C = c1c2….cw. Each
segment will be coded as a symbol ci using the formula:
1 ti  1


ci  a ti   a 1


k  k 1  ti   k
where k indicates the k-th symbol in the alphabet, 1 the 1st symbol in
the alphabet and a the a-th symbol in the alphabet.
 For example, Table 1 gives the breakpoints for the values of
a from 3 to 10.
 Assume the size of the alphabet is a = 3, we divide the range
of time series values into three segments in such a way
that the accumulative probability distribution of each
segment is equal (1/3).
 Based on the standard normal distribution, 1
corresponds to  value when P( > x) = 1/3; 2 corresponds
to  value when P( > x) = 1/3 + 1/3 = 2/3. Therefore, we get
1 = -0.43, 2 = 0.43 and these two breakpoints correspond
to P(1 > x) = 0.33 and P(2 > x) = 0.66, respectively.
Table 1: A lookup table that contains the breakpoints that divide
a Gaussian distribution in an arbitrary number (from 3 to 10) of
equiprobable regions.
a=3
a=4
a=5
a=6
a=7
a=8
a=9
a=10
1
-0.43
-0.67
-0.84
-0.97
-1.07
-1.15
-1.22
-1.28
2
0.43
0
-0.25
-0.43
-0.57
-0.67
-0.76
-0.84
0.67
0.25
0
-0.18
-0.32
-0.43
-0.52
0.84
0.43
0.18
0
-0.14
-0.25
0.97
0.57
0.32
0.14
0
1.07
0.67
0.43
0.25
1.15
0.76
0.52
1.22
0.84
3
4
5
6
7
8
9
1.28
Note we made two parameter
choices
C
The word size (w), in
this case 8.
C
0
20
40
1
2
60
3
1
b
a
0
20
4
80
100
5
c
6
7
8
c
c
b
120
b
2
1
a
40
60
The alphabet size (cardinality, a), in this case 3.
80
100
3
120
Random projection algorithm
 It uses a collision matrix whose rows and columns are
SAX representation of each time series subsequence.
 At each iteration, it selects certain positions of each
words (as a “mask”) and traverses the word list.
 For each match, the collision matrix entry is
incremented.
 At the end, the large entries in the collision matrix are
selected as motif candidates. (greater than a threshold
s)
 Finally, each candidate is checked for validity in the
original data.
34
Random projection (cont.)
35
A working example
36
A working example
37
Remarks on Random Projection
 It’s the most popular algorithm for detecting time series
motifs.
 It can find motifs in linear time and is robust to noise.
 However, it still has three drawbacks.
 (i) it has several parameters that need to be tuned by user.
 (ii) if the distribution of the projections is not sufficiently
wide, it becomes quadratic in time and space.
 (iii) it is based on locality-preserving hashing that is effective
for a relative small number of projected dimensions (10 to 20).
 And it’s quite slow for large time series.
38
2. Mueen-Keogh Algorithm
(Mueen and Keogh, 2009)
 Based on the Brute-force algorithm
 MK works directly on raw time series data
 Three techniques to speed up the algorithm:
 Exploiting the Symmetry of Euclidean Distance
 Exploiting Triangular Inequality and Reference Point
 Applying Early Abandoning
39
Exploiting the Symmetry of Euclidean Distance
 Basing on D(A, B) = D(B, A), we can prune off a half of the
distance computations by storing D(A, B) and reusing the
value when we need to find D(B, A).
40
Exploiting Triangular Inequality and Reference Point
 Given two subsequences Ca and Cb. By triangular inequality, we
have
D(Q, Ca)  D(Q, Cb) + D(Ca, Cb).
 From that, we derive: D(Ca, Cb)  D(Q, Ca) – D(Q, Cb).
 If we want to check whether D(Ca, Cb)  R , we only need to look at
D(Q, Ca) – D(Q, Cb). If D(Q, Ca) – D(Q, Cb)  R, we can conclude
that D(Ca, Cb)  R.
 Given a reference subsequence Q, we have to compute the
distances from Q to all the subsequences in time series Ti. That
means we have to compute D(Q, ti) for each subsequence ti in the
time series Ti.
41
Applying Early Abandoning
 In the case the triangular inequality can not help, we have
to compute the Euclidean distance D(Ca, Cb), then we can
apply early abandoning technique.
 The idea: we can abandon the Euclidean distance
computation as soon as the cumulative sum during
distance computation goes beyond the range R.
D( X , Y ) 
m
2
2
(
X

Y
)
 i i
i 1
42
Experiments of MK Algorithm
Motif
Brute_Force
MK algorithm
length
Magnitude of
improvement
8
1034046
6639
155.75
16
1017072
5583
182.17
32
985056
4602
214.05
64
922560
3590
256.98
Table 1: Experiments on the number of distance function calls (Stock
dataset)
 Limitation: The use of Euclidean distance directly on
raw time series data gives rise to robustness problem
when dealing with noisy data.
43
Our proposed method:
EP-C algorithm
44
Significant Extreme Points & Clustering
(Gruber et al., 2006)
 We can compress a time series by selecting some of its minima
and maxima, called important points and dropping the other
points.
45
Important extreme points
 Important minimum:
 am  T= {a1,…, an} is an important minimum if there are i, j where i
< m < j, such that:
 am is the minimum among ai, …, aj, and
 ai/am ≥ R and aj/am ≥ R (R is the compression rate)
 Important maximum:
 am  T= {a1,…, an} is an important maximum if there are i, j where i
< m < j, such that:
 am is the maximum among ai, …, aj, and
 am/ai ≥ R and am/aj ≥ R (R is the compression rate)
46
Finding Time Series Motifs
 (i) Compute all important extreme points
 (ii) Extract candidate motifs
 (iii) Clustering of candidate motifs
 (iv) Select the motifs from the result of the clustering
K. B. Pratt and E. Fink, “Search for patterns in compressed
time series”, International Journal of Image and Graphics,
vol. 2, no. 1, pp. 89-106, 2002.
47
Extracting Motif Candidates
Function getMotifCandidateSequence(T)
N = length(T);
EP = findSignificantExtremePoints(T, R);
maxLength = MAX_MOTIF_LENGTH;
for i = 1 to (length(EP)-2) do
motifCandidate = getSubsequence(T, epi, epi+2)
if length(motifCandidate) > maxLength
then
addMotifCandidate(resample(motifCandidate, maxLength))
else
addMotifCandidate(motifCandidate)
end if
end for
Spline Interpolation or
homothety
end
48
Homothetic transformation
Homothety is a transformation in
affine space. Given a point O and
a value k ≠ 0. A homothety with
center O and ratio k transforms M
to M’ such that .
M
M’
N’
O
N
OM '  k  OM
P’
P
The Figure shows a homothety with center O and ratio k = ½ which
transforms the triangle MNP to the triangle M’N’P’.
Homothety (cont.)
The algorithm that performs homothety to transform a
motif candidate T with length N (T = {Y1,…,YN}) to
motif candidate of length N’ is given as follows.
 1. Let Y_Max = Max{Y1,…,YN}; Y_Min = Min {Y1,…,YN}
 2. Find a center I of the homothety with the
coordinate: X_Center = N/2, Y_Center = (Y_Max +
Y_Min)/2
 3. Perform the homothety with center I and ratio k =
N’/N.
Minimum Euclidean Distance
 Besides, we note that two ‘similar’ motif candidates will not be
recognized where a vertical difference exists between them.
 To make the clustering step in the EP-C algorithm able to handle
not only uniform scaling but also shifting transformation
along vertical axis, we modify our Euclidean distance by using
Minimum Euclidean Distance as a method of negating the
differences caused through vertical axis offsets.

Given two motif candidates: T’ = {T’1, T’2,…,T’N} and Q’ =
{Q’1, Q’2,…,Q’N}, the Euclidean distance between them is
normally given by the following formula:
N'
D(T ' , Q' )   (T ' i  Q' i ) 2
i 1
a) Uniform scaling
b) Shifting along the vertical
axis
Minimum Euclidean Distance
 In this work we use the Minimum Euclidean Distance
which is defined by the following equation:
 N'
2
D(T ' , Q' )  Min (T ' i  Q' i  b) 
 i 1

where b 
(2)
From Eq. (2), we can derive a suitable value for the
shifting parameter b such that we can find the best match
between the two motif candidates Q’ and T’ as follows:
1 N'
b   (Q' i  T ' i )
N ' i 1
Clustering of Motif Candidates
function getHierarchicalClustering(MCS, u, d)
C = getIntitialClustering(MCS)
while size(C)>u do
[Ci, Cj] = getMostSimilarClusters()
addCluster(C, mergeClusters(Ci, Cj )
removeCluster(C, Ci);
removeCluster(C, Cj );
endwhile
return C
end
54
Comparing the EP-C algorithm to Random
Projection
 For EP-C, we implement three variants of EP-C in order to
compare EP-C using spline interpolation (for resampling a
longer motif candidate to a shorter one) to EP-C using
homothety and compare EP-C using HAC clustering to EP-C
using K-means clustering. We denote these variants as follows:
HAC|SI: the EP-C algorithm using spline interpolation and HAC
clustering.

HAC|HT: the EP-C algorithm using homothety and HAC
clustering.

K-means|HT: the EP-C algorithm using homothety and Kmeans clustering.

The test datasets
 In this experiment, we tested the algorithms on four
publicly available datasets:
 ECG (7900 data points),
 Memory (6800 data points),
 Power (35000 data points) and
 ECG (140000 data points).
 All these four datasets are obtained from the UCR
Time Series Data Mining Archive (Keogh and Folias,
2002).
Parameter settings
 For the RP algorithm, we use the parameter setting: the length of
each PAA segment l = 16, the alphabet size a = 5, the number of
iterations in RP i = 10, the number of errors allowed by RP d = 1.
The SAX word length w depends on the dataset, for example, w =
15 for ECG dataset and w = 26 for Memory dataset.

For the EP-C algorithm, we use the parameter setting: the
compression ratio for extracting significant extreme points R =
1.2, the minimum length of motif candidates l_min = 50 and r =
the number of clusters / the number of extreme points = 0.2.
When HAC clustering is used, the distance between two clusters
is computed by the minimum distance, i.e. the distance between
two nearest objects each belonging to one of the two clusters.
Experimental results
Dataset
Length
ECG
7900
Memory
Power
ECG
6800
35000
140000
Algorithm
Number of motif instances
Runtime (sec)
RP
24
9
HAC|SI
13
0.5
HAC|HT
15
0.13
K-means|HT
17
0.2
RP
15
8
HAC|SI
9
3
HAC|HT
7
1
K-means|HT
8
1
N/A
N/A
HAC|SI
21
21
HAC|HT
36
2
K-means|HT
44
3
N/A
N/A
HAC|SI
65
50
HAC|HT
85
3
K-means|HT
85
8
RP
RP
From the experimental results, we can see that:

1. The three variants of EP-C algorithm are much more
effective than RP in terms of time efficiency and accuracy. The
number of motif instances in the case of RP is more than that of
EP-C. The reason of this fact is that RP uses sliding window to
extract subsequences for motif discovery while EP-C uses the
concept of significant extreme points to extract motif
candidates. However, the accuracy of EP-C is still better than
that of RP.

2. With large datasets such as Power (35000 data points) and
ECG (140000 data points), RP can not work since it can not
tackle large datasets, while HAC|HT can find the motif in a very
short time (2 or 3 seconds).

3. The EP-C with homothety (HAC|HT) is better than EP-C
with spline interpolation (HAC|SI) in terms of time efficiency
and accuracy.

4. HAC clustering is more suitable than K-means for clustering
motif candidates in the EP-C algorithm.
Left) ECG dataset. Right) A motif was discovered in the dataset.
Left) Power dataset. Right) A motif was discovered in the dataset.
Efficiency of EP-C
 We can evaluate the efficiency of the EP-C algorithm by simply
considering the ratio of how many times the Euclidean distance
function must be called by EP-C over the number of times it
must be called by the brute-force algorithm given in (Lin et al.,
2002).
 Note: the numbers of Euclidean distance function calls in the
EP-C are the same for all three different versions of EP-C
(HAC|SI, HAC|HT and k-Means|HT).
 Table 2: The efficiency of the EP-C algorithm on various datasets
Dataset
ECG
Memory
Power
ECG
7900
6800
35000
140000
---------------------------------------------------------------------------Efficiency 0.0000638 0.0000920 0.0001304
0.0000635
Remarks on EP_C Algorithm
This motif discovery method (EP_C) is very efficient,
especially for large time series. It is much more efficient
than Random Projection.
For example, experiment on Koski_ECG dataset from UCR
Archive: http://www.cs.ucr.edu/~eamonn/SAX/koski_ecg.dat
This time series has 144002 points, run time for detection
motif: 3secs
62
Conclusions
 Time series motif discovery has several practical
applications.
 Through experiments, we see that EP_C (Extreme
points and Clustering) method are much more
efficient than Random Projection.
63
Future works
 Developing methods to detect motifs in streaming
time series.
 Exploiting motif information in time series prediction:
 Banking data
 Environmental data
64
65
REFERENCES
1.
2.
3.
4.
5.
Buza, K. and Thieme, L. S.: Motif-based Classification of Time
Series with Bayesian Networks and SVMs. In: A. Fink et al. (eds.)
Advances in Data Analysis, Data Handling and Business
Intelligences, Studies in Classification, Data Analysis, Knowledge
Organization. Springer-Verlag, pp. 105-114 (2010).
B. Chiu, E. Keogh, and S. Lonardi, Probabilistic Discovery of Time
Series Motifs, Proceedings of the 9th International Conference on
Knowledge Discovery and Data mining, Washington, D.C., 2003.
Gruber C., Coduro, M., Sick, B.: Signature Verification with
Dynamic RBF Networks and Time Series Motifs. In : Proc of 10th
Int. Workshop on Frontiers in Handwriting Recognition (2006).
J. Lin, E. Keogh, Patel, P. and Lonardi, S., Finding Motifs in Time
Series, The 2nd Workshop on Temporal Data Mining, at the 8th ACM
SIGKDD International Conference on Knowledge Discovery and Data
Mining, Edmonton, Alberta, Canada, 2002.
Mueen, A., Keogh, E., Zhu, Q., Cash, S., and Westover, B.: Exact
Discovery of Time Series Motifs. In: Proc. of SDM (2009).
66
6. Jiang, Y., Li, C., Han, J.: Stock temporal prediction based on time series
motifs. In: Proc. of 8th Int. Conf. on Machine Learning and Cybernetics,
Baoding, China, July 12-15 (2009).
7. Li, Q., Lopez, I.F.V. and Moon, B.: Skyline Index for time series data, IEEE
Trans. on Knowledge and Data Engineering, Vol. 16, No. 4 (2004)
8. Phu, L. and Anh, D. T., Motif-based Method for Initialization k-Means
Clustering of Time Series Data, Proc. of 24th Australasian Joint Conference
(AI 2011), Perth, Australia, Dec. 5-8. Dianhui Wang, Mark Reynolds (Eds.),
LNAI 7106, Springer-Verlag, 2011, pp. 11-20.
9. Son, N.T., Anh, D.T., Discovering Approximate Time Series Motif based on
MP_C Method with the Support of Skyline Index, Proc. of 4th Int. Conf. on
Knowledge and Systems Engineering (KSE’2012), Aug. 17-19, Da Nang, Vietnam
(to appear).
10. Nguyen Thanh Son, Duong Tuan Anh, Time Series Similarity Search based
on Middle Points and Clipping, Proc. of 3rd Conference on Data Mining and
Optimization (DMO), 27-29 June, 2011, Putrajaya, Malaysia, pp.13-19.
11. Xi, X., Keogh, E., Wei, L., Mafra-Neto, A., Finding Motifs in a Database of
Shapes, Proc. of SIAM 2007, pp. 249-270.
67