Time Series Data

Download Report

Transcript Time Series Data

N.Y.U.S.T.
I. M.
Hierarchical Clustering of Time Series
Data Streams
Presenter : Shao-Wei Cheng
˜ Gama and Joao
˜
Authors : Pedro Pereira Rodrigues, Joao
Pedro Pedroso
TKDE 2007
Intelligent Database Systems Lab
Outline

Motivation

Objective

Methodology

Experiments

Conclusion

Personal Comments
N.Y.U.S.T.
I. M.
2
Intelligent Database Systems Lab
Hierarchical Clustering

N.Y.U.S.T.
I. M.
Organized as a hierarchical tree.
Data:
x
y
p1
2
2
p2
5
4
p3
1
1
p4
3
9
p5
6
6
p6
8
12
3
Intelligent Database Systems Lab
Time Series Data
N.Y.U.S.T.
I. M.

Data streams usually consist of variables producing examples
continuously over time.

The basic idea behind clustering streaming time series is to find groups
of variables that behave similarly through time.

Each variable is a time series and each new example is the value of an
observation of all time series in a particular moment.
Examples(time)
t1
t2
t3
t4
t5
t6
t7
p1
2
2
…
…
…
…
…
p2
5
4
…
…
…
…
…
Variables
p3
1
1
…
….
…
…
…
(data)
p4
3
9
…
…
…
…
…
p5
6
6
…
…
…
…
…
p6
8
12
…
…
…
…
…
…
4
Intelligent Database Systems Lab
Motivation

In recent real-world applications, data flows continuously from a data
stream at high speed, producing examples over time.


N.Y.U.S.T.
I. M.
Such as sensor data, web clicks, credit card usage, multimedia data,
stock market.
Traditional models cannot adapt to the high speed arrival of new
examples. So new algorithms have been developed that aim to process
data in real-time.
5
Intelligent Database Systems Lab
Objectives

This paper presents and analyzes an incremental system for clustering
streaming time series.


N.Y.U.S.T.
I. M.
Online Divisive-Agglomerative Clustering (ODAC)
An adaptive system to perform hierarchical clustering of variables.

Memory usage

Time consumption
6
Intelligent Database Systems Lab
Methodology

N.Y.U.S.T.
I. M.
Online Division-Agglomerative Clustering
7
Intelligent Database Systems Lab
Methodology
N.Y.U.S.T.
I. M.
Similarity and dissimilarity measure:
Hoeffding bound:
8
Intelligent Database Systems Lab
Methodology
N.Y.U.S.T.
I. M.
9
Intelligent Database Systems Lab
Experiments
N.Y.U.S.T.
I. M.
Detecting compact and well-separated clusters
Independent of the number of clusters
Hierarchy Quality
10
Intelligent Database Systems Lab
Experiments
N.Y.U.S.T.
I. M.
11
Intelligent Database Systems Lab
Conclusion
N.Y.U.S.T.
I. M.

Experimental results indicate that the performance is nearly
equivalent to a batch divisive clustering on stationary time series.

They also reveal good performance on finding the correct number of
clusters, obtained by a bunch of runs of k-means.

The main problem is how to achieve the trade-off between fast answer
to changes and the time and space required to update the model.
12
Intelligent Database Systems Lab
Personal Comments

Advantage


The first proposal of a hierarchical approach to the problem.
Drawback


N.Y.U.S.T.
I. M.
…
Application

Clustering of time series data streams.
13
Intelligent Database Systems Lab