Subsequences

Download Report

Transcript Subsequences

Visually Mining and Monitoring
Massive Time Series
Lin, J, Keogh, E., Lonardi, S., Lankford, J.P. and Nystrom, D.M.
In Proceedings of the 10th ACM SIGKDD International Converence on Knowledge Discovery
and Data Mining, 2004.
Amy Karlson
V. Shiv Naga Prasad
15 February 2004
CMSC 838S
Images courtesy of Jessica Lin and Eamonn Keogh
What are Time Series?

Simply:


Observations of a variable made over time
Typical across a wide variety of domains






Medicine
Physiology
Finance
Microbiology
Meteorology
Surveillance
Motivation:
Critical Decision Making

Domains



Spacecraft Launch
Medicine
Research Directions

Mining Archives



Extract rules, patterns, regularities
Visualizing Streams
Novel visualization and interaction for:



Query by content
Motif discovery
Anomaly detection
3
Some Visual Time Series Systems

Time Searcher

Direct Manipulation
Pattern Query
Hochheiser and Shniederman

Theme Rivers

dot.com stocks
1999-2002
Theme strength
over time
Havre, Hetzler, Whitney & Nowell InfoVis 2000

Spirals

Periodic Data with
known period
Weber et. al
VizTree
Construct a subsequence tree to span the
space of subsequences of a given time
series.
 Use this to collect statistics about the
series.
 Size of the structure is independent of the
length of the series.

VizTree Approach - Overview
Place windows along the time series to
obtain subsequences.
 Quantize along time and value dimension
to obtain sequences of discrete symbols.
 Construct a subsequence tree to represent
all possible such sequences.
 Collect frequencies of traversal of the
branches of the subsequence tree.
 Use these for motif and anomaly
detection, and for comparing time series.

Subsequences
Place windows along the
time series to obtain
subsequences.
Discretization
Subsequences are patterns.
 Take windows along time series
– length of window ~ length of
subsequence.
 Discretize the range of data - one symbol
for each quantum.
 Divide window into segments ~ represent
one segment with one symbol.

Symbolic Aggregate approXimation
(SAX)
Quantization levels
Representative
symbols
Segments
One subsequence
Discrete version = acdcbdba
Subsequence Tree - example
a
a
b
c
a
b
b
c
a
c
b
c

symbols={a,b,c}
#segments
per window=2
Tree
spans the space of
subsequences.
#Branch
factor ~ # symbols
(size of alphabet)
Depth
window
~ # segments per
Branch
thickness ~ freq. of
occurrence of subsequence.
VisTree Tool
Demo
Query by Content:
Subsequence Matching
Finding known patterns
 Chunking



Breaking a time series into individual series
Methods



Time
Shape
(e.g. power usage)
(e.g. heart beats)
---------
VizTree
Search Approaches


Exact - Slow
Approximate - Fast
 Exploration
 Hypothesis Testing
---------------------
VizTree
Motif Discovery
Finding unknown patterns
 Not exact matches
 VisTree allows exploration at varying levels
of precision


E.g., cc** vs. ccac
A
0
500
20
1500
2000
B
40
60
80
100
120
140
0
20
C
(The angular speed of reel 2)
1000
A
0
Winding Dataset
B
2500
C
40
60
80
100
120
140
0
20
40
60
80
100
120
140
Anomaly Detection
Finding abnormal patterns.
 Use data already seen to identify
anomalies
 Identified by thin
branches

Comparing Series:
Diff Tree
Same parameters  same tree structure
 Compare the test branch frequencies with
respect to reference branch frequencies





Blue = underrepresented
Green = overrepresented
Red = equivalent
Thickness = magnitude
Thoughts on VizTree (Vis.)

Most of “discovery” is implicit




Manual search
Parameter setting might be an issue
Automation might help
Tree Visualization




Use of real estate?
Effective?
Intuitive?
Alternatives?
Thoughts on VizTree (HCI)

Primarily a tool to for researchers now


(Also, we might have an outdated version)
Even so, some HCI suggestions:




Indication of how tree detail relates to tree
overview
Zoom into a specific area of the time series
(rather than zoom+scroll)
Selection in subsequence detail relates to
subsequence overview
Unfortunately, least interesting patterns are
most easily accessed (branches at root)


“snap to branch” or “snap to intersection” ?
Ability to turn off highlighting (undo)
Summary:
Unique Contributions
Fundamental support for aperiodic series
 Scalable



Resource requirements do not grow linearly
with length series
Rich visual feature set



Global summaries
Diff-trees between multiple series
Local patterns and anomalies