Subsequences
Download
Report
Transcript Subsequences
Visually Mining and Monitoring
Massive Time Series
Lin, J, Keogh, E., Lonardi, S., Lankford, J.P. and Nystrom, D.M.
In Proceedings of the 10th ACM SIGKDD International Converence on Knowledge Discovery
and Data Mining, 2004.
Amy Karlson
V. Shiv Naga Prasad
15 February 2004
CMSC 838S
Images courtesy of Jessica Lin and Eamonn Keogh
What are Time Series?
Simply:
Observations of a variable made over time
Typical across a wide variety of domains
Medicine
Physiology
Finance
Microbiology
Meteorology
Surveillance
Motivation:
Critical Decision Making
Domains
Spacecraft Launch
Medicine
Research Directions
Mining Archives
Extract rules, patterns, regularities
Visualizing Streams
Novel visualization and interaction for:
Query by content
Motif discovery
Anomaly detection
3
Some Visual Time Series Systems
Time Searcher
Direct Manipulation
Pattern Query
Hochheiser and Shniederman
Theme Rivers
dot.com stocks
1999-2002
Theme strength
over time
Havre, Hetzler, Whitney & Nowell InfoVis 2000
Spirals
Periodic Data with
known period
Weber et. al
VizTree
Construct a subsequence tree to span the
space of subsequences of a given time
series.
Use this to collect statistics about the
series.
Size of the structure is independent of the
length of the series.
VizTree Approach - Overview
Place windows along the time series to
obtain subsequences.
Quantize along time and value dimension
to obtain sequences of discrete symbols.
Construct a subsequence tree to represent
all possible such sequences.
Collect frequencies of traversal of the
branches of the subsequence tree.
Use these for motif and anomaly
detection, and for comparing time series.
Subsequences
Place windows along the
time series to obtain
subsequences.
Discretization
Subsequences are patterns.
Take windows along time series
– length of window ~ length of
subsequence.
Discretize the range of data - one symbol
for each quantum.
Divide window into segments ~ represent
one segment with one symbol.
Symbolic Aggregate approXimation
(SAX)
Quantization levels
Representative
symbols
Segments
One subsequence
Discrete version = acdcbdba
Subsequence Tree - example
a
a
b
c
a
b
b
c
a
c
b
c
symbols={a,b,c}
#segments
per window=2
Tree
spans the space of
subsequences.
#Branch
factor ~ # symbols
(size of alphabet)
Depth
window
~ # segments per
Branch
thickness ~ freq. of
occurrence of subsequence.
VisTree Tool
Demo
Query by Content:
Subsequence Matching
Finding known patterns
Chunking
Breaking a time series into individual series
Methods
Time
Shape
(e.g. power usage)
(e.g. heart beats)
---------
VizTree
Search Approaches
Exact - Slow
Approximate - Fast
Exploration
Hypothesis Testing
---------------------
VizTree
Motif Discovery
Finding unknown patterns
Not exact matches
VisTree allows exploration at varying levels
of precision
E.g., cc** vs. ccac
A
0
500
20
1500
2000
B
40
60
80
100
120
140
0
20
C
(The angular speed of reel 2)
1000
A
0
Winding Dataset
B
2500
C
40
60
80
100
120
140
0
20
40
60
80
100
120
140
Anomaly Detection
Finding abnormal patterns.
Use data already seen to identify
anomalies
Identified by thin
branches
Comparing Series:
Diff Tree
Same parameters same tree structure
Compare the test branch frequencies with
respect to reference branch frequencies
Blue = underrepresented
Green = overrepresented
Red = equivalent
Thickness = magnitude
Thoughts on VizTree (Vis.)
Most of “discovery” is implicit
Manual search
Parameter setting might be an issue
Automation might help
Tree Visualization
Use of real estate?
Effective?
Intuitive?
Alternatives?
Thoughts on VizTree (HCI)
Primarily a tool to for researchers now
(Also, we might have an outdated version)
Even so, some HCI suggestions:
Indication of how tree detail relates to tree
overview
Zoom into a specific area of the time series
(rather than zoom+scroll)
Selection in subsequence detail relates to
subsequence overview
Unfortunately, least interesting patterns are
most easily accessed (branches at root)
“snap to branch” or “snap to intersection” ?
Ability to turn off highlighting (undo)
Summary:
Unique Contributions
Fundamental support for aperiodic series
Scalable
Resource requirements do not grow linearly
with length series
Rich visual feature set
Global summaries
Diff-trees between multiple series
Local patterns and anomalies