Applications and Parameter Analysis of Temporal Chaos Game

Download Report

Transcript Applications and Parameter Analysis of Temporal Chaos Game

Spatiotemporal Stream
Mining Applied to
Seismic+ Data
Margaret H. Dunham
CSE Department
Southern Methodist University
Dallas, Texas 75275 USA
[email protected]
9/15/2008
CTBTO Data Mining/Data Fusion
Workshop
1
Outline



CTBTO Data
CTBTO Modeling Requirements
EMM
9/15/2008
CTBTO Data Mining/Data Fusion
Workshop
2
CTBTO Data
As a Data Miner I must first
understand your DATA
•Diverse – Seismic, Hydroacoustic,
Infrasound, Radionuclide
•Spatial (source and sensor)
•Temporal
•STREAM Data
9/15/2008
CTBTO Data Mining/Data Fusion
Workshop
3
From Sensors to Streams




Stream Data - Data captured and sent
by a set of sensors
Real-time sequence of encoded signals
which contain desired information.
Continuous, ordered (implicitly by
arrival time or explicitly by timestamp
or by geographic coordinates)
sequence of items
Stream data is infinite - the data
keeps coming.
11/26/07 – IRADSN’07
4
CTBTO & Data Mining



Data Mining techniques must be
defined based on your data and
applications
Can’t use predefined fixed models
and prediction/classification
techniques.
Must not redo massive amounts of
algorithms already created.
9/15/2008
CTBTO Data Mining/Data Fusion
Workshop
5
CTBTO + DM Requirements
• Model:






Handle different data types (seismic,
hydroacoustic, etc.)
Spatial + Temporal (Spatiotemporal)
Hierarchical
Scalable
Online
Dynamic
• Anomaly Detection:


9/15/2008

Not just specific wave type or data values
Relationships between arrival of waves/data
Combined values
data
from all sensors
CTBTO Dataof
Mining/Data
Fusion
Workshop
6
EMM (Extensible Markov Model)




Time Varying Discrete First Order Markov
Model
Nodes are clusters of real world states.
Overlap of learning and validation phases
Learning:
• Transition probabilities between nodes
• Node labels (centroid or medoid of cluster)
• Nodes are added and removed as data arrives

Applications: prediction, anomaly detection
9/15/2008
CTBTO Data Mining/Data
7
Fusion
Workshop
Research Objectives




Apply proven spatiotemporal modeling
technique to seismic data
Construct EMM to model sensor data
• Local EMM at location or area
• Hierarchical EMM to summarize lower level models
• Represent all data in one vector of values
• EMM learns normal behavior
Develop new similarity metrics to include all sensor
data types (Fusion)
Apply anomaly detection algorithms
9/15/2008
CTBTO Data Mining/Data Fusion
Workshop
8
EMM Creation/Learning
<18,10,3,3,1,0,0>
<17,10,2,3,1,0,0>
<16,9,2,3,1,0,0>
<14,8,2,3,1,0,0>
2/3
2/3
2/22/3
1/1
1
1/2
1/2
N3
N1
1/3
N2
1/1
1/2
1/1
<14,8,2,3,0,0,0>
<18,10,3,3,1,1,0.>
9/15/2008
9
Input Data Representation



Vector of sensor values (numeric) at
precise time points or aggregated
over time intervals.
Need not come from same sensor
types.
Similarity/distance between vectors
used to determine creation of new
nodes in EMM.
9/15/2008
CTBTO Data Mining/Data Fusion
Workshop
10
Anomaly Detection with EMM

Objective: Detect rare (unusual,
surprising) events

Advantages:
Detected unusual
weekend traffic pattern
•Dynamically learns what is
normal
•Based on this learning, can
predict what is not normal
•Do not have to a priori indicate
normal behavior

Applications:
•Network Intrusion
•Data: IP traffic data, Automobile
traffic data
Weekdays Weekend
Seismic:
•Unusual Seismic Events
Minnesota DOT Traffic Data
•Automatically Filter out normal
events
11/3/04
11
EMM with Seismic Data





Input – Wave arrivals (all or one per
sensor)
Identify states and changes of states in
seismic data
Wave form would first have to be converted
into a series of vectors representing the
activity at various points in time.
Initial Testing with RDG data
Use amplitude, period, and wave type
9/15/2008
CTBTO Data Mining/Data Fusion
Workshop
12
New Distance Measure



Data = <amplitude, period, wave type>
Different wave type = 100% difference
For events of same wave type:
• 50% weight given to the difference in amplitude.
• 50% weight given to the difference in period.
If the distance is greater than the threshold, a
state change is required.

amplitude =
| amplitudenew – amplitudeaverage | / amplitudeaverage
 period =
| periodnew – periodaverage | / periodaverage

9/15/2008
CTBTO Data Mining/Data Fusion
Workshop
13
EMM with Seismic Data
States 1, 2, and
3 correspond to
Noise, Wave A,
and Wave B
respectively.
9/15/2008
CTBTO Data Mining/Data Fusion
Workshop
14
Preliminary Testing





RDG data February 1, 1981 – 6
earthquakes
Find transition times close to known
earthquakes
9 total nodes
652 total transitions
Found all quakes
9/15/2008
CTBTO Data Mining/Data Fusion
Workshop
15
.
EMM Nodes
Node #
1
2
3
4
5
6
7
8
9
9/15/2008
Average amplitude
1.649m
8.353m
23.237m
87.324m
253.333m
270.524m
7.719m
723.088m
1938.772m
Average period
0.119 sec
0.803 sec
0.898 sec
0.997 sec
1.282 sec
0.96 sec
20.4 sec
1.962 sec
1.2 sec
CTBTO Data Mining/Data Fusion
Workshop
Phase code
P (primary wave)
P (primary wave)
P (primary wave)
P (primary wave)
P (primary wave)
P (primary wave)
P (primary wave)
P (primary wave)
P (primary wave)
16
Hierarchical EMM
Summary
EMM
Regional
EMM
Local
EMM
9/15/2008
Regional
EMM
Local
EMM
Local
EMM
CTBTO Data Mining/Data Fusion
Workshop
Local
EMM
Local
EMM
17
Now What?
Interest DM
COMMUNITY
DATA
NEEDED
NOISE
MAY
NOT BE
BAD
KDD CUP
9/15/2008
CTBTO Data Mining/Data Fusion
Workshop
18
References







Zhigang Li and Margaret H. Dunham, “ STIFF: A Forecasting Framework for SpatioTemporal Data”, Proceedings of the First International Workshop on Knowledge
Discovery in Multimedia and Complex Data, May 2002, pp 1-9.
Zhigang Li, Liangang Liu, and Margaret H. Dunham, “ Considering Correlation
Between Variables to Improve Spatiotemporal Forecasting,” Proceedings of the
PAKDD Conference, May 2003, pp 519-531.
Jie Huang, Yu Meng, and Margaret H. Dunham, “Extensible Markov Model,”
Proceedings IEEE ICDM Conference, November 2004, pp 371-374.
Yu Meng and Margaret H. Dunham, “Efficient Mining of Emerging Events in a
Dynamic Spatiotemporal,” Proceedings of the IEEE PAKDD Conference, April 2006,
Singapore. (Also in Lecture Notes in Computer Science, Vol 3918, 2006, Springer
Berlin/Heidelberg, pp 750-754.)
Yu Meng and Margaret H. Dunham, “Mining Developing Trends of Dynamic
Spatiotemporal Data Streams,” Journal of Computers, Vol 1, No 3, June 2006, pp
43-50.
Charlie Isaksson, Yu Meng, and Margaret H. Dunham, “Risk Leveling of Network
Traffic Anomalies,” International Journal of Computer Science and Network Security,
Vol 6, No 6, June 2006, pp 258-265.
Margaret H. Dunham and Vijay Kumar, “Stream Hierarchy Data Mining for Sensor
Data,” Innovations and Real-Time Applications of Distributed Sensor Networks
(DSN) Symposium, November 26, 2007, Shreveport Louisiana.
9/15/2008
CTBTO Data Mining/Data Fusion
Workshop
19