Applications and Parameter Analysis of Temporal Chaos Game
Download
Report
Transcript Applications and Parameter Analysis of Temporal Chaos Game
Spatiotemporal Stream
Mining Applied to
Seismic+ Data
Margaret H. Dunham
CSE Department
Southern Methodist University
Dallas, Texas 75275 USA
[email protected]
9/15/2008
CTBTO Data Mining/Data Fusion
Workshop
1
Outline
CTBTO Data
CTBTO Modeling Requirements
EMM
9/15/2008
CTBTO Data Mining/Data Fusion
Workshop
2
CTBTO Data
As a Data Miner I must first
understand your DATA
•Diverse – Seismic, Hydroacoustic,
Infrasound, Radionuclide
•Spatial (source and sensor)
•Temporal
•STREAM Data
9/15/2008
CTBTO Data Mining/Data Fusion
Workshop
3
From Sensors to Streams
Stream Data - Data captured and sent
by a set of sensors
Real-time sequence of encoded signals
which contain desired information.
Continuous, ordered (implicitly by
arrival time or explicitly by timestamp
or by geographic coordinates)
sequence of items
Stream data is infinite - the data
keeps coming.
11/26/07 – IRADSN’07
4
CTBTO & Data Mining
Data Mining techniques must be
defined based on your data and
applications
Can’t use predefined fixed models
and prediction/classification
techniques.
Must not redo massive amounts of
algorithms already created.
9/15/2008
CTBTO Data Mining/Data Fusion
Workshop
5
CTBTO + DM Requirements
• Model:
Handle different data types (seismic,
hydroacoustic, etc.)
Spatial + Temporal (Spatiotemporal)
Hierarchical
Scalable
Online
Dynamic
• Anomaly Detection:
9/15/2008
Not just specific wave type or data values
Relationships between arrival of waves/data
Combined values
data
from all sensors
CTBTO Dataof
Mining/Data
Fusion
Workshop
6
EMM (Extensible Markov Model)
Time Varying Discrete First Order Markov
Model
Nodes are clusters of real world states.
Overlap of learning and validation phases
Learning:
• Transition probabilities between nodes
• Node labels (centroid or medoid of cluster)
• Nodes are added and removed as data arrives
Applications: prediction, anomaly detection
9/15/2008
CTBTO Data Mining/Data
7
Fusion
Workshop
Research Objectives
Apply proven spatiotemporal modeling
technique to seismic data
Construct EMM to model sensor data
• Local EMM at location or area
• Hierarchical EMM to summarize lower level models
• Represent all data in one vector of values
• EMM learns normal behavior
Develop new similarity metrics to include all sensor
data types (Fusion)
Apply anomaly detection algorithms
9/15/2008
CTBTO Data Mining/Data Fusion
Workshop
8
EMM Creation/Learning
<18,10,3,3,1,0,0>
<17,10,2,3,1,0,0>
<16,9,2,3,1,0,0>
<14,8,2,3,1,0,0>
2/3
2/3
2/22/3
1/1
1
1/2
1/2
N3
N1
1/3
N2
1/1
1/2
1/1
<14,8,2,3,0,0,0>
<18,10,3,3,1,1,0.>
9/15/2008
9
Input Data Representation
Vector of sensor values (numeric) at
precise time points or aggregated
over time intervals.
Need not come from same sensor
types.
Similarity/distance between vectors
used to determine creation of new
nodes in EMM.
9/15/2008
CTBTO Data Mining/Data Fusion
Workshop
10
Anomaly Detection with EMM
Objective: Detect rare (unusual,
surprising) events
Advantages:
Detected unusual
weekend traffic pattern
•Dynamically learns what is
normal
•Based on this learning, can
predict what is not normal
•Do not have to a priori indicate
normal behavior
Applications:
•Network Intrusion
•Data: IP traffic data, Automobile
traffic data
Weekdays Weekend
Seismic:
•Unusual Seismic Events
Minnesota DOT Traffic Data
•Automatically Filter out normal
events
11/3/04
11
EMM with Seismic Data
Input – Wave arrivals (all or one per
sensor)
Identify states and changes of states in
seismic data
Wave form would first have to be converted
into a series of vectors representing the
activity at various points in time.
Initial Testing with RDG data
Use amplitude, period, and wave type
9/15/2008
CTBTO Data Mining/Data Fusion
Workshop
12
New Distance Measure
Data = <amplitude, period, wave type>
Different wave type = 100% difference
For events of same wave type:
• 50% weight given to the difference in amplitude.
• 50% weight given to the difference in period.
If the distance is greater than the threshold, a
state change is required.
amplitude =
| amplitudenew – amplitudeaverage | / amplitudeaverage
period =
| periodnew – periodaverage | / periodaverage
9/15/2008
CTBTO Data Mining/Data Fusion
Workshop
13
EMM with Seismic Data
States 1, 2, and
3 correspond to
Noise, Wave A,
and Wave B
respectively.
9/15/2008
CTBTO Data Mining/Data Fusion
Workshop
14
Preliminary Testing
RDG data February 1, 1981 – 6
earthquakes
Find transition times close to known
earthquakes
9 total nodes
652 total transitions
Found all quakes
9/15/2008
CTBTO Data Mining/Data Fusion
Workshop
15
.
EMM Nodes
Node #
1
2
3
4
5
6
7
8
9
9/15/2008
Average amplitude
1.649m
8.353m
23.237m
87.324m
253.333m
270.524m
7.719m
723.088m
1938.772m
Average period
0.119 sec
0.803 sec
0.898 sec
0.997 sec
1.282 sec
0.96 sec
20.4 sec
1.962 sec
1.2 sec
CTBTO Data Mining/Data Fusion
Workshop
Phase code
P (primary wave)
P (primary wave)
P (primary wave)
P (primary wave)
P (primary wave)
P (primary wave)
P (primary wave)
P (primary wave)
P (primary wave)
16
Hierarchical EMM
Summary
EMM
Regional
EMM
Local
EMM
9/15/2008
Regional
EMM
Local
EMM
Local
EMM
CTBTO Data Mining/Data Fusion
Workshop
Local
EMM
Local
EMM
17
Now What?
Interest DM
COMMUNITY
DATA
NEEDED
NOISE
MAY
NOT BE
BAD
KDD CUP
9/15/2008
CTBTO Data Mining/Data Fusion
Workshop
18
References
Zhigang Li and Margaret H. Dunham, “ STIFF: A Forecasting Framework for SpatioTemporal Data”, Proceedings of the First International Workshop on Knowledge
Discovery in Multimedia and Complex Data, May 2002, pp 1-9.
Zhigang Li, Liangang Liu, and Margaret H. Dunham, “ Considering Correlation
Between Variables to Improve Spatiotemporal Forecasting,” Proceedings of the
PAKDD Conference, May 2003, pp 519-531.
Jie Huang, Yu Meng, and Margaret H. Dunham, “Extensible Markov Model,”
Proceedings IEEE ICDM Conference, November 2004, pp 371-374.
Yu Meng and Margaret H. Dunham, “Efficient Mining of Emerging Events in a
Dynamic Spatiotemporal,” Proceedings of the IEEE PAKDD Conference, April 2006,
Singapore. (Also in Lecture Notes in Computer Science, Vol 3918, 2006, Springer
Berlin/Heidelberg, pp 750-754.)
Yu Meng and Margaret H. Dunham, “Mining Developing Trends of Dynamic
Spatiotemporal Data Streams,” Journal of Computers, Vol 1, No 3, June 2006, pp
43-50.
Charlie Isaksson, Yu Meng, and Margaret H. Dunham, “Risk Leveling of Network
Traffic Anomalies,” International Journal of Computer Science and Network Security,
Vol 6, No 6, June 2006, pp 258-265.
Margaret H. Dunham and Vijay Kumar, “Stream Hierarchy Data Mining for Sensor
Data,” Innovations and Real-Time Applications of Distributed Sensor Networks
(DSN) Symposium, November 26, 2007, Shreveport Louisiana.
9/15/2008
CTBTO Data Mining/Data Fusion
Workshop
19