Transcript Document
Mining Weather Data for
Decision Support
Roy George
Army High Performance Computing Research Center
Clark Atlanta University
Atlanta, GA 30314
Research
Clustering Algorithms for Data Mining
Spatio-Temporal Domain
Parallelization of Algorithms
Algorithms for Feature Extraction and
Knowledge Discovery
2
Challenges of Geographical Data
Complexities associated with data volume
Domain complexities
Systems are interconnected
Data gathering and sampling
Interesting signals hidden by stronger patterns
Complexities caused by local variation
Terabyte databases
Interpretation of aggregated data
Formalizing the domain
3
Background: Issues with Hard
Clustering
Issue: Force data with imprecision and/or
uncertainty into discrete classes
Result: Missing important outliers,
boundary patterns
Approach: Use of Approximate Clustering
Technique
4
Background: K-Means
Clustering
Partition the data into K Clusters that are
homogenous
Algorithm
Select K time series as initial centroids
Assign all time series to the most similar centroid
Re-compute the centeroids
Repeat till centroids do not change
Variations based on different measures of
similarity
5
Unsupervised Fuzzy K-Means
(UKFM) Clustering
Choose the initial number of clusters
Develop a clustering using the Fuzzy KMeans
Merge the cluster pair that have maximum
correlation
Compute validity measure
Repeat till until termination condition reached
6
UKFM Results
Weather Data Set
Initial: 11 Clusters
Optimal: 8 Clusters
7
Final: 4 Clusters
Global Earth Science Data
Collaborative Effort with V. Kumar (UMinn)
Test bed for UKFM (comparison with existing
techniques)
Data Set
Ocean Climate Indices
Global Sea Pressure (1989 – 1993)
Capture Teleconnections
Result
UKFM can capture even weaker OCI’s using
coarse clusters
8
Global Climate Data
(Sea Level Pressure)
Intermediate:
60 Clusters
9
Global Climate Data
(Sea Level Pressure)
Final: 26
Clusters
10
Relation with SOI
11
Integrating Multi Datasets in
UFKM Clustering
Motivation: Data-based approach of
Determining “interesting” clusters
Validate using multi datasets
Rule: Retain clusters that have supporting data
Applicable in Data Rich Environment
12
UKFM Clustering with MultiDataset Validation
• Choose the initial number of clusters
• Develop a clustering using the Fuzzy KMeans
• Validate cluster with other datasets Di=1,n
• Merge if clusters is uncorrelated
Else
Consider next candidate pair to merge
Repeat till until termination condition
reached
13
UKFM Multi-Dataset Results
Height
Windspeed
Pressure
14
Temperature
Multi-threading Parallel Algorithm
For each clustering stage
For each iteration
Slaves: Calculate M
for each cluster
Master: Normalize M
Slaves: Calculate C
for each cluster
Master: Normalize C
15
Multi-threading Result
Implemented on Sun Fire workstation with
four 900-MHz UltraSPARC® III processors
Near Linear Speed Up Obtained
16
Relevance to the Army
Directly supports the FBKOF STO (B.
Broome)
Development of the Weather Information and
Tactical Support (WITS) System
17
Weather Information and
Tactical Support (WITS)
Objective: Extraction of patterns from
weather to be extracted and fused with
external databases (logistics, terrain, forces,
etc.) for higher level planning
18
Approach
Development of an OLAP
Weather Repository
GA Weather (1981-2002)
text
Sources: Nat. Weather
Svc, GA Env. Network
text
text
Development of WITS
Modules
MONTH
text
DAY
Ad-hoc Querying
Real time Analysis and
Planning
Effects on Army Systems
YEAR
TEMPERATURE,
PRECIPITATION,
WIND SPEED, etc
Integration with IWEDA
Abstract Data
Representation
19
WITS System Design
TAPS MODULE
DATA
MINING
MODULES
DATA WAREHOUSE
USER
INTERFACE
t
e
x
t
text
text
KNOWLEDGE
BASES
(IWEDA)
text
DATA CLEANING
& TRANSFORMATION
QUERY
MODULES
DATA
ACQUISITION AGENTS
IQ MODULE
REAL TIME MODULE
20
WITS/IQ
21
WITS/IQ
22
WITS/IWEDA
23
WITS/Analysis
24
WITS/Analysis
25
Work in Progress
Characterization of Analysis Queries
Incorporation into Data Mining Algorithms into
WITS
Enhancement of WITS/TAPS
Implementation of WITS/Real
26
Hybrid Genetic Fuzzy Systems
for Feature Extraction and Knowledge
Discovery
27
Project Goals
Design and implement hybrid genetic fuzzy
system for knowledge discovery.
Develop API/Tools.
Apply tools to Army related problems.
28
Contribution
Hybrid system based on the Simple Genetic
Algorithm (SGA). Enhanced the SGA by adding
three levels of knowledge discovery.
Level 1: Discovers up to k possible rules for a given set of
inputs and outputs. It then attempts to minimize the
number of rules and tune the knowledge base.
Level 2: Takes the set of rules from Level 1 and further
minimizes the rules. In addition, it also tunes the
knowledge base.
Level 3: Makes one last attempt to further tune the
architecture of the knowledge base.
29
Rule Discovery
Search for k possible rules from the set of p possible rules. k
is a input parameter of the GA application.
Discover the smallest value of k, therefore reducing the
number of rules needed.
Example Rules:
If INPUT_1 is low AND INPUT_2 is medium THEN
OUTPUT_1 is high
If INPUT_1 is high THEN OUTPUT_1 is low
30
Relevance to the Army
Collaborators: Jeff Passner, John Raby (ARL)
IMETS weather modeling
Post processing used to predict additional
parameters
Visibility, Turbulence, Fog, etc.
Use of Knowledge Discovery to Predict Parameters
31
Visibility Application
Generate and tune a system that can predict
visibility based on input parameters
Tasks for the fuzzy genetic system
Search for a set of k rules from p possible rules that
describe the relationship of the input parameters with
the output (visibility)
Concurrently discover the architecture, and optimize
the performance of the knowledge-bases in relation to
the k rules
32
Results for
Low Visibility Classifier
33
Results for
Medium Visibility Classifier
34