Transcript Document

Mining Weather Data for
Decision Support
Roy George
Army High Performance Computing Research Center
Clark Atlanta University
Atlanta, GA 30314
Research

Clustering Algorithms for Data Mining



Spatio-Temporal Domain
Parallelization of Algorithms
Algorithms for Feature Extraction and
Knowledge Discovery
2
Challenges of Geographical Data

Complexities associated with data volume


Domain complexities


Systems are interconnected
Data gathering and sampling


Interesting signals hidden by stronger patterns
Complexities caused by local variation


Terabyte databases
Interpretation of aggregated data
Formalizing the domain
3
Background: Issues with Hard
Clustering

Issue: Force data with imprecision and/or
uncertainty into discrete classes
 Result: Missing important outliers,
boundary patterns
 Approach: Use of Approximate Clustering
Technique
4
Background: K-Means
Clustering


Partition the data into K Clusters that are
homogenous
Algorithm





Select K time series as initial centroids
Assign all time series to the most similar centroid
Re-compute the centeroids
Repeat till centroids do not change
Variations based on different measures of
similarity
5
Unsupervised Fuzzy K-Means
(UKFM) Clustering





Choose the initial number of clusters
Develop a clustering using the Fuzzy KMeans
Merge the cluster pair that have maximum
correlation
Compute validity measure
Repeat till until termination condition reached
6
UKFM Results
Weather Data Set
Initial: 11 Clusters
Optimal: 8 Clusters
7
Final: 4 Clusters
Global Earth Science Data


Collaborative Effort with V. Kumar (UMinn)
Test bed for UKFM (comparison with existing
techniques)

Data Set


Ocean Climate Indices



Global Sea Pressure (1989 – 1993)
Capture Teleconnections
Result
UKFM can capture even weaker OCI’s using
coarse clusters
8
Global Climate Data
(Sea Level Pressure)
Intermediate:
60 Clusters
9
Global Climate Data
(Sea Level Pressure)
Final: 26
Clusters
10
Relation with SOI
11
Integrating Multi Datasets in
UFKM Clustering

Motivation: Data-based approach of
Determining “interesting” clusters

Validate using multi datasets


Rule: Retain clusters that have supporting data
Applicable in Data Rich Environment
12
UKFM Clustering with MultiDataset Validation
• Choose the initial number of clusters
• Develop a clustering using the Fuzzy KMeans
• Validate cluster with other datasets Di=1,n
• Merge if clusters is uncorrelated
Else
Consider next candidate pair to merge
 Repeat till until termination condition
reached
13
UKFM Multi-Dataset Results
Height
Windspeed
Pressure
14
Temperature
Multi-threading Parallel Algorithm

For each clustering stage

For each iteration
Slaves: Calculate M
for each cluster
Master: Normalize M
Slaves: Calculate C
for each cluster
Master: Normalize C
15
Multi-threading Result

Implemented on Sun Fire workstation with
four 900-MHz UltraSPARC® III processors
 Near Linear Speed Up Obtained
16
Relevance to the Army

Directly supports the FBKOF STO (B.
Broome)

Development of the Weather Information and
Tactical Support (WITS) System
17
Weather Information and
Tactical Support (WITS)

Objective: Extraction of patterns from
weather to be extracted and fused with
external databases (logistics, terrain, forces,
etc.) for higher level planning
18
Approach

Development of an OLAP
Weather Repository

GA Weather (1981-2002)

text
Sources: Nat. Weather
Svc, GA Env. Network
text
text

Development of WITS
Modules
MONTH
text
DAY



Ad-hoc Querying
Real time Analysis and
Planning
Effects on Army Systems


YEAR
TEMPERATURE,
PRECIPITATION,
WIND SPEED, etc
Integration with IWEDA
Abstract Data
Representation
19
WITS System Design
TAPS MODULE
DATA
MINING
MODULES
DATA WAREHOUSE
USER
INTERFACE
t
e
x
t
text
text
KNOWLEDGE
BASES
(IWEDA)
text
DATA CLEANING
& TRANSFORMATION
QUERY
MODULES
DATA
ACQUISITION AGENTS
IQ MODULE
REAL TIME MODULE
20
WITS/IQ
21
WITS/IQ
22
WITS/IWEDA
23
WITS/Analysis
24
WITS/Analysis
25
Work in Progress

Characterization of Analysis Queries



Incorporation into Data Mining Algorithms into
WITS
Enhancement of WITS/TAPS
Implementation of WITS/Real
26
Hybrid Genetic Fuzzy Systems
for Feature Extraction and Knowledge
Discovery
27
Project Goals

Design and implement hybrid genetic fuzzy
system for knowledge discovery.


Develop API/Tools.
Apply tools to Army related problems.
28
Contribution

Hybrid system based on the Simple Genetic
Algorithm (SGA). Enhanced the SGA by adding
three levels of knowledge discovery.

Level 1: Discovers up to k possible rules for a given set of
inputs and outputs. It then attempts to minimize the
number of rules and tune the knowledge base.

Level 2: Takes the set of rules from Level 1 and further
minimizes the rules. In addition, it also tunes the
knowledge base.

Level 3: Makes one last attempt to further tune the
architecture of the knowledge base.
29
Rule Discovery

Search for k possible rules from the set of p possible rules. k
is a input parameter of the GA application.

Discover the smallest value of k, therefore reducing the
number of rules needed.

Example Rules:

If INPUT_1 is low AND INPUT_2 is medium THEN
OUTPUT_1 is high

If INPUT_1 is high THEN OUTPUT_1 is low
30
Relevance to the Army

Collaborators: Jeff Passner, John Raby (ARL)


IMETS weather modeling
Post processing used to predict additional
parameters


Visibility, Turbulence, Fog, etc.
Use of Knowledge Discovery to Predict Parameters
31
Visibility Application

Generate and tune a system that can predict
visibility based on input parameters

Tasks for the fuzzy genetic system


Search for a set of k rules from p possible rules that
describe the relationship of the input parameters with
the output (visibility)
Concurrently discover the architecture, and optimize
the performance of the knowledge-bases in relation to
the k rules
32
Results for
Low Visibility Classifier
33
Results for
Medium Visibility Classifier
34