Descriptive Analysis of the Global Climate System and Predictive

Download Report

Transcript Descriptive Analysis of the Global Climate System and Predictive

Comparing Predictive Power in
Climate Data: Clustering Matters
Karsten Steinhaeuser
University of Minnesota
joint work with
Nitesh Chawla & Auroop Ganguly
12th International Symposium on
Spatial and Temporal Databases
Minneapolis, MN
August 24, 2011
Outline
• Motivation
• Networks Primer
• From Data to Networks
• Motivating Networks in Climate Science
• Descriptive Analysis and Predictive Modeling
• Empirical Evaluation & Comparison
• Conclusions
08/24/2011
University of Minnesota
2
Mining Complex Data
• Complex spatio-temporal data pose unique challenges
• Tobler’s First Law of Geography:
“Everything is related, but near
things more than distant.”
– But are all near things equally related?
– Are there phenomena explained by
interactions among distant things?
(teleconnections)
08/24/2011
University of Minnesota
3
Networks Primer
What is a Network?
• Oxford English Dictionary:
network, n.: Any netlike or complex
system or collection of interrelated
things, as topographical features,
lines of transportation, or
telecommunications routes
(esp. telephone lines).
• My working definition:
Any set of items that are connected or related to each other.
(“items” and “connections” can be concrete or abstract)
08/24/2011
University of Minnesota
4
Networks Primer
Community Detection in Networks
• Identify groups of nodes
that are relatively more tightly
connected to each other than
to other nodes in the network
• Computationally challenging
problem for real-world networks
08/24/2011
University of Minnesota
5
From Data to Networks
• Networks are pervasive in
social science, technology,
and nature
• Many datasets explicitly
define network structure
• But networks can also represent other types of data,
framework for identifying relationships, patterns, etc.
08/24/2011
University of Minnesota
6
Motivating Networks in Climate
Uncertainty derives from many known and
often many more unknown sources
08/24/2011
University of Minnesota
7
Motivating Networks in Climate
• Projections of climate
rely on many factors
– Understanding of the
physical processes
– Ability to implement
this understanding in
computational models
– Assumptions about
the future
Source: IPCC SRES and AR-4
08/24/2011
University of Minnesota
8
Motivating Networks in Climate
• Some processes well-understood and modeled,
others much less credible
• Comparison to observations shows varying skills
08/24/2011
University of Minnesota
9
Motivating Networks in Climate
• Models cannot capture some features/processes
• Comparison to observations illustrates severe
geographic variability, topographic bias
08/24/2011
University of Minnesota
10
Motivating Networks in Climate
Research Question:
Can we characterize the credible variables,
identify relationships to the relatively less
credible variables, and leverage them to
improve or refine our understanding?
Answer:
Stay Tuned…
08/24/2011
University of Minnesota
11
Knowledge Discovery for Climate
Data Storage &
Management
Knowledge
Discovery
System Inputs
Observations
GCM outputs
HPSS
Complex
Networks
Oracle DB
Data Mining
HighPerformance
Computing
HPCC
Lens - Jaguar
Visualization
ARM Data
ArcGIS
Basic OLAP
Capabilities
PowerWall
System Outputs
Novel Insights
Decision Support
08/24/2011
University of Minnesota
12
Historical Climate Data
• NCEP/NCAR Reanalysis (proxy for observation)
• Monthly for 60 years (1948-2007) on 5ºx5º grid
• Seven variables:
08/24/2011
University of Minnesota
Raw
Data
De-Seasonalize
Sea surface temperature (SST)
Sea level pressure (SLP)
Geopotential height (GH)
Precipitable water (PW)
Relative Humidity (RH)
Horizontal wind speed (HWS)
Vertical wind speed (VWS)
Anomaly
Series
13
Network Construction
• View global climate system as a
collection of interacting oscillators
[Tsonis & Roebber, 2004]
– Vertices represent locations in
space
– Edges denote correlation in
variability
• Link strength estimated by
correlation, low-weight edges are
pruned from the network
• Construct networks only for
locations over the oceans
– Relatively better captured by models
08/24/2011
University of Minnesota
14
Geographic Properties
• Examine network structure in spatial context
– Link lengths computed as great-circle distance
– Compare autocorrelation / de-correlation lengths for
different variables, interpret within the domain
Autocorrelation
Teleconnection
Sea Level Pressure
08/24/2011
Precipitable Water
University of Minnesota
Vertical Wind Speed
15
Clustering Climate Networks
• Apply community detection to
partition networks
– Use Walktrap algorithm
[Pons & Latapy, 2006]
– Efficient and works well
for dense networks
Sea Level Pressure
• Visualize spatial pattern using
GIS tools
• Cluster structure suggests
relationships within the climate
system
Precipitable Water
08/24/2011
University of Minnesota
16
Update!
Research Question:
Can we characterize the credible variables,
identify relationships to the relatively less
credible variables, and leverage them to
improve or refine our understanding?
Revised Answer:
Yes… but that’s not all.
08/24/2011
University of Minnesota
17
Descriptive  Predictive
• Network representation is able to capture
interactions, reveal patterns in climate
– Validate existing assumptions / knowledge
– Suggest potentially new insights or hypotheses
for climate science
• Want to extract the relationships between
atmospheric dynamics over ocean and land
– i.e., “Learn” physical phenomena from the data
08/24/2011
University of Minnesota
18
Predictive Modeling
• Use network clusters as candidate predictors
• Create response variables for target regions
around the globe (illustrated below)
• Build regression
model relating
ocean clusters
to land climate
08/24/2011
University of Minnesota
19
Illustrative Example
• Predictive model for air temperature in Peru
– Long-term variability highly predictable due to
well-documented relation to El Nino
• Small number of clusters have majority of skill
– Feature selection (blue line) improves predictions
Raw Data
All Clusters
Feature Selection
08/24/2011
University of Minnesota
20
Update!
Research Question:
Can we characterize the credible variables,
identify relationships to the relatively less
credible variables, and leverage them to
improve or refine our understanding?
Revised Answer:
Yes and Yes… but wait, there’s more.
08/24/2011
University of Minnesota
21
Results on Train/Test
08/24/2011
University of Minnesota
22
Predictive Skill
08/24/2011
University of Minnesota
23
Update!
Research Question:
Can we characterize the credible variables,
identify relationships to the relatively less
credible variables, and leverage them to
improve or refine our understanding?
Revised Answer:
Yes, Yes, and Yes.
08/24/2011
University of Minnesota
24
Variations / Extensions
• Compare network approach to traditional
clustering methods
– k-means, k-medoids, spectral, EM, etc.
• Compare different types of predictive models
– (linear) regression, regression trees, neural nets,
support vector regression
08/24/2011
University of Minnesota
25
Compare Clustering Methods
08/24/2011
University of Minnesota
26
Compare Predictive Models
08/24/2011
University of Minnesota
27
Refining Model Projections
08/24/2011
University of Minnesota
28
Conclusions
• Networks capture behavior of the climate system
• Clusters (or “communities”) derived from these
networks have useful predictive skill
– Statistically significantly better than predictors based
on clusters derived using traditional methods
• Potential for advancing climate science
– Understanding of physical processes
– Complement climate model simulations
08/24/2011
University of Minnesota
29
Upcoming Events
1. First International Workshop on Climate Informatics,
New York, NY, August 26, 2011
http://www.nyas.org/climateinformatics
2. NASA Conference on Intelligent Data Understanding
(CIDU), Mountain View, CA, Oct 19-21, 2011
https://c3.ndc.nasa.gov/dashlink/projects/43/
3. IEEE ICDM Workshop on Knowledge Discovery from
Climate Data, Vancouver, Canada, December 10, 2011
http://www.nd.edu/~dial/climkd11/
08/24/2011
University of Minnesota
30
Thanks & Questions
Contact
[email protected]
Personal Homepage
http://www.nd.edu/~ksteinha
NSF Expeditions on
Understanding Climate Change
http://climatechange.cs.umn.edu
This work was supported in part by the National Science Foundation
under Grants OCI-1029584 and BCS-0826958. This research was
also funded in part by the project entitled “Uncertainty Assessment
and Reduction for Climate Extremes and Climate Change Impacts”
under the initiative “Understanding Climate Change Impact: Energy,
Carbon, and Water Initiative” within the Laboratory Directed
Research and Development (LDRD) Program of the Oak Ridge
National Laboratory, managed by UT-Battelle, LLC for the U.S.
Department of Energy under Contract DE-AC05-00OR22725.
08/24/2011
University of Minnesota
31