Transcript Title

Proposed Microsoft Water TCI+
Development of the AmeriFlux and
Central Valley Data Portals
conducted through a partnership with the Berkeley Water Center
http://esd.lbl.gov/BWC/
Susan Hubbard (BWC), Deb Agarwal (LBNL, UCB) &
Catharine vanIngen (MSFT)
Feb. 2006
Outline


Overview of Berkeley Water Center (BWC)
Motivation and General Objectives for
Development of Water Data Portals
 Description of proposed Portals:
 Carbon-Climate;
 Central

Valley Cyber-Infrastructure
Proposal Specifics
 Requested Support
 Project Timeline

Summary
Berkeley Water Center (BWC):
A Water Center of Excellence

Is developing a new mode of for doing business at Berkeley by developing a
seamless integration of UCB and LBNL expertise;
BWC involves
faculty from 3 UCB
Colleges and 3
LBNL Divisions

Conducts interdisciplinary investigations that are coordinated through
research thrust areas;
 Accelerates thrust area results into applications;
 Develops collaborations between Berkeley water researchers and other
expert groups;
 Creates strong, mutually beneficial partnerships between Berkeley and
other academic, governmental, and private sector institutions;
Motivation for the Water Data Portals
•Meeting the water needs of humans is one of the greatest challenges of the 21st
century;
•Hydrological processes are highly complex and dynamic over various spatial and
temporal scales;
•Understanding hydrological processes with sufficient accuracy in the face of
anthropogenic and global changes is a prerequisite to successful water management.
•Simple access to curated data and related metadata is a necessary component of a
modern cyber-infrastructure that enables researchers and water managers to
assimilate complex, multi-scale datasets collected from networked micro sensors to
global satellite platforms and to use that data with modeling or mining tools to test
hypotheses.
Portal Prototypes: General Objectives

Demonstrate and advanced approach for tackling 21st century challenges
by leveraging web service concepts, Microsoft technologies, and
information technology expertise;

Developed in close collaboration with water scientists to ensure that the
result is immediately seen as useful for doing water science

Early focus is on most critical components needed to address relevant
science questions, rather than creating a fully developed problem solving
environment.

Continually demonstrate and “dogfood” prototypes with end-to-end
scenarios and use feedback to refine and augment

Work on two different, yet scientifically related, projects that will :

Permit us to understand what is common and what is distinct between
different water research approaches;
 Allow us to work with a wide range of water datasets and analysis techniques;
 Provide demonstration vehicles to two different water research communities.
We propose
the
development
of two data
portals,
developed
based on the
needs of
different
water
research
communities
CARBON-CLIMATE
•Community has well-defined
research objectives;
•Protocols for acquisition and
reporting of AmeriFlux data well
developed.
•Datasets are ripe for synthesis but
lack cyberinfrastructure
CARBON-CLIMATE
DATA PORTAL
CALIFORNIA
HYDROLOGY
•Research objectives not
well defined;
•Extremely diverse and
‘dirtier’ large datasets
•Curation and
infrastructure needed
prior to synthesis
CENTRAL VALLEY
DATA PORTAL
Carbon Climate Data Portal

The ability to make global change predictions requires
information about carbon stocks and fluxes and the impact
of those on climate.

AmeriFlux datasets are used to assess carbon fluxes.



These datasets are collected from 149 environmental observatories
located across the Americas.
Protocols are already developed for data acquisition and reporting to a
central facility.
The size of a complete historical dataset is a few 100 MBs.
Carbon-Climate Feedbacks: Example


Climate warming is associated with earlier onset of Spring,
which is expected to enhance plant growth and to lead to an
increase in Carbon sequestration;
Berkeley Researchers (Angert et al., 2005, PNAS) recently
found that:




Earlier springs permit more uptake of CO2
However, increase in droughts (hotter, dryer summers) resulted
in lower net CO2 uptake which cancels out earlier enhanced
uptake.
Carbon-Climate Feedbacks are important;
The ability to compute simple correlations across sites,
measurements, and seasons will enable other such
interactions to be discovered and thereby to improve global
change predictions.
Soils
Climate
Observatory
datasets
Spatially
continuous
datasets
Examples of Carbon-Climate Datasets
Remote Sensing
Prototype Ameriflux Portal Development




Design of a schema capable of versioning and
researcher annotation of AmeriFlux data;
Build a data loading pipeline with basic data
cleaning capability through leveraging SQL Server
2005 Integration services
Develop a web portal that can provide simple
dataset selection and downloading across
measurement sites, parameters, versions, and time
windows;
Integrate with commonly used data visualization
tools to allow simple data mining and browsing.
Prototype Ameriflux Portal Adoption


Perform end-to-end scenario demonstrations and
live dogfooding in collaboration with BWC scientists
Refine and augment based on feedback. Potential
augmentations:




Federate with other data sources, such as MODIS remote
sensing, soils, and climate datasets.
Link to numerical models that permit hypothesis testing
Leverage workflow components to automate key analysis
tasks
Locate long term home for further development and
use of the portal.
Carbon-Climate Workbench Vision
Data Portals:
Host Ameriflux
Climate Data,
Statsgo Soils Data,
Tools:
MODIS products
Statistical
Graphical
Web Service Interface to Data and Tools
Web-based
Workbench
access
Choose Ameriflux
Area/Transect, Time
Range, Data Type
Import other
Datasets
Design Workflow
Data harvest
Sites 1-16
Gap Fill,
A technique
Statistical &
graphical
analysis
Climate
Statsgo
MODIS
LAI
Temp
Fpar
Veg Index
Surf Refl
NPP
Albedo
Gap Fill,
B technique
Version
control
Canoak
Model Site 1
Ecology Toolbox
Data
Cleaning Tools
Knowledge Generation Tools
Data Mining
and
Analysis Tools
Canoak
Model Site 9
Network
display LAI
Statistical &
Graphical
analysis
Carbon-Climate Workbench
Modeling Tools
Visualization
Tools
Compute
Resources
Central Valley Data Portal

Across the US, groundwater supplies
roughly 40 percent of drinking water;

The State of California alone uses about 16
Million acre-feet of ground water each year,
more than any other State in the Nation,
and 80% of that goes toward crop
irrigation;

The 400 Mile long Central Valley supplies ¼
of the food in the US.

California Groundwater quantity and quality
is critical to the economic viability of the
state!

A Data Portal will enable joint analysis of a
range of datasets and tools that are critical
to California water resource and water
quality,
Central
Valley
Southern Sacramento
86 wells
Northern San Joaquin
70 wells
Southeast San Joaquin
~100 wells
USGS Projects

The importance of Central valley water
resources and quality has prompted the
USGS to develop a $50M to monitor
ground water quality;
 The USGS project focuses on intensive
data collection, and no plans have been
made to curate these data or to federate
them with the other water datasets
critical for understanding water balance
and quality over time in the Central
Valley.
Examples
of Central Valley Water
Datasets
Basin Boundary and
Stream Network
Hydrological Units
from Well logs
Water levels
GAMA water quality Data
List of Analytes
Volatile organic compounds
Pesticides
Stable Isotopes, D, O-18
Tritium-3He / Noble Gases
Specific Conductance
Stable isotopes, 3H/He, noble gases
Carbon Isotopes (C-13,C-14)
Radon, Radium, gross alpha/beta
Field parameters - temp, EC, DO, turbidity, pH, alk.
Major ions and trace elements
Arsenic & Iron speciation
Nutrients (nitrates, phosphates)
Dissolved Organic Carbon
Emerging Contaminants
E. Coli, total Coliform, Coliphage
GAMA Project
Ken Belitz
Selected “Emerging Contaminants”
Pharmaceuticals
N-nitrosodimethylamine (NDMA)
Perchlorate
1,4-dioxane
Chromium (total and VI)
Prototype Central Valley Portal Development

Follows approach described for CarbonClimate portal development (data curation,
cleaning, mining and visualization)
 Data loading pipeline and cleaning will be
more challenging because the datasets are
larger, more diverse, and ‘dirtier’ than
AmeriFlux
 Data visualization likely includes some sort
of mapping to display measurements across
the Valley
Prototype Central Valley Portal Adoption


Follows approach described for Carbon-Climate portal
development (end-to-end scenario demonstrations and live
dogfooding in collaboration with BWC scientists)
Use portal to link subset of the data to modest numerical
model and attempt to do a specific scientific investigation
using the data (Kesterson salt balance).




Demonstrates value of portal
Demonstrates value of dataset
Refine and augment based on feedback. Potential
augmentations are also similar to Carbon-Climate portal
development although other data sources or data within those
sources will differ.
Locate long term home for further development and use of the
portal. BWC will host prototype portal for demonstrations and
dogfooding.
Central Valley Data Vision
BWC Data Gateway
Data Harvesting
and
Transformations
BWC Analysis Gateway
Distributed
Central
Valley Data
Sets
Data Cleaning,
Models, Analysis
Tools
Computational
Resources
BWC Water Portal
Knowledge discovery,
Hypothesis testing,
Water Synthesis
Dissemination
and Archiving
Building Water
Cyberinfrastructure to
Connect Data,
Resources, and People
Example research and policy questions for
the Central Valley Portal
Short term: Is salt leaking from the scattered
farm irrigation runnoff ponds in the
Kesterson Valley? If so, when will that
become a significant water quality concern?
Long term: What is the long term impact of
groundwater constituents, such as fertilizers
and emerging contaminants, on human and
economic health of California?
Transferability of Central Valley Prototype

Development of an infrastructure to study Central Valley Water
is a critical step in fusing science into water management
and decision making processes;

Because of the importance of the Central Valley in the water
community, the infrastructure will serve as a prototype for
basins across the world.

Portal will serve as a springboard for subsequent BWC water
research in the Central Valley, such as investigation of the
impact of global change on Central Valley productivity.
Proposal Project Parameters

IT components of proposed project to be led by Dr. Deb Agarwal
(LBNL/UCB) and Dr. Catharine van Ingen (MSFT);

The BWC will ensure that the prototypes benefit from good
scientific input and are distributed through the community;

We request support for 2 programmers and 1 graduate student per
year (~350k/year) for two years.

The programmers will start with the development of the more
straightforward Carbon-Climate portal and will transition to
development of the more challenging Central Valley Portal.

One programmer will primarily focus on development of data
loading and web service access, while the other programmer will
focus on data cleaning, mining and visualization tools.

Bi-weekly seminars will be held to facilitate exchange between the
programmers and the BWC scientists involved in the portal
development.
Project Timeline
3/06
•Hire programmers, postdocs, graduate students
•Begin intensive work on protoype Ameriflux Portal
•Begin conceptual development of architecture for the Central Valley
Portal and get data for curation
09/06
•Complete early prototype Ameriflux Portal
•Begin AmeriFlux Demonstrations to the Carbon Flux community
•Begin intensive work on prototype Central Valley Portal
03/07
09/07
•Refine prototype Ameriflux Portal based on user feedback and make
available to researchers for early use
•Begin federating of AmeriFlux Portal with models and
climate and remote sensing datasets
•Begin prototype Central Valley Portal short and longer term model
federation demonstrations
•Refine and augment prototype Central Valley Portal as needed
03/08
•Transfer Central Valley and AmeriFlux Portals to the respective scientific
communities.
Summary

Projects will demonstrate what modern commodity
tools and commercial data handling practices can
bring to water resources investigations and water
management.

Through close interaction between computer
scientists and the BWC water specialists and
partners, we envision that the data portals
developed through this TCI will be immediately
beneficial to water science professionals and serve
as an example in the more general e-science
community.

We request support of 700k over two years to
support the development of the proposed portals.