The NEON data model, accessed through

Download Report

Transcript The NEON data model, accessed through

2010-2011 NEON Computer Science Clinic
Common Data Services for Ecological Data
NEON
NEON, the National Ecological Observatory Network, is a
NSF-funded project to collect and manage data from across
the U.S. NEON's mission is to enable understanding and
forecasting of the impacts of climate change, land-use
change and invasive species on continental-scale ecology -by providing infrastructure and consistent methodologies to
support research and education in these areas.
Problem Statement
System Design
This project uses JUnit to automate the testing of software
deliverables. A set of unit tests cover the data-access API call,
getData, as well as getConfig, a configuration-access API call.
Both edge cases and common scenarios are considered.
Ecological studies
Community
Environmental modeling / monitoring
data products
Applications
Testing
The majority of these tests use data from a test database to
verify their correctness. To support these tests, the team has
written and provided scripts to populate the test database.
Data formatting / visualization
Data gap-filling / smoothing / aggregation
Common Data Services
Data
NEON's SQL
database
Deliverables
other data files
Thus, the team is delvering to NEON:
raw data from towers, scientists, others...
NEON's mission is to make its large, varied datastores
available to the scientific community. In addition, NEON
application developers – and applications -- need the ability
to access data systematically and without worrying about
the details of the model by which the data is stored.
NEON organizes the U.S. into 20 ecoclimatic domains.
Data Sources
NEON’s Computational Infrastructure (CI) systems will handle
a wide variety of data from different soruces and formats:
• The Fundamental Instrument Unit (FIU) monitors physical
and chemical climate properties such CO2 and moisture.
NEON is designing and deploying observation platforms to
provide such data. Those data will be collected into NEON's
large SQL database at its headquarters in Boulder, CO.
• The Fundamental Sentinel Unit (FSU) collects specimens of
local species and data on biodiversity and populations.
• The Airborne Observation Platform (AOP) analyzes
changing land use, vegetation cover, and species migration.
• The Land Use Analysis Package (LUAP) compiles and
assess historical data, much of which is in NetCDF format.
Thus, NEON asked the HMC clinic team to design and
prototype a Common Data Services (CDS) software layer.
The CDS provides application developers a consistent,
extensible abstraction through which to access NEON's
data, whether its database or files. The team has also
documented and tested its final, deliverable CDS system.
• A hibernate-based system that traverses NEON's database
and returns Java objects encapsulating data/configurations.
Given a query for data, the Common Data Services (CDS)
layer provides a list of NetCDF files as its primary output
method. If desired, the user can choose to access the
“bare” data, provided as a Java object, instead of the file.
NetCDF files generated by the CDS are lazily created: they
aren’t generated until the user requests them via a function
call. This saves time and disk access in the case that the
calling application isn’t interested in them.
Database Traversal
NetCDF
Example: getData( "Mauna Loa, HI", “Raw CO2” )
NetCDF is a community-standard scientific data file format
supported by a large number of existing applications and
well-maintained libraries. Because the ecological community
and NEON itself use NetCDF already, we chose NetCDF as
our common intermediate data format. The Common Data
Services layer outputs the results of a query in NetCDF form,
whether the data are stored in a flat file or NEON's database.
Hibernate
Hibernate is a Java-based object/relational mapping library
that allows developers to treat database tables as objects.
Thus, Hibernate simplifies the process of reading databases:
instead of constructing complex SQL queries, the CDS
traverses a collection of interconnected Java objects.
A NEON observation platform for collecting FIU data.
The NEON data model, accessed through Hibernate
In addition, we use the Hibernate Spatial and the JTS
Topology Suite in order to access and manipulate geometric
objects from the database. These libraries also offer a large
set of transformations, such as polygon intersections and
area computations, on arbitrary geometric objects.
• A netCDFHandler class that can write Java DataResult
objects into NetCDF files. This file-creation is done lazily.
• An overall CDS layer that thus offers NEON's application
developers a consistent interface -- NetCDF files -- whether
the data is stored in NetCDF or in NEON's database. If
desired, developers can access the underlying Java objects.
• The clinic team
has also created
a javascript/PHP
interface to highlight an example
of the type of
application CDS
might support. It
uses historic CO2
data from Mauna
Loa, Hawaii.
• The provided CDS system is necessarily a prototype,
because NEON and its data handling policies are evolving
during this early phase of its deployment. The clinic’s CDS
system offers a flexible foundation layer to which NEON and
other developers will add additional capabilities in order to
monitor, maintain, and investigate the nation’s largest store of
ecological data.
Acknowledgments
Team Members
Jason Garrett-Glaser '11
Keith Ingram '11 (PM)
Alejandro Lopez-Lago '11
HamsterBob Stewart '11
NEON Liaisons
Robert Tawa
DJ Spiess
Faculty Advisor
Zachary Dodds