Integrating Physical and Biological Oceanographic

Download Report

Transcript Integrating Physical and Biological Oceanographic

Toward a distributed information system
for marine biology and limnology
(aka PAKT project)
Presenting: Karen Stocks, Amarnath Gupta, Chris Condit
Peter Arzberger (PI), Paul Brewin, Li Chen, Heasoo Hwang, Yannis
Papakonstantinou, Xufei Qian, Simone Santini, Reza Wahadj, Ilya
Zaslavsky
+ Rutgers University, University of Auckland, U. Wisconsin
Funding from the Gordon and Betty Moore Foundation
The Big Challenge:
Integrating distributed and heterogeneous
data resources to advance marine ecology
and limnology
Opening the “Data Closet”
OBIS
Seamounts
Lakes Testbed
CalCOFI
Marine Testbed
Information Technology Development
Seamounts
(undersea mountains)
Seamounts are
- biologically unique
- heavily fished habitats
SeamountsOnline: Centralized relational database
Seamount Science Example
Can seamount diversity be predicted from
seamount depth, distance from continental
margin, geological age, surface productivity,
etc.? Does endemism follow the predictions if
Island Biogeography Theory?
Seamount Challenges
Combine multiple, distributed datatypes:
• relational species distributions data in
SeamountsOnline (seamounts.sdsc.edu)
• bathymetry data and seamount morphology data
in the Seamount Catalog (earthref.org)
• raster physical data from World Ocean Atlas,
satellite imagery, etc.
Users
Research: CenSeam
–
Data Analysis Working Group
–
Expedition Planning
Management
–
United Nations: IUCN-sponsored workshop on deepwater
corals on Seamount
–
International Seabed Authority workshop
Seamount Research Coordination Network, NSF
OBIS: Ocean Biogeographic Information System (www.iobis.org)
OBIS
• The Ocean Biogeographic Information System is
an international federation of 50+ distributed
data providers (7 mil data records) sharing
species distribution data
• OBIS has a well established community
(secretariat funding, 10 regional node centers,
etc.) but limited resources to build infrastructure
• The current DiGIR client-server system allows
~70 fields of data to be transferred (an extended
Darwin Core) (www.iobis.org)
OBIS Science Examples
• Evaluating biogeographic provinces with
real data
• Predicting the spread of invasive species
• Identifying diversity hotspots/siting marine
protected areas
• Evaluating our state of knowledge
OBIS Challenges
• integrate OBIS biological data with emerging
physical data resources
• hierarchical data
• allow habitat-specific data exploration
• extend query functionality (e.g. to complex
spatial queries)
• capture more data when registering new data
providers/serve specific communities better
Integrate OBIS biological data with emerging physical data resources
CalCOFI
- CalCOFI (the California Cooperative Ocean
Fisheries Investigations) is a 50+ year long
monitoring study off of Southern California
- 4 times per year a regular grid of stations is
sampled for larval fish, zooplankton, and
physical ocean parameters
CalCOFI Science Examples
• Determining scales of variability in
biological components in space and time
• Correlating fluctuations in larval fish
abundance with physical parameters over
time.
• Developing ecosystem models for habitatbased management
Technical Challenges
• Multiple data types: relational, hierarchical,
raster, point, voxel, etc.
• Geospatial data operations
• Ontologies
• Higher knowledge sources
Integrating Physical and
Biological Oceanographic Data
The Information Systems Viewpoint
What are we integrating and why?
• The Science Goals
– Explain biodiversity
•
•
•
•
Of a species
Of any taxonomic grouping of species
Around a habitat
By correlating distribution of a taxonomic group with the
spatial (temporal) distribution of physical phenomena
• By creating groupings of physical and biological parameters
that correlate with the distribution and abundance of species
– Perhaps for specific habitats
– Create predictive models
• Given physical parameters or habitat characteristics, predict
species distribution and abundance
• Given species distribution, predict physical parameters
• …
Studies
collected-for
Samples
taken-from
A Conceptual Framework
for a Global Biodiversity
Schema
collected-from
Collection
Method
Collection
System
Intra-class-relationships
(parameterized)
Collection
Target
Loc.
Loc
Classes
Classes
observed-at
Time/Frequency
Intra-class-relationships
(parameterized)
Organism
Organism
Classes
Classes
Observations
Partial-mapping
occur-at
Location
Generic
Locational
Reference
Of
Organisms
Organism
Properties
Referred
Object
Organism-Class
Existence
Point-in-space
Organism-Class
Abundance
Surface-in-space
Organism-Class
Rel. Abundance
Individual
Organism
Contributions
spatial
relationships
Organisms
Organism
Properties
enviro-locationrelationships
Partial-mapping
associated-with
Partial-mapping
Environ.
Environ.
Ontology-1
Ontology-k
Spatial-Volume
solid
annular
Environmental
Parameters
Generic
Environ.
Reference
Of
Organisms
Environmental
Region
Properties
A Conceptual Framework for a Global
Physical Oceanography Schema
collection
metadata
Measurement
(data/function)
resolution
point
spatial collection pattern
surface
time/frequency
parameters
coverage
value
Referred
Object
Point-in-space
scalar
prob.
dense
vector
view-definition
name
properties
Surface-in-space
Spatial-Volume
solid
annular
Phenomena
sparse
volume
What are we integrating and why?
• Data elements
– The central elements
• Distribution of biological and physical variables
– Point distributions
– Field distributions
– Object-bound distributions
• Grouping of biological and physical variables
– Hierarchical groupings
– Hypergraph groupings
– Additional elements
•
•
•
•
Geographic boundaries
Details of observations
Details of habitats and objects therein
…
Point, Field & Object-bound
Distributions
• Distributions
– Point distributions are sparse
• Continuous distributions
– Field distributions are dense
• Often discrete
– Object-bound distributions are sparse
• Around objects
• Associated with other object-related properties
• Modeling field distributions as arrays
– Can be modeled using nested-relational
calculus (algebra) + indices + counting (Libkin
95)
• Special access functions can be useful (Marathe
98)
– Non-uniform field (NUF) distributions: alignedarrays with nulls
• NRC + indices + counting + list operations
• Dimension transformation + interpolation
– Containment vs. overlap semantics
We are yet to show
the relationship
between Map Algebra
and Array Algebra
Integration of Point with NUF
Distribution Data Sources
• Some issues
– Value AT POINT queries
– Neighborhood queries
• Two possible “join” semantics
– “snap” points to array-cells
– “regrid” arrays to point resolution with interpolation
• Planning the joins in a mediator
– Scenario
• A prior sub query selects a set of points P
• Another prior subquery selects a set of array cells by condition C
• Find value of function F for the points at the corresponding cells
– Solutions
• Get P and C-result at the mediator and compute F at the mediator
• Collect the set P at the mediator, call function F on array with condition C for
each element of P
• Send an array indexing function to point source and return indexes, and
perform an indexed selection from array source
– Not implemented yet
The General Integration Problem
• Sources need to export different data models
–
–
–
–
•
•
•
•
Different algebras
Semantics of structures
Semantics of values
Constraints among values and domains
How do we register this information?
What combined algebra does the mediator support?
How do we control addition of newer sources?
How does this work in the GAV or GLAV integration
framework?
• How do we include type and structure transformations,
and domain-specific value-association as part of the
mediation process?
The Current Integration Framework
• Some Decisions
– All data are “relationalized”
– Algebraic operations are implemented on top of relational
sources as functions
– Functions are modeled in the BIRN mediator as relations with
binding patterns
– Popular native formats like OpenDAP are semantically too
heterogeneous and has poor query capabilities
• Value based queries are disallowed
• We need to augment the registration mechanism to (semiautomatically) ingest all metadata
• We will ingest the data and store it relationally in a networkaccessible relational system
– Will consider the problems of adding vector-data and unaligned
array data as a next step
The Demonstration
• The global schema
The marked tables are
augmented with physical
parameters from the World
Ocean Atlas – over two
different grids
Technology Overview
• Microsoft ASP.NET
• Asynchronous Javascript and XML (AJAX)
• Google Maps
Google Maps
• Pros
–
–
–
–
–
Intuitive U.I.
Bathymetry
Simple Javascript API
Speed
Cost
• Cons
– Google dependant
– Data volume limitation
• Alternatives Under Consideration
– ESRI ArcGIS Server
– 3D Client (ArcGlobe, GoogleEarth, WorldWind)
– Some combination
Data Sources
• SeamountsOnline
– Biological Oceanography Information
• World Ocean Atlas
– Physical Oceanography Information
• Biological and Physical Combination
Next Steps
• Interface Refinement
• Apply learning to OBIS
• Questions?
Contact Information
• Amarnath Gupta ([email protected])
• Karen Stocks ([email protected])
• Chris Condit ([email protected])