E2E presentation

Download Report

Transcript E2E presentation

End-to-End Data Services
A Few Personal Thoughts
Unidata Staff Meeting
2 September 2009
Vision
“Unidata’s vision calls for providing comprehensive, wellintegrated and end-to-end data services for the
geosciences. These include an array of functions for
collecting, finding, and accessing data; data management
tools for generating, cataloging, and exchanging metadata;
and submitting or publishing, sharing, analyzing,
visualizing, and integrating data.”
What does this vision statement mean to each of us?
Background
• Providing real-time weather data (and related
tools) was the primary reason why Unidata was
created and it has been the bread and butter of
Unidata’s mission for more than two decades.
• But as our work has evolved, along with our
community, it has become clear that just
provision of real-time data or facilitating data
access is not enough.
• Hence the vision statement.
• Let’s think about how some of the capabilities
available on Amazon/Ebay/You Tube/Flickr can
be facilitated for geosciences “data”.
Objectives
1. Create “integrated” data services across all stages of
data life cycle, beginning with observations and ending
with data curation/archiving.
• A) Observations/Sensors  Ingest  Data collection
systems  Data providers  Disseminate  Users (both
end users and data archival systems)
• B) From beginning till end of a workflow (LEAD example):
• Observations  Ingest  Analysis/Assimilation 
Prediction  Output  Dissemination  Users (both end
users and data repositories)
Imperatives
• Integrated services does not imply a monolithic
system, but a set of modular services that are
configurable, flexible, extensible, and scalable.
• Need think what [essential] services are needed
by our users and the use cases.
– Users include students, faculty, scientists, data
providers, outreach providers, field project personnel
– Use cases include class room & lab use, research
studies, weather websites, field projects, projects like
LEAD, portals, and data centers
– Both programmatic and interactive invocation
• We may not work on all of the functionalities
ourselves but we need to facilitate as many of the
as possible.
Strategies and Tactics
• Integration achieved via both loosely and
tightly coupled components and services
• Incrementalism is the only practical option
for a program like Unidata where many
technologies already exist and resources
are scarce.
• Leverage as much as possible both our
own technologies as well as what is
available from the outside.
What do I mean by Data?
• Scientific data (binary, ASCII, netCDF,
HDF, XML, GRIB, …and XML)
• Metadata (ASCII, XML, etc.)
• Data in data bases (e.g., SQL)
• GIS data (Shapefiles, KML, etc.)
• Derived products from scientific data
• Ancillary data objects (images, videos,
documents – pdf, Word, html, ppt, etc.
Integration Capabilities
• Different data types (feeds, obs., platforms,
model output, and GIS information)
• Different data formats
• Data on different projections
• Distributed data holdings
• Data operation (e.g., GDS, netCDF operators)
• Metadata addition
• Integration of scientific data with metadata
content, documents, and other information
Not develop ourselves but
perhaps provide hooks to
•
•
•
•
•
•
Collaboration tools
Wikis
Forums
Blogs
Chat and IM/SMS
Social network apps (Facebook, Twitter,
etc.)
• RSS, email and other notifications
A list of possible data services
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Data collection service (routine ingest via LDM, FTP, etc.)
Data submission service
Metadata service for submitting, editing, and exchanging metadata
Cataloging service
Data discovery service
Monitoring and notification service for new data, metadata, and products
Data access services
Data delivery/transport services, including copying/moving data to other
servers and personal space and streaming data on demand
Security and authentication services
Subsetting service, including capability for progressive disclosure
Aggregation services for data and metadata
Services for CF conformance checking
Decoding and data translation services
Unit conversion services
Visualization and product generation services
Data fusion and data manipulation services (e.g., netCDF operator services)
GIS services
Output handling services
IMO, Beyond Unidata’s Scope
•
•
•
•
•
•
Data mining
Ontologies
Brokering and workflow orchestration
Federation and mediation
Provenance
Curation and stewardship
Final Thoughts
• It is important that we develop consensus on
what we mean by integrated, end-to-end data
services. Therefore, we need to hear your
thoughts.
• Once we have an idea of what it is that we want
to build, we need to agree on how to go about
building it.
• Again, I believe in an incremental approach, but
there may be other ideas. With RAMADDA,
THREDDS, netCDF, and LDM, many of the
pieces already exist upon which to build E2E
data services.
• We need to identify the next steps and a
concrete project in this potentially long journey.
Develop a pilot effort? A prototype?