The VIRTUAL SOLAR-TERRESTRIAL OBSERVATORY

Download Report

Transcript The VIRTUAL SOLAR-TERRESTRIAL OBSERVATORY

Shifting the Burden from the User
to the Data Provider
Peter Fox
High Altitude Observatory,
NCAR (***)
With thanks to eGY and various NSF, DoE and
NASA projects
1
Outline
• Background, definitions
• Informatics -> e-Science
• Data has lots of uses
– Virtual Observatories: use cases
– Data Framework: Examples
– Data ingest, integration, mining and …
• Discussion
2
Fox HDF: Semantic Data Burden Shift Oct 15, 2008
Background
Scientists should be able to access a global, distributed
knowledge base of scientific data that:
• appears to be integrated
• appears to be locally available
But… data is obtained by multiple instruments, using
various protocols, in differing vocabularies, using
(sometimes unstated) assumptions, with
inconsistent (or non-existent) meta-data. It may be
inconsistent, incomplete, evolving, and distributed
And… there exist(ed) significant levels of semantic
heterogeneity, large-scale data, complex data
types, legacy systems, inflexible and unsustainable
implementation technology…
3
Fox HDF: Semantic Data Burden Shift Oct 15, 2008
Information
Information
But
data has
products have
Lots of Audiences
More Strategic
Less Strategic
SCIENTISTS TOO
From “Why EPO?”, a NASA internal
report on science education, 2005
4
Fox HDF: Semantic Data Burden Shift Oct 15, 2008
The Information Era: Interoperability
Modern information and communications
technologies are creating an
“interoperable” information era in which
ready access to data and information can
be truly universal. Open access to data
and services enables us to meet the new
challenges of understand the Earth and
its space environment as a complex
system:
• managing and accessing large data sets
• higher space/time resolution capabilities
• rapid response requirements
• data assimilation into models
• crossing disciplinary boundaries.
5
Fox HDF: Semantic Data Burden Shift Oct 15, 2008
Shifting the Burden from the User
to the Provider
6
Fox HDF: Semantic Data Burden Shift Oct 15, 2008
Modern capabilities
7
Fox HDF: Semantic Data Burden Shift Oct 15, 2008
Mind the
Gap!
As a result of
finding out who
is doing
what, the
• Informatics
- information
science
includes
sharing experience/ expertise, and substantial
science of (data and) information, the practice
coordination:
of information processing, and the engineering
• ofThere
is/ was still
a gap between
science
and the
information
systems.
Informatics
studies
the
underlying infrastructure and technology that is
structure, behavior, and interactions of natural
available
and artificial systems that store, process and
• Cyberinfrastructure is the new
communicate (data and) information. It also
research environment(s) that support
develops its own conceptual and theoretical
advanced data acquisition, data
foundations. Since computers, individuals and
storage, data management, data
organizations all process information,
integration, data mining, data
informatics has computational, cognitive and
visualization and other computing
social aspects, including study of the social
and information processing services
impact of information technologies. Wikipedia.
over the Internet.
Fox HDF: Semantic Data Burden Shift Oct 15, 2008
8
Progression after progression
Informatics
IT Cyber
Infrastru
cture
Cyber
Informatics
Core
Informatics
Science
Informatics,
aka
Xinformatics
Science,
SBAs
9
Fox HDF: Semantic Data Burden Shift Oct 15, 2008
Virtual Observatories
• Conceptual examples:
• In-situ: Virtual measurements
– Related measurements
• Remote sensing: Virtual, integrative
measurements
– Data integration
• Managing virtual data products/ sets
10
Virtual Observatories
Make data and tools quickly and easily accessible
to a wide audience.
Operationally, virtual observatories need to find the
right balance of data/model holdings, portals and
client software that researchers can use without
effort or interference as if all the materials were
available on his/her local computer using the
user’s preferred language: i.e. appear to be
local and integrated
Likely to provide controlled vocabularies that may
be used for interoperation in appropriate
domains along with database interfaces for
access and storage and “smart” tools for
evolution and maintenance.
11
Early days of discipline specific VOs
?
VO2
VO3
VO1
DB1
DB2
DB3
…………
DBn
12
The Astronomy approach; datatypes as a service
Limited
interoperability
VO App
1
Open
VOTable
VO App2
VO App3
Geospatial Consortium:
Simple
Image
Access
Protocol
Web {Feature, Coverage, Mapping}
Simple
Service
Spectrum
Sensor Web Enablement:
VO layer
Sensor {Observation, Planning,
Analysis}Lightweight
Service
semantics
Access
Protocol
Simple
Time Access
Protocol
Limited meaning, hard
coded
use
the
same
approach
DBn
DB
DB
2
DB1
3
…………
Limited extensibility
Under review
13
Added value
Education, clearinghouses,
disciplines, et c.
other
services,
Semantic mediation layer - mid-upper-level
VO
Portal
Semantic
interoperability
Added value
Added value
Semantic query,
hypothesis and
inference
Web
Serv.
VO
API
Mediation Layer
• Ontology - capturing concepts of Parameters,
Instruments, Date/Time, Data Product (and
Semantic mediation layer - VSTO - low level
associated classes, properties) and Service
Classes
• Maps queries to underlying data Metadata, schema,
data
• Generates access requests for metadata,
data
• Allows queries, reasoning, analysis, new
Added value
DBn
DB2
DB3 explanation,
hypothesis
generation,
testing,
et
c.
…………
DB
1
Query,
access
and use
of data
14
Content: Coupling Energetics and Dynamics
of Atmospheric Regions WEB
Community data
archive for
observations and
models of
Earth's upper
atmosphere and
geophysical
indices and
parameters
needed to
interpret them.
Includes
browsing
capabilities by
periods, > 310
instruments,
models, > 820
15
parameters…
Content: Mauna Loa Solar
real-time
Observatory Near
data products
from Hawaii from
a variety of solar
instruments.
Source for space
weather, solar
variability, and
basic solar
physics
Other content used
too - Center for
Integrated Space
Weather Modeling
16
Semantic Web Methodology and
Technology Development Process
•
•
Establish and improve a well-defined methodology vision for
Semantic Technology based application development
Leverage controlled vocabularies, et c.
Rapid
Open World:
Evolve, Iterate, Prototype
Redesign,
Redeploy
Leverage
Technology
Infrastructure
Adopt
Science/Expert
Technology
Approach Review & Iteration
Use Tools
Analysis
Use Case
Small Team,
mixed skills
Develop
model/
ontology
17
Science and technical use cases
Find data which represents the state of the neutral
atmosphere anywhere above 100km and toward the
arctic circle (above 45N) at any time of high
geomagnetic activity.
– Extract information from the use-case - encode knowledge
– Translate this into a complete query for data - inference and
integration of data from instruments, indices and models
Provide semantically-enabled, smart data query services
via a SOAP web for the Virtual IonosphereThermosphere-Mesosphere Observatory that retrieve
data, filtered by constraints on Instrument, Date-Time,
and Parameter in any order and with constraints
included in any combination.
18
VSTO - semantics and ontologies in an operational
environment: vsto.hao.ucar.edu, www.vsto.org
Web Service
19
Fox RPI: Semantic Data Frameworks May 14, 2008
Semantic filtering by
domain or instrument
hierarchy
Partial exposure of
Instrument
class
hierarchy - users
seem to LIKE THIS
20
21
Inferred plot type
and return formats
for data products
22
Fox RPI: Semantic Data Frameworks May 14, 2008
Inferred plot type
and return required
axes data
23
Fox RPI: Semantic Data Frameworks May 14, 2008
Semantic Web Benefits
• Unified/ abstracted query workflow: Parameters, Instruments, Date-Time
• Decreased input requirements for query: in one case reducing the
number of selections from eight to three
• Generates only syntactically correct queries: which was not always
insurable in previous implementations without semantics
• Semantic query support: by using background ontologies and a
reasoner, our application has the opportunity to only expose coherent
query (portal and services)
• Semantic integration: in the past users had to remember (and maintain
codes) to account for numerous different ways to combine and plot the
data whereas now semantic mediation provides the level of sensible data
integration required, now exposed as smart web services
– understanding of coordinate systems, relationships, data synthesis,
transformations, et c.
– returns independent variables and related parameters
• A broader range of potential users (PhD scientists, students, professional
research associates and those from outside the fields)
24
What is a Non-Specialist Use Case?
Teacher accesses internet goes
to An Educational Virtual
Observatory and enters a
search for “Aurora”.
Someone
should be able
to query a
virtual
observatory
without having
specialist
knowledge
25
What should the User Receive?
Teacher receives four groupings of search results:
1) Educational materials:
http://www.meted.ucar.edu/topics_spacewx.php and
http://www.meted.ucar.edu/hao/aurora/
2) Research, data and tools: via VSTO, VSPO and
VITMO, knows to search for brightness, or green/red
line emission
3) Did you know?: Aurora is a phenomena of the
upper terrestrial atmosphere (ionosphere) also
known as Northern Lights
4) Did you mean?: Aurora Borealis or Aurora
Australis, et c.
26
Semantic Information Integration:
Concept map for educational use of
science data in a lesson plan
27
Fox RPI: Semantic Data Frameworks May 14, 2008
28
Fox RPI: Semantic Data Frameworks May 14, 2008
Issues for Virtual Observatories
• Scaling to large numbers of data providers and
redefining the role(s)/ relations with them
• Crossing discipline boundaries
• Security, access to resources, policies
• Branding and attribution (where did this data come
from and who gets the credit, is it the correct version,
is this an authoritative source?)
• Provenance/derivation (propagating key information
as it passes through a variety of services, copies of
processing algorithms, …)
• Data quality, preservation, stewardship
29
Problem definition
• Data is coming in faster, in greater volumes and outstripping our
ability to perform adequate quality control
• Data is being used in new ways and we frequently do not have
sufficient information on what happened to the data along the
processing stages to determine if it is suitable for a use we did not
envision
• We often fail to capture, represent and propagate manually
generated information that need to go with the data flows
• Each time we develop a new instrument, we develop a new data
ingest procedure and collect different metadata and organize it
differently. It is then hard to use with previous projects
30
• The task of event determination and feature classification is onerous
and we don't do it until after we get the data
Use cases
•
•
•
•
•
•
•
•
•
•
Determine which flat field calibration was applied to the image taken on
January, 26, 2005 around 2100UT by the ACOS Mark IV polarimeter.
Which flat-field algorithm was applied to the set of images taken during the
period November 1, 2004 to February 28, 2005?
How many different data product types can be generated from the ACOS
CHIP instrument?
What images comprised the flat field calibration image used on January 26,
2007 for all ACOS CHIP images?
What processing steps were completed to obtain the ACOS PICS limb
image of the day for January 26, 2005?
Who (person or program) added the comments to the science data file for
the best vignetted, rectangular polarization brightness image from January,
26, 2005 1849:09UT taken by the ACOS Mark IV polarimeter?
What was the cloud cover and atmospheric seeing conditions during the
local morning of January 26, 2005 at MLSO?
Find all good images on March 21, 2008.
Why are the quick look images from March 21, 2008, 1900UT missing?
Why does this image look bad?
31
Provenance
• Origin or source from which something
comes, intention for use, who/what
generated for, manner of manufacture,
history of subsequent owners, sense of
place and time of manufacture, production
or discovery, documented in detail
sufficient to allow reproducibility
32
33
34
35
Visual browse
36
37
38
Discussion (1)
• Taken together, an emerging set of collected
experience manifests an emerging informatics
core capability that is starting to take data
intensive science into a new realm of realizability
and potentially, sustainability
–
–
–
–
Use cases (i.e. real users)
X-informatics
Core Informatics
Cyber Informatics
• There are implications for data models
39
Progression after progression
Informatics
IT Cyber
Infrastru
cture
Cyber
Informatics
Core
Informatics
Science
Informatics
Science,
SBAs
Example:
• CI = OPeNDAP server running over HTTP/HTTPS
• Cyberinformatics = Data (product) and service ontologies, triple store
• Core informatics = Reasoning engine (Pellet), OWL
• Science (X) informatics = Use cases, science domain terms, concepts in
an ontology
40
Discussion (2)
• Data and information science is becoming
the ‘fourth’ column (along with theory,
experiment and computation)
• Semantics (of the data) are a very key
ingredient -> may imply richer data models
41
Summary
• Informatics is playing a key role in filling the gap
between science (and the spectrum of non-expert)
use and generation and the underlying
cyberinfrastructure, i.e. in shifting the burden
– This is evident due to the emergence of Xinformatics
(world-wide)
• Our experience is implementing informatics as
semantics in Virtual Observatories (as a working
paradigm) and Grid environments
– VSTO is only one example of success
– Data mining, data integration, smart search, provenance
are close behind
• Informatics is a profession and a community activity
and requires efforts in all 3 sub-areas (science, core,
cyber) and must be synergistic
42
Fox RPI: Semantic Data Frameworks May 14, 2008
More Information
• Virtual Solar Terrestrial Observatory (VSTO):
http://vsto.hao.ucar.edu, http://www.vsto.org
• Semantically-Enalbed Science Data Integration (SESDI):
http://sesdi.hao.ucar.edu
• Semantic Provenance Capture in Data Ingest Systems
(SPCDIS): http://spcdis.hao.ucar.edu
• Semantic Knowledge Integration Framework (SKIF/SAM):
http://skif.hao.ucar.edu
• Semantic Web for Earth and Environmental Terminology
(SWEET): http://sweet.jpl.nasa.gov
• Conferences: AGU 2008, EGU 2009, ISWC 2008, CIKM
2008, …
• Peter Fox [email protected]
43