Informatics - Marine Metadata Interoperability

Download Report

Transcript Informatics - Marine Metadata Interoperability

Informatics: Filling the gap between
science and ICT in a sustainable
way
Peter Fox
Tetherless World Constellation
Rensselaer Polytechnic Institute
Formerly: High Altitude Observatory, NCAR
1
Background
Scientists should be able to access a global, distributed
knowledge base of scientific data that:
• appears to be integrated
• appears to be locally available
But… data is obtained by multiple means (instruments
and models), using various protocols, in differing
vocabularies, using (sometimes unstated)
assumptions, with inconsistent (or non-existent)
meta-data. It may be inconsistent, incomplete,
evolving, and distributed
And… there exist(ed) significant levels of semantic
heterogeneity, large-scale data, complex data
types, legacy systems, inflexible and unsustainable
2
implementation technology…
Virtual Observatories
Make data and tools quickly and easily accessible
to a wide audience.
Operationally, virtual observatories need to find the
right balance of data/model holdings, portals and
client software that researchers can use without
effort or interference as if all the materials were
available on his/her local computer using the
user’s preferred language: i.e. appear to be
local and integrated.
3
Science and technical use cases
Find data which represents the state of the neutral
atmosphere anywhere above 100km and toward the
arctic circle (above 45N) at any time of high
geomagnetic activity.
– Extract information from the use-case - encode knowledge
– Translate this into a complete query for data - inference and
integration of data from instruments, indices and models
Provide semantically-enabled, smart data query services
via a SOAP web for the Virtual IonosphereThermosphere-Mesosphere Observatory that retrieve
data, filtered by constraints on Instrument, Date-Time,
and Parameter in any order and with constraints
included in any combination.
4
Fox Informatics and Semantics, © 2008
Information
Information
But
data has
products have
Lots of Audiences
More Strategic
Less Strategic
SCIENTISTS TOO
5
From “Why EPO?”, a NASA internal
report on science education, 2005
What is a Non-Specialist Use Case?
Teacher accesses internet goes
to An Educational Virtual
Observatory and enters a
search for “Aurora”.
6
Someone
should be able
to query a
virtual
observatory
without having
specialist
knowledge
What should the User Receive?
Teacher receives four groupings of search results:
1) Educational materials:
http://www.meted.ucar.edu/topics_spacewx.php and
http://www.meted.ucar.edu/hao/aurora/
2) Research, data and tools: via a range of science
VOs, knows to search for brightness, or green/red
line emission
3) Did you know?: Aurora is a phenomena of the
upper terrestrial atmosphere (ionosphere) also
known as Northern Lights
4) Did you mean?: Aurora Borealis or Aurora
Australis, etc.
7
Shifting the Burden from the User
to the Provider
8
Fox Informatics and Semantics, © 2008
Response
(so far)
As a result of
finding out who
is doing
what, the
• Informatics
- information
science
includes
sharing experience/ expertise, and substantial
science of (data and) information, the practice
coordination:
of information processing, and the engineering
• ofThere
is/ was still
a gap between
science
and the
information
systems.
Informatics
studies
the
underlying infrastructure and technology that is
structure, behavior, and interactions of natural
available
and artificial systems that store, process and
• Cyberinfrastructure is the new
communicate (data and) information. It also
research environment(s) that support
develops its own conceptual and theoretical
advanced data acquisition, data
foundations. Since computers, individuals and
storage, data management, data
organizations all process information,
integration, data mining, data
informatics has computational, cognitive and
visualization and other computing
social aspects, including study of the social
and information processing services
impact of information technologies. Wikipedia.
over the Internet.
9
Progression after progression
Informatics
IT Cyber
Infrastru
cture
(CI)
Cyber
Informatics
Core
Informatics
Science
Informatics
Science,
SBAs
• CI = Discipline neutral, e.g. OPeNDAP server running over HTTP/HTTPS
• Cyberinformatics = Data (product) and service ontologies, triple store, map to
schema
• Core informatics = Reasoning engine (Pellet), OWL (computer science)
• Science (X) informatics = Use cases, science domain terms, concepts in an
ontology or controlled vocabulary
10
A moment of history
• In the late 1950’s (actually around 1957-1958)
the modern informatics term was coined
• Existed for a while but then split into library
science and computer science and developed
their own fields, became disconnected
• Now coming back to be relevant to science
• Informatics IS NOT just having a scientist
work with an “IT/ICT” person (NOT, NOT,
NOT)
11
Cyberinformatics
• The first match between the domain and
the underlying domain-neutral einfrastructure/ cyberinfrastructure
• When the underlying infrastructure (when
it becomes real infrastructure and not just
software) changes this is one part that
needs to change
• Less brittle since upper layers remain
intact
12
Core informatics
• The realm of computer science (for the
most part, also librarians)
• Strongly influenced by science (and
medical applications) above and below
this layer
• If we can leverage this, we do not need to
do the specialist work, however …
• We must work with these scientists,
sustainably
13
Science Informatics
• Where science meets the underlying
technical capabilities and methods
• Must be expressible in science terms;
increasingly use cases
• The people in this area are multi-lingual
and both interdisciplinary and multidisciplinary, few are trained or literate here
• Team, or really a community of practice
(CoP)
14
Assume
• Mark and Charlie and others have
addressed aspects of professional and
credit for data aspects/ management
• Dave, Hans and others have ‘data’
journals and ability to cite data
• Projects and communities adopt these
• Probably others but this is enough for now
15
Sustaining
• Visibility: capitalize and maintain this
– ICSU/SCID report
– U.S. NRC Decadal survey
– IUGG/UCDI, IUGS/CGI, geounions
– EGU/ESSI
• Need a CoP close to their science and
able to share experience, expertise
• Balance research and production ***
• Crosses disciplines by definition **
16
More sustaining
• Institutional structure that is sustained is the
academic one
– Peers, journals, curricula, incentives, rewards
– Can then feed into institutions, agencies, projects
– Need instructors with experience
• MUST not become isolated as its own field, to
some extent this is happening now within AGU
• MUST re-engage library and computer science
• MUST stay close to science (X-informatics)
• MUST maintain interfaces across layers of
informatics
17
Harmonizing the Hierarchies
• Working level (L, self-G), e.g. many
• National/ regional societies (L, what is role for G?),
e.g. AGU, EGU, more needed
• ‘Mission’/ ‘Production’ agencies (G, what is role for
L?), e.g. BGS, USGS, ESA, NASA, NOAA, JAXA, BGR, USGS …
• Programmes - regional and global (some L, G?),
e.g. GEOSS, GMES, GCOS, OneGeology,
• International association/ union (some L and some
G but not uniform), e.g. IAGA, IAU, IUGS, IUG
• International alliances, e.g. IVOA, CEOS, SPASE
• Global, inter-union (G, need L), e.g. ICSU, GEO, CODATA,
WGISS
Leadership - L : Governance - G
Discussion
• Taken together, an emerging set of collected
experience manifests an emerging informatics
core capability that is starting to take data
intensive science into a new realm of realizability
and potentially, sustainability
– X-informatics, Core Informatics, Cyber Informatics
• Gaps – must bridge these very soon (I*Ys work)
– Asia: WPGM, AOGS, Japan, China
– Russia, Australia, Africa, South America
– In hierarchies: from group to world
• Pursue the academic model
19