5_Semantics+Metadata_Fox20150223

Download Report

Transcript 5_Semantics+Metadata_Fox20150223

In Search of What Some of It Means
RDA Semantics and Metadata Workshop
Feb 23, 2015
Peter Fox (RPI) [email protected]
Tetherless World Constellation
Metadata and documentation
Not more code!
Spectral synthesis
components and flow
Getting the metadata?
What I wanted ~ 1994-6
Scientists should be able to access a global, distributed knowledge
base of scientific data that:
•
appears to be integrated
•
appears to be locally available
But… data is obtained by multiple means (instruments, models,
analysis) using various protocols, in differing vocabularies, using
(sometimes unstated) assumptions, with inconsistent (or nonexistent) metadata. It may be inconsistent, incomplete, evolving,
and distributed. And, it is almost always created in a manner to
facilitate its generation not its use.
And… there exist(ed) significant levels of semantic heterogeneity,
large-scale data, complex data types, legacy systems, inflexible
and unsustainable implementation technology…
6
What I was doing…
pro read_spec, spectra_name,
description, auxiliary_info, model_size,
mu_size, wave_size, model, smodel, mu,
wave0, wavelength, intensity,
brightness_temperature, index1, index2,
percent
ncopts = 0;
description_start=0
description_edges=80
i=0
j=0
k=0
; Construct the DB filename
ncid=ncdf_open(string(getenv("SPECTRA
")))
inq_struct=ncdf_inquire(ncid)
; /* get dimension info */
tmp_id = ncdf_dimid(ncid,
"comment_dim")
ncdf_diminq,ncid, tmp_id, dummy,
comment_dim
tmp_id=ncdf_dimid(ncid, "mu_dim")
ncdf_diminq,ncid, tmp_id, dummy,
mu_dim
tmp_id=ncdf_dimid(ncid, "wave_dim")
ncdf_diminq,ncid, tmp_id, dummy,
wave_dim
tmp_id=ncdf_dimid(ncid, "model_dim")
ncdf_diminq,ncid, tmp_id, dummy,
model_dim
tmp_id=ncdf_dimid(ncid, "smodel_dim")
ncdf_diminq,ncid, tmp_id, dummy,
smodel_dim
tmp_id=ncdf_dimid(ncid, "item_dim")
What I was doing… etc.
tmp_id = ncdf_varid (ncid, "description")
ncdf_varget,ncid, tmp_id,
OFFSET=0,COUNT=comment_dim,
description
; Id's for variables
tmp_id=ncdf_varid(ncid, "spectra_name")
ncdf_varget,ncid, tmp_id,
OFFSET=0,COUNT=comment_dim,
spectra_name
tmp_id=ncdf_varid(ncid, "auxiliary_info")
ncdf_varget,ncid, tmp_id,
OFFSET=0,COUNT=comment_dim,
auxiliary_info
tmp_id=ncdf_varid(ncid, "model_size")
ncdf_varget,ncid, tmp_id,
OFFSET=0,COUNT=item_dim,
model_size
start=intarr(1)
edges=intarr(1)
start(0)=0
edges(0)=model_size
tmp_id=ncdf_varid(ncid, "mu_size")
ncdf_varget,ncid, tmp_id, mu_size,
OFFSET=start, COUNT=edges
tmp_id=ncdf_varid(ncid, "model")
ncdf_varget,ncid, tmp_id, model,
OFFSET=start, COUNT=edges
start=intarr(2)
edges=intarr(2)
start(0)=0
edges(0)=smodel_dim
start(1)=0
edges(1)=model_size
tmp_id=ncdf_varid(ncid, "smodel")
ncdf_varget,ncid, tmp_id, smodel,
OFFSET=start, COUNT=edges
What does It all Mean?
Some version of this…
~Metadata?
Experience
Data
Creation
Gathering
Information
Presentation
Organization
Knowledge
Integration
Conversation
Context
10
It and Meaning
• It = things that matter
– Context
• Meaning = duh -> semantics
• Relations!! Real ones!
• But it was more than that, though that
often comes later…
– Syntax (structure/form)
– Semantics (meaning)
– Pragmatics (use)
Metadata-InformationKnowledge Ecosystem
Experience
Metadata
Creation
Gathering
Information
Formalization
Organization
Knowledge
Integration
Shared
Conceptualization
Context
12
Provenance
• Origin or source from which something
comes, intention for use, who/what
generated for, manner of manufacture,
history of subsequent owners, sense of
place and time of manufacture, production
or discovery, documented in detail
sufficient to allow reproducibility
• Provenance: metadata in a given context!
Swallow that.
• Knowledge provenance; meaning and
relations in multiple contexts!
Perfect is the enemy of the
good… (thanks Voltaire)
Origins …
• In 2000-2001 the need for capturing and
preserving knowledge in science data became
very clear but the barriers were high
• In 2004 we started a virtual observatory project
based on semantic technologies
• Use case driven – in solar and solar-terrestrial
physics with an emphasis on instrument-based
measurements and real data pipelines; we needed
implementations
• We knew we also needed integration and
provenance (but that came later)
• We aimed to push semantics into our systems to
build new ‘prototypes’ but we ‘failed’ ;-)
Tetherless World Constellation
15
In 2004
• 2004 – OWL was a W3 recommendation!!
• Protégé 2.x and the Protégé-Java-OWL
API
• SWOOP was a viable editor
• Jena and the Jena API were in good
shape
• Pellet worked
• SPARQL was still a twinkle in the RDF
working group’s eye
• Semantics were still the realm of computer
scientists
Tetherless World Constellation
16
Design and Development
• We made a conscious decision only to develop
ontologies that were required to answer
specific use cases and migrate metadata
– Both Classes AND Properties (uh-oh…)
• We made a conscious effort to use whatever
ontologies were available (cf. trends in
metadata… nuff said)
• We were pretty sure that rules would be
needed (complex logic or late semantic
binding)
• We ignored query
(see implementation)
Tetherless World Constellation
17
Use Case example
• Plot the neutral temperature from the Millstone-Hill
Fabry Perot, operating in the non-vertical mode during
January 2000 as a time series.
• Plot the neutral temperature from the Millstone-Hill
Fabry Perot, operating in the non-vertical mode
during January 2000 as a time series.
– Meanings and relations
• Objects=Things!
–
–
–
–
–
–
–
Neutral temperature is a (temperature is a) parameter
Millstone Hill is a (ground-based observatory is a) observatory
Fabry-Perot is a interferometer is a optical instrument is a instrument
Non-vertical mode is a instrument operating mode
January 2000 is a date-time range
Time is a independent variable/ coordinate
Time series is a data plot is a data product
• Metadata just appeared everywhere…
18
Semantics - Modern informatics enables
a new scale-free** framework approach
• Use cases
• Stakeholders
• Distributed
authority
• Access control
• Ontologies
• Maintaining Identity
Semantics between
2004 and 2009
• Ontologies were needed for data integration
and provenance and mediation for data mining
• Protégé 3.x and then 4.0 came out
• SWOOP development was interrupted
• Cmap added OWL predicate support*
• SPARQL became a recommendation
• Triple stores exploded in use and capability
• Linked Open Data started to take off
• Pellet 2.0 came out
• I used the “M” word
less
frequently!
Tetherless World Constellation
22
Working with knowledge
Expressivity
Implementability
Maintainability/ Extensibility
Working with semantics
Query
Inference
Rule execution
Semantics between
2009 and now
• Semantic data framework (SeSF)
• Substantial knowledge provenance work
• Data quality, uncertainty and bias
representations and applications (oh,
these are in production at NASA)
• Multi-sensor data synergy advisor
• Applications:
– Sea Ice, Carbon Observatory, Integrated
Ecosystem Assessments, globalchange.gov,
ocean.data.gov, energy.data.gov ….
Tetherless World Constellation
25
Respect and Mediation … how
Discovering new data
NCA links to GCIS entities
http://data.globalchange.gov
28
Information model
Ontology
29
Core and Framework Semantics Multi-tiered interoperability
People
used by
Integrated
Applications
Agency Policy
Makers
System Scientists
Politicians
Decision-level semantic mediation: high-level vocabularies that facilitate policy-level
decision-making
Inter-disciplinary
Data Visualization
Apps
Semantic
interoperability
Integration
Frameworks &
Methodologies
Eco & other system
Assessment Apps
Application-level semantic mediation: mid-level vocabularies that facilitate the interoperability of system models and data products
Sof t ware,
Tools&Apps
Disciplinespecific
model(s)
Semantic
interoperability
Dataproduct
Generator
Semantic query,
hypothsis and
inference
Information/
Science Apps
Query,
access and
use of data
Data-level Semantic mediation: lower-level vocabularies applied to each data source
for a specific science domain of interest
Data
Repositories
Federal
Repository
Commercial
Database
Researcher
Private
Database
Other Data
Sources
Metadata,
schema,
data
... ... ...
Closing thoughts
• Go ahead, create all the metadata you
want, we’ll “materialize” some of it into
triples based on semantics for use!
• Go ahead, create all the schema and
encodings you want but remember –
semantics now lives in an open-world
(some of it). You are not the only source of
metadata. Not all formal. Link over map.
• Semantics make metadata useful but we
do not need all
of
your
metadata
Tetherless World Constellation
31
Contact
• [email protected]
• http://tw.rpi.edu
• @taswegian