Oral (click to slides) - Tetherless World Constellation

Download Report

Transcript Oral (click to slides) - Tetherless World Constellation

TWC
Knowledge Evolution in
Distributed Geoscience Datasets and
the Role of Semantic Technologies
Xiaogang (Marshall) Ma
Tetherless World Constellation
Rensselaer Polytechnic Institute
[email protected]
x.marshall.ma
rpi.edu/~max7
MarshallXMa
@MarshallXMa
0000-0002-9110-7369
TWC
William Smith's 1815 geologic
map of England and Wales
with part of Scotland
William Smith
(1769-1839)
(Image source: Geological Society of London)
TWC
1874
1906
1939
1969
2007
2013
Evolution of the
Geological Map of
British Islands / UK
(Image source: British
Geological Survey)
TWC
2004
2008
2005
2009
Definition of
“Quaternary” in
several versions of
the International
Stratigraphic Chart
4
TWC
5
TWC
(Haq, 2007)
Distributed datasets:
Regional geologic
time scales
TWC
(Haq, 2007)
Distributed datasets:
Regional geologic
time scales
TWC
Italy/France near
Cuneo/Colmar
(Asch et al., 2012)
Cambrian
Carboniferous
Felsic and hornblendic gneisses
Granitic rocks
Wyoming/Colorado
(Ma et al., 2014)
Distributed datasets:
Mismatches of geological
units across political
boundaries
(Base map courtesy:
OneGeology-Europe and USGS)
8
TWC
• Data and models, vocabularies, and ontologies
– Have we ever had model-independent datasets?
• Ontology dynamics and a data life cycle
ARCHIVING
*Preservation metadata
*Confidentiality
*Additional processing
CONCEPT
COLLECTION
*Initial concepts
*Questions and
answers
*Grant info
*Questionnaire
*Coded instrument
*CAI metadata
*Paradata
PROCESSING
*Data specs
*Recodes
*Summary
descriptive info
DISTRIBUTION
*Terms of use
*Citation
*Packaging info
DISCOVERY
*Catalog record
*Indexing
*Related
publications
ANALYSIS
*Replication code
*Publications
REPURPOSING
*Post-hoc harmonization
*Data transformations
Diagram reproduced from (Spencer, 2012)
9
TWC
Ontology dynamics
•
•
•
•
•
•
•
•
•
•
Ontology Mapping
Ontology Morphism
Ontology Matching
Ontology Articulation
Ontology Translation
Ontology Evolution
Ontology Debugging
Ontology Versioning
Ontology Integration
Ontology Merging
(Flouris et al., 2008)
10
TWC
Potential challenges
• Reworking of the extant data in a data center
– e.g. caused by ontology/vocabulary versioning
• Semantic mismatch among data sources
– e.g. heterogeneity in ontologies of the same topic
• Differentiated understanding of a same piece of dataset
between data providers and data users
– e.g. a data provider understands Quaternary as 1.806 Ma-present,
and a data user understands it as 2.588 Ma-present
• Error propagation in cross-discipline data re-use
– e.g. heterogeneous datasets may cause misconception in
subsequent works
(Ma et al., 2014)
11
TWC
A few recent works of interest
OneGeology-Europe
• 20 European nations
providing national geologic
maps at scale ~1: 1M
• Harmonized geological
terms and map legends
• Multilingual labels in 18
languages
• Central portal for data
browsing/query among
distributed data sources
http://www.onegeology-europe.org
A contribution to
INSPIRE
12
TWC
Federated query:
Result of geologic
units with age
‘Cenozoic - from 66
million years to today’
13
TWC
Earth Resource Form
Environmental Impact Value
Exploration Activity Type
Exploration Result
UNFC Value
Earth Resource Expression
Earth Resource Shape
Enduse Potential
Mineral Occurrence Type
Mining Activity Type
Processing Activity Type
Mining Waste Type Value
Commodity Code
Mineral Deposit Group
Mineral Deposit Type
Product Value
CGI Geoscience Terminology Workgroup
• Construct a collection of vocabularies for
populating information interchange
documents and enabling interoperability
• Provide labels for concepts, scope to
various communities defined by
language, science domain, or application
domain
http://cgi-iugs.org/tech_collaboration/
geoscience_terminology_working_group.html
Recently finished CGI vocabularies
14
TWC
USGS Online Geologic Maps
• Standardized vocabulary
with detailed annotation
• Forward and backward
queries between spatial
data and attribute data
• Links to further data
sources, e.g. aeromagnetic
survey, mineral resources
data, soils, geochemical
samples, etc.
http://mrdata.usgs.gov/geology/
state/map.html
15
TWC
Records of a point in the
San Francisco area
16
TWC
Recommendations
• Communities of practice on ontology and vocabulary
– Bottom-up, self-organized, and loose top-down control
• Formalize the ‘Concept’ step in a data life cycle
– Top-down, and adopt outputs from the bottom-up approach
• Make it a virtuous circle among the bottom-up and topdown approaches
Thanks for listening.
[email protected]
@MarshallXMa
17