- Tetherless World Constellation

Download Report

Transcript - Tetherless World Constellation

Toward verifiable science assessment
reporting: The Global Change Information
System (GCIS)
MPI-M Seminar – September 17, 2014
Peter Fox + (RPI) - GCIS Semantics Lead, [email protected],
@taswegian, http://tw.rpi.edu
+ lots of others (esp. R. Wolfe, C. Tilmes, X. Ma)
www.globalchange.gov
Overview
• U.S. National Climate Assessment
• About the GCIS
– Who are we?
– What did we do and why?
– Underlying methods and technologies
– What are our plans for the future?
• Sneak peak of more verifiable science…
2
U.S. Global Change Research Program
The Program:
• Coordinates Federal research to
better understand and prepare
the nation for global change
• Prioritizes and supports cutting
edge scientific work in global
change
• Assesses the state of scientific
knowledge and the Nation’s
readiness to respond to global
change
• Communicates research findings
to inform, educate, and engage
the global community
3
Global Change Research Act (1990), Section 106
…not less frequently than every 4 years, the
Council… shall prepare… an assessment which–
• integrates, evaluates, and interprets the findings
of the Program and discusses the scientific
uncertainties associated with such findings;
• analyzes the effects of global change on the
natural environment, agriculture, energy
production and use, land and water resources,
transportation, human health and welfare,
human social systems, and biological diversity;
and
• analyzes current trends in global change, both
human- induced and natural, and projects major
trends for the subsequent 25 to 100 years.
4
National Climate Assessments
Climate Change Impacts on the
United States (2000)
Global Climate Change Impacts
in the United States (2009)
Climate Change Impacts in
the United States (2014)
See: http://globalchange.gov/
5
NCA 2009
http://nca2009.globalchange.gov
6
Outline for Third NCA Report
•
•
•
•
•
•
•
Letter to the American People
Executive Summary: Report Findings
Introduction
Our Changing Climate
Sectors & Sectoral Cross-cuts
Regions & Biogeographical Cross-cuts
Responses
– Decision support
– Mitigation
– Adaptation
• Agenda for Climate Change Science
• The NCA Long-term Process
• Appendices
– Commonly Asked Questions
– Expanded Climate Science Info
7
Regions & Biogeographical Cross-Cuts
Oceans and
Marine
Resources
Coasts,
Development,
and Ecosystems
Sectors
•
•
•
•
•
•
Water Resources
Energy Supply and Use
Transportation
Agriculture
Forestry
Ecosystems and
Biodiversity
• Human Health
Sectoral Cross-Cuts
• Water, Energy, and Land Use
• Urban Systems, Infrastructure,
and Vulnerability
• Impacts of Climate Change on
Tribal, Indigenous, and Native
Lands and Resources
• Land Use and Land Cover
Change
• Rural Communities
• Biogeochemical Cycles
globalchange.gov - v2.0
11
National Climate Assessment 2014
12
Global Change Information System
(GCIS)
Long Term Vision:
The Global Change Information System (GCIS) is intended to
eventually become a unified web based source of authoritative,
accessible, usable and timely information about climate and global
change for use by scientists, decision makers, and the public.
13
Global Change Information System
(GCIS)
Long Term Vision:
The Global Change Information System (GCIS) is intended to
eventually become a unified web based source of authoritative,
accessible, usable and timely information about climate and global
change for use by scientists, decision makers, and the public.
Initial Prototype:
Coincident with the release of the Third National Climate
Assessment (NCA) or May 6 2014, the GCIS supports the
distribution, presentation and documentation needs of the NCA,
integrating that content into the USGCRP web site and
demonstrating the potential for GCIS to support the long term
vision.
14
Information Quality Act
•
•
Reproducibility means that the information is capable of being substantially reproduced,
subject to an acceptable degree of imprecision. For information judged to have more
(less) important impacts, the degree of imprecision that is tolerated is reduced
(increased). With respect to analytic results, "capable of being substantially reproduced''
means that independent analysis of the original or supporting data using identical
methods would generate similar analytic results, subject to an acceptable degree of
imprecision or error.
Transparency is not defined in the OMB Guidelines, but the Supplementary Information
to the OMB Guidelines indicates (p. 8456) that "transparency" is at the heart of the
reproducibility standard. The Guidelines state that "The purpose of the reproducibility
standard is to cultivate a consistent agency commitment to transparency about how
analytic results are generated: the specific data used, the various assumptions
employed, the specific analytic methods applied, and the statistical procedures
employed. If sufficient transparency is achieved on each of these matters, then an
analytic result should meet the reproducibility standard." In other words, transparency and ultimately reproducibility - is a matter of showing how you got the results you got.
http://www.cio.noaa.gov/services_programs/IQ_Guidelines_011812.html
15
Complete Traceability for NCA Content
Transparency ------------------------------------------------------------------------ Reproducibility
Traceable
Sources
References
Image sources
Data sources
•
•
Link to datasets
Complete metadata
Traceable
Processes
•
•
Description of
methods
Access to process
info & review
Traceable
Tools
•
•
Access to computer
code
Description of systems
and platforms
Easier
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Harder
•
•
•
Traceable
Data
Data and The National Climate Assessment
The Challenge
•
•
•
•
•
•
•
More than 250 named authors (>1000 contributing!)
827 pages
43 Chapters and Appendices
284 Figures
More than 600 Images
3395 References
Approximately 83 data sources used across as many
as 235 instances
17
Data and National Climate Assessment
The Solution
• Defined categories of information within the report:
– Figure
– Image
– Data Source
• Build a process for collecting source information that will satisfy
IQA and HISA requirements:
– Named sources and contacts for every figure, image, and data
source
– Web-based survey that requests inputs that address transparency
and reproducibility and build a foundation for providing the
Metadata ISO 19115 standard
– IT infrastructure that connects and promotes automation between
the web-based survey, a structured data server (SDS)/GCIS, and
publication to an official, interactive NCA web site
18
The use case-driven iterative approach
19
More details at: http://tw.rpi.edu/web/doc/TWC_SemanticWebMethodology
Ontology engineering use case
The first use case
•
Title: Find data used to generate a report figure
•
Actor and system: A reader of the National
Climate
Assessment
•
20
Flow of interactions: A reader wishes to identify the source of the data
used to produce a particular figure in the NCA. A reference to the paper
in which the image contained in this figure was originally published
appears in the figure caption. Clicking that reference displays a page of
metadata information about the paper, including links to the datasets
used in that paper. Pursuing each of those links presents a page of
metadata information about the dataset, including a link back to the
agency/data center web page describing the dataset in more detail and
making the actual data available for order or download.
Ontology engineering use case
The first use case
21
•
Title: Find
data used to generate a report figure
•
Actor and system: A reader of the National Climate Assessment
•
Flow of interactions: A reader wishes to identify the source of the data
used to produce a particular figure in the NCA. A reference to the paper
in which the image contained in this figure was originally published
appears in the figure caption. Clicking that reference displays a page of
metadata information about the paper, including links to the datasets
used in that paper. Pursuing each of those links presents a page of
metadata information about the dataset, including a link back to the
agency/data center web page describing the dataset in more detail and
making the actual data available for order or download.
An intuitive concept map of the 1st use case
22
An intuitive concept map of the use case
Classes and properties recognized from the use case
23
An intuitive concept map of the use case
From an intuitive model to an ontology:
(1) A defined class or property should be meaningful and robust
enough to meet the requirements of various use cases
ontologyrecognized
can be extended
Classes(2)
andAn
properties
from the by
useadding
case classes and properties
recognized from new use cases through the iterative approach
24
Data and The National Climate Assessment
The Solution
globalchange.gov
website
NCA Resources
Site Web Form
ATRAC/XML
File Generator
Structured
Data
Server
Metadata Entry
25
Dataset metadata from a figure
26
Dataset metadata from a image in a figure
27
The second use case
• Title: Identify roles of people in the generation of a chapter in the draft
NCA3
• Actor and system: a viewer of the GCIS website
• Flow of interactions: A viewer sees that Chapter 6 (Agriculture) in the
draft NCA3 was written by a group of authors mentioned in a list. On
the title page of that chapter the reader can view the role of each
author, e.g., convening lead author, lead author or contributing author,
in the generation of this report chapter.
• We decided to use the PROV-O ontology to describe this use case
28
The three Starting Point classes in
PROV-O ontology and the
properties that relate them
Source: http://www.w3.org/TR/prov-o/
29
Mapping the use case
into PROV-O
Author of
Chapter 6
Chapter 6
in NCA3
isA
isA
Writing of
isA
Chapter 6
in NCA3
30
Roles of agents in an
activity in PROV-O
Source: http://www.w3.org/TR/prov-o/
31
Mapping roles of chapter
authors into PROV-O
isA Author of
Chapter 6
Writing of
Chapter 6
in NCA3
isA
Convening
lead author
Lead author
isA
Contributing
author
32
Roles of people in the
activity ‘Writing of
Chapter 6’
Here only three of
the eight authors
of this chapter are
shown. Each
author had a
specific role for this
chapter.
Re-using existing ontologies for the GCIS ontology
By such mappings we can use reasoners that are suitable for the PROV-O ontology,
and thus to retrieve provenance graphs from the established GCIS
34
GCIS Structured Data Server
• Capture – Obtain from a variety of sources: manual input
by trusted parties – support staff, agency partners, data
centers; automated harvesting from publishers, agency
data centers, etc.
• Identify – Assign persistent, resolvable, controlled
identifiers to each element.
• Organize – Capture, discover and represent relationships
between elements, including across various types of
elements; across data centers; and across agency
boundaries.
• Present – Provide machine accessible interfaces to retrieve
structured metadata, and to search/data mine it.
• Maintain – Develop tools and processes to ensure quality
and integrity of database contents over time.
35
Global Change Content Elements
• Reports, Figures, Images, Research Papers,
Journals, Measurements, Datasets,
Instruments, Agencies, Projects, People,
Models, Algorithms, …
• Findings – “Climate is changing.” “Sea Level
is Rising.”
• Concepts: “Impacts of Climate Change on
Human Health” “Adaptation”
36
Machine Accessible Metadata
globalchange.gov
website
NCA Resources
Site Web Form
ATRAC/XML
File Generator
Structured
Data
Server
37
Linked Open Data
http://5stardata.info
38
Identifier Resolution
doi:10.5067/MEASURES/GSSTF/DATA308
A common, persistent, citable reference to that dataset.
We build GCIS specific identifiers from those:
http://data.globalchange.gov/doi/10.5067/MEASURES/GSSTF/DATA308
Then we can resolve it (with content negotiation) on our site,
and link it with identifiers for our other resources, including
asserting equivalence and linking with the data center
responsible for stewardship and distribution of the actual
data. We can also refer and link to other repositories of
information about those resources.
39
Content Negotiation
http://data.globalchange.gov/doi/10.5067/MEASURES/GSSTF/DATA308
The server response from the URI depends on what you
ask for:
•A traditional browser will ask for HTML, and receive and
render a human readable description of the resource.
•Web services can request formal, structured XML or RDF
metadata about the resource.
Our goal is to provide a curated collection of authoritative
global change information, but always link back to the data
center or publisher responsible for the long term
stewardship of the resource.
40
GCIS Structured Data Server
data.globalchange.gov
41
GCIS Database/API
•
•
•
•
RESTful API at data.globalchange.gov
URLs correspond to ontology URIs
Primary storage : RDBMS (PostgreSQL)
Representation is serialized (for JSON) or
used in templates (for Turtle)
• Turtle representation is exported into a
triple store (Virtuoso) which provides a
SPARQL endpoint.
42
GCIS
Ontology
(version
1.2)
(a) Classes and properties representing a brief structure of the NCA3
(b) Classes and properties relevant to the findings of the NCA3 and each
chapter in it
44
(c) Classes and properties about sensors, instruments, platforms, and
algorithms, etc. through which datasets are generated
45
A few classes are asserted as sub-classes of PROV-O classes
Full GCIS Ontology documents are available at:
http://tw.rpi.edu/web/project/gcis-imsap/GCISOntology
46
(part of) GCIS Ontology
47
For more info, see http://data.globalchange.gov
Final output of the GCIS ontology
• Ontology documentation
– http://escience.rpi.edu/ontology/GCISIMSAP/2/GCISOntology_v_1_2.htm
• Concept map
– http://cmapspublic3.ihmc.us/rid=1MCJMLST01G0CSWH-2YH4/GCIS_Ontology_v1_2.cmap
• Ontology RDF serialized in Turtle format
– http://escience.rpi.edu/ontology/GCISIMSAP/2/GCISOntology_v_1_2.ttl
48
Global Change Keywords (GCMD)
Sample finding:
Certain types of extreme weather
events have become more
frequent and intense, including
heat waves, floods, and droughts
in some regions. The increased
intensity of heat waves has been
most prevalent in the western
parts of the country, while the
intensity of flooding events has
been more prevalent over the
eastern parts. Droughts in the
Southwest and heat waves
everywhere are projected to
become more intense in the
future.
GCMD v8.0
•
•
•
•
•
•
•
•
•
•
•
•
ATMOSPHERIC/OCEAN INDICATORS >
EXTREME WEATHER
EXTREME WEATHER > EXTREME
PRECIPITATION
PRECIPITATION > PRECIPITATION RATE
EXTREME WEATHER > HEAT/COLD WAVE
FREQUENCY/INTENSITY
NATURAL HAZARDS > HEAT
NATURAL HAZARDS > FLOODS,
PRECIPITATION > PRECIPITATION AMOUNT
PRECIPITATION >RAIN
SURFACE WATER > FLOODS
ATMOSPHERIC PHENOMENA > DROUGHT,
EXTREME WEATHER > EXTREME DROUGHT,
NATURAL HAZARDS > DROUGHTS
49
SPARQL Example
• http://data.globalchange.gov/examples
• List 10 figures and datasets from which they were derived
select ?figure,?dataset FROM <http://data.globalchange.gov>
where {
?figure gcis:hasImage
?img .
?img prov:wasDerivedFrom
?dataset
}
limit 10
50
Two Parallel Paths
1. Third National Climate Assessment (NCA3)
Traceable
Sources
•
•
•
References
Image sources
Data sources
Traceable
Data
•
•
Link to datasets
•
Complete metadata
•
Traceable
Processes
Description of
methods
Access to process
info & review
Traceable
Tools
•
•
Access to computer
code
Description of systems
and platforms
2 . GCIS
Two Parallel Paths
1. NCA3 release
Traceable
Sources
•
•
•
References
Image sources
Data sources
Traceable
Data
•
•
Traceable
Processes
Link to datasets
•
Complete metadata
•
Description of
methods
Access to process
info & review
2 . Populate GCIS
Traceable
Tools
•
•
Access to computer
code
Description of systems
and platforms
Data and GCIS
The Future
globalchange.gov
website
Structured
Data
Server
53
Interagency Information Integration
GCIS can use relationships between all relevant
information about global change across the agencies:
o From observations to datasets to research papers to models to
analyses to organizations to people to synthesized reports to
human impacts...
o Determine agency interdependencies -- An EPA analysis uses a
NOAA model dependent on observations from a NASA satellite.
o Can present unique interagency metrics "How many papers
referenced datasets from a specific satellite?"
o Direct users back to agency data centers for more detailed
information and the actual content and data.
GCIS Data Mining
Structured information with relationships allows integrated
data mining, searching, metrics.
o What projects provided data used to produce figures that were
referenced in the 2013 NCA section about coastal sea level rise
impacts?
o Which data centers hold data referenced by papers related to
forests in the midwest?
o Which agencies have people working on projects related to societal
impacts of extreme weather events?
o Show me the latest papers about health impacts of air quality in
California. Which datasets were used in the analysis of air quality
in California?
Schedule
2013
2014
Now (Sep)
2015
2016
Release (5/6)
NCA Report
Initial data sets
Full data sets
Indicators
Demo
Pilot
Health Assessment
Ontology Improvements
Sustained NCA
Earth Observation Assessment
(possible support)
56
Staff (some of many contributors)
U.S. Global Change Research Program (USGCRP), National Coordination Office (NCO):
Robert Wolfe1, Curt Tilmes1, Steve Aulenbach2, Brian Duggan2, Justin Goldstein2,
Amanda McQueen2, Julie Morris2, Glynis Lough2
National Climate Assessment (NCA) Technical Support Unit (TSU):
David Easterling3, Paula Hennon4, Angel Li4, April Sides6, Mark Phillips5, Sarah Champion4,
Andrew Buddenberg4, Devin Thomas6
Habitat Seven (NCA Web Design and Development):
Jamie Herring, Phil Evans, Aires Almeida, Graham Blair
Rensselaer Polytechnic Institute (RPI) Tetherless World Constellation (TWC) (Semantic Web
Information Modeling):
Peter Fox, Xiaogang Ma, Patrick West, Stephan Zednik, Jin Zheng
Forum One (globalchange.gov Web Design, Development and Integration):
Michael Rader, John Schneider, Keenan Holloway, Sarah LeNguyen
1.
2.
3.
4.
5.
6.
NASA
University Corporation for Atmospheric Research
NOAA/NCDC
The Cooperative Institute for Climate and Satellites (CICS), North Carolina State University
National Environmental Modeling and Analysis Center (NEMAC), UNC Asheville
ERT, Inc.
57
See also
•
Ma, X., Fox, P., Tilmes, C., Jacobs, K., Waple, A., 2014. Capturing and
presenting provenance of global change information. Nature Climate Change.
4, 409–413. doi:10.1038/nclimate2141
•
Tilmes, C., Fox, P., Ma, X., McGuinness, D., Privette, A.P., Smith, A., Waple,
A., Zednik, S., Zheng, J., 2013. Provenance representation for the National
Climate Assessment in the Global Change Information System. IEEE
Transactions on Geoscience and Remote Sensing, 51 (11), 5160-5168.
•
Xiaogang Ma, Jin Guang Zheng, Justin C. Goldstein, Stephan Zednik, Linyun
Fu, Brian Duggan, Steven M. Aulenbach, Patrick West, Curt Tilmes, Peter Fox
2014, Ontology engineering in provenance enablement for the National
Climate Assessment, Environmental Modelling and Software, 16, 191-205.
doi:10.1016/j.envsoft.2014.08.002
–
58
Open access (until October 17, 2014): http://authors.elsevier.com/a/1Pc6G4sKhE0y1E
Sneak peek
for what is
next
http://tw.rpi.
edu/web/pro
ject/ECOOP
59
Courtesy: C Tilmes
Climate Informatics: Human Experts and the End-to-End System, by Rood and Edwards
60
http://www.earthzine.org/2014/05/22/climate-informatics-human-experts-and-the-end-to-end-system/
Questions and Comments?
For more information visit
http://www.globalchange.gov and http://data.globlchange.gov
Next … iPython meets NCA
NCA=National Climate Assessment
Stace Beaulieu