- Tetherless World Constellation

Download Report

Transcript - Tetherless World Constellation

Linked Data and the Provenance
Explosion
Deborah L. McGuinness
Tetherless World Constellation Chair
Professor of Computer Science and Cognitive
Science
Director RPI Web Science Trust Network Lab
Rensselaer Polytechnic Institute
DPDM, March 18, 2011 Melbourne, Australia
Outline
– Motivating scenarios
• Light Weight data connectivity in health setting
– PopSciGrid
– Provenance-related questions
• Mid-weight semantic modeling in
interdisciplinary virtual observatories
– Virtual Solar Terrestrial Observatory -> Semantic
Provenance Capture in Data Ingest Systems
• Mobile platform, social networking enhanced
advisor
– Semantic Sommelier
– Provenance Infrastructure and Directions
– Discussion and Directions
Selected Background
• Bell Labs: designing Description Logics (DLs) &
environments aimed at supporting applications such
as configuration.
– led to research on making DL-based systems
useful – with focus on explanation
• Stanford University: focus on ontology-enabled xx,
large hybrid systems, later X-informatics often for
eScience
– led to ontology evolution & diagnostic
environments, expanded explanation settings
including “messy” hybrid systems with new
provenance emphasis
Background cont.
• Rensselaer Polytechnic Institute/ Tetherless
World Constellation: next generation web,
web science research center, open data,
next generation semantic eScience
– Led to more connections with social platforms,
empowering collections (of users, data, etc.)
TWC
Population Sciences Grid Goals
(with NIH/NCI, Northwester)
• Convey complex health-related information to
consumer and public health decision makers
for community health impact
• Leverage the growing evidence base for
communicating health information on the
Internet
• Inform the development of future research
opportunities effectively utilizing
cyberinfrastructure for cancer prevention and
6
control.
Computer Science Slant
• How can semantic technologies be used to integrate, present,
and analyze data for a wide range of users?
• Can tools allow lay people to build their own demos and
support public usage and accurate interpretation?
• How do we facilitate collaboration and making applications
“viral”?
• Within PopSciGrid:
– Which policies (taxation, smoking bans, etc) impact health and health
care costs?
– What data should we display to help scientists and lay people evaluate
related questions?
– What data might be presented so that people choose to make (positive)
behavior changes?
– What does the data show? why should someone believe that?
7
– What are appropriate follow ups?
PopSciGrid
PopSciGrid
PopSciGrid II
http://logd.tw.rpi.edu/demo/tax-cost-policy-prevalence
PopSciGrid III
Questions
• Overall data:
– What data is used?
– How recent is it?
– What are the conditions under which it was
obtained?
– Is it reliable for this purpose?
• Pick one item like prevalence – is this the best
parameter to focus on?
– What is prevalence (definition)?
– How is it measured (overall / in this data set)?
– Are there another or better proxy (e.g., packs sold)
• Do we need more data, more inference, more xxx…
Virtual Observatory
(VSTO)
• General: Find data subject to
certain constraints and plot
appropriately
• Specific: Plot the
observed/measured Neutral
Temperature as recorded by
the Millstone Hill Fabry-Perot
interferometer while looking in
the vertical direction at any
time of high geomagnetic
activity in a way that makes
sense for the data.
November 9, 2006
13
Partial exposure of
Instrument
class
November 9,-2006users
hierarchy
Deborah L. McGuinness
14
VSTO Results
Many Benefits:
– Reduced query formation from 8 to 3 steps and reduced choices at
each stage
– Allowed scientists to get data from instruments they never knew of
before (e.g., photometers in example)
– Supported augmentation and validation of data
– Useful and related data provided without having to be an expert to
ask for it
– Integration and use (e.g. plotting) based on inference
– Ask and answer questions not possible before
BUT Needed Provenance
–
–
Deborah McGuinness, Peter Fox, Luca Cinquini, Patrick West, Jose Garcia, James L. Benedict, and Don Middleton. The Virtual SolarTerrestrial Observatory: A Deployed Semantic Web Application Case Study for Scientific Research. In the Proceedings of the Nineteenth
Conference on Innovative Applications of Artificial Intelligence (IAAI-07). Vancouver, British Columbia, Canada, July 22-26, 2007.
Peter Fox, Deborah L. McGuinness, Luca Cinquini, Patrick West, Jose Garcia, James L. Benedict, and Don Middleton. Ontology-supported
Scientific Data Frameworks: The Virtual Solar-Terrestrial Observatory Experience. In Computers and Geosciences - Elsevier. Volume 35,
Issue 4 (2009).
Inference Web (IW)
End
Users
End-User Data Access
Interact
& Data
ion
Analysis
Explanation via Graph
services
Services
Validate PML data
Distributed
PML data
Explanation via
Explanation via Annotation
Customized Summary
Access published PML data
Inference Web is a semantic web-based knowledge provenance management infrastructure:
• Uses a provenance interlingua (PML) for encoding and interchange of provenance metadata
in distributed environments
• Provides interactive explanation services for end-users
• Provides data access and analysis services for enriching the value of knowledge provenance
•It has been used in a wide range of applications
Mobile
Wine
Agent
Making Systems Actionable
using Knowledge Provenance
CALO
Combining
Proofs in
TPTP
Intelligence Analyst
Tools
GILA
NOW including Data-gov
Knowledge
Provenance
in Virtual
17
Observatories
Proof/Provenance
Markup Language (PML)
• A kind of linked data on the Web
World Wide Web
PML
data
PML
data
PML
data
D
D
D
D
D
PML
data
PML
data
Enterprise Web
D
D
D
PML
data
Enterprise Web
PML
data
…
• Modularized & extensible
– Provenance: annotate provenance properties
– Justification: encodes provenance relations (including support for
multiple justifications)
– Trust: add trust annotation
• Semantic Web based
User Require Provenance!
Users demand it! If users (humans and agents) are to use, reuse, and
integrate system answers, they must trust them.
Intelligence analysts: (from DTO/IARPA’s NIMD)
Andrew. Cowell, Deborah McGuinness, Carrie Varley, and David A. Thurman. Knowledge-Worker Requirements
for Next Generation Query Answering and Explanation Systems. Proc. of Intelligent User Interfaces for
Intelligence Analysis Workshop, Intl Conf. on Intelligent User Interfaces (IUI 2006), Sydney, Australia.
Intelligent Assistant Users: (from DARPA’s PAL/CALO)
Alyssa Glass, Deborah L. McGuinness, Paulo Pinheiro da Silva, and Michael Wolverton. Trustable Task
Processing Systems. In Roth-Berghofer, T., and Richter, M.M., editors, KI Journal, Special Issue on
Explanation, Kunstliche Intelligenz, 2008.
Virtual Observatory Users: (from NSF’s VSTO)
Deborah McGuinness, Peter Fox, Luca Cinquini, Patrick West, Jose Garcia, James L. Benedict, and Don
Middleton. The Virtual Solar-Terrestrial Observatory: A Deployed Semantic Web Application Case Study for
Scientific Research. Proc. of the Nineteenth Conference on Innovative Applications of Artificial Intelligence
(IAAI-07). Vancouver, British Columbia, Canada.
And… as systems become more diverse, distributed, embedded, and depend
on more varied data and communities, more provenance and more types are
needed
CHIP Pipeline
(Chromospheric Helium Image Photometer)
Intensity
Images (GIF)
Raw Image Data
Captured by CHIP
Publishes
Chromospheric
Helium-I Image
Photometer
Mauna Loa
Solar
Observatory
(MLSO)
Hawaii
•Raw Data Capture
Velocity
Images (GIF)
•Raw Image Data
National Center
for Atmospheric
Research (NCAR)
Data Center.
Boulder, CO
•Follow-up Processing
on Raw Data
(e.g., Flat Field Calibration)
•Quality Checking
(Images Graded: GOOD, BAD, UGLY) 20
20
Semantic Provenance Capture for
Data Ingest Systemcs (SPCDIS)
Fact: Scientific data services are increasing in usage
and scope, and with these increases comes
growing need for access to provenance information.
Provenance Project Goal: to design a reusable,
interoperable provenance infrastructure.
Science Project Goal: design and implement an
extensible provenance solution that is deployed at
the science data ingest/ product generation time.
Outcome: implemented provenance solution in one
science setting AND operational specification for
other scientific data applications.
21
Extends vsto.org
ACOS
Data Ingest
• Typical science data
processing pipelines
• Distributed
• Some metadata in silos
• Much metadata lost
• Many human-in-loop
decisions, events
• No metadata infrastructure
for any user
• Community is broadening
Chromospheric Helium Imaging Photometer (CHIP) Data Ingest
ACOS – Advanced Coronal Observing System
22
The Advanced Coronal Observing
System case for Provenance
•Provenance metadata currently not propagated with or linked
to the data products
•Processing metadata
•Origin (observation) metadata
•Data products are the result of “black box” systems
•Most users do not know what calibrations,
transformations, and QA processing have been applied
to the data product
Source
??
?
Processing
Product
23
Advanced Coronal Observing System
(ACOS) Provenance Use Cases
• What were the cloud cover and seeing
conditions during the observation period
of this image?
• What calibrations have been applied to
this image?
• Why does this image look bad?
24
PML Usage in SPCDIS
Source
• Justification
DateTime
Engine
Rule
– Explanation
– Causality graphhasInferenceEngine
• Provenance
–
–
–
–
Conclusion
Source
Engine
Rule
• Trust
– Trust/Belief metrics
Rule
hasInferenceRule
Justification
Justification
Conclusion
Conclusion
NodeSet
NodeSet
SourceUsage
hasSourceUsage
hasAntecedentList
Justification
Conclusion
NodeSet
25
PML in Action in
SPCDIS
• This is the PML provenance encoding for a “quick look”
gif file, which is generated from two image data datasets
The “antecedents” of
the quicklook gif file
are other node sets
InferenceStep
: how the gif
hasInferenceRule
file was
hasInferenceEngine derived
hasConclusion:
a reference to the
gif file itself
hasAntecedents
Node set
for the
quickloook
gif file
A PML-Enhanced Image
CHIP PML-Enhance Quick-Look
CHIP Quick-Look
provenance
Integrated View
• Observer
log’s
information
added into
quicklook
image’s
provenance
Provenance aware
faceted search
Tetherless World Constellation
29
Current Issues
• Successful interdisciplinary VO; needed provenance
• Successful provenance integration for experts; needs to
support more diverse audience
– As the user base diversifies, what updates are needed?
– Will a domain ontology for MLSO/NCAR-affiliated staff be
understandable by citizen scientists?... No
– How can our representational infrastructure be extended with
contextual information relevant to user needs? E.g., linking data
products from one part of the CHIP pipeline to specific solar
events or events at MLSO (such as reports of bad weather)
– Should provenance ontologies provide extensional capabilities
to include domain-informed extensions – yes
–
–
–
[1] Stephan Zednik, Peter Fox and Deborah L. McGuinness, “System Transparency, or How I Learned to Worry about
Meaning and Love Provenance!” Proceedings of IPAW 2010
[2] James R. Michaelis, Li Ding, Zhenning Shangguan, Stephan Zednik, Rui Huang, Paulo Pinheiro da Silva, Nicholas Del Rio and
Deborah L. McGuinness, “Towards Usable and Interoperable Workflow Provenance: Empirical Case Studies Using PML”
Proceedings of SWPM 2009
[3] AGU 2010 with papers with Fox, et al, McGuinness et al., Zednick et al,, West. et. al, Michaelis et al, …
30
Wine Agent – Semantic
Sommelier
Wine Agent for iPhone
• Client application which
talks to a SW service
• Make requests for dishes
and wines using autogenerated interfaces
• Make recommendations
to the system for others
Getting the
Recommendation
• Recommendations are
made up of two classes: 1
dish, 1 wine
• When the instance is
realized, the agent looks
up matching
recommendations and
returns the results
• Tapping a particular
recommendation causes
the wine agent to look for
pairings which match the
recommendation
Our Position
System Transparency supports user understanding and trust
Our Research Goal: Provide interoperable infrastructure that
supports explanations of sources, assumptions, and answers
as an enabler for trust
Provenance Events
CSV2RDF
visualize
derive
derive
revision
Archive
Enhance
SemDiff
derive
Challenges for Data Aggregators
(with Tim Lebo, Greg Williams)
36
Discussion
• Provenance is growing in acceptance, need,
and type
• Provenance data could easily dwarf other data
in volume
• Some interlinguas have emerged that have
significant usage and have shown significant
value and are ready to be used (plus standard
likely from W3C)
• Interdisciplinary eScience and open data are
increasing the need and pace
37
Discussion II
A few trends we have observed:
– Techniques for supporting interaction with large diverse
communities are needed (we believe user annotation is one
such critical technique)
– Data aggregators face additional challenges if provenance is
not available… and may accelerate the demand for
provenance and provenance standards
– Getting back to the portion of the source used is critical for
some
– Tracking manipulations is critical for some
– Providing and creating provenance as part of a larger ecosystem is key
– Domain-specific extensions can be of value
Open (govt, science, etc) data (along with semantic web applications
with embedded information about knowledge provenance and term
meaning) is providing many new opportunities and will continue to
change our lives.
• Questions? dlm <at> cs <dot> rpi <dot> edu
38