Transcript Document

Foundations VI: Provenance
Deborah McGuinness
TA Weijing Chen
Semantic eScience
Week 9, October 31, 2011
1
References
•
•
•
•
•
PML -McGuinness, Ding, Pinheiro da Silva, Chang. PML 2: A Modular
Explanation Interlingua. AAAI 2007 Workshop on Explanation-aware Computing,
Vancouver, Can., 7/07. Stanford Tech report KSL-07-07.
http://www.ksl.stanford.edu/KSL_Abstracts/KSL-07-07.html
Inference Web - McGuinness and Pinheiro da Silva. Explaining Answers from the
Semantic Web: The Inference Web Approach. Web Semantics: Science,
Services and Agents on the World Wide Web Special issue: International
Semantic Web Conference 2003 - Edited by K.Sycara and J.Mylopoulis. Volume
1, Issue 4. Journal published Fall, 2004
http://www.ksl.stanford.edu/KSL_Abstracts/KSL-04-03.html
McGuinness, D.L.; Zeng, H.; Pinheiro da Silva, P.; Ding, L.; Narayanan, D.;
Bhaowal, M. Investigations into Trust for Collaborative Information Repositories:
A Wikipedia Case Study. The Workshop on the Models of Trust for the Web
(MTW'06), Edinburgh, Scotland, May 22, 2006. 2006.
http://www.ksl.stanford.edu/KSL_Abstracts/KSL-06-05.html
More from http://inference-web.org/wiki/Publications
2
Note the LOGD converter generates PML
Semantic Web Methodology and
Technology Development Process
•
•
Establish and improve a well-defined methodology vision for
Semantic Technology based application development
Leverage controlled vocabularies, et c.
Adopt
Leverage
Science/Expert
Rapid
Technology
Open World: Prototype Technology
Review & Iteration
Approach
Infrastructure
Evolve, Iterate,
Redesign,
Redeploy
Use Tools
Evaluation
Analysis
Use Case
Small Team,
mixed skills
Develop
model/
ontology
3
Ingest/pipelines: problem definition
•
Data is coming in faster, in greater volumes and outstripping our ability to perform
adequate quality control
•
Data is being used in new ways and we frequently do not have sufficient
information on what happened to the data along the processing stages to
determine if it is suitable for a use we did not envision
•
We often fail to capture, represent and propagate manually generated
information that need to go with the data flows
•
Each time we develop a new instrument, we develop a new data ingest
procedure and collect different metadata and organize it differently. It is then hard
to use with previous projects
•
The task of event determination and feature classification is onerous and we
don't do it until after we get the data
4
5
20080602 Fox VSTO et al.
Use cases
• Who (person or program) added the comments
to the science data file for the best vignetted,
rectangular polarization brightness image from
January, 26, 2005 1849:09UT taken by the
ACOS Mark IV polarimeter?
• What was the cloud cover and atmospheric
seeing conditions during the local morning of
January 26, 2005 at MLSO?
• Find all good images on March 21, 2008.
• Why are the quick look images from March 21,
2008, 1900UT missing?
• Why does this image look bad?
6
7
20080602 Fox VSTO et al.
8
20080602 Fox VSTO et al.
Provenance
• Origin or source from which something
comes, intention for use, who/what generated
for, manner of manufacture, history of
subsequent owners, sense of place and time
of manufacture, production or discovery,
documented in detail sufficient to allow
reproducibility
• Knowledge provenance; enrich with
ontologies and ontology-aware tools
9
Semantic Technology Foundations
• PML – Proof Markup Language – used for
knowledge provenance interlingua
• Inference Web Toolkit – used to manipulate and
access knowledge provenance
• OWL-DL ontologies (including SWEET and VSTO
ontologies)
•
•
PML -McGuinness, Ding, Pinheiro da Silva, Chang. PML 2: A Modular Explanation Interlingua. AAAI 2007 Workshop on
Explanation-aware Computing, Vancouver, Can., 7/07. Stanford Tech report KSL-07-07.
Inference Web - McGuinness and Pinheiro da Silva. Explaining Answers from the Semantic Web: The Inference Web
Approach. Web Semantics: Science, Services and Agents on the World Wide Web Special issue: International Semantic
Web Conference 2003 - Edited by K.Sycara and J.Mylopoulis. Volume 1, Issue 4. Journal published Fall, 2004
Inference Web Explanation Architecture
WWW
SDS
OWL-S/BPEL
Trace of web service discovery
Learners
*
Proof Markup
Language (PML)
Toolkit
IWTrust
Trust computation
IW Explainer/
Abstractor
End-user friendly
visualization
Learning Conclusions
JTP/CWM
KIF/N3
Trust
Theorem prover/Rules
SPARK
SPARK-L
Trace of task execution
UIMA
Justification
Provenance
Text Analytics
IWBrowser
Expert friendly
Visualization
IWSearch
search engine
based publishing
IWBase
provenance
registration
Trace of information extraction
• Semantic Web based infrastructure
• PML is an explanation interlingua
– Represent knowledge provenance (who, where, when…)
– Represent justifications and workflow traces across system boundaries
• Inference Web provides a toolkit for data management and
visualization
Global View and More
Views of Explanation
filtered
focused
global
Explanation
(in PML)
discourse
trust
•
•
provenance
Explanation as a graph
Customizable browser
options
–
–
–
–
•
abstraction
Proof style
Sentence format
Lens magnitude
Lens width
More information
–
–
–
–
Provenance metadata
Source PML
Proof statistics
Variable bindings
Provenance View
• Source metadata: name, description, …
• Source-Usage metadata: which fragment of
a source has been used when
Views of Explanation
filtered
focused
Explanation
(in PML)
trust
global
abstraction
discourse
provenance
Trust View
Views of Explanation
filtered
Detailed trust
explanation
Trust Tab
Explanation
(in PML)
trust
•
•
•
Fragment
colored by
trust value
focused
global
abstraction
discourse
provenance
(preliminary) simple
trust representation
Provides colored
(mouseable) view
based on trust values
Enables sharing and
collaborative
computation and
propagation of trust
values
Discourse View
•
•
•
•
(Limited) natural language interface
Mixed initiative dialogue
Exemplified in CALO domain
Explains task execution component
powered by learned and human
generated procedures
Views of Explanation
filtered
focused
Explanation
(in PML)
trust
global
abstraction
discourse
provenance
Selected IW and PML Applications
• Portable proofs across reasoners: JTP (with temporal and
context reasoners (Stanford); CWM (W3C), SNARK(SRI),
…
• Explaining web service composition and discovery (SNRC)
• Explaining information extraction (more emphasis on
provenance – KANI, UIMA)
• Explaining intelligence analysts’ tools (NIMD/KANI)
• Explaining tasks processing (SPARK / CALO)
• Explaining learned procedures (TAILOR, LAPDOG, /
CALO)
• Explaining privacy policy law validation (TAMI)
• Explaining decision making and machine learning (GILA)
• Explaining trust in social collaborative networks (TrustTab)
• Registered knowledge provenance: IW Registrar
(Explainable Knowledge Aggregation)
• Explaining natural science provenance – VSTO, SPCDIS,
…
PML1 vs. PML2
• PML1 was introduced in 2002
– It has been used in multiple contexts ranging from
explaining theorem provers to text analytics to machine
learning.
– It was specified as a single ontology
• PML2 improves PML1 by
– Adopting a modular design: splitting the original ontology
into three pieces: provenance, justification, and trust
• This improves reusability, particularly for applications
that only need certain explanation aspects, such as
provenance or trust.
– Enhancing explanation vocabulary and structure
• Adding new concepts, e.g. information
• Refining explanation structure
PML Provenance Ontology
• Scope: annotating
provenance metadata
• Highlights
– Information
– Source Hierarchy
– Source Usage
Referencing, Encoding and
Annotating a Piece of Information
• Referencing a piece of information
– using URI
• Encoding the content of information
– Complete Quote:
<hasRawString>(type TonysSpecialty SHELLFISH) </hasRawString>
– Obtained from URL:
<hasURL>http://inferenceweb.org/ksl/registry/storage/documents/tonys_fact.kif</hasURL>
• Annotations
– For human consumption:
<hasPrettyString>Tonys’ Specialty is ShellFish</hasPrettyString>
– For machine consumption
• Language:
<hasLanguage rdf:resource="http://inference-web.org/registry/LG/KIF.owl#KIF" />
• Format:
<hasFormat "http://inference-web.org//registry/FM/PDF.owl#PDF" />
Source Hierarchy
• Source is the container of information
• Our source hierarchy offers
– Many well-known sources such as
• Sensor (e.g. geo-science)
• InferenceEngine (e.g. reasoner)
• WebService (e.g. workflow)
– Finer granularity of source than just document
• DocumentFragment (for text analytics)
Source Usage
• Source Usage
– logs the action that accesses a source at a
certain dateTime to retrieve information
– is part of PML1
• Example: Source #ST was accessed on
certain date
<pmlp:SourceUsage rdf:about="#usage1">
<pmlp:hasUsageDateTime>2005-10-17T10:30:00Z</pmlp:hasUsageDateTime>
<pmlp:hasSource rdf:resource="#ST"/>
</pmlp:SourceUsage>
PML Justification Ontology
• Scope: annotating
justification process
• Highlights
– Template for questionanswer/justification
– Four types of justification
Four Types of Justification
Goal
conclusion without justification
Assumption
conclusion assumed (using
Assumption Rule) asserted by an
InferenceEngine, no antecedent
Direct Assertion
conclusion directly asserted (using
DirectAssertion rule) by an
InferenceEngine, no antecedent
Regular
conclusion derived from antecedent
conclusions
PML Trust Ontology
• Scope: annotate trust and
belief assertions
• Highlights
– Extensible trust representation
(user may plug in their
quantitative metrics using OWL
class inheritance feature)
– Has been used to provide a
trust tab filter for wikipedia –
see McGuinness, Zeng, Pinheiro da
Silva, Ding, Narayanan, and Bhaowal.
Investigations into Trust for Collaborative
Information Repositories: A Wikipedia
Case Study. WWW2006 Workshop on the
Models of Trust for the Web (MTW'06),
Edinburgh, Scotland, May 22, 2006.
25
26
20080602 Fox VSTO et al.
Quick look browse
27
20080602 Fox VSTO et al.
28
Visual browse
29
30
31
Search and structured query
Search
Structured
Query
32
Search
33
20080602 Fox VSTO et al.
Provenance within
SemantAqua
34
SemantAqua System
Architecture
Virtuoso
access
35
Provenance
• Preserves provenance in the Proof Markup
Language (PML).
• Data Source Level Provenance:
– The captured provenance data are used to support
provenance-based queries.
• Reasoning level provenance:
– When water source been marked as polluted, user can
access supporting provenance data for the
explanations including the URLs of the source data,
intermediate data, the converted data, and regulatory
data.
36
Visualization
37
http://tinyurl.com/iswc-swqp
Visualization
38
http://tinyurl.com/iswc-swqp
Visualization
39
http://tinyurl.com/iswc-swqp
Visualization
40
http://tinyurl.com/iswc-swqp
Visualization
41
http://tinyurl.com/iswc-swqp
Visualization
• Time series Visualization:
– Presents data in time series visualization
for user to explore and analyze the data
Violation, measured value: 2032.8
Violation, measured value: 971
Limit value: 400
42
http://was.tw.rpi.edu/swqp/trend/epaTrend.html?state=RI&county=1&site=http%3A%2F%2Ftw2.tw.rpi.edu%2
Fzhengj3%2Fowl%2Fepa.owl%23facility-110009444869
Selected Results
• Provenance information encoded using semantic
web technology supports transparency and trust.
• SemantAqua provides detailed provenance information:
– Original data, intermediate data, data source
• “What if” Scenario:
– User can apply a stricter regulation from another state to a local
water source.
• User may be interested only in certain sources and can
use the interface to control queries
43
Aim at providing at least as
much provenance as
SemantAqua
• Questions?
44