Provenir ontology: Towards a Framework for eScience

Download Report

Transcript Provenir ontology: Towards a Framework for eScience

Provenir ontology: Towards a Framework for
eScience Provenance Management
Satya S. Sahoo, Amit P. Sheth
Kno.e.sis Center,
Wright State University
Microsoft eScience Workshop 2009
Pittsburgh, Oct 16
Outline
• Provenance: A Tale of Two Use Cases
• Provenance Ontologies: A Modular Approach
• Provenir: A Foundational Model of Provenance
• Provenance Query Infrastructure
• Application to Parasite Research
Provenance in GlycoProtein Analysis
Cell Culture
extract
?
Glycoprotein Fraction
proteolysis
Glycopeptides Fraction
1
n
Proteolytic
enzyme
Separation technique I
Glycopeptides Fraction
n
PNGase
Peptide Fraction
Separation technique II
n*m
Peptide Fraction
Mass spectrometry
ms data
Data
reduction
ms peaklist
N-dimensional array
Signal integration
Data reduction
ms/ms peaklist
binning
Parent protein and peptide
list
ms/ms data
Peptide identification
Peptide list
Data correlation
Provenance in Parasite Research
Gene
Name
Sequence
Extraction
Drug Resistant
Plasmid
3‘ & 5’
Region
Plasmid
Construction
T.Cruzi
sample
Knockout
Construct Plasmid
Transfection
Transfected
Sample
Drug
Selection
Selected
Sample
Cell
Cloning
Gene Knockout and Strain Creation*
• Provenance from the French word
Gene Name
“provenir” describes the lineage or history of
a data entity
• For Verification and Validation of Data
Integrity,? Process Quality, and Trust
• Issues in Provenance Management
 Interoperability
Cloned Sample
 Consistent Modeling
 Reduce Terminological Heterogeneity
Cloned
Sample
*T.cruzi Semantic Problem Solving Environment Project, Courtesy of D.B. Weatherly and Flora Logan, Tarleton Lab, University of Georgia
Outline
• Provenance: A Tale of Two Use Cases
• Provenance Ontologies: A Modular Approach
• Provenir: A Foundational Model of Provenance
• Provenance Query Infrastructure
• Application to Parasite Research
Ontologies for Provenance Modeling
• Advantages of using Ontologies
 Formal Description: Machine Readability, Consistent Interpretation
 Use Reasoning: Knowledge Discovery over Large Datasets
• Problem: A gigantic, monolithic Provenance Ontology! – not
feasible
• Solution: Modular Approach using a Foundational Ontology
FOUNDATIONAL
ONTOLOGY
PARASITE
EXPERIMENT
GLYCOPROTEIN
EXPERIMENT
OCEANOGRAPHY
Outline
• Provenance: A Tale of Two Use Cases
• Provenance Ontologies: A Modular Approach
• Provenir: A Foundational Model of Provenance
• Provenance Query Infrastructure
• Application to Parasite Research
Provenir Ontology
Gene
Name
Sequence
Extraction
Drug Resistant
Plasmid
AGENT
3‘ & 5’
Region
Plasmid
Construction
Knockout
Construct Plasmid
T.Cruzi
sample
has_agent
DATA
Transfection
Machine
Transfection
Transfected
Sample
Drug
Selection
PROCESS
Selected
Sample
Cell
Cloning
Cloned
Sample
Provenir Ontology Schema
SPATIAL
THEMATIC
TEMPORAL
is_a
is_a
is_a
PARAMETER
DATA COLLECTION
is_a
AGENT
is_a
DATA
has_agent
PROCESS
Domain-specific Provenance: Parasite Experiment
ontology
agent
has_agent
is_a
is_a
data
has_participant
PROVENIR
ONTOLOGY
parameter
is_a
data_collection
is_a
process
is_a
spatial_parameter
is_a
is_a
temporal_parameter
domain_parameter
is_a
is_a
is_a
is_a
transfection_machine
drug_selection
location
is_a
is_a
is_a
sample
has_participant
transfection
is_a
cell_cloning
strain_creation_
protocol
Time:DateTime
Descritption
transfection_buffer
Tcruzi_sample
has_parameter
PARASITE
EXPERIMENT
ONTOLOGY
*Parasite Experiment ontology available at: http://wiki.knoesis.org/index.php/Trykipedia
Trident Ontology for Oceanography
Outline
• Provenance: A Tale of Two Use Cases
• Provenance Ontologies: A Modular Approach
• Provenir: A Foundational Model of Provenance
• Provenance Query Infrastructure
• Application to Parasite Research
Provenance Query Classification
Classified Provenance Queries into Three Categories
• Type 1: Querying for Provenance Metadata
o Example: Which gene was used create the cloned sample with ID =
65?
• Type 2: Querying for Specific Data Set
o Example: Find all knockout construct plasmids created by researcher
Michelle using “Hygromycin” drug resistant plasmid between April 25,
2008 and August 15, 2008
• Type 3: Operations on Provenance Metadata
o Example: Were the two cloned samples 65 and 46 prepared
under similar conditions – compare the associated
provenance information
Provenance Query Operators
Four Query Operators – based on Query Classification
• provenance () – Closure operation, returns the complete set of
provenance metadata for input data entity
• provenance_context() - Given set of constraints defined on
provenance, retrieves datasets that satisfy constraints
• provenance_compare () - adapt the RDF graph equivalence
definition
• provenance_merge () - Two sets of provenance information are
combined using the RDF graph merge
Provenance Query Engine Architecture
• Available as API for
integration with
provenance
management systems
• Input:
o Type of
provenance
query operator :
provenance ()
o Input value to
query operator:
cloned sample 65
o User details to
connect to
underlying
Oracle RDF
store
QUERY
OPTIMIZER
TRANSITIVE CLOSURE
Outline
• Provenance: A Tale of Two Use Cases
• Provenance Ontologies: A Modular Approach
• Provenir: A Foundational Model of Provenance
• Provenance Query Infrastructure
• Application to Parasite Research
T.cruzi SPSE Provenance Management System
Conclusions
• Provenir ontology as a foundational model for provenance
• Extensible to model domain-specific provenance
 Parasite Experiment ontology
 Trident ontology
 ProPreO ontology
• Query Infrastructure to support provenance modeled using
Provenir ontology
• Application in a NIH-funded project for Parasite Research
Acknowledgement
• Roger Barga – Microsoft Research, eScience
• D. Brent Weatherly – Center for Tropical and Emerging
Diseases, University of Georgia
• Flora Logan – The Wellcome Trust Sanger Institute, Cambridge,
UK
• Raghava Mutharaju – Kno.e.sis Center, Wright State University
• Pramod Anantharam - Kno.e.sis Center, Wright State University
References
• Provenir ontology:
http://wiki.knoesis.org/index.php/Provenir_Ontology
• Provenance Management in Parasite Research:
http://knoesis.wright.edu/library/resource.php?id=00712
• Provenance Management Framework:
http://knoesis.wright.edu/research/semsci/application_domain
/sem_prov/
• T.cruzi Semantic Problem Solving Environment:
http://knoesis.wright.edu/research/semsci/application_domain
/sem_life_sci/tcruzi_pse/