OMII f2f Meeting, London, 19-20/4/06

Download Report

Transcript OMII f2f Meeting, London, 19-20/4/06

myGrid/Taverna Provenance
Daniele Turi
University of Manchester
OMII f2f Meeting, London, 19-20/4/06
Components
• Identifiers
– LSIDs
• Data
– JDBC data store
• Metadata
– RDF Provenance Plugin
• Browsing
– Provenance Browser Plugin
• Security
– Under development
LSID
LSID: Life Science Identifier
• URN specification in progress
• 5 part identifier (with optional version id)
– urn:lsid:www.mygrid.org.uk:lsdocument:X1234
– urn:lsid:ncbi.nlm.nlh.gov.lsid.biopathways.org:genbank_gi
:7717376
• protocol for retrieving data and metadata
about an object
• commitment by the provider to always return
the same data for an ID
LSID (ctd)
• Issue
– LSID Authorities
• Resolution
– LSID Resolvers
• Examples
– myGrid
– Long Term Ecological Research Network
– BioPathways Consortium
LSID (ctd 2)
• abstraction
• lightweight
• independent from actual storage implementation
– database
– file system
– application
• both for private and public data sources
Data
Data Storage (current)
• Taverna can persist inputs, outputs and
intermediate results in an SQL database
via JDBC
• Optional and can be done by configuring a
Baclava Data Store
• Allows the LSIDs of data items to be
resolved against the actual data
Data Storage (future)
• Domain-specific databases
– use outside myGrid
• Develop:
– taverna processor for JDBC/OGSA-DAI
– associated interface (cf BioMart)
• Users will be able to study the contents of an
existing database and:
– write queries that extract data from the database,
where the query may be parameterised with values
passed in from the workflow;
– write requests that insert data from the workflow into a
named table in the database.
Metadata
Metadata Generation
• Taverna Provenance Plugin
• Listen to Taverna Events
– WorkflowEventListener
• Faithfully record them as ontological
instance data
– RDF graphs (one for each Taverna run)
Metadata
• Representation
• Ontology (Schema)
• Storage
• Query
• Browsing
Representation
• RDF
– triples
• subject –predicate object
– URIs (hence easy data integration)
– semantic web language
– XML serialization
– flexible, powerful
– sets of triples gives rise to graphs
Workflow Run
urn:lsid:…:workflow:6
urn:lsid:…:org:HY7
runs
urn:lsid:..:wfInstance:8
launchedBy
urn:lsid:…:person:4
executed
executed
belongsTo
urn:lsid:…:processRun:84
urn:lsid:…:processRun:51
Schema
• Ontology
– RDF schema
• Taxonomic inferences
– also available as OWL
• opens it up to complex reasoning
Typed Workflow Run
executed
WorkflowRun
runs
Workflow
Provenance Ontology
launchedBy
ProcessRun
Experimenter
urn:lsid:…:workflow:6
Organization
belongsTo
urn:lsid:…:org:HY7
runs
urn:lsid:..:wfInstance:8
launchedBy
urn:lsid:…:person:4
executed
executed
belongsTo
urn:lsid:…:processRun:84
urn:lsid:…:processRun:51
Storage
• Named RDF graphs
– retrieve whole graphs (eg workflows)
– implementation in
• NG4J (Jena + MySQL)
– scalability issues
• Sesame2 native store
– scalable
– Java 5
Query
• RDF query languages
– TriQL, SeRQL, SPARQL
• query languages for named RDF graphs
• Ontology inspection/reasoning
• Canned Queries
– workflows with failed processes
– input/output of past process runs
– workflows with data changed by user
Browsing
Provenance Browsing
• Provenance Browser Plugin
– reusing Taverna GUI components
• Matthew Gamble
Analysis
Provenance Analysis
• Comparison
• Aggregation
• etc
– see work by Jun Zhao
Security
• User sends LSID ref and credentials to the Access Point
• Access Point returns data and metadata or denies
access as follows:
– credentials are passed to a User Directory
– User Directory passes the corresponding user to the
Authorization Authority
– Authorization Authority returns the user attributes in the form of a
(possibly signed) SAML assertion
– this assertion, together with the lsid and its corresponding
metadata, is passed to the Policy Enforcement Point (PEP)
– PEP uses these three inputs to form an XACML request that is
passed to a Policy Decision Point (PDP) that is preloaded with
an XACML Policy Set.
– PDP evaluates the request against its policy set and returns an
XACML response to PEP
– PEP decodes the response and either allows data/metadata to
be returned to the user or denies access.
myGrid XACML Policy
• Scenario
– supervisors can access all workflows in the
organization
– students can access only their own workflows
– blacklisted users cannot access anything
• See policySet.xml on myGrid wiki