Towards a Model of Provenance and User Views In Scientific

Download Report

Transcript Towards a Model of Provenance and User Views In Scientific

Modeling
Provenance
through User views
Sarah Cohen-Boulakia
Shirley Cohen
Susan Davidson
Thunyarat (Bam) Amornpetchkul
Olivier Biton
Database group, University of Pennsylvania
Provenance Challenge, Sept. 2006
1
Our approach

Model of provenance




Based on study of user requirements (CIPRES)
Based on careful studies of workflow systems
(Kepler, MyGrid, Chimera)  minimal information to
reason about provenance
No workflow system is proposed
User views
Capability of workflow systems to group steps
(forming boxes) and to zoom into boxes
 Multi-granularity levels of provenance


Implemented in Oracle 10g and Java


Relational framework augmented with transitive
closure
Java/Spring/JDBC: object layer and user interface
Provenance Challenge, Sept. 2006
2
Workflow Representation
input data
reslice: step-class
8.reslice: step

Terminology



output data
Step-classes (static)
An execution of a workflow generates a partial
order of steps (dynamic)
 Instances of step classes
Each step has input and output data
Provenance Challenge, Sept. 2006
3
Provenance Trace

Base tables

Data(dataid, name, type), DataAttributes(dataid,
attribute, value)
 Data(1, Anatomy Image1, Anatomy Image)
 DataAttributes(1, center, UChicago)




Center=UChicago
InstanceOf(Step,Step-Class,ts), StepParams(step,
attribute, value), StageInstance(step, stage)
Input(stepId,dataId,ts) / Output(stepId,dataId,ts)
stepId takes as input /produces dataId at time ts
Views


Process(stepId, stepClass, input, output, time)
…
Provenance Challenge, Sept. 2006
4
Provenance Queries
Q1: Find the process that led to Atlas X Graphic /
everything that caused Atlas X Graphic to be as it is
SELECT DISTINCT step, step-class, input, output
FROM Process
START WITH output = (
SELECT ID FROM DataID WHERE name = 'Atlas X Graphic'
)
CONNECT BY PRIOR input = output
Implements
ORDER BY step;
transitive closure.
Necessary to return
all the data used to
(recursively)
compute Atlas X
Graphic.
Provenance Challenge, Sept. 2006
5
Provenance Queries (Cont.)

All the queries can be answered by our system


Using SQL




Code available on TWiki
Connect by operators
Joins with several tables (e.g. Parameters,
DataAttribute)
Minus and Union operators
The generalization of Q7 (difference between
workflows) is currently not answerable
Provenance Challenge, Sept. 2006
6
Workflow Variant:
User Views

What are User views?




Box1
Level of detail the user
wishes to track
Permissions given to the
user
Ability of the user to see /
know the sub-steps
(distributed computation)
Box2
Why use User Views?



Throw away unimportant
intermediate results
Better understanding of
the workflow
Reduce the amount of
work to be redone
UBio
UBlackBox
UAdmin can see everything
Provenance Challenge, Sept. 2006
7
Querying within User Views

Need information from





Workflow: Step-class containment and user views
Cinput(sid,idid,tsi), Coutput(sid,idid,tso)
 View UProcess(usr, step, step-class, input, output)
Query: What are all the data items used to
produce“Resliced Image1”?
SELECT *
FROM uProcess upc
WHERE usr = :userName
START WITH outputName = 'Resliced Image1'
CONNECT BY PRIOR upc.output = upc.input;
UAdmin: Anatomy Header 1, Anatomy Image1,
Reference Image, Reference Header, Wrap param1
UBio: Anatomy Header 1, Anatomy Image1,
Reference Image, Reference Header
UBlackBox: empty answer!
Provenance Challenge, Sept. 2006
8
Conclusion, Perspectives

Able to answer the queries, including



Variation of the workflow and queries
considering user views



Data and Step provenance
Immediate and Deep (recursive) provenance
Multi-granularity levels of provenance
Only visible and necessary data are kept
Open questions



What is the meaning of “stage” in a workflow (with
respect to user views)?
What are we expecting as an answer to the difference
between two workflows (cf. query 7)?
Are all the procedures of the workflow “biologically
significant” (cf. user views)?
Provenance Challenge, Sept. 2006
9
Acknowledgements

Kepler Group






Shawn Bowers
Bertram Ludascher
Timothy McPhillips
Biologists from the CIPRES project
Members from the Database group,
University of Pennsylvania
This work is supported by NSF grants
0513778, 0415810, and 0612177
Provenance Challenge, Sept. 2006
10
User interface
Provenance Challenge, Sept. 2006
11