preservAHM2005a

Download Report

Transcript preservAHM2005a

Paul Groth, Simon Miles, Luc Moreau
UK e-Science All Hands Meeting 2005
Outline






Process Documentation for Provenance
Power of the P-Structure
P-assertion Recording Protocol
PReServ’s Functionality
Performance
Pitch 
UK e-Science All Hands Meeting 2005
Provenance
 The Provenance Question
– Lots of definitions…
– Boil it down to a question.
– What is the process that led to a particular
result?
 How do we answer this question?
– Search through
documentation.
UK e-Science All Hands Meeting 2005
Documentation
 Process Documentation
– encompasses all other documentation
 SOA based model of process
 Actors communicate via message passing
 Actors make ASSERTIONS to document
process. Termed p-assertions.
 How to organise these p-assertions
UK e-Science All Hands Meeting 2005
P-Structure
UK e-Science All Hands Meeting 2005
P-Structure View
UK e-Science All Hands Meeting 2005
Benefits
 Domain independent queries
 That are provenance specific
 P-structure is a shared logical organisation of
p-assertions
 Does not prescribe how p-assertions are
exactly stored in an implementation.
UK e-Science All Hands Meeting 2005
PReP
 Introduces the Provenance Store
– A Separate entity for maintaining process
documentation
 PReP specifies how an actor can
communicate with the Provenance Store.
 PReP has a number of nice properties.
– Statelessness
– Idempotence
– Terminiation
UK e-Science All Hands Meeting 2005
An Implementation
 What is PReServ?
– A Web Services implementation of a
Provenance Store
– Implements
• PReP for recording
• XQuery for querying
– Provides libraries and wrappers for
making applications provenance aware.
UK e-Science All Hands Meeting 2005
PReServ Implementation Diagram
WS Client
PS Client
Side
Library
Axis
Handler
Axis
Handler
Web Service
PS Client
Side
Library
Provenance Store
WS Calls
Java Calls
Backend Store Interface
PS Client
Side
Library
Query Actor WS
Database
Store
In-Memory
Store
Backend Stores
UK e-Science All Hands Meeting 2005
…
Implementation cont.
 Caching mechanism to improve performance
 Berkeley Java Database 2.0
• No setup required
• Completely Transactional
SOAP Msg
SOAP Msg
Dispatcher
Store Plug In
Query Plug In
…
Backend Store Interface
Java Object Database
UK e-Science All Hands Meeting 2005
Memory
…
Requirements




Apache Tomcat 5.0
Apache Ant 1.6.2
Java 1.5 (1.4 supported with some help)
Pure Java, tested on
– Windows
– Mac OS X
– Debian Linux
UK e-Science All Hands Meeting 2005
Evaluation
Deployment
 Protein Compressibility Experiment
– HPDC’05
 Workflow runs under VMWare
– deployment consistency
– ease of development
 Workflow is executed on one machine
 PReServ runs on another machine
– Version 0.1.5 of PReServ
UK e-Science All Hands Meeting 2005
Record Performance
UK e-Science All Hands Meeting 2005
Query Performance
UK e-Science All Hands Meeting 2005
Applications
UK e-Science All Hands Meeting 2005
Conclusion
 The p-structure allows for domain
independent, provenance specific queries
using XQuery.
 Both recording and query times are linear
 PReServ has a extensible architecture
allowing for further functionality to be easily
added.
UK e-Science All Hands Meeting 2005
Download!
 Try it out!
 Download PReServ 0.2:
– The AHM release 
– Released under Open Source MIT License
 www.pasoa.org
– Click software
 Contact us, we will try to help you make your
application provenance-aware.
UK e-Science All Hands Meeting 2005
Configuration
 Redhat Linux 9.1 on VMWare on Windows
XP
 Pentium P4 2.8 GHZ 1.5 GB RAM
 PReServ on another machine
– Database backend Berkley JDB
 100 Mb local ethernet
UK e-Science All Hands Meeting 2005