preservAHM2005a
Download
Report
Transcript preservAHM2005a
Paul Groth, Simon Miles, Luc Moreau
UK e-Science All Hands Meeting 2005
Outline
Process Documentation for Provenance
Power of the P-Structure
P-assertion Recording Protocol
PReServ’s Functionality
Performance
Pitch
UK e-Science All Hands Meeting 2005
Provenance
The Provenance Question
– Lots of definitions…
– Boil it down to a question.
– What is the process that led to a particular
result?
How do we answer this question?
– Search through
documentation.
UK e-Science All Hands Meeting 2005
Documentation
Process Documentation
– encompasses all other documentation
SOA based model of process
Actors communicate via message passing
Actors make ASSERTIONS to document
process. Termed p-assertions.
How to organise these p-assertions
UK e-Science All Hands Meeting 2005
P-Structure
UK e-Science All Hands Meeting 2005
P-Structure View
UK e-Science All Hands Meeting 2005
Benefits
Domain independent queries
That are provenance specific
P-structure is a shared logical organisation of
p-assertions
Does not prescribe how p-assertions are
exactly stored in an implementation.
UK e-Science All Hands Meeting 2005
PReP
Introduces the Provenance Store
– A Separate entity for maintaining process
documentation
PReP specifies how an actor can
communicate with the Provenance Store.
PReP has a number of nice properties.
– Statelessness
– Idempotence
– Terminiation
UK e-Science All Hands Meeting 2005
An Implementation
What is PReServ?
– A Web Services implementation of a
Provenance Store
– Implements
• PReP for recording
• XQuery for querying
– Provides libraries and wrappers for
making applications provenance aware.
UK e-Science All Hands Meeting 2005
PReServ Implementation Diagram
WS Client
PS Client
Side
Library
Axis
Handler
Axis
Handler
Web Service
PS Client
Side
Library
Provenance Store
WS Calls
Java Calls
Backend Store Interface
PS Client
Side
Library
Query Actor WS
Database
Store
In-Memory
Store
Backend Stores
UK e-Science All Hands Meeting 2005
…
Implementation cont.
Caching mechanism to improve performance
Berkeley Java Database 2.0
• No setup required
• Completely Transactional
SOAP Msg
SOAP Msg
Dispatcher
Store Plug In
Query Plug In
…
Backend Store Interface
Java Object Database
UK e-Science All Hands Meeting 2005
Memory
…
Requirements
Apache Tomcat 5.0
Apache Ant 1.6.2
Java 1.5 (1.4 supported with some help)
Pure Java, tested on
– Windows
– Mac OS X
– Debian Linux
UK e-Science All Hands Meeting 2005
Evaluation
Deployment
Protein Compressibility Experiment
– HPDC’05
Workflow runs under VMWare
– deployment consistency
– ease of development
Workflow is executed on one machine
PReServ runs on another machine
– Version 0.1.5 of PReServ
UK e-Science All Hands Meeting 2005
Record Performance
UK e-Science All Hands Meeting 2005
Query Performance
UK e-Science All Hands Meeting 2005
Applications
UK e-Science All Hands Meeting 2005
Conclusion
The p-structure allows for domain
independent, provenance specific queries
using XQuery.
Both recording and query times are linear
PReServ has a extensible architecture
allowing for further functionality to be easily
added.
UK e-Science All Hands Meeting 2005
Download!
Try it out!
Download PReServ 0.2:
– The AHM release
– Released under Open Source MIT License
www.pasoa.org
– Click software
Contact us, we will try to help you make your
application provenance-aware.
UK e-Science All Hands Meeting 2005
Configuration
Redhat Linux 9.1 on VMWare on Windows
XP
Pentium P4 2.8 GHZ 1.5 GB RAM
PReServ on another machine
– Database backend Berkley JDB
100 Mb local ethernet
UK e-Science All Hands Meeting 2005