Transcript slides
Recording Actor Provenance in
Scientific Workflows
Ian Wootten, Shrija Rajbhandari, Omer Rana
[email protected]
Cardiff University, UK
What?
Provenance is concerned with process
This may or may not be documented
Data Provenance – The process which leads to a
particular piece of data
Actor Provenance - The process which leads to a
particular actor state
How an actor (client or service) arrived at a particular
state during an interaction (for stateless actors)
What? Actor Provenance
A2
A1
B2
B1
Service
Service
Enactment Engine
Actor State Assertions:
Asserting the state of
an actor at a particular
time during an
interaction.
Interaction Assertions:
Asserting the contents
of a message by an
actor sending or
receiving it.
Metrics for Actor State Assertion
Static
No variation in value over actor lifetime
Per Node - Node identity, Operating system
Per Actor - Actor identity, Name, Owner, Version
Dynamic
Variation in value over actor lifetime
Per Node - Memory usage, Network traffic
Per Actor - Execution Time, Availability
Instrumented
Actor is ‘Instrumented’ at Key Points in its Execution
Description of internal data flow
Eg. German Aerospace Center (DLR)
Completion states for action events and file transfers
How? Actor Provenance
Instrumented
Output
Monitor
Output
M1
B2
B1
M2
Service
Service
Enactment Engine
Instrumented Actor:
Service information
obtained from
instrumented points
within an actor.
Monitoring Sources:
Service information
derived from hosting
platform via monitoring
sources (eg Ganglia)
Why? Standalone and Combined Value
Standalone State Assertion Value
Actor Selection
Performance
• Evaluation of Past / Prediction of Future
Resource Allocation
Actor administrator allocates resources according to performance
metrics
Combined Value - Putting Assertions into Context
Interaction – Through Actor State Assertions
Determining the likely cause of error / results
Understanding what an actor is doing
Actor – Through Interaction Assertions
Understanding performance pattern observations
Understanding instrumented metric observations
How? Actor Provenance Registry
Attempt to provide a mechanism to specify
and record actor state assertions for any
application
Generic Mechanism Problems
No Knowledge of Potential Resources
Monitoring sources, containers
No Direct Knowledge of Implementation
Instrumented Data Capture
How? Actor Provenance Registry
Resource and Rule Registration
Resource – Monitoring Tool
Rule - User defined instructions
Indirectly from Resources
Coordinator polls resources for information
Times of interest – Service Invocation, Request
Directly from actor
Collection of Instrumented data
Representation?
How? Actor Provenance Registry
Integration with PReP [Groth et al.]
Local
Store
Record
Provenance
Provenance
Store
Record
Provenance
Local
Store
Registry
Monitoring
Sources
Client
Service
Registry
Invoke
Result
Record Actor
Provenance
Record Actor
Provenance
Monitoring
Sources
Data Mining Prototype
Record assertions using registry during
invocation of a data modelling service
Service takes incoming data sets and
generates a model based upon it
Uses Quantitative Structure-Activity
Relationship (QSAR) to attempt to correlate
biological activity to a chemical compound
Larger data set = longer run time
Performance Evaluation
5 rules
40000
35000
1 rule
Invocation Time (ms)
30000
No rules
25000
20000
15000
10000
5000
0
0
50
100
150
200
Size of Data Set (KB)
250
300
350
400
Conclusions / Future Work
Actor Provenance data is important
Without it, we don’t get the full picture
Prototype shows that it can be done
Room for improvement
Interface to Monitoring System
Caching of results
No inclusion of ‘instrumented’ actor capture
Requires service provider adoption to work
Prototype Configuration
Single machine holding both client, service
and registry
Rules executed on invocation of service
XQuery
Invocations performed 100 times on datasets
between 30KB – 340KB in size
Coordinator records rule results to a local
file store