A Generic Provenance Middleware for Queries, Updates, and

Download Report

Transcript A Generic Provenance Middleware for Queries, Updates, and

Interoperability for
Provenance-aware Databases
using PROV and JSON
Xing Niu
Illinois Institute of Technology
[email protected]
Raghav Kapoor, Boris GlavicDieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy
Illinois Institute of Technology
Oracle Corporation
Venkatesh Radhakrishnan
Facebook
Outline
①
②
③
④
⑤
⑥
Introduction
Related work
Overview
Export and Import
Experimental Results
Conclusions and Future Work
Introduction
• The PROV standards
 A standardized, extensible representation of provenance
graphs
 Exchange of provenance information between systems
• Provenance-aware DBMS
 Computing the provenance of database operations
 E.g., Perm[1], GProM [2], DBNotes[3], Orchestra[4],
LogicBlox[5]
3
[1] B. Glavic, R. J. Miller, and G. Alonso. Using SQL for Efficient Generation and Querying of Provenance Information. In In Search
of Elegance in the Theory and Practice of Computation, pages 291–320. Springer, 2013..
[2] YB. Arab, D. Gawlick, V. Radhakrishnan, H. Guo, and B. Glavic. A generic provenance middleware for database queries,
updates, and transactions. In TaPP, 2014.
[3] D. Bhagwat, L. Chiticariu, W.-C. Tan, and G. Vijayvargiya. An Annotation Management System for Relational Databases.
VLDB Journal, 14(4):373–396, 2005.
[4] G. Karvounarakis, T. J. Green, Z. G. Ives, and V. Tannen. Collaborative data sharing via update exchange and provenance.
TODS, 38(3):19, 2013.
[5] Huang, S., Green, T., Loo, B.: Datalog and emerging applications: an interactive tutorial. In: SIGMOD, pp.
1213–1216 (2011)
Introduction
• Example: extracting demographic information
from tweets
4
Introduction
• Problem:
 No relational database system supports tracking of
database provenance as well as import and export of
provenance in PROV
 Not capable of exporting provenance into standardized
formats
• E.g., GProM:
 Essentially produces wasDerivedFrom edges
• Between the output tuples of a query Q and its inputs.
 However, not available as PROV graphs
• No way to track the derivation back to non-database entities
5
Introduction
• GProM System
 Computes provenance for database
operations
• Queries, updates, transactions
 Using SQL language extensions
• e.g., PROVENANCE OF (SELECT ...)
6
Introduction
• Example of GProM in action
 The result of PROVENANCE OF for query Q
 Each tuple in this result represents one wasDerivedFrom
assertion
• E.g., tuple to1 was derived from tuple t1
7
Introduction
• Goal: make databases interoperable with other
provenance systems
• Approach:
 Export and import of provenance
• PROV-JSON
 Propagation of imported provenance
 Implemented in GProM using SQL
8
Outline
①
②
③
④
⑤
⑥
Introduction
Related work
Overview
Export and Import
Experimental Results
Conclusion and future work
Related Work
• How to integrate provenance graphs by identifying common
elements? [6]
• Address interoperability problem between databases and other
provenance-aware systems through
– Common model for both types of provenance [7][8][9]
– Monitoring database access to link database provenance with other
provenance systems [10][11]
[6] A. Gehani and D. Tariq. Provenance integration. In TaPP, 2014.
[7] U. Acar, P. Buneman, J. Cheney, J. van den Bussche, N. Kwasnikowska, and S. Vansummeren. A graph model of data and workflow
provenance. In TaPP, 2010.
[8] Y. Amsterdamer, S. Davidson, D. Deutch, T. Milo, J. Stoyanovich, and V. Tannen. Putting Lipstick on Pig: Enabling Database-style
Workflow Provenance. PVLDB, 5(4):346–357, 2011.
[9] D. Deutch, Y. Moskovitch, and V. Tannen. A provenance framework for data-dependent process analysis. PVLDB, 7(6), 2014.
[10] F. Chirigati and J. Freire. Towards integrating workflow and database provenance. In IPAW, pages 11–23, 2012.
[11] Q. Pham, T. Malik, B. Glavic, and I. Foster. LDV: Light-weight Database Virtualization. In ICDE, pages 1179–1190, 2015.
10
Outline
①
②
③
④
⑤
⑥
Introduction
Related works
Overview
Export and Import
Experimental Results
Conclusion and future work
Overview
• We introduce techniques for exporting database provenance
as PROV documents
• Importing PROV graphs alongside data
• Linking outputs of SQL operations to imported provenance
for their inputs
– Implementation in GProM offloads generation of PROV documents
to backend database
• SQL and string concatenation
12
Outline
①
②
③
④
⑤
⑥
Introduction
Related works
Overview
Export and Import
Experimental Results
Conclusion and future work
Export and Import
• Export
– Added TRANSLATE AS clause
• e.g., PROVENANCE OF (SELECT ...) TRANSLATE
AS …
– Construct PROV-JSON document from database
provenance
① Running several projections over the provenance
computation
– E.g., ‘”_:wgb\(’ || F0.STATE || ‘|’ || F0.”AVG(AGE)” || ‘\)’…
② Uses aggregation to concatenate all snippets of a certain
type
– E.g., entity nodes, wasGeneratedBy edges, allUsed edges
③ Uses string concatenation to create final document
14
Export and Import
• Example: part of the final PROV document
Red dotted lines in DB
15
Export and Import
• Import
 Import PROV for an existing relation
 Provide a language construct IMPORT PROV FOR ...
 Import available PROV graphs for imported tuples and
store them alongside the data
 Add three columns to each table to store imported
provenance
• prov doc: store a PROV-JSON snippet representing its
provenance
• Prov_eid: indicates which of the entities in this snippet
represents the imported tuple
• Prov_time: stores a timestamp as of the time when the tuple was
imported
16
Export and Import
• Import:example
Relation user with imported provenance
Attribute value d is the previous PROV graph without
database activities and entities
17
Export and Import
• Using Imported Provenance During Export
 Include the imported provenance as bundles in the
generated PROV graph
• Bundles [13] enable nesting of PROV graphs within
PROV graphs, treating a nested graph as a new entity.
 Connect the entities representing input tuples in the
imported provenance to the query activity and output
tuple entities
[13] P. Missier, K. Belhajjame, and J. Cheney. The W3C PROV family of specifications for modelling
provenance metadata. In EDBT, pages 773–776, 2013.
18
Export and Import
• Example of Bundles:
19
Export and Import
• Handling Updates
 If a tuple is modified, that should be reflected when
provenance is exported
• E.g., by running an SQL UPDATE statement
• Example
 Assume the user has run an update to correct tuple t1’s age value
(setting age to 70) before running the query
20
Export and Import
• Challenge
 How to track the provenance of updates under
transactional semantics
• Solution
GProM using the novel concept of reenactment
queries
• User can request the provenance of an past update,
transaction, or set of updates executed within a given
time interval
• Construct PROV document using provenance
for updates computed on-the-fly
21
Outline
①
②
③
④
⑤
⑥
Introduction
Related works
Overview
Export and Import
Experimental Results
Conclusion and future work
Experimental Results
• TPC-H [14] benchmark datasets
 Scale factor from 0.01 to 10 (10MB up to 10GB size)
• Run on a machine with
 2 x AMD Opteron 3.3Ghz Processors
 128GB RAM
 4 x 1 TB 7.2K RPM disks configured in RAID 5
• Queries
 Provenance of a three way join between relations customer,
order, and nation
 With additional selection conditions to control selectivity (and,
thus, the size of the exported PROV-JSON document).
[14] TPC. TPC-H Benchmark Specification, 2009.
23
Experimental Results
1 GB
10 GB
24
Outline
①
②
③
④
⑤
⑥
Introduction
Related works
Overview
Export and Import
Experimental Results
Conclusions and Future Work
Conclusions and Future Work
Conclusions
• Integrated import and export of provenance represented as
PROV-JSON into/from provenance-aware databases
• Construct PROV graphs on-the-fly using SQL
• Connect database provenance to imported PROV data
Future Work
• Full implementation for updates
• Automatic storage management (e.g., deduplication) for
imported provenance
• Automatic cross-referencing
26
Questions
• My Webpage
– http://www.cs.iit.edu/~dbgroup/people/xniu.php
• Our Group’s Webpage
– http://cs.iit.edu/~dbgroup/research/index.html
• GProM
– http://www.cs.iit.edu/~dbgroup/research/gprom.ph
p
27
Others
• Provenance querying
• Provenance for JSON
28