ICDE10 talk - Leonidas Fegaras
Download
Report
Transcript ICDE10 talk - Leonidas Fegaras
Propagating Updates Through XML Views
Using Lineage Tracing
Leonidas Fegaras
University of Texas at Arlington
Leonidas Fegaras
Propagating Updates Through XML Views Using Lineage Tracing
1
Updating XML Views
Problem: Given a virtual XML view over a relational database
expressed in XQuery, translate an XQuery update over the XML
view to SQL updates over the base tables (embedded into pure
XQuery)
Assumptions:
We use the XQuery Update Facility (XUF)
Snapshot semantics: one XQuery with updates = a single transaction
Goals:
1. prevent most view side-effects statically
... but detect the rest efficiently at run-time
2. collectively consider all compatible updates in a transaction
This is far more expressive than considering each update in isolation
Leonidas Fegaras
Propagating Updates Through XML Views Using Lineage Tracing
2
Highlights
Prevent many forms of view side-effects by using syntactic
restrictions and a static analysis
Allow the full extent of XQuery to define views but restrict the table
connections in a view to trees
Use a conservative static analysis to detect exclusive data sources
These are the table columns that can not cause view side-effects when updated
Based on polymorphic type inference and type usage analysis
Detect the rest of the view side-effects using a general (but
efficient) run-time analysis
Propagate information about the origins of the updatable data through
views and XQuery updates
to be used at the update destinations
Methodology: lineage tracing
The lineage is not just a tuple-id/column-name
an XML element has a context: a chain of tuples used in reaching the element
The table updates are generated by considering all chains in a transaction
Leonidas Fegaras
Propagating Updates Through XML Views Using Lineage Tracing
3
Exclusive Data Source (EDS)
An exclusive data source (EDS) is a table column whose values
1. do not appear in the view output at all
ie, they are not updatable
2. or they appear in the view output only once, but they are not accessed
elsewhere in the view code
Exclusive data sources cannot cause view side effects when
updated
Our goal is to prevent/detect side effects on non-EDS columns as
well
Leonidas Fegaras
Propagating Updates Through XML Views Using Lineage Tracing
4
How to Detect Exclusive Data Sources?
Use polymorphic type inference with a simple usage analysis:
each table column = a fresh type variable
if the principal type of a view is polymorphic over a type variable
=> the view is oblivious to the table column values
in addition, if the type variable appears once in the view principal type
=> the table column is an exclusive data source
A proof based on the parametricity theorem ('theorem for free')
Problems:
Not all XQuery engines support static validation
May be too conservative
Limitation: the analysis granularity is a table column
Not good for generic mappings, eg, a view that uses a single tag table
The tag table may be used multiple times in the view (once per tag)
Cannot statically deduce disjointness
But may use extra annotations in the view to specify disjointness
Leonidas Fegaras
Propagating Updates Through XML Views Using Lineage Tracing
5
Updating non-EDS Columns
Database $DB:
Department ( dno, dname, dphone )
Employee ( ename, dno, salary )
View $view:
View tree:
Employee
ename
dno
salary
for $i in $DB/Employee
let $d := $DB/Department[dno=$i/dno]
return <employee>{ $i/ename, $i/salary,
$d/dname, $d/dphone
}</employee>
EDS columns:
Employee.ename, Employee.dno, Employee.salary
Update on a non-EDS column:
dno
dname dphone
Department
replace $view[ename="Smith"]/dname with "CS"
Leonidas Fegaras
Propagating Updates Through XML Views Using Lineage Tracing
6
Example (cont.)
Update on an non-EDS column:
replace $view[ename="Smith"]/dname with "CS"
If the CS department exists, can we just connect Smith to CS?
May cause a side-effect on dphone
What if the dphone were not visible in the view?
ename
dno
salary
Smith
update
EE
dno
x1234
dname dphone
CS
dno
x5678
dname dphone
What if there is an update
replace $view[ename="Smith"]/dphone with "x5678"
in the same transaction?
Leonidas Fegaras
Propagating Updates Through XML Views Using Lineage Tracing
7
Checking for Side-Effects at Run-Time
An XQuery replace update generates database updates of the form:
replace ( T.A, value, tids )
to replace T.A with a new value
tids is the tuple id list of the tuple chain to reach the update destination tuple
We do not need to start from a root
anchor tuple: the lowest-level tuple in the chain whose foreign key in the chain
is EDS
If T.A is EDS, then tids contains the tuple id of the destination only
At the end of each transaction
1. We collect all updates associated with the same anchor tuple
2. We construct a shadow tuple chain that contains the updated values
3. If there is an existing chain in the database with the same visible tuple values
... we update the anchor foreign key to link it to the existing chain
A visible column is one that appears in the view output (from usage analysis)
Otherwise, we try to insert the tuples of the shadow chain
... which may cause key constraint violation
Leonidas Fegaras
Propagating Updates Through XML Views Using Lineage Tracing
8
Example
E must be an EDS column
Leonidas Fegaras
Propagating Updates Through XML Views Using Lineage Tracing
9
More on Updates
This translation of a replace update will not cause side-effects
But is this what the programmer intended to do?
replace $view[dname="EE"]/dname with "CS"
Too expensive to test at run-time
Deletions are handled similarly
We generate replace calls to change T.A to null
If all visible column values in the shadow chain are null, then disconnect
the chain from the anchor
Needs garbage collection
Very hard to handle insertions
1. Generate replace-null(T.A,value,tids) to replace only if T.A is null
2. Generate insert(T) to insert a new tuple
Hard to generate these calls from a general XQuery insert
Leonidas Fegaras
Propagating Updates Through XML Views Using Lineage Tracing
10
Using Lineage Tracing
The replace/replace-null/insert calls are generated at compiletime
but the tuple chain ids (tids) used must be propagated at run-time
At run-time, every XML atomic value (xs:anyAtomicType) is
annotated with lineage attributes
tids: sequence of tuple ids
path: sequence of foreign key names T.A
Requires minor extensions to an existing XQuery engine
Constructed when an atomic value is generated from a table in a view
from the view tree
XQuery operations must propagate these values as is
Newly constructed values have empty lineage
Leonidas Fegaras
Propagating Updates Through XML Views Using Lineage Tracing
11
Other Contributions
A general algorithm for translating XQuery updates to embedded
SQL update calls
Generates pure XQuery with embedded replace/replace-null/insert calls
It is type-guided (based on the update destination type)
Limitation: XQuery insert translations are not complete
requires that there is at least one child in the parent node to copy parts of its
lineage
Each XQuery update query is translated to pure XQuery with
embedded SQL code to retrieve data,
calls to replace/replace-null/insert to update data
A general framework for optimizing XQuery with embedded
SQL statements
Source-to-source rewrite heuristics to
promote XQuery predicates to SQL code
fuse SQL statements in pairs to form SQL join queries
An implementation in Haskell:
Leonidas Fegaras
http://lambda.uta.edu/HXQ/
Propagating Updates Through XML Views Using Lineage Tracing
12
Conclusion
Presented a framework for updating virtual XML views
Characteristics:
Hybrid of schema-based and data-driven methods for detecting view sideeffects
Allows general XQueries but restricts the table connections in the views
XQuery updates are translated statically to embedded SQL code
... but the resulting SQL updates use information that is propagated at runtime (through lineage tracing)
Current work:
A source-to-source, type-guided approach to incremental maintenance of
XML views
Leonidas Fegaras
Propagating Updates Through XML Views Using Lineage Tracing
13
Related Work
L. Wang et al [vldb'06]: HUX
A hybrid of schema-based and data-driven methods
Uses data lineage
Uses the concept of clean extended source
Semi-decidable method to determine translatability of updates
deduced from relationship cardinalities
Barsalou et al [sigmod'91]: Object view updates
View trees from ownership connections
Literal translation to base updates within a dependency island
Braganholo et al [vldb'04]: Map XML views to relational views
B. Choi et al [icde'07]: data-driven with XPath views
Needs to maintain a reachability matrix
Bidirectionalization research in programming languages
Relational lenses: Bohannon et al [pods'06]
Bidirectionalization for free: Voigtlander [popl'09]
Leonidas Fegaras
Propagating Updates Through XML Views Using Lineage Tracing
14