Recursive XML Schemas, Recursive XML Queries, and Relational

Download Report

Transcript Recursive XML Schemas, Recursive XML Queries, and Relational

Recursive XML Schemas, Recursive XML
Queries, and Relational Storage: XML-to-SQL
Query Translation
Krishnamurthy, R., Chakaravarthy, V.T., Kaushik, R., &
Naughton, J.F. (2004) Proceedings of the 20th
International Conference on Data Engineering
(ICDE’04)
Presenter:
Jochen Stoesser
mailto: [email protected]
Motivation (1)
• General topic: XML-to-SQL query translation
• 3 scenarios of XML shredding
 schema-oblivious shredding
 XML Publishing
 schema-based shredding
April 5, 2005
Advanced Database Systems, Jochen Stoesser
2
Motivation (2): Schema Graph
• Tree schema graph (see example)
• directed acyclic (DAG)
schema graph
 multiple incoming
edges
• recursive schema
graph
Source: Krishnamurthy et al. (2004)
April 5, 2005
Advanced Database Systems, Jochen Stoesser
3
Motivation (3): Schema-based Shredding
• for each non-leaf node create separate
relation
 Book(id, title)
 Author(id, parentid)
 Section(id, parentid,
parentcode, title)
 Para(id, parentid)
 Figure(id, parentid, caption, image)
• each leaf node is associated with
a column name
• parentid and parentcode to preserve structure
April 5, 2005
Advanced Database Systems, Jochen Stoesser
4
Motivation (4)
• Focus of the paper: XML-to-SQL query translation with
schema-based shredding, especially in the presence of
 recursive XML query, e.g.
/book//section/title
//  /descendant-or-self::node()/
 recursive XML schemas
•
(>> example)
not solved by
 existing schema-based shredding algorithms,
 schema-oblivious shredding algorithms,
 XML Publishing algorithms
April 5, 2005
Advanced Database Systems, Jochen Stoesser
5
Queries: SQL(p)
• example: retrieve nodes 9 (titles of subsections, i.e.
nodes 7)
/book/section/section/title
• path p = <book, section(4), section(7), title>
• SQL(p):
select
from
where
S2.title
Book B, Section S1, Section S2
B.id=S1.parentid and S1.parentcode=1 and
S1.id=S2.parentid and S2.parentcode=2;
>>
April 5, 2005
Advanced Database Systems, Jochen Stoesser
6
Queries: RtoL(l)
• root-to-leaf SQL query
• possibly multiple root-to-leaf paths p1, ..., pm to leaf l
m
• RtoL(l) := i = 1SQL(pi )
• retrieves all information that would be retrieved
traversing all paths from the root to leaf l
• Problem: recursive schema
 possibly infinite number of root-to-leaf paths
 RtoL query: union of infinitely many queries
April 5, 2005
Advanced Database Systems, Jochen Stoesser
*
7
Query Translation
• two stages:
1. Identify the paths in the XML schema graph that satisfy the
query: PathId stage
2. Use annotations (schemas) from XML-to-Relational mapping
to construct equivalent relational query: SQLGen stage
XML schema
PathId
Mapping
schema
SQLGen
SQL query
XML query
April 5, 2005
Advanced Database Systems, Jochen Stoesser
8
PathId Stage
• Problems:
 recursive schema: number of paths possibly infinite
 DAG graph: exponential number of paths
• General idea:
 Represent matching paths as graph instead of
enumerating to reflect shared information across
multiple paths (will become important for SQLGen
stage)
 execute query on a schema graph and identify
statisfying paths
April 5, 2005
Advanced Database Systems, Jochen Stoesser
9
PathId Stage
• Algorithm outline:
 Take automaton AS representing schema graph S
 Translate Query into DFA AQ
 Create cross-product automaton ASQ from AS and AQ,
eliminate all dead states
 ASQ contains all matching paths
 View ASQ as mapping schema SSQ
>>
April 5, 2005
Advanced Database Systems, Jochen Stoesser
10
PathId Stage: XPath Semantics
Query: //section//title
Source: Krishnamurthy et al. (2004)
April 5, 2005
Advanced Database Systems, Jochen Stoesser
11
SQLGen Stage
XML schema
PathId
Mapping
schema
SQLGen
XML query
• informally: union of all RtoL in mapping schema SSQ
corresponds to query result
• problem: DAG and recursive graphs & queries
 DAG: number of paths can be exponential in the size of
the component
 recursive graphs & queries: infinite number of matching
paths
April 5, 2005
Advanced Database Systems, Jochen Stoesser
12
SQLGen Stage
• Solution: SQL99  with-clause
• Used to create temporary relations for
• Nodes in DAG components
 shared computation, reflects shared information
contained in paths through DAG components
 decrease exponential to polynomial complexity!
• Recursive components
>>
April 5, 2005
Advanced Database Systems, Jochen Stoesser
13
SQLGen: Algorithm (1)
• Query: /E0//E10
c1
• Find mapping schema SSQ (S=SSQ)
c2
• strongly connected, non-recursive
components (i = Ei):
{0}, {1}, {2},{3}, {4}, {5}, {6},
{10}, {11}
• merge adjacent components ci
and cj if ci dominates cj
c3
c1 = {0, 1, 2, 3, 4, 5, 6}, c3 = {10, 11}
E11
• recursive component c2 = {7, 8, 9}
Source: Krishnamurthy et al. (2004)
April 5, 2005
Advanced Database Systems, Jochen Stoesser
14
SQLGen: Algorithm (2), DAG components
• further process in top-down topological order
• c1 = {0, 1, 2, 3, 4, 5, 6} non-recursive DAG component
• create temporary relation for each non-root node that is
 leaf node
 has child or parent in different
component
 multiple incoming/outgoing edges
( shared computation)
• N1 = {2, 3, 6}
>>
April 5, 2005
c1
Advanced Database Systems, Jochen Stoesser
15
SQLGen: Algorithm (3), DAG components
• for example for node E6  N1:
shared computation
with T6 as (
select R6.* from R4, T3 where
R4.id=R6.parentid and
T3.id=R4.parentid and
R6.parentcode=1
union all
select R6.* from R5, T3 where
R5.id=R6.parentid and
T3.id=R5.parentid and
R6.parentcode=2
)
April 5, 2005
Advanced Database Systems, Jochen Stoesser
16
SQLGen: Algorithm (4), recursive
components
• c2 = {7, 8, 9} recursive component
• temporary relation TR, schema is
union of the schemas of nodes in TR
• two parts:
1. Initialization part
captures all incoming edges
2. Recursive part
captures recursion
>>
April 5, 2005
Advanced Database Systems, Jochen Stoesser
17
SQLGen: Algorithm (5), initialization
• incoming edges from other components:
(2, 8) and (3, 7)
shared computation
• Q1 = select R8.*, id(8) as schemanode from T2, R8
where R8.parentcode=2 and R8.parentid=T2.id
• Q2 = select R7.*, id(7) as schemanode from T3, R7
where R7.parentcode=3
and R7.parentid=T3.id
• Qinit = Q1  Q2
April 5, 2005
Advanced Database Systems, Jochen Stoesser
18
SQLGen: Algorithm (6), recursion
• edges within the recursive component c2:
(7, 9), (8, 7), (8, 9), (9, 8)
• construct recursive query for each of these edges, e.g.
Qe1=(7,9) = select R9.*, id(9) as schemanode from TR, R9
where TR.schemanode=id(7) and
R9.parentid=TR.id and R9.parentcode=1
• recursive part QR =  jQ ej
TR = Qinit  QR
• for each n  c2: temporary relation
T(n) = select * from TR where schemanode=id(n)
(>> example)
April 5, 2005
Advanced Database Systems, Jochen Stoesser
19
SQLGen: Final Query
• result of SQLGen stage:
 temporary relations
-
T2, T3, T6 for c1
-
T7, T8, T9, and TR for c2
-
T10 and T11 for c3
 Final query:
 /E0//E10
algorithm select elemid from T11
April 5, 2005
Advanced Database Systems, Jochen Stoesser
20
Empirical Evaluation (1)
• XMark Benchmark schema
• Translation of query fragments that appear in query suite
• XML-to-Relational mapping schema has 101 nodes
• Size of cross-product schema in all cases < 100 nodes
• Result: Translation processes for each query < 6ms
April 5, 2005
Advanced Database Systems, Jochen Stoesser
21
Empirical Evaluation (2)
• test under extreme conditions
• all transitions on single label x
• query //x//x//x//x//x
• mapping schema complete graph (clique) of n nodes
• runtime of translation algorithm:
Source: Krishnamurthy
et al. (2004)
April 5, 2005
Advanced Database Systems, Jochen Stoesser
22
Contributions
• Claim: „first to present a generic algorithm that translates
path expression queries to SQL in the presence of recursion
in the schema in the context of schema-based XML storage
shredding of XML into relations“
• Algorithm translates a path expression query into a single
SQL query, irrespective of the XML schema‘s complexity
• SQL query‘s size polynomial in size of the input XML-toRelational mapping and the XML query
April 5, 2005
Advanced Database Systems, Jochen Stoesser
23
Limitations
• High complexity of SQL query even for relatively easy XML
queries
• Although running time may be small, memory requirements
may be high due to many temporary relations
• Furthermore, although authors indicate solutions for XPath
semantics and branching path expression queries (e.g.
p1[p2 op value]), there is no proposition about increase in
complexity regarding runtime, memory requirements etc.
April 5, 2005
Advanced Database Systems, Jochen Stoesser
24
Q&A
?
April 5, 2005
Advanced Database Systems, Jochen Stoesser
25
References
• Krishnamurthy, R. (2004) XML-to-SQL Query Translation.
Dissertation at the University of Wisconsin-Madison. Retrieved at
March 18 from http://www.cs.wisc.edu/~sekar/research/main.pdf
• XPath v1.0. W3C. Retrieved at March 18 from
http://www.w3.org/TR/1999/REC-xpath-19991116.html
• Tian, F., DeWitt, D.J., Chen, J., & Zhang, C. The Design and
Performance Evaluation of Alternative XML Storage Strategies.
Retrieved at April 4 from
http://www.cs.wisc.edu/~czhang/doc/publications/feng6page.pdf
April 5, 2005
Advanced Database Systems, Jochen Stoesser
26
Appendix: Recursive XML Schema
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
version="1.0">
<xs:element name="element">
<xs:complexType>
<xs:sequence>
<xs:any processContents="skip" minOccurs="0" />
<xs:element ref="element" minOccurs="0" />
</xs:sequence>
*
</xs:complexType>
</xs:element>
element
</xs:schema>
<<
April 5, 2005
Advanced Database Systems, Jochen Stoesser
27
Appendix: SQL(path p)
<<
April 5, 2005
Advanced Database Systems, Jochen Stoesser
28
Appendix: PathId(Q,S)
<<
April 5, 2005
Advanced Database Systems, Jochen Stoesser
29
Appendix: SQLGen(SSQ)
<<
April 5, 2005
Advanced Database Systems, Jochen Stoesser
30
Appendix: SQLFromDAG(c)
<<
April 5, 2005
Advanced Database Systems, Jochen Stoesser
31
Appendix: SQLFromRecursive(c) (1)
April 5, 2005
Advanced Database Systems, Jochen Stoesser
32
Appendix: SQLFromRecursive(c) (2)
<<
April 5, 2005
Advanced Database Systems, Jochen Stoesser
33
Appendix: TR example
<<
April 5, 2005
Advanced Database Systems, Jochen Stoesser
34