Relational XML-PPT

Download Report

Transcript Relational XML-PPT

Storing and Querying Ordered
XML Using a Relational Database
System
By Khang Nguyen
Based on the paper of Igor
Tatarinov and Statis Viglas
Introduction


Researchers proposed using relational
database system to decompose XML
documents into relations and translate XML
queries into SQL queries over these relation.
The paper will answer the question of how
(and whether) the ordered XML data model
can be effectively supported by the
unordered relational data model.
Introduction (Cont.)


The paper proposes three order
encoding methods (Global Order, Local
Order, and Dewey Order) to represent
XML order in the relational data model.
The paper will also answer the question
of when and why to use which encoding
method.
1. Ordered XML: Data Model, Query
Language and Query Dimensions

1.1 The XML Data Model




An XML can be viewed as a tree.
Leaf Nodes = Data Values (text).
Internal Nodes = XML elements.
Document Order = the orders of the
elements in the XML document.
1. Ordered XML: Data Model, Query
Language and Query Dimensions

1.2 Order in XML Query Language

1.2.1 XPath


is a language for specifying navigation within
an XML document.
An XPath expression’s syntax:



Path ::= /Step1/Step2/…/Stepn
Path ::= /films/film/writers/writer/Murray Burnett
An XPath expression is evaluated sequentially,
“step” by “step”.
1. Ordered XML: Data Model, Query
Language and Query Dimensions (Cont.)

1.2.2 XQuery




Is a more complex language based on XPath.
Has all the functionalities of XPath.
Includes Before and After operators that take
two node sequences (XPath expressions) and
Return the nodes from the first sequence that
are before or after some node in the second
sequence.
1. Ordered XML: Data Model, Query
Language and Query Dimensions (Cont.)

1.3 Evaluation Modes for XML Queries

1.3.1 Select Mode:



The nodes in an input XML document are assumed to
have unique identifiers (IDs).
The results of executing an XPath expression is an
ordered set of node IDs.
1.3.2 Reconstruct Mode:


Combines selection and extraction.
The result of evaluating an XPath expression in
reconstruct mode is an ordered set of XML elements.
2. XML Order Encoding
Methods


To capture the document order in the
relational data model is accomplished
by encoding each node’s position in an
XML document as a data value.
Unfortunately, there is no single
encoding method, which is optimal for
both queries and updates.
2. XML Order Encoding
Methods (Cont.)

2.1 Global Order Encoding:



Each node is assigned a number that represents
the node’s absolute position in the document.
Poor insertion performance is the primary
weakness.
To improve updates’ performance, use sparse
numbering which does not require the remaining
nodes be numbered when deleting XML
fragments.
Global Order Encoding
1
play
2
4
title
act
3
#text
act
5
title
7
scen
e
6
#text
Global Order
2. XML Order Encoding
Methods (Cont.)

2.3 Local (Sibling) Order Encoding:



Each node is assigned a number that represents
its relative position among its siblings.
Combining a node’s position with that of its
ancestors yields a path vector that uniquely
identifies the absolute position of the node within
the document.
The advantage is the low overhead incurred by
updates.
Local Order Encoding
1
play
1
title
act
1
#text
3
2
act
1
title
2
scen
e
1
#text
Local Order
2. XML Order Encoding
Methods (Cont.)

2.3 Dewey Order Encoding:



Each node is assigned a vector that represents the
path from the document’s root to the node.
Is “lossless” because each path uniquely identifies
the absolute position of the node within the
document.
One potential disadvantage is the extra space
required to store paths from the root to each
node.
Dewey Order Encoding
1
play
1.1
title
1.1.1
#text
1.3
1.2
act
act
1.2.1
1.2.2
title
#text
scen
e
1.2.1.1
Dewey Order
3. Shredding Ordered XML
into Relations

3.1 The Schema-less Case:




The schema of input document is unknown.
The Edge shredding approach is proposed.
A single relation, the Edge table, is used to store
an entire document.
The Edge table is defined


Edge(id, parent, name, value)
To reduce storage, a separate relation (the Path
table) can be used to store paths and their
identifiers.
3. Shredding Ordered XML
into Relations (Cont.)

3.1.1 Storing Order Information

Global Order


Local Order


Edge(id, parent_id, end_des_id, path_id, value)
Edge(id, parent_id, sIndex, path_id, value)
Dewey Order

Edge(dewey, path_id, value)
3. Shredding Ordered XML
into Relations (Cont.)

3.2 The Schema-aware Case:



Inlining shredding technique can be used when an
XML Schema (DTD) is known.
One advantage is the possibility of more efficient
navigation from an element to its sub-elements.
Another advantage is Inlining shreds and stores
XML documents into a set of tables. Queries tend
to access less data and perform better.
Summary



Relational database systems can support
most ordered XML queries efficiently.
Global Order is best for query-mostly
workloads. Dewey Order is best for a mix of
queries and updates. Local Order is best for
update-intensive environments.
Edge and Inlining methods are well known for
translating ordered XML queries into SQL
queries.