Bottom-up Evaluation of XPath Queries

Download Report

Transcript Bottom-up Evaluation of XPath Queries

Bottom-up Evaluation of
XPath Queries
Stephanie H. Li
Zhiping Zou
Outline
 Overview of XPath
 Motivation
 Algorithms : bottom-up evaluation
 Design and implementation
Introduction- Overview
 Overview of Xpath

XPath is a querying language and is designed for
addressing nodes of XML documents.
Data model
 Syntax
 Expressions

 Location paths
 Operators
 Functions

Evaluation(context)
Data Model
 Data Model


XML document = tree of nodes
7 kinds of nodes:

Element

Attribute
Text
Namespace
Processing-instruction
Comment
Document (root) nodes.





Data Model(Example)
<a>
<b/>
<b/>
<b/>
<b/>
The root node
r
</a>
The root element
a
b
b
b
b
Expression
 XPath uses expressions to select nodes from
XML documents
 The main types of expressions are
Location Paths, Functions and operators
Location Paths
 Although there are many different kinds of
XPath expressions, the one that’s of primary
use in Java programs is the location path.
 Location Path:
/child::movies/child::movie[po
sition()=5]
step
axis nodetest predicate
location path
Location Step
 Axis::Nodetest[predicts]



Axis: chooses the direction to move from the
context node
Node test: determines what kinds of nodes will
be selected along that axis
Predicts: further filter the node-set.
XPath Axis
 Axis---main navigator for a XML doc
ancestor
ancestor-or-self
child
descendant
descendant-or-self
following
following-sibling
parent
preceding
preceding-sibling
: nodes along the path to the root
: same but including the context node
: children of the context node
: descendants of the context node
: same but including the context node
: nodes after the context node in document order,
excluding descendants
: following sibling of the context node
: the parent of the context node
: nodes before the context node in document
order,excluding ancestors
: preceding sibling of the context node
Node Test
 Node Type test

Example
T(root()) = {r},
 T(element()) = {a; b1; : : : ; b4}
 T(element(a))= {a}
 T(element(b)) = {b1; : : : ; b4}

 Node Name test

Element node name
Operators and Functions
 Arithmetic Ops
 Ops for comparisons and boolean logic:

{<,>,<=,>=,=,!=} {or, and}
 Functions


Position()
Last()
Xpath Query Evalutation
 Query evaluation is a major algorithmic problem


Main construct is the expression
Each expression is evaluated to yield an object one of
these four types:




Node-set (an unordered collection of nodes without
duplicates )
Boolean(true or false)
Number(a floating-point number )
String
Context
 All XPath expressions are evaluated w.r.t. a
Context,which consists of



A context node
A context position(int)
A context size(int)
 The input context for query evaluation is
chosen by the user.
Motivation
 Claim:

The way XPath is defined in W3C XPath
recommendation motivates an inefficient
implementation (exponential-time).
 This paper propose more efficient way
(polynomial-time)
Basic query evaluation strategy
Procedure process-location-step(n0, Q)
/* n0
is the context node;
query Q is a list of location steps */
Begin
node set S := apply Q.first to node n0;
if (Q.tail is not empty) then
for each node n ∈ S do
process-location-step(n, Q.tail);
End
The algorithm recursively
evaluates each remaining
step for each matching
node of the current step
Time(|Q|) = |D| * Time(|Q|-1) or |D||Q| when |Q| > 0
1
when |Q| = 0
Xpath Evaluate in PTime
 Theorem: Let e be an arbitrary XPath expression.
Then, for context node x, position k, and size n, the
value of e is v, where v is the unique value such that
<x,k,n,v>∈ E↑[e]
 The main principle that the paper propose to obtain an
XPath evaluation algorithm with PTime complexity is
the notion of a context-value table(CVT)
Context-value table Principle
 Given an expression e, the CVT of e specifies all valid
combinations of contexts c<x,k,n> and values v, s.t. e
evaluates to v in context c<x,k,n>
 Such a table for expression e is obtained by first
computing the CVTs of the direct subexpressions of
e and then combining them into the CVT for e.
 The size of each of the CVTs has a polynomial bound
 Each of the combination steps can be effected in
PTime
 Thus, query evaluation in total under our principle also
has a PTime bound
Bottom-up evaluation of XPath
Bottom-up evaluation of XPath
Algorithm (Bottom-up algorithm for XPath)
By a bottom-up algorithm we
Input: An XPath query Q;
mean a method of processing
Output: E↑[Q]
XPath while traversing the parse
Method:
tree of the query from its leaves
Let Tree(Q) be the parse tree of query Q;
up to its root.
R:=Ø;
For each atomic expression l ∈ leaves(Tree(Q)) do
compute table E↑[l] and add it to R; [Note: we use JDom to do this]
While E↑[root(Tree(Q))]! ∈ R do
Begin
take an Op(l1,…ln) nodes(Tree(Q))
s.t. E↑[l1],… E↑[ln] ∈ R;
compute E↑[Op(l1,…ln)] using E↑[l1],…, E↑[ln];
add E↑[Op(l1,…ln)] to R;
End;
Return E↑[root(Tree(Q))]
Bottom-up evaluation of XPath
 Example

XML :
<?xml version="1.0"?>
<people>
<person born="1912" died="1954" id="p342">
<name> Alan Turing </name>
<!-- Did the word computer scientist exist in Turing's day? -->
<profession>computer scientist</profession>
<profession>mathematician</profession>
<profession>cryptographer</profession>
<homepage>href="http://www.turing.org.uk/"</homepage>
</person>
<person born="1918" died="1988" id="p4567">
<name>Richard M. Feynman</name>
<profession>physicist</profession>
<hobby>Playing the bongoes</hobby>
</person>
</people>
Example: XML Doc Tree
Example: XPath Query tree
Parse tree XPath query:
descendant:: profession/following-sibling::*[position()!= last()]
Example: Evaluate
subexpressions
Example: Evaluate
subexpressions
Example: Evaluate
subexpressions
Design and Implementaion
 Environment




Java,JDK1.5.0
Jdom1.0
XPath1.0
Features:
Only Element nodes are queried
 Not support abbreviated xpath expressions
 Not support format of location steps in predicts.

System Structure
XML file
Query
Context node
User input
(MyDriver.java)
JDom XML parser
(org.jdom.input.SAXBuilder)
XML document tree
Query Parser
(Parser.java BinaryTree.java,Node.java)
Query tree
Evaluator( QueryEval.java)
Context value tables (ContextValTable.java and others)
Result for the full xpath query
Conclusion
 XPath query evaluation algorithm that runs in
polynomial time with respect to the size of both the
data and the query (linear in the size of queries and
quadratic in the size of data)
 No optimization, strictly coheres to the specification
given in the paper
References
 G. Gottlob, C. Koch, and R. Pichler. "Xpath Processing in a Nutshell".
In Proceedings of the 19th IEEE International Conference on Data
Engineering (ICDE'03), Bangalore, India, Mar. 2003.
 G. Gottlob, C. Koch, and R. Pichler. "Efficient Algorithms for
Processing XPath Queries". In Proceedings of the 28th International
Conference on Very Large Data Bases (VLDB'02), Hong Kong, China,
Aug. 2002.
 G. Gottlob, C. Koch, and R. Pichler. "XPath Query Evaluation:
Improving Time and Space Efficiency". In Proceedings of the 19th
IEEE International Conference on Data Engineering (ICDE'03),
Bangalore, India, Mar. 2003.
 http://www.ibiblio.org/xml/books/xmljava/chapters/ch16.html