Bottom-up Evaluation of XPath Queries
Download
Report
Transcript Bottom-up Evaluation of XPath Queries
Bottom-up Evaluation of
XPath Queries
Stephanie H. Li
Zhiping Zou
Outline
Overview of XPath
Motivation
Algorithms : bottom-up evaluation
Design and implementation
Introduction- Overview
Overview of Xpath
XPath is a querying language and is designed for
addressing nodes of XML documents.
Data model
Syntax
Expressions
Location paths
Operators
Functions
Evaluation(context)
Data Model
Data Model
XML document = tree of nodes
7 kinds of nodes:
Element
Attribute
Text
Namespace
Processing-instruction
Comment
Document (root) nodes.
Data Model(Example)
<a>
<b/>
<b/>
<b/>
<b/>
The root node
r
</a>
The root element
a
b
b
b
b
Expression
XPath uses expressions to select nodes from
XML documents
The main types of expressions are
Location Paths, Functions and operators
Location Paths
Although there are many different kinds of
XPath expressions, the one that’s of primary
use in Java programs is the location path.
Location Path:
/child::movies/child::movie[po
sition()=5]
step
axis nodetest predicate
location path
Location Step
Axis::Nodetest[predicts]
Axis: chooses the direction to move from the
context node
Node test: determines what kinds of nodes will
be selected along that axis
Predicts: further filter the node-set.
XPath Axis
Axis---main navigator for a XML doc
ancestor
ancestor-or-self
child
descendant
descendant-or-self
following
following-sibling
parent
preceding
preceding-sibling
: nodes along the path to the root
: same but including the context node
: children of the context node
: descendants of the context node
: same but including the context node
: nodes after the context node in document order,
excluding descendants
: following sibling of the context node
: the parent of the context node
: nodes before the context node in document
order,excluding ancestors
: preceding sibling of the context node
Node Test
Node Type test
Example
T(root()) = {r},
T(element()) = {a; b1; : : : ; b4}
T(element(a))= {a}
T(element(b)) = {b1; : : : ; b4}
Node Name test
Element node name
Operators and Functions
Arithmetic Ops
Ops for comparisons and boolean logic:
{<,>,<=,>=,=,!=} {or, and}
Functions
Position()
Last()
Xpath Query Evalutation
Query evaluation is a major algorithmic problem
Main construct is the expression
Each expression is evaluated to yield an object one of
these four types:
Node-set (an unordered collection of nodes without
duplicates )
Boolean(true or false)
Number(a floating-point number )
String
Context
All XPath expressions are evaluated w.r.t. a
Context,which consists of
A context node
A context position(int)
A context size(int)
The input context for query evaluation is
chosen by the user.
Motivation
Claim:
The way XPath is defined in W3C XPath
recommendation motivates an inefficient
implementation (exponential-time).
This paper propose more efficient way
(polynomial-time)
Basic query evaluation strategy
Procedure process-location-step(n0, Q)
/* n0
is the context node;
query Q is a list of location steps */
Begin
node set S := apply Q.first to node n0;
if (Q.tail is not empty) then
for each node n ∈ S do
process-location-step(n, Q.tail);
End
The algorithm recursively
evaluates each remaining
step for each matching
node of the current step
Time(|Q|) = |D| * Time(|Q|-1) or |D||Q| when |Q| > 0
1
when |Q| = 0
Xpath Evaluate in PTime
Theorem: Let e be an arbitrary XPath expression.
Then, for context node x, position k, and size n, the
value of e is v, where v is the unique value such that
<x,k,n,v>∈ E↑[e]
The main principle that the paper propose to obtain an
XPath evaluation algorithm with PTime complexity is
the notion of a context-value table(CVT)
Context-value table Principle
Given an expression e, the CVT of e specifies all valid
combinations of contexts c<x,k,n> and values v, s.t. e
evaluates to v in context c<x,k,n>
Such a table for expression e is obtained by first
computing the CVTs of the direct subexpressions of
e and then combining them into the CVT for e.
The size of each of the CVTs has a polynomial bound
Each of the combination steps can be effected in
PTime
Thus, query evaluation in total under our principle also
has a PTime bound
Bottom-up evaluation of XPath
Bottom-up evaluation of XPath
Algorithm (Bottom-up algorithm for XPath)
By a bottom-up algorithm we
Input: An XPath query Q;
mean a method of processing
Output: E↑[Q]
XPath while traversing the parse
Method:
tree of the query from its leaves
Let Tree(Q) be the parse tree of query Q;
up to its root.
R:=Ø;
For each atomic expression l ∈ leaves(Tree(Q)) do
compute table E↑[l] and add it to R; [Note: we use JDom to do this]
While E↑[root(Tree(Q))]! ∈ R do
Begin
take an Op(l1,…ln) nodes(Tree(Q))
s.t. E↑[l1],… E↑[ln] ∈ R;
compute E↑[Op(l1,…ln)] using E↑[l1],…, E↑[ln];
add E↑[Op(l1,…ln)] to R;
End;
Return E↑[root(Tree(Q))]
Bottom-up evaluation of XPath
Example
XML :
<?xml version="1.0"?>
<people>
<person born="1912" died="1954" id="p342">
<name> Alan Turing </name>
<!-- Did the word computer scientist exist in Turing's day? -->
<profession>computer scientist</profession>
<profession>mathematician</profession>
<profession>cryptographer</profession>
<homepage>href="http://www.turing.org.uk/"</homepage>
</person>
<person born="1918" died="1988" id="p4567">
<name>Richard M. Feynman</name>
<profession>physicist</profession>
<hobby>Playing the bongoes</hobby>
</person>
</people>
Example: XML Doc Tree
Example: XPath Query tree
Parse tree XPath query:
descendant:: profession/following-sibling::*[position()!= last()]
Example: Evaluate
subexpressions
Example: Evaluate
subexpressions
Example: Evaluate
subexpressions
Design and Implementaion
Environment
Java,JDK1.5.0
Jdom1.0
XPath1.0
Features:
Only Element nodes are queried
Not support abbreviated xpath expressions
Not support format of location steps in predicts.
System Structure
XML file
Query
Context node
User input
(MyDriver.java)
JDom XML parser
(org.jdom.input.SAXBuilder)
XML document tree
Query Parser
(Parser.java BinaryTree.java,Node.java)
Query tree
Evaluator( QueryEval.java)
Context value tables (ContextValTable.java and others)
Result for the full xpath query
Conclusion
XPath query evaluation algorithm that runs in
polynomial time with respect to the size of both the
data and the query (linear in the size of queries and
quadratic in the size of data)
No optimization, strictly coheres to the specification
given in the paper
References
G. Gottlob, C. Koch, and R. Pichler. "Xpath Processing in a Nutshell".
In Proceedings of the 19th IEEE International Conference on Data
Engineering (ICDE'03), Bangalore, India, Mar. 2003.
G. Gottlob, C. Koch, and R. Pichler. "Efficient Algorithms for
Processing XPath Queries". In Proceedings of the 28th International
Conference on Very Large Data Bases (VLDB'02), Hong Kong, China,
Aug. 2002.
G. Gottlob, C. Koch, and R. Pichler. "XPath Query Evaluation:
Improving Time and Space Efficiency". In Proceedings of the 19th
IEEE International Conference on Data Engineering (ICDE'03),
Bangalore, India, Mar. 2003.
http://www.ibiblio.org/xml/books/xmljava/chapters/ch16.html