Conditional XPath
Download
Report
Transcript Conditional XPath
Conditional XPath, the first
order complete XPath dialect
Maarten Marx
Presented by: Einav Bar-Ner
1
Introduction
XPath 1.0 is a variable free language used for
selecting nodes from XML documents.
What is missing?
Core XPath (xpath query which interpreted on
XML document tree models) cannot express
queries with conditional paths as exemplified
by “do a child step, while test is true at the
resulting node.”
2
Introduction – Cont.
Solution:
We add conditional axis relations to Core XPath and
show that the resulting language, called conditional
XPath, is equally expressive as first-order logic when
interpreted on ordered trees.
Specifically, conditional XPath extends Core XPath with
the until operator.
This paper introduces and motivates this addition.
3
XPath 1.0 VS. XPath 2.0
XPath 2.0 contains variables which are
used in if–then–else, for, and quantified
expressions. The available axis relations
are the same in both versions.
4
XPath 1.0 VS. XPath 2.0
As XPath 2.0 adds variables and first
order quantifiers it is natural to ask
whether XPath 1.0 is already
expressively complete with respect to
first order logic .
5
XPath VS. First Order Logic
More precisely, is every first order query
Φ(x) which selects a set of nodes from
an XML document model equivalent to
an XPath expression?
As we will see, it is not, but a simple and
natural addition is sufficient.
6
Lecture’s Topics
1.
Introduction + motivation
2.
A brief introduction to XPath, CXPath
3.
Proving the expressively completeness
of CXPath
7
“Until–like” query
The answer set of “until-like” query consists of
all nodes n’ satisfying the statement:
“n’ is a descendant of n, the label of n’ is A,
and for all z, if z is a descendant of n and n’
is a descendant of z, then the label of z is A.”
This query can be expressed by (child :: A)+,
which we call a conditional axis.
8
Conditional XPath (CXPath) vs.
Core XPath
While CXPath is more expressive than
Core XPath, in all other aspects it is
very conservative.
CXPath has the same syntax as XPath 1.0
and Core XPath and the same (standard
W3C) semantics.
9
A Brief Introduction to XPath –
Cont.
The primitive axis of XPath are
, , ,
and those of CXPath are
, , , , fexpr , fexpr , fexpr , fexpr
10
A Brief Introduction to XPath
Definition 1.
The syntax of the XPath languages XCore and CXPath
is defined by the grammar
Locpath ::= axis ‘::’ntst | axis ‘::’ ntst ‘[’fexpr‘]’ |
‘/’locpath | locpath ‘/’ locpath |
locpath ‘ | ’ locpath
fexpr
::= locpath | not fexpr | fexpr and fexpr |
fexpr or fexpr
axis
::= self | primitive_axis | primitive_axis+ |
primitive_axis*.
11
A Brief Introduction to XPath –
Cont.
“locpath” is the start production
“axis” – denotes axis relations e.g. descendant,
ancestor.
“ntst” denotes tags labeling document nodes or the
star ‘*’ that matches all tags (these are called node
tests).
“fexpr” will be called filter expressions after their use
as filters in location paths.
With an XPath expression we always mean a “locpath”.
12
A Brief Introduction to XPath –
Cont.
Interpretation of the axis relations:
- Assigns a boolean value: “true” if the
node n satisfies fexpr, else false.
Given a tree T and an expression A, the
meaning of A in T is written as [[A]]T.
13
The Semantics of XCore and
CXPath
14
Example
Consider the following information need:
give elements whose next element in
document order has tag A. This can be
expressed in first order logic by
yx y A( y) z( x z y)
in which << abbreviates descendant or
ancestor_or_self/following_sibling/descendant_or_self.
15
Example – Cont.
In the following case, answer ser will not
contain x.
x
y
A
z
16
Example – Cont.
To “program” this information in navigational
XPath we seem to need to express the “next in
document order” relation. To do this we use a
few macros: first, last and leaf abbreviate
self :: [not :: ] , self :: [not :: ] and self :: [not :: ]
We also use the converses of the conditional axis,
written as last Its meaning is the transitive
closure of the relation self :: [last ] / ::
Now we can write the “next in document order”
relation in conditional XPath by the following
case distinction:
:: [ first ] | self :: [leaf ] / :: | last :: / ::
17
What kind of queries does
XPath express?
The answer set of locpath evaluated on a
tree R (notation: answerR(locpath)) is the
set
{n Є R| there exists an m, (m,n) Є [locpath]R}
Thus for each expression A, answerR(A)
equals answerR / * :: * / A .
18
Motivation
We will show that for every first order
query Φ(x) there exists an absolute
CXPath expression A which is
equivalent in the following strong
sense: for each tree R, for each node
n, R|=Φ(n)
if and only if n Є answerR(A).
19
Conditional XPath and First
order logic
Let
be the first order language with two
binary relation symbols < and and countably
many unary predicates P, Q, . . . L
is interpreted on node labeled sibling
ordered trees in the obvious manner: < is
interpreted as the descendant relation R ,
as the strict total order R on the siblings,
and the unary predicates P as the sets of
nodes labeled with P.
20
Conditional XPath and First
order logic – Cont.
Theorem 4. Every
formula in one free variable
is, on ordered trees, equivalent to an CXPath filter
expression.
Meaning: For each filter expression fexpr
εR(n, fexpr) is true if and only if
*
n Є answerR (/ :: *[fexpr ])
Note that we do not make a restriction to finite trees.
The result holds for the class of all trees.
21
Defining Xuntil
Define Xuntil as the propositional modal
language with four binary until–like modal
operators , , ,
The syntax is given by the grammar
The pi are propositional variables and
22
Defining Xuntil – Cont.
Formally, a model R is a structure (T, h), with T a
tree and h an assignment function from the set of
propositional variables to the powerset of the set
of tree nodes.
Truth of a formula is defined relative to a model R
and a node n in that model via the following
recursive definition:
23
Example
h(p) = {n1’’, n2’’, n3’’}, h(q) = {n’}
Φ = q, ψ = p, n |=
(q, p)
n
n1’’
n2’’
n3’’
n’
p
p
p
q
24
Xuntil and CXPath
Proposition 6.
Every Xuntil formula is, on ordered trees,
equivalent to an CXPath filter expression.
25
proof of Theorem 4
Instead of proving that Theorem directly we show
expressive completeness of Xuntil , which is
sufficient by Proposition 6.
We say that Xuntil is expressively complete
if for every
formula Φ(x) there exists an Xunti
formula σ such that for every tree model R, for
all nodes n in R,
R|=Φ(n) if and only if R,n |= Φ.
The key idea of the proof is the brilliant notion of
separation.
26
Separation
Let T be a tree and a node t є T. Define the following
partition on T:
Now let h, h’ be two assignments. We say that h, h’
agree on the future of t iff for any atom q and any
s є future(t), s є h(q) iff s є h’(q).
We similarly define this notion for the present, past,
left and right.
27
Example
h and h’ agree on the future of t:
h’:
h:
t
q1
t
q2
s1
s2
s1
s2
q2
q3
q2
q3
28
Example – Cont.
h and h’ do not agree on the future of t:
h’:
h:
t
q1
t
q2
s1
s2
s1
s2
q2
q3
q2
q3, q4
29
Separation – Cont.
We say that a wff A is a pure future wff iff for each
tree T, for all t є T, for all assignments h, h’, if h, h’
agree on the future of t, then t, h |= A iff t, h’ |= A.
Similarly, we define pure present, past, left and right
wffs.
We say that a wff A is separable iff there exists a wff
which is a boolean combination of pure present,
future, past, left and right wffs and is equivalent to A
everywhere on any tree.
30
Example – Not a Pure Future
wff
(q4, q5)
h:
q2 s1
q4 s5
h’:
s4 q5
s4 q6
s3 q4
s3 q4
t
q1
t
q1
s2
q3
q2 s1
s2
s2 q3
q4 s5
31
Example –Pure Future wff
(q2, q4)
h:
q2 s1
q4 s5
h’:
s4 q5
s4 q6
s3 q4
s3 q4
t
q1
t
q1
s2
q3
q2 s1
s2
s2 q3
q4 s5
32
Separation – Cont.
Theorem 7.
If every Xuntil wff is separable over trees,
then Xuntil is expressively complete.
Theorem 8.
Each Xuntil wff is, over trees, separable.
Corollary:
Xuntil is expressively complete.
33
Separation – Cont.
CXPath
Xuntil
Separable
Expressively
Complete
34
Separation – Cont.
The next lemma describes a syntactic
criterion for pure wffs.
Theorem 8 is shown by rewriting each wff
into a boolean combination of wffs
satisfying these syntactic criteria.
35
Separation – Cont.
Lemma 9.
1. Each boolean combination of atoms is a pure present wff.
2. Each boolean combination of wffs in whose scope occur
only atoms, ,
and
wffs is a pure future wff.
3. Each boolean combination of
wffs in whose scope occur
only atoms,
wffs and pure future wffs is a pure right
wff.
4. Each boolean combination of
wffs in whose scope occur
only atoms,
wffs and pure future wffs is a pure left wff.
5. Each boolean combination of wffs in whose scope occur
only atoms, wffs and pure left and right wffs is a pure
past wff.
36
Separating Formulas
We shall describe a syntactic procedure
for separating each wff into a boolean
combination of wffs of the form
described in Lemma 9. Then that
Lemma yields the theorem.
37
Separating Formulas – Cont.
what do we need to do?
1. pull out wffs from under the scope of
wffs, and conversely;
2. pull out wffs from under the scope of
, and wffs;
3. pull out wffs from under the scope of
wffs.
38
Separating Formulas – Cont.
Lemma 10. The following are valid on all
structures, for any of the four
orientations
39
Creating Macros for the Unary
Connectives.
For
one of the four arrows:
A abbreviates ( A, )
A abbreviates ( A, T)
A abbreviates A
corresponds to the “next” operator.
corresponds to its transitive closure.
A expresses that everywhere below A
holds.
40
Separating Formulas – Cont.
Call a wff horizontal if it does not contain and
wffs.
Theorem 12 (Gabbay).
Each horizontal wff is equivalent to a wff in which
no
wff occurs in the scope of a
wff and
conversely.
The next lemma allows us to bring wffs out of the
scope of horizontal wffs.
41
Separating Formulas – Cont.
Lemma 13. Let a, q, A, B be arbitrary wffs.
The following are valid over trees. They are
also valid if
is replaced everywhere by
42
Separating Formulas – Cont.
Proof of Lemma 13.
The equivalences follow from the
observation that if xR y ,
then for any ( A, B)
x | ( A, B) if and only if y | ( A, B)
43
Separating Formulas – Cont.
The only cases left are wffs in the scope of
wffs and conversely.
Lemma 14. Let a, q, A and B be atoms. Consider
the followings wffs:
44
Separating Formulas – Cont.
Each of the above wffs is equivalent, over
ordered trees, to another wff in which the
only appearances of the connective are as
+ (A, B) and if an appearance of that wff is
in the scope of , then (A, B) is in the
scope of a
or a
wff (which itself is in
the scope of the ).
45
The final step.
We now know the basic steps of the
separation proof. We simply keep pulling
out ’s from under the scope of ’s
etcetera until there are no more. Given a
wff A, this process will eventually lead to
a syntactically separated wff.
46
COMPLEXITY
Query evaluation for Core XPath can be
done in time O( |D| · |Q| ),
with |D| the size of the data and
|Q| the size of the query.
CXPath is more expressive, but the upper
bound remains.
47
CONCLUSION
We defined an easy to use, variable free XPath
dialect which is expressively complete with
respect to first order logic when interpreted
on ordered trees.
According to Marx, the lack of variables is one
of the reasons for the success of XPath, so it
is nice to know that they are not needed for
expressivity reasons.
Besides that, query evaluation for CXPath can
still be done in linear time.
48