Conflict Scheduling of Transactions on XML Documents

Download Report

Transcript Conflict Scheduling of Transactions on XML Documents

Scheduling of Transactions on
XML Documents
Author: Stijin Dekeyser
Jan Hidders
Reviewed by Jason Chen, Glenn,
Steven, Christian
Native XML Database
• It’s a database that stores and retrieves
XML documents efficiently
• The query language should be based on
XML, and the structure of XML
• The indexes should ensure that the fast
execution of queries against XML
documents
G.Dobbie 2004
Pros and cons of NXD compare to
relational Database
Pros
Cons
• NXD is more efficient • NXD is slow to
in terms of documents
retrieve a view
retrieval
compare to relational
database
• NXD requires less
system space
compare to relational
DB since it store less
nulls and tables
compare to relational
DB
G.Dobbie 2004
Motivation for conflict schduler
• NXD becomes more popular for
companies to use
• Currently there is no NXD scheduler that
can guarantee high concurrency and
serializability
• State of the art of concurrency control in
NXD is currently locking the entire
document
Data Model of xml document in
Conflict scheduler
• Similar to Xpath data model representation
n
1
<document id = “0”>
<person id = “1”, age = “22”>
<name>Jason</name>
<address>33 Colmar Rd
</address>
<occupation>Student</occupation>
<father>
<person id = 3, age =
“54”>
Roo
t
n
2
n
3
n
4
<name>Ming</name>
</person>
Id =
1
Name =
“Jason”
Figure 1
elemen
tattribut
e
tex
t
Occupation=
“Research Assistant”
</father>
</person>
<person id = “2”, age = “23”>
<name>Tanya</name>
<occupation>Office
Assistant</occupation>
</person>
</document>
figure 1
Figure 2
Data Manipulation language in
conflict scheduler
• DML is used to query a resultset in a DB,
in this case it is a NXD DB
• Also similar to Xpath DML
• P ::= F | P/ F | P// F
• F ::= E | *
• Where f can be root or an element of a xml
document
Dekeyser, S. & Hidders, J. (2004)
Operations in Conflict Scheduler
• A(n, a)
• D(n)
• Q(n, p)
This is the update operation which add a new
node a to an existing node n
The new node a is returned as the result of
the operation
The operation fails if the document can not
represent as a tree structure.
The update operation delete an existing node
n from the document tree
This operation does not return any result
This operation fails if the document structure
is not a tree after the updates
This query operation return all the nodes
which is connected from node n by path p
Dekeyser, S. & Hidders, J. (2004)
Locking Scheme in Conflict
Scheduler
• Locking scheme is used to lock the node
when one user is accessing it. In this way
it prevents other user to modify it at the
same time to protect data integrity
• There are 2 locking scheme involved in
the conflict scheduler
– Path Lock Propagation Scheme
– Path Lock Satisfiability Scheme
Dekeyser, S. & Hidders, J. (2004)
Path Lock Propagation Scheme
• In this locking scheme, if node n is locked,
all nodes which connect from n with path
expression “p” is also locked
<a1(Q(, document/person/father/person), t1)>
will perform the locking of this node
rl(t1, r, document/person/father/person)
the propagation process will also perform the following locking
rl(t1, n1, person/father/person)
rl(t1, n2, father/person)
rl(t1, n3, father/person)
rl(t1, n8, person)
Dekeyser, S. & Hidders, J. (2004)
Path Lock Satisfiability Scheme
• This locking generates less locks than
Path Lock Propagation Scheme but is
hard to complex in terms of testing conflict
therefore can also effect the efficiency in
terms of processing speed of the database
system.
<a1(Q(, document/person/father/person), t1)>
will perform the locking of this node
rl(t1, r, document/person/father/person)
Dekeyser, S. & Hidders, J. (2004)
The Conflict Scheduler
• The conflict scheduler is the automaton
whose state consists of a schedule S of
actions that it has previously accepted and
processed, a set of locks L, a dependency
graph G which is directed graph whose
nodes are transaction identifiers, and an
instance graph I.
Dekeyser, S. & Hidders, J. (2004)
The Conflict Scheduler (cont)
• Consider the following schedule S in given
the instance graph in figure 1
S in = < a1( Q(r,document/person/occupation, t1),
a2 (Q(r, document), t2),
a3 (A (n1, person), t2),
a4( A(n28, occupation),
a5(D ( n22)>
The Conflict Scheduler (cont)
• Initially :
• S: empty, G empty
• First operation ( Q(r,
document/person/occupation, t1)
dependency graph G1
t1
with only 1 node
The Conflict Scheduler (cont)
• Second operation:
document), t2)
• Dependency graph G2
t1
a2 (Q(r,
t2
• two nodes without any edge between them
The Conflict Scheduler (cont)
• Third operation: a3 (A (n1, person), t2)
Root
n1
new person node
n2
n3
n28
n4
n20
n5
Id = 1
n9
n21
Name = “Jason”
element
attribute
n22
n23
Occupation=
“Research Assistant”
text
• Dependency graph G3 is still the same
to the G2 from the second operation
The Conflict Scheduler (cont)
• Forth operation: a4( A(n28, occupation)
t1
• This
operation conflicts with operation a1. ie the
node a4 required (occupation is locked by a1 ) thus,
G4 now contains an edge from t1 to t2, after a4, no
further conflicts appear. The conflict scheduler
finishes and accepts Sin as its output schedule
• Dependency graph G4:
t1
t2
The Conflict Scheduler (cont)
• So now a serial schedule equivalent to S
in is the schedule obtained by first taking
all actions of transaction t1 and then all
actions of transaction t2. The order is
based on the dependency graph.
Conclusion
• Concurrency becomes important as NXD
is gaining popularity
• Conflict scheduler has improved
concurrency compare to commit scheduler
• The serializability of conflict scheduler has
also been proven
• Future development will aim to realize it in
the industry