Sedna: A Native XML DBMS

Download Report

Transcript Sedna: A Native XML DBMS

Sedna: A Native XML DBMS
Andrey Fomichev
Maxim Grinev
Sergey Kuznetsov
Institute for System Programming of RAS
SOFSEM 2006
23 January
Agenda
Sedna overview and goals
 Data organization
 Memory management
 Query evaluation
 Conclusion

Challenges
Fernandez, M.F., Semeon, J.: Growing XQuery.
ECOOP 2003
 Extending XQuery with data update facilities
 Growing XQuery to a program language
Physical layer for supporting these aspects is
required. The layer is primarily based on
 Data structures
 Memory management
Sedna Overview






Full-featured database system (external and
main memory management, query and
update facilities, concurrency etc.)
Native XML database
Based on the XQuery language and the
XQuery/XPath data model
XUpdate language
Implemented in Scheme and C/C++
Supported platforms are Windows and Linux
Agenda

Sedna overview and goals
 Data
organization
Memory management
 Query evaluation
 Conclusion

Data Organization

Descriptive schema driven storage strategy is
used, which consists in clustering nodes of
XML document according to their position in
descriptive schema
 Direct pointers are used to represent relations
between nodes of an XML document such as
parent, child and sibling relationships
Descriptive Schema (Data Guide)
<library>
<book>
<title>Foundation on databases</title>
<author>Abiteboul</author>
<author>Hull</author>
<author>Vianu</author>
</book>
. . .
<book>
<title>An Introduction to Database
Systems</title>
<author>Date</author>
<issue>
<publisher>Addison-Wesley</publisher>
<year>2004</year>
</issue>
</book>
<paper>
<title>A Relational Model for Large Shared
Data Banks</title>
<author>Codd</author>
<paper>
. . .
<paper>
<title>The Complexity of Relational Query
Languages</title>
<author>Codd</author>
<paper>
</library>
library
book
title
paper
author
issue
publisher
title
year
/child::library/child::book/child::title
book
Data Structures
title
...
Indirection
table
parent
prev-in-block
left-sibling
node handle
label
children “by descriptive schema”
next-in-block
right-sibling
Structural query efficiency
When we answer structural queries like
/child::library/child::book/child::title
We
 Read only blocks containing necessary
information and do not read other blocks
 Every block, which is being read, does
contain only those nodes that are to be in the
answer
Node updates efficiency

Node descriptors
have fixed size aside
the block
 Node descriptors are
partly ordered
 Immutable numbering
scheme
 Indirection table for
parents
parent
leftsibling
node
rightsibling
indirection
table
child
…
child
Agenda
Sedna overview and goals
 Data organization

 Memory
management
Query evaluation
 Conclusion

Memory Management


Pointers are used to present relationships between
nodes and traversing nodes results in intensive
pointer dereferencing, so the dereferencing operation
should be effective
Database address space should be big enough to
represent large volumes of data
OS memory management restrictions
 Restriction on the size of address space caused by
32-bit architecture that prevails nowadays
 We can’t control the page replacement (swapping)
procedure
Layered Address Space (LAS)
Transaction
process
Layered Address
Space
(layer, addr)
addr
OS Virtual Process
Address Space
MapViewOfFile(Windows)
mmap (Linux)
Buffer
Manager
Buffer Memory
VirtualLock (Windows)
mlock (Linux)
layer * LAYER_SIZE + addr
External
Memory
(Disk)
Sedna Memory Management
Benefits



Emulating 64-bit virtual address space on the
standard 32-bit architecture allows removing
restrictions on the size of database
Pointer dereferencing in LAS is comparable to
dereferencing of ordinary pointer in a low-level
programming language because we map the layer to
process virtual address space on an equality basis
The same pointer representation in main and
secondary memory is used that allows avoiding
costly pointer swizzling
Agenda
Sedna overview and goals
 Data organization
 Memory management

 Query

evaluation
Conclusion
Query Evaluation Aspects

Suspended element constructors
 Different strategies for XPath queries
evaluation
 Combining Lazy and Strict Semantics
Element constructors
XML element construction requires
deep copy of its content (so, the
operation is heavy)
 Suspended element constructors (the
copy is performed on demand when
some operation gets into the
constructed element)

Different strategies for XPath
queries evaluation
library
book
title
/library/book[issue/year=2004]
paper
author
issue
publisher
title
book
year
/library/book/issue/
year[.=2004]/../..
Combining Lazy and Strict
Semantics (1)

Iterative result computation (open; next;
close)
 Iterative result computation with functional
programming language give lazy evaluation
 On the other hand, strict semantic of a
language is more efficient comparing with
lazy semantics
 So, we combine strict and lazy semantics for
XQuery
Combining Lazy and Strict
Semantics (2)

Query evaluations starts in lazy mode
 Every function call is a reason to switch to
strict mode if the sizes of arguments are
relatively small
 The large input sequence for any physical
operation in the strict mode is the subject to
switch to lazy mode
Conclusion

Efficient evaluation of structured XPath
queries
 Local node-level updates
 Effective processing of XML data in main
memory comparable to general purpose
programming language
Thank you for your attention
You can find more about Sedna at
http://modis.ispras.ru/Development/sedna.htm