Sedna: A Native XML DBMS
Download
Report
Transcript Sedna: A Native XML DBMS
Sedna: A Native XML DBMS
Andrey Fomichev
Maxim Grinev
Sergey Kuznetsov
Institute for System Programming of RAS
SOFSEM 2006
23 January
Agenda
Sedna overview and goals
Data organization
Memory management
Query evaluation
Conclusion
Challenges
Fernandez, M.F., Semeon, J.: Growing XQuery.
ECOOP 2003
Extending XQuery with data update facilities
Growing XQuery to a program language
Physical layer for supporting these aspects is
required. The layer is primarily based on
Data structures
Memory management
Sedna Overview
Full-featured database system (external and
main memory management, query and
update facilities, concurrency etc.)
Native XML database
Based on the XQuery language and the
XQuery/XPath data model
XUpdate language
Implemented in Scheme and C/C++
Supported platforms are Windows and Linux
Agenda
Sedna overview and goals
Data
organization
Memory management
Query evaluation
Conclusion
Data Organization
Descriptive schema driven storage strategy is
used, which consists in clustering nodes of
XML document according to their position in
descriptive schema
Direct pointers are used to represent relations
between nodes of an XML document such as
parent, child and sibling relationships
Descriptive Schema (Data Guide)
<library>
<book>
<title>Foundation on databases</title>
<author>Abiteboul</author>
<author>Hull</author>
<author>Vianu</author>
</book>
. . .
<book>
<title>An Introduction to Database
Systems</title>
<author>Date</author>
<issue>
<publisher>Addison-Wesley</publisher>
<year>2004</year>
</issue>
</book>
<paper>
<title>A Relational Model for Large Shared
Data Banks</title>
<author>Codd</author>
<paper>
. . .
<paper>
<title>The Complexity of Relational Query
Languages</title>
<author>Codd</author>
<paper>
</library>
library
book
title
paper
author
issue
publisher
title
year
/child::library/child::book/child::title
book
Data Structures
title
...
Indirection
table
parent
prev-in-block
left-sibling
node handle
label
children “by descriptive schema”
next-in-block
right-sibling
Structural query efficiency
When we answer structural queries like
/child::library/child::book/child::title
We
Read only blocks containing necessary
information and do not read other blocks
Every block, which is being read, does
contain only those nodes that are to be in the
answer
Node updates efficiency
Node descriptors
have fixed size aside
the block
Node descriptors are
partly ordered
Immutable numbering
scheme
Indirection table for
parents
parent
leftsibling
node
rightsibling
indirection
table
child
…
child
Agenda
Sedna overview and goals
Data organization
Memory
management
Query evaluation
Conclusion
Memory Management
Pointers are used to present relationships between
nodes and traversing nodes results in intensive
pointer dereferencing, so the dereferencing operation
should be effective
Database address space should be big enough to
represent large volumes of data
OS memory management restrictions
Restriction on the size of address space caused by
32-bit architecture that prevails nowadays
We can’t control the page replacement (swapping)
procedure
Layered Address Space (LAS)
Transaction
process
Layered Address
Space
(layer, addr)
addr
OS Virtual Process
Address Space
MapViewOfFile(Windows)
mmap (Linux)
Buffer
Manager
Buffer Memory
VirtualLock (Windows)
mlock (Linux)
layer * LAYER_SIZE + addr
External
Memory
(Disk)
Sedna Memory Management
Benefits
Emulating 64-bit virtual address space on the
standard 32-bit architecture allows removing
restrictions on the size of database
Pointer dereferencing in LAS is comparable to
dereferencing of ordinary pointer in a low-level
programming language because we map the layer to
process virtual address space on an equality basis
The same pointer representation in main and
secondary memory is used that allows avoiding
costly pointer swizzling
Agenda
Sedna overview and goals
Data organization
Memory management
Query
evaluation
Conclusion
Query Evaluation Aspects
Suspended element constructors
Different strategies for XPath queries
evaluation
Combining Lazy and Strict Semantics
Element constructors
XML element construction requires
deep copy of its content (so, the
operation is heavy)
Suspended element constructors (the
copy is performed on demand when
some operation gets into the
constructed element)
Different strategies for XPath
queries evaluation
library
book
title
/library/book[issue/year=2004]
paper
author
issue
publisher
title
book
year
/library/book/issue/
year[.=2004]/../..
Combining Lazy and Strict
Semantics (1)
Iterative result computation (open; next;
close)
Iterative result computation with functional
programming language give lazy evaluation
On the other hand, strict semantic of a
language is more efficient comparing with
lazy semantics
So, we combine strict and lazy semantics for
XQuery
Combining Lazy and Strict
Semantics (2)
Query evaluations starts in lazy mode
Every function call is a reason to switch to
strict mode if the sizes of arguments are
relatively small
The large input sequence for any physical
operation in the strict mode is the subject to
switch to lazy mode
Conclusion
Efficient evaluation of structured XPath
queries
Local node-level updates
Effective processing of XML data in main
memory comparable to general purpose
programming language
Thank you for your attention
You can find more about Sedna at
http://modis.ispras.ru/Development/sedna.htm