Natix - Al Akhawayn University
Download
Report
Transcript Natix - Al Akhawayn University
Natix
Done by
Asmaa Hassanain
CSC 5370
Dr. Hachim Haddoutti
12/8/2003
Contents
XML data management Techniques
What is Natix
Natix Architecture
Storage Layer: Logical Data Model
Mapping between XML and the Logical Model
XML page Interpreter Storage Formater
XML segment mapping for large trees
Index Structures
Natix Physical Algebra
Example Plans
To do...
CSC 5370 XML and Data Management
2
XML data management
Techniques
Map data to relational database
But:
Store data as a plain text file
Unnormalized relations
But:
Data as
centric
view: Large number
Storedata
objects
Need to parse the entire file for
of
tables
But:
processing every query
Designing
Nativecentric
XML database
Document
all
OOD systems are view:
not enough
systemsinformantion
from
scratch
a single
data item
developed to in
provide
efficient
querying
capabilities
(e.g.
CLOB)
CSC 5370 XML and Data Management
3
Natix
CSC 5370 XML and Data Management
4
What is Natix?
Natix is a native XML Repository
Proposed by Kanne and Moerkotte at
University of Mannheim (Germany)
Natix requires Linux to run (kernel
2.2.16 or later, or 2.4.*), with CODA
support enabled in the kernel.
Still under development
CSC 5370 XML and Data Management
5
Natix Architecture
CSC 5370 XML and Data Management
6
Natix Architecture
Binding Layer: map between the Natix
Engine Interface and different
application interfaces
CSC 5370 XML and Data Management
7
Natix Architecture
e. g. NatixFS:
File system interface – Natix can be mounted like
an ordinary file system
Allows to view XML tree as a file system tree
Importing a document – just copy it to a
directory, e.g. cp bib.xml /natix
Exporting a document – just open it, e.g.
more /natix/bib.xml
Removing a document – just delete a file, e.g.
rm /natix/bib.xml
XPath expressions – just use it as file name, e.g.
more /natix/{%%title}
CSC 5370 XML and Data Management
8
Natix Architecture
Service Layer: Provides all DBMS
functionality required in addition to
simple storage and retrieval
Natix Engine Interface
Query execution engine
Query compiler
Transaction manager
Object manager
CSC 5370 XML and Data Management
9
Natix Architecture
Natix Engine Interface:
The interface through which the
database services communicate with
each other and with applications
provides a unified facade to specify
requests to the database system.
CSC 5370 XML and Data Management
10
Natix Architecture
Query compiler: translates queries
expressed in XML query languages
into optimized query execution plans
CSC 5370 XML and Data Management
11
Natix Architecture
Query execution engine: evaluates
queries
Interprets the plan passed by the
query compiler
Able to execute all queries
expressible in a typical XML query
language like XQuery
CSC 5370 XML and Data Management
12
Natix Architecture
Transaction management : contains
classes that provide ACIDstyle
transactions + Components for
recovery
adapt the ARIES protocol for
recovery
For synchronization, an S2PLbased
scheduler is introduced
CSC 5370 XML and Data Management
13
Natix Architecture
Storage Layer: manages all persistent data
structures and their transfer between main
and secondary memory .
contains classes for efficient XML storage,
indexes and metadata storage.
manages the storage of the recovery log and
controls the transfer of data between main
and secondary storage.
accesses raw disks or file system files and
provides a memory space divided into
segments, which are a linear collection of
equal-sized pages.
CSC 5370 XML and Data Management
14
Storage Layer: Logical Data Model
Logical Data Model: logical tree
New nodes can be inserted as children
or siblings of existing nodes
Any node can be removed
Individual documents are represented
as ordered trees
CSC 5370 XML and Data Management
15
Mapping between XML and the
Logical Model
A small wrapper class is used to map the
XML model with its node types and
attributes to a simple tree model and vice
versa:
Elements are mapped one to one to tree
nodes of Logical Data Model
Atributes are mapped to child nodes of an
additional attribute container child node
The name of referenced entities are
retained in special internal nodes
CSC 5370 XML and Data Management
16
XML page Interpreter Storage
Formater
The logical data tree is partitioned
into subtrees
Each sudtree is stored in a single
record of variable lenght
Each record contains a pointer to
the record containing the parent
node and the document identifier
CSC 5370 XML and Data Management
17
XML page Interpreter Storage
Formater
Subtrees of original XML document are
stored together in a single physical record
clusters connected subtrees of the
document tree into large records and
represents intra-record references
differently from inter-record references
The inner structure of the subtrees is
retained
CSC 5370 XML and Data Management
18
XML segment mapping for large
trees
Proxy nodes refer to connected subtrees not stored in
the same record
Helper aggregate nodes group together a subset of
children of a node
CSC 5370 XML and Data Management
19
Index Structures
Natix uses two Index Structures:
Full text index framework
(inverted
files): store lists of document
Index
eXtended Access Support Relation
references
to indicate
Map
search
terms in
to which
list identifier and
List
Manager
Preserves the parent/child, ancestor/
documents
search
terms
appear
store
these
mappings
persistenly
Maps
the
list
identifiers
to
the
actual
descandant,
and
preceding/following
FragmentedList
lists
(managing
the
directory
offor
the
Provides
the
main
interface
relationships
between
nodes
Lists
are
divided
to
fragments
that
fit the
on a
ContextDescription
inverted file)
page to
+combined
linked
+
can
be
traversed
user
work together
with
inverted
files
The XASR
with
a
full
text
Establishes the actual representation
sequentially
index
a powerful
in provides
which data
is storedmethod
in a list
It manages
all the fragments
of one list and
to search
on contentens
of nodes
control insertions and deletions on this list
CSC 5370 XML and Data Management
20
Natix Physical Algebra
‘Let’, ‘for’, ‘where’ and ‘return’ in
XQuery are supported
‘Select’, ‘map’, ‘join’, ‘grouping’ and
‘sort’ operations are performed by
standard algebraic operators
borrowed from relational context
‘D-join’ and ‘unary and binary
grouping’ are borrowed from the
object oriented context
CSC 5370 XML and Data Management
21
Natix Physical Algebra
Scan operations: e. g. ExpressionScan
ExpressionScan: generates a tuple containing
the root of the document identified by its
name by evaluating a given expression
UnnestMap is used to generate variable bindings
for XPath expressions
e.g./a//b/c UnnestMap$4=child($3,c)(
UnnestMap$3=desc($2,b)(
UnnestMap$2=child($1,a)([$1])))
‘BA-Map’, ‘FL-Map’, ’Groupify-GroupApply’ and
‘NGroupify-NGroupApply’ are use to construct
the XML result
CSC 5370 XML and Data Management
22
Example Plans (1):
This query retrieves the
title and the year for all
recent books
CSC 5370 XML and Data Management
23
Example Plans (2):
CSC 5370 XML and Data Management
24
To do...
Support for functions inside XPath
expressions
Cannot import DTDs as of now
Support for different character encodings
Support for XML namespaces
preparing for the launch of the first
full commercial end-user release of
Natix that may support all these
features
CSC 5370 XML and Data Management
25
Questions ?
CSC 5370 XML and Data Management
26
References
Natix: A Technology Overview:
http://pi3.informatik.unimannheim.de/publications.html#79
Efficient storage of XML data:
http://pi3.informatik.unimannheim.de/publications.html#79
Anatomy of a Natix XML base Management
System:
http://pi3.informatik.unimannheim.de/publications.html#79
Alebraic XML Construction and its Optimization
in Natix:
http://pi3.informatik.unimannheim.de/publications.html#79
Data ex machina:
www.dataexmachina.de/natix.html
CSC 5370 XML and Data Management
27
Thank You
CSC 5370 XML and Data Management
28