7. XML_Native Storage
Download
Report
Transcript 7. XML_Native Storage
Storing XML using native
storage
Presented by Molato Badr
Supervised by Dr. H.Haddouti
Introduction
• XML more frequently used
development of systems that store and query xml data efficiently
• Research to improve system performance:
• Indexing paths
• Optimizing XML queries
– Storage configuration of XML data on disk
efficiency of an XML Data Management System
Outlines
I.
Native storage as a definition
II.
Several Native storage strategies
III. Comparison to DBMS storage
Native storage?
• based on the XML Data Models such as
Document Object Model (DOM),
• NXDs : a native XML database is simply a
database for storing and accessing XML using
XML.
NXDs
• NXD defines a (logical) model for an XML document,
stores and retrieves documents according to that model.
• Has an XML document as its fundamental unit of (logical)
storage, just as relational database has a row in a table as its
fundamental unit of (logical) storage.
• Documents go in and documents come out. Thus NXD
may not actually be a standalone database at all.
• NXD is intended to developer by providing robust storage
and manipulation of XML documents.
• NXDs manage collections of documents, allowing you to
query and manipulate those documents as a set.
Native storage strategies
• Schema independent
– Subtree-based strategy (Natix)
– Document based strategy (Apache Xindice system)
– Element based strategy (TIMBER)
each element node is a record.
• OrientStore two schema-guided storage
strategies:
– Element-Based Clustering (EBC),
– Logical partition-Based Clustering (LPC) strategies.
Subtree-strategy (Natix)
• Natix (University of Mannheim, Germany)
– Semantically partition large document into subtrees based on
tree structure
– Store each subtree in one record (unit of storage) that is
atomic
– Proxy nodes are used to connect subtrees in different records
– Primitives for read/write/insert/delete of element
– Record size need not be statically configured, can be a
dynamic value; adapting to the size and structure of
document at runtime
– Reconstruction of original tree by replacing proxies by
subtrees
Document based strategy (Apache
Xindice system)
• No mapping to relational required
• Stores documents in tokenized form
• Provides quick fragment retrieval
• Supports optimized XML querying
Document based strategy (Apache
Xindice system) cont’
•
•
•
•
•
•
•
Basic unit of data is a Document
Sets of Documents are Collections
Collections may contain Collections
Think of it as a file system for XML
Collections may be indexed
Collections may maintain XMLObjects
XMLObjects are like Stored Procedures
Element-based strategy (TIMBER)
Element-based strategy (TIMBER)
• Build on Shore (responsible for disk management)
• takes an XML document as input, produces a parse tree as
output.
• Takes each node of this parse tree as it is produced, transforms
it into an internal representation
• Stores it into shore as an atomic unit of storage
• Each node corresponds to an element. Child nodes for subelement.
• All attributes of an element node are clubbed into a single node
Stored as a child node of that element.
• The content of an element node is pulled out in a child node.
• Mixed content: each pulled out in a separate child node.
Schema guided strategy
(OrientStore)
• EBC (Element-Based clustering) similar to Elementbased strategy but clusters the element records such
that records with the same schemaNodeID.
• LPC (Logical partition-based clustering): The Logical
Partition-Based Clustering (LPC) storage
strategy partitions the schema graph into semantic blocks.
• A semantic block describes a relatively integrated logical
unit.
EBC (Element based clustering)
Clusters all the elements title
together with all their text
values together.
LPC (logical partition-Based
strategy)
Book and its children title and publisher form a
semantic block.
•
• Records are instances of the formed semantic
blocks: v (n, b1, b2) instance of vendor (name,
book).
Logical Partition-Based Clustering
• all the instances of the same semantic block
are clustered together. Thus the records b1 (p1,
t1) and b2 (p2, t2) in Figure 2(b) will be stored in
a physical page,
• v (n, b1, b2) may be stored in another physical
page.
N.B.: Lies between Subtree based strategy
and element-based strategy
Comparison with DBMS