Transcript Khan - EECS

A Performance Evaluation of
Alternative Mapping Schemes
for Storing XML Data in a
Relational Database
By
Daniela Floresu
Donald Kossmann
Presented by:
Intakhab Mehboob Khan
Table of Contents
•
•
•
•
•
Introduction
Approaches to Store Semi-Structured Data
Data Model for Semi-Structured Data
Query Language and XML-QL
Storing XML Data in Relational Database
– Mapping Attributes
– Mapping Values
• Evaluating the Mapping Schemes
• Conclusion
Introduction
• August 3, 1999
• How XML data can be stored and Queried
• Presented alternative Mapping Schemes to
Store XML data
• Performance experiments that analyze the
tradeoffs of the schemes
Approaches to Store SemiStructured Data
• Special Purpose Database System
– Examples are Lore, Rufus and Strudel
– Store and retrieve xml data, using specially
designed structures and indices
• Object Oriented Database
– Example is O2 or Objectsore
– Rich data modeling capabilities of OODMS are
exploited
• Standard Relational Database System
– Data is mapped in tables of a relational schema
Data Model for Semi-Structured
Data
• Characteristics of Semi-Structured Data
– Schema is not given in advance, may be implicit
– Schema is relatively large and may be changing
frequently
– Schema is descriptive rather than perspective
– Data is not strongly typed
• Simple graph data model similar to OEM
model
Data Model for Semi-Structured Data
Query Language and XML-QL
• All query languages for semi-structure are
based on labeled graph
• Features of Semi-Structure query language
– regular path expression
– ability to query the schema
• In addition, XML-QL restructuring
mechanism
Storing XML Data in Relational
Database [Mapping Attributes]
• Edge Approach
–
–
–
–
Store all attributes in single table
Edge(source, ordinal, name, flag, target)
Indexing, Forward and backward traversals
Variant of Edge approach is: Store attributes
name in separate table
Storing XML Data in Relational
Database [Mapping Attributes]
• Attribute Approach
– All the attributes with the same name in one
table
– Resembles to binary storage scheme proposed
to stir semi-structure data
– Aname(source, ordinal, flag, target)
– Indexing
Storing XML Data in Relational
Database [Mapping Attributes]
• Universal Table
– Single Universal table to store all attributes of
XML document
– Universal(source, ordinaln1, flagn1, targetn1,…..)
Storing XML Data in Relational
Database [Mapping Attributes]
• Normalized Universal Table
– Multi-valued attributes are stored in separate
Overflow tables
– UnivNorm(source, ordinaln1, flagn1, targetn1,…..)
– Overflow(source, ordinal, flag, target),….
Storing XML Data in Relational
Database [Mapping Values]
• Storing values in separate table
– Value table storing all integers, dates, and all
strings
• Vtype(vid, value)
Storing XML Data in Relational
Database [Mapping Values]
• Storing values together with attributes
– Column for each data type: Inlining
– No flag is needed
– For indexing, on every value columns separately
in addition to source and target
Evaluating the Mapping Schemes
• Plan of Attack
– Size of Relational Database for each mapping
scheme
– The time to bulkload the relational database
given an XML document
– The time to reconstruct the XML document
from the relational data
– The time to execute different classes of XML
queries
– The time to execute different kinds of update
functions
Evaluating the Mapping Schemes
• Experimental Platform
– Commercial relational database system,
installed on Sun Sparc Station 20 with
• Two 75 MHZ processors
• 128MB of main memory & a disk that stores the
database and intermediate results of query
processing
– Machine runs on Solaris 2.6, with limited size of
main memory buffer to 6.4MB
– Calls to relational database from the Java
programs are implemented with JDBC
Evaluating the Mapping Schemes
• Benchmark Specification
– Benchmark Database
Evaluating the Mapping Schemes
• Benchmark Specification
– Benchmark Queries
Evaluating the Mapping Schemes
• Benchmark Specification
– Update Functions
Evaluating the Mapping Schemes
• Benchmark Specification
– Database Size
Evaluating the Mapping Schemes
• Benchmark Specification
– Bulkloading Times
Evaluating the Mapping Schemes
• Benchmark Specification
– Reconstructing the XML Document
Evaluating the Mapping Schemes
• Benchmark Specification
– Running Times of the Queries
Evaluating the Mapping Schemes
• Benchmark Specification
– Running Times of the Updates Functions
Conclusion
• Relational database has following advantages
– Mature and Scale very well
– Traditional and Semi-structured data can coexist in relational database
– RDBMS are capable of performing more complex
XML queries on large database
• Disadvantages
– Very expensive to reconstruct the original XML
data from relational database
– Components such as authorization and
concurrency control need to be implemented
outside RDBMS
Conclusion
(Cont’d)
• Alternative mapping schemes results shows:
– Attribute tables for every attribute name that
occurs in an XML document and inlining of values
into these Attributes tables is the best approach