Storage-IITB

Download Report

Transcript Storage-IITB

XML Storage
Maya Ramanath
Database Systems Lab
SERC, IISc
How should we store XML?
File system
Text files - differently represented during
processing
Text files - with additional indexes
Database systems
Relational
Object-oriented, etc.
File System
Advantages
Easy to store
Document granularity is maintained
Disadvantages
Query processing
Database Systems
Advantages
Well understood technology
Optimized for query processing
Disadvantages
XML does not directly map to any data model
Not amenable to a rigid schema
Example
<STUDENT>
<NAME> Maya </NAME>
<ADDRESS>
<INSTITUTE> IISc </INSTITUTE>
<CITY> Bangalore </CITY>
</ADDRESS>
</STUDENT>
<STUDENT>
<NAME> Charuta </NAME>
<ADDRESS> IIT, Mumbai </ADDRESS>
</STUDENT>
Example (contd.)
STUDENT
NAME
“Maya”
STUDENT
ADDRESS
INSTITUTE
“IISc”
CITY
“Bangalore”
NAME
“Charuta”
ADDRESS
“IIT, Mumbai”
XML to Relational
1. STUDENT1 (KEY, NAME, INSTITUTE, CITY)
STUDENT2 (KEY, NAME, ADDRESS)
STUDENT
NAME
“Maya”
STUDENT
ADDRESS
INSTITUTE
“IISc”
CITY
NAME
“Charuta”
“Bangalore”
ADDRESS
“IIT, Mumbai”
XML to Relational (contd.)
2. STUDENT (KEY, NAME)
ADDRESS1 (KEY, INSTITUTE, CITY)
ADDRESS2 (KEY, ADDRESS)
3. STUDENT (KEY, NAME, INSTITUTE, CITY, ADDRESS)
STUDENT
NAME
“Maya”
STUDENT
ADDRESS
INSTITUTE
“IISc”
CITY
NAME
“Charuta”
“Bangalore”
ADDRESS
“IIT, Mumbai”
STORED (Deutsch et al.)
A query language used to define
mappings
Q = FROM STUDENT : $X
{ NAME : $N,
ADDRESS : {INSTITUTE : $I, CITY : $C} }
STORE STUDENT($X, $N, $I, $C)
STORED (contd.)
Q = FROM STUDENT : $X
{ NAME : $N,
ADDRESS : {INSTITUTE : $I, CITY : $C} }
STORE STUDENT($X, $N, $I, $C)
Mappings generated through data-mining
algorithms
Non-conforming data stored in overflow
graphs
XML to OO (Christophides et al.)
An OO schema is derived based on the
DTD
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
STUDENT (NAME, ADDRESS)>
NAME (#CDATA)>
ADDRESS (INSTITUTE, CITY)>
INSTITUTE (#CDATA)>
CITY (#CDATA)>
XML to OO (contd.)
STUDENT
NAME
TEXT
class STUDENT public type tuple
( name: NAME,
address: ADDRESS )
ADDRESS
INSTITUTE
TEXT
CITY
TEXT
class ADDRESS public type tuple
( institute: INSTITUTE,
city: CITY )
class NAME inherit Text
class ADDRESS inherit Text
class INSTITUTE inherit Text
class CITY inherit Text
Natix (U. Mannheim)
A basic record manager is used
Each node or set of nodes is stored in a
record
Splitting strategies on the tree can be
employed to optimize query processing
Natix (contd.)
STUDENT
NAME
Maya
STUDENT
NAME
“Maya”
ADDRESS
ADDRESS
INSTITUTE
CITY
“IISc” “Bangalore”
INSTITUTE
IISc
CITY
Bangalore
Summary of issues involved
Determining the ‘best’ mapping
Space occupied
Data fragmentation
Support for ‘overflow’ data
Lossless…?
Translation of XML queries to DB queries
Reconstruction of XML documents from
the DB