Storage-IITB
Download
Report
Transcript Storage-IITB
XML Storage
Maya Ramanath
Database Systems Lab
SERC, IISc
How should we store XML?
File system
Text files - differently represented during
processing
Text files - with additional indexes
Database systems
Relational
Object-oriented, etc.
File System
Advantages
Easy to store
Document granularity is maintained
Disadvantages
Query processing
Database Systems
Advantages
Well understood technology
Optimized for query processing
Disadvantages
XML does not directly map to any data model
Not amenable to a rigid schema
Example
<STUDENT>
<NAME> Maya </NAME>
<ADDRESS>
<INSTITUTE> IISc </INSTITUTE>
<CITY> Bangalore </CITY>
</ADDRESS>
</STUDENT>
<STUDENT>
<NAME> Charuta </NAME>
<ADDRESS> IIT, Mumbai </ADDRESS>
</STUDENT>
Example (contd.)
STUDENT
NAME
“Maya”
STUDENT
ADDRESS
INSTITUTE
“IISc”
CITY
“Bangalore”
NAME
“Charuta”
ADDRESS
“IIT, Mumbai”
XML to Relational
1. STUDENT1 (KEY, NAME, INSTITUTE, CITY)
STUDENT2 (KEY, NAME, ADDRESS)
STUDENT
NAME
“Maya”
STUDENT
ADDRESS
INSTITUTE
“IISc”
CITY
NAME
“Charuta”
“Bangalore”
ADDRESS
“IIT, Mumbai”
XML to Relational (contd.)
2. STUDENT (KEY, NAME)
ADDRESS1 (KEY, INSTITUTE, CITY)
ADDRESS2 (KEY, ADDRESS)
3. STUDENT (KEY, NAME, INSTITUTE, CITY, ADDRESS)
STUDENT
NAME
“Maya”
STUDENT
ADDRESS
INSTITUTE
“IISc”
CITY
NAME
“Charuta”
“Bangalore”
ADDRESS
“IIT, Mumbai”
STORED (Deutsch et al.)
A query language used to define
mappings
Q = FROM STUDENT : $X
{ NAME : $N,
ADDRESS : {INSTITUTE : $I, CITY : $C} }
STORE STUDENT($X, $N, $I, $C)
STORED (contd.)
Q = FROM STUDENT : $X
{ NAME : $N,
ADDRESS : {INSTITUTE : $I, CITY : $C} }
STORE STUDENT($X, $N, $I, $C)
Mappings generated through data-mining
algorithms
Non-conforming data stored in overflow
graphs
XML to OO (Christophides et al.)
An OO schema is derived based on the
DTD
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
STUDENT (NAME, ADDRESS)>
NAME (#CDATA)>
ADDRESS (INSTITUTE, CITY)>
INSTITUTE (#CDATA)>
CITY (#CDATA)>
XML to OO (contd.)
STUDENT
NAME
TEXT
class STUDENT public type tuple
( name: NAME,
address: ADDRESS )
ADDRESS
INSTITUTE
TEXT
CITY
TEXT
class ADDRESS public type tuple
( institute: INSTITUTE,
city: CITY )
class NAME inherit Text
class ADDRESS inherit Text
class INSTITUTE inherit Text
class CITY inherit Text
Natix (U. Mannheim)
A basic record manager is used
Each node or set of nodes is stored in a
record
Splitting strategies on the tree can be
employed to optimize query processing
Natix (contd.)
STUDENT
NAME
Maya
STUDENT
NAME
“Maya”
ADDRESS
ADDRESS
INSTITUTE
CITY
“IISc” “Bangalore”
INSTITUTE
IISc
CITY
Bangalore
Summary of issues involved
Determining the ‘best’ mapping
Space occupied
Data fragmentation
Support for ‘overflow’ data
Lossless…?
Translation of XML queries to DB queries
Reconstruction of XML documents from
the DB