ppt - Intelligent Data Systems Laboratory
Download
Report
Transcript ppt - Intelligent Data Systems Laboratory
Relational-Style XML Query
Taro L. Saito, Shinichi Morishita
University of Tokyo
June 10th, SIGMOD 2008
Intelligent Database Systems Lab
School of Computer Science & Engineering
Seoul National University, Seoul, Korea
Vancouver, Canada
Presented by Sangkeun-Lee
Reference slides: www.xerial.org/pub/paper/2008/leo-sigmod2008-slides.pdf
If your Manager Says…
It’s a tragedy…
Center for E-Business Technology
Copyright 2009 by CEBT
Migration to XML Database
Benefits of using XML
XML is a portable text-data format
Tree-structured XML can reduce redundancy of relational data
Co
Company
Employee Office
1
e1
NY
1
e2
NY
Emp
Emp
e1
Relational Data
Office
e2
Office
NY
XML Data
Center for E-Business Technology
Copyright 2009 by CEBT
NY
Problem
Querying relational data translated into XML
Q: Retrieve a node tuple (Co, Emp, Office) from the XML data
E.g. Xpath, a path expression query
/Co/Emp/Office
Co
Company
Employee Office
1
e1
NY
1
e2
NY
Emp
Emp
e1
Relational Data
Office
e2
Office
NY
XML Data
Center for E-Business Technology
Copyright 2009 by CEBT
NY
Structure Variations
Tree-representation of relational data is not unique
Company
Employee Office
1
e1
NY
1
e2
NY
Co
Emp
Office
Co
Emp
e1
NY
Office
Office
Co
NY
Emp
Emp
e1
Emp
Emp
e1
Center for E-Business Technology
e2
Copyright 2009 by CEBT
Office
NY
e2
e2
NY
Inconvenience of Xpath Query
User must know the entire XML structures to produce correct
path queries
Co
Co
Office
Office
Emp
NY
NY
Co
Emp
e1
Emp
Emp
e1
e2
/Co/Office/Emp
Center for E-Business Technology
Emp
Emp
e1
e2
Office
Office
NY
//Office[Co]/Emp
Copyright 2009 by CEBT
e2
NY
//Co/Emp[Office]
Relational-Style XML Query
Query relations in XML
With an SQL-like syntax
SELECT Co, Emp, Office from (XML Data)
Co
Co
Office
Office
Emp
e1
Emp
NY
Co
Emp
Input XML
NY
Emp
Emp
e1
e2
Emp
e1
e2
Office
e2
Office
NY
NY
Result
Center for E-Business Technology
Copyright 2009 by CEBT
Company
Employee
Office
1
e1
NY
1
e2
NY
To Retrieve Relations in XML
Center for E-Business Technology
Copyright 2009 by CEBT
Problem Definition
Convert an SQL query, SELECT A,B,C, into an XML structure
query
There can be many structural variations of (A,B,C)
For N nodes, there exists N^(N-1) structural variations
–
3^2 = 9
…
Center for E-Business Technology
Copyright 2009 by CEBT
An Example
This example involves various tree
structures that denote data in the
same relation.
Center for E-Business Technology
Copyright 2009 by CEBT
Amoeba
A node tuple (A,B,C) is an amoeba iff one of the A,B and C is a
common ancestor of the others
…
Amoeba join retrieves all amoeba structures in the XML data
Center for E-Business Technology
Copyright 2009 by CEBT
Relation in XML
A Key observation
Relation is simply embedded in XML
Company
Employee
Office
1
e1
NY
1
e2
NY
Co
Co
Office
Office
NY
Emp
NY
Co
Emp
Emp
e1
Emp
Emp
e1
e1
e2
e2
/Co/Office/Emp
Center for E-Business Technology
Emp
Office
e2
Office
NY
//Office[Co]/Emp
Copyright 2009 by CEBT
//Co/Emp[Office]
NY
Hidden Semantics in XML
Some amoeba structure may not form a relation
Why this structure is not allowed?
Because there are functional dependencies (FD) implied in the
XML structure
Company
1
M
Company
Office
Emp
Office
1
Office
Emp
Emp
Center for E-Business Technology
N
Employee
Emp
Copyright 2009 by CEBT
Functional Dependencies (FD)
FD: X->Y (From a given X,Y is uniquely determined)
Employee -> office (Each employee belongs to an office)
Office -> company (Each office belongs to a company)
Relation in XML must have an amoeba structure corresponding to each FD
Relations and FDs are sufficient to describe a schema of XML
Company
1
M
Invalid
structure!!
Company
Office
Emp
N
Employee
Office
Emp
Emp
Office
1
Emp
Center for E-Business Technology
Copyright 2009 by CEBT
Examples of FDs
Center for E-Business Technology
Copyright 2009 by CEBT
Detecting FDs
A type of FDs required to determine XML structures to query is
one-to-many(or one-to-one) relationships:
FD: Emp -> Office
–
Each employee belongs to an office
–
An office may have several employees (one-to-many)
We can observe these relationships by counting node
occurrences or directory from the ER-diagram
Company
Company
Office
Emp
M
Office
1
Office
Emp
Center for E-Business Technology
Emp
1
N
Employee
Emp
Copyright 2009 by CEBT
If FDs are ignored…
The company has M offices, and each office has N employees:
# of (company, office, employee) tuples:
When M = 100, N=5 100x(100x5)
While, # of correct answers is only M*N = 500
Company
1
Company
Office
Emp
Emp
Office
Emp
Center for E-Business Technology
Emp
M
Office
Emp
Emp
Copyright 2009 by CEBT
Office
1
N
Employee
Avoiding Invalid Relations
However, an amoeba structure itself is a connected component
of tree nodes, and thus invalid nodes may be connected, as
illustrated in Figure 9.
To avoid these irrelevant node connections, while allowing
various tree structures in describing XML data, we introduce a
restricted class of XML structures, called a tree relation
Center for E-Business Technology
Copyright 2009 by CEBT
FD-Aware Amoeba Join
Tree Relation
<<Emp,Office,Company>>&<<Emp,Office>>&<<Office,Company>>
FDs: Emp-> Office, Office -> Company
Bottom-up construction of query results
Amoeba Join (Employee, Office)
Amoeba Join (Office, Company)
FD-aware amoeba join avoids invalid XML structures
Company
Company
1
M
Office
Emp
Emp
Center for E-Business Technology
Office
Emp
Emp
Office
Emp
Emp
Copyright 2009 by CEBT
Office
1
N
Employee
Query Performance
FD-aware amoeba join scales well
For various sizes of XML data
Center for E-Business Technology
Copyright 2009 by CEBT
Experiments – Query Set
Center for E-Business Technology
Copyright 2009 by CEBT
Think in Relational-Style
First, consider
XML := Relations + their annotations
Steps
Detect relational part from XML data
Detect one-to-many(one) relationships (FDs)
Write relational queries
–
SELECT Co, Emp, Office
Things more in the paper
XML Algebra
Data Integration
Pushing Structural Constraints for better performance
Query Incomplete Relations
Other experiments
Center for E-Business Technology
Copyright 2009 by CEBT
Contributions & Conclusions
Relation in XML
Defined using amoeba structure and FDs
XML Algebra
Relational-Style XML Query
Retrieves relations in XML with a SQL-like query syntax (SQL over XML)
Allows structural variations of XML data
Departure from path expression queries
Target XML structures are automatically determined
Applications
Database integration
Managing relational data enhanced with XML syntax
“It’s Just SQL”
A large number of XML data and queries are still relational
Center for E-Business Technology
Copyright 2009 by CEBT
Contributions & Conclusions
It’s not tragedy
anymore…
Center for E-Business Technology
Copyright 2009 by CEBT
My thought on
Really?
Center for E-Business Technology
Thoughts on The Paper
Good Points
Generally, I think it’s a very good paper
The paper shows us very interesting motivation and clearly define the problem
The approach in the paper is intuitive and well-explained
The authors performed well-designed experiments
The paper proposes many future works
–
Maybe, one of us can work on them
However,
I doubt if it is really useful – Can we use this for an important business project?
Although the author insists that their algorithm scales well, some queries caused out of
memory and didn’t work
–
It’s a crucial problem to be used in real-life business
Isn’t Xpath good enough?
–
People who use XML docs usually know the structure of XML docs
Center for E-Business Technology
Copyright 2009 by CEBT