CSE 350 - Query Processing with XML

Download Report

Transcript CSE 350 - Query Processing with XML

Query Processing with XML
CSE 350 – Advanced
Database Topics
Jeffrey R. Ellis
Query Processing Topics





Why?
Java and Other Programming Languages
XPath/XSLT
XQuery (W3C-sponsored Query Language)
Current Research
–
–
Other Query Languages
XISS (XML Indexing and Storage System)
FIRST – Distinction between XML
and HTML/Web Technologies

XML spotlight is analogous to Java
–
–

XML IS NOT AN HTML REPLACEMENT
–
–

Immediate benefits applied to World Wide Web
Long-range, more exciting benefits in applications
HTML marks pages up for presentation on the web
XML marks text for semantic information purposes
XML can encode HTML pages, but HTML
works well on the Web
XML Data Storage

XML Documents
–
–
–
–
–
–
–
Data is delineated semantically
Schemas/DTDs control contents of elements
Semi-structured attitude allows flexibility
Text is human-readable and machine-parsable
Open standards work with common tools
File data storage allows for easy sharing
Can queries control access to data?
Traditional Database Storage

Databases
–
–
–
–
–
–
–
Data is delineated semantically
Schemas control contents of rows
No flexibility from semi-structured storage
Data is not human-readable, but only machineparsable
Proprietary standards prevent interoperability
Proprietary storage prevents data sharing
Queries control access to data
XML for Query Processing


If we can get efficient query processing, XML
document storage provides many benefits over
traditional database storage.
Sample application
–
–
–
Employee database document
XML Schema assumed to exist
Employee information queried as per standard HR
processing
<?xml version="1.0"?>
<!DOCTYPE employees SYSTEM "employee.xsd">
<employees>
<emp gender='m'>
<name>
<last>Bissell</last>
<first>Brian</first>
</name>
<position>IT Specialist</position>
<salary>35,000</salary>
<location>CT</location>
</emp>
<emp gender='m'>
<name>
<last>Pham</last>
<first>Hung</first>
<mi>Q</mi>
</name>
<position>Senior IT Specialist</position>
<salary>45,000</salary>
<location>CT</location>
</emp>
…
</employees>
Tree Structure of XML Document

Remember that XML documents are trees
emp
gender
last
name
first
position
mi
salary
location
Query Processing – Programming
Languages





XML Documents are flat files
Any language with file I/O can read XML
document
Any language with string parsing capabilities
can use XML data
Query processing done through language
syntax
“Obvious” result different from traditional
databases
Query Processing – Programming
Languages

Strategy
–
–
–

Languages have gathered XML processing
tools in libraries
–

Basic File I/O through language
Basic String matching to identify elements
Processing possible, but not necessarily efficient
xerces – Apache library for Java and C++
Two methods for parsing XML data
–
–
DOM
SAX
DOM





Document Object Model
Defined by W3C for XML, HTML, and
stylesheets
Provides an hierarchical, object-view of the
document
DOMParser parses through file, then provides
access to nodes
Key: Every item in XML document is a node
DOM Example
Node (Attr)
name=“gender”
value=“m”
parent
Node (Element)
name=“emp”
attribute1
child1
Node (Element)
name=“name”
parent
child1
Node (Element)
name=“last”
parent
child1
Node (Text)
value=“Bissell”
parent
SAX





Simple API for XML
Defined by XML-DEV mailing list
Provides an event-driven processing of the
document
XMLReader parses through file and activates
different methods and functions based on the
elements retrieved
Key: Methods are defined in interface,
implemented in user code
DOM versus SAX




SAX is primarily Java-based; DOM defined for
most languages
DOM requires storage of entire document in
memory; SAX processes as it reads
DOM mirrors a document that can be revisited;
suited for document processing
SAX mirrors object lifecycles; suited for data
processing
Query Processing - XPath/XSLT




Standard XML technologies XPath and XSLT
provide a ready-made querying infrastructure
XPath identifies the location of various
document elements
XSL Stylesheets provide methods for
tranforming data from one format to another
Combining XPath and XSLT provides easy
generation of result sets based on queries
XPath

Provides element, value, and attribute
identification
employees/emp/name/first = “Brian”, “Hung”,
“Sara”, “Brian”
//salary = “35,000”, “40,000”, “35,000”, “60,000”
count(/employees/emp) = 4
//mi = “Q”
XSLT

Stylesheet transforms data from one form into
another
<xsl:template match=“name”>
<xsl:value-of select=“first”/>
<xsl:value-of select=“last”/>
</xsl:template>
= Brian Bissell, Hung Pham, Sara Menillo, Brian
Chicos
Combine XPath and XSLT for
Queries

Query: Find the last name and position of each
employee named Brian
<xsl:template match='employees'>
<xsl:for-each select='emp'>
<xsl:if test='name/first="Brian"'>
<xsl:value-of select='name/last'/>
<xsl:text>:</xsl:text>
<xsl:value-of select='position'/>
<xsl:text>; </xsl:text>
</xsl:if>
</xsl:for-each>
</xsl:template>
Combine XPath and XSLT for
Queries

Query: Find the average salary of all non-managers
<xsl:template match='employees'>
<xsl:variable name='running_sum'>
<xsl:value-of select='sum(emp/salary[../position!="Manager"])'/>
</xsl:variable>
<xsl:variable name='running_count'>
<xsl:value-of select='count(emp[position!="Manager"])'/>
</xsl:variable>
<xsl:value-of select='$running_sum div $running_count'/>
</xsl:template>
Results XSLT/XPath

Many SQL queries can be accomplished
–
–
–
–
–
–
–
XPath provides element (data) access
XPath provides basic functions (e.g., sum() )
XPath provides WHERE functionality
XSLT provides SELECT functionality
XSLT provides ORDER BY functionality (sort)
XSLT provides result set formatting
UNION functionality provided ..?
Querying with XPath and XSLT

Important questions
–
–
–


Is it sufficient?
Is it efficient?
Is there a better way?
XML community has need to design a full
query language
XQuery – Working draft published 7 June 2001
Query Processing - XQuery


XML provides flexibility in representing many
kinds of information
Good query language must be likewise flexible
–

Pre-XQuery languages are good for specific types
of data
Goal: “[S]mall, easily implementable language
in which queries are concise and easily
understood.”
XQuery Forms
1.
2.
3.
4.
5.
6.
7.
Path expressions
Element constructors
FLWR expressions
Operator/Function expressions
Conditional expressions
Quantified expressions
Data Type expressions
XQuery – Path Expressions


Contribution of XPath
XQuery 1.0 and XPath 2.0 Data Model
document(“sample1.xml”)//emp/salary
/employees/emp/name[../@gender=‘f’]
//emp[1 TO 3]/name/first
XQuery – Element Constructors


Queries can generate new elements
Similar to XSLT abilities
<worker>
{$name/last}
{$position}
</worker>
XQuery – FLWR Expressions


For clause/Let clause/Where clause/Return
Similar to SQL
FOR $e IN document(“sample1.xml”)//emp
WHERE $e/salary > 38000
AND $e/@gender = ‘f’
RETURN $e/name
XQuery – Operator/Function
Expressions


Pre-defined and user-defined operators and
functions
Still under development: Union, Intersect,
Except
FOR $e IN //employees/emp
WHERE not(empty($e//mi))
RETURN $e/name
XQuery – Conditional Expressions

If-then-else expressions are not yet limited to
boolean (ongoing discussion)
FOR $e IN /employees/emp
RETURN
<worker>
{$name}
IF ($e/position=“Manager”)
THEN <manager />
</worker>
Quanitifed Expressions


Some/Every conditions
Some/Every evaluates to True or False
FOR $e IN //employees
WHERE SOME $p IN $e//emp/position =
“Manager”
RETURN $e
Data Types



Data Types based on those available from XML
Schema
Data types can be literal (“Brian”), from
constructor functions (date(“2001-10-11”) ), or
from casting ( CAST AS xsd:integer(24) )
User-defined data types are also allowable and
parsable
XQuery





More choices than XSLT/XPath combination
Work in progress
Current W3C efforts into query language
Influencing the future design of the core XML
technologies (XPath)
Hopes to be fully flexible for all future XML
applications
Query Processing – Research

XQuery specification continues to undergo
review and change
–
–

6 of 7 specification documents released since June
All specifications released in 2001
Other avenues of research
–
–
–
Other Query languages
Indexing strategies
Implementation
Query Processing – Other Query
Languages

Many query languages exist
–
–
–
–


Quilt (basis for XQuery)
W3C early languages (XML-QL, XQL)
Adopted traditional languages (OQL, XSQL)
Research papers (XML-GL, YATL, Lorel)
Other query languages often optimized for a
particular subset of XML documents
Query language field *MAY* be standardizing
to XQuery
Query Processing – Indexing
Strategy


Query language less important; better indexing
techniques lead to efficiency
XISS (XML Indexing and Storage System)
–
–
–
–
September 19, 2001 publishing
Builds sets of indexes on XML data elements and
attributes on initial parse of XML document
Lookup becomes constant-time through the various
built indexes
Demonstrated successes in test runs
Query Processing - Implementation

XML is currently in state of flux
–
–
–

Standards are still being revised
Industry cautious before embracing a new
technology
Economic slowdown may prevent new research and
development efforts
XML still waiting for its “Killer App”, application
that forces immediate acceptance
XML Query Processing



XML is a functional database storage language
Efficient query language needed to turn XML
into a viable database
Query language solutions are being developed
–
–
–
–
Java/C++ hooks first developed – OK
XSLT/XPath implemented – GOOD
XQuery being designed – GREAT?
Future additions – ????