Transcript xml
Introduction to Semistructured
Data and XML
Chapter 27
Database Management Systems, R. Ramakrishnan
1
How the Web is Today
HTML documents
• often generated by applications
• consumed by humans only
• easy access: across platforms, across organizations
No application interoperability:
• HTML not understood by applications
• Database technology: client-server
Database Management Systems, R. Ramakrishnan
2
New Universal Data Exchange
Format: XML
A recommendation from the W3C
XML = data
XML generated by applications
XML consumed by applications
Easy access: across platforms, organizations
Database Management Systems, R. Ramakrishnan
3
Paradigm Shift on the Web
From documents (HTML) to data (XML)
From information retrieval to data
management
For databases, also a paradigm shift:
• from relational model to semistructured data
• from data processing to data/query translation
• from storage to transport
Database Management Systems, R. Ramakrishnan
4
Semistructure data
1.
2.
3.
4.
Information integration: important new application
that motivates what follows.
Semistructured data: a new data model designed to
cope with problems of information integration.
XML (Extensible Markup Language) : a new Web
standard that is essentially semistructured data.
XQUERY: an emerging standard query language
for XML data.
Database Management Systems, R. Ramakrishnan
5
Information Integration
Problem: related data exists in many places. They talk
about the same things, but differ in model, schema,
conventions (e.g., terminology).
Example: In the real world, every bar has its own
database.
Some may have relations like beer-price; others have
an Microsoft Word file from which the menu is
printed.
Some keep phones of manufacturers but not
addresses.
Some distinguish beers and ales; others do not.
Database Management Systems, R. Ramakrishnan
6
The Semistructured Data Model
Bib
Object Exchange
Model (OEM)
&o1
complex object
paper
paper
book
references
&o12
&o24
references
author
title
year
&o29
references
author
http
page
author
title publisher
title
author
author
author
&o43
&25
&96
1997
last
firstname
firstname
lastname
&243
“Serge”
“Abiteboul”
“Victor”
lastname
first
&206
“Vianu”
122
133
atomic object
Database Management Systems, R. Ramakrishnan
7
Characteristics of Semistructured
Data
Missing or additional attributes
Multiple attributes
Different types in different objects
Heterogeneous collections
Self-describing, irregular data, no a priori structure
Database Management Systems, R. Ramakrishnan
8
Comparison with Relational Data
row
nam e
phone
John
3634
Sue
6343
D ic k
6363
Database Management Systems, R. Ramakrishnan
row
row
name phone name phone name phone
“John” 3634 “Sue” 6343 “Dick”
6363
{ row: { name: “John”, phone: 3634 },
row: { name: “Sue”, phone: 6343 },
row: { name: “Dick”, phone: 6363 }
}
9
XML (Extensible Markup Language)
A W3C standard to complement HTML
Origins: Structured text SGML
• Large-scale electronic publishing
• Data exchange on the web
Motivation:
• HTML describes presentation
• XML describes content
Database Management Systems, R. Ramakrishnan
10
From HTML to XML
HTML describes the presentation
Database Management Systems, R. Ramakrishnan
11
HTML
<h1> Bibliography </h1>
<p> <i> Foundations of Databases </i>
Abiteboul, Hull, Vianu
<br> Addison Wesley, 1995
<p> <i> Data on the Web </i>
Abiteboul, Buneman, Suciu
<br> Morgan Kaufmann, 1999
Database Management Systems, R. Ramakrishnan
12
XML
<bibliography>
<book> <title> Foundations… </title>
<author> Abiteboul </author>
<author> Hull </author>
<author> Vianu </author>
<publisher> Addison Wesley </publisher>
<year> 1995 </year>
</book>
…
</bibliography>
XML describes the content
Database Management Systems, R. Ramakrishnan
13
Why are we DB’ers interested?
It’s data. That’s us.
Database issues:
• How are we going to model XML? (graphs).
• How are we going to query XML? (XQuery)
• How are we going to store XML (in a relational
database? object-oriented? native?)
• How are we going to process XML efficiently?
(many interesting research questions!)
Database Management Systems, R. Ramakrishnan
14
XML Terminology
Tags: book, title, author, …
• start tag: <book>, end tag: </book>
Elements: <book>…<book>,<author>…</author>
• elements can be nested
• empty element: <red></red> (Can be abbrv. <red/>)
XML document: Has a single root element
Well-formed XML document: Has matching tags
Valid XML document: conforms to a schema
Database Management Systems, R. Ramakrishnan
15
Well-Formed XML
1. Declaration = <? ... ?> .
• Normal declaration is
<? XML VERSION = "1.0" STANDALONE = "yes"
?>
• “Standalone” means that there is no DTD specified.
2. Root tag surrounds the entire balance of the document.
<FOO> is balanced by </FOO>, as in HTML.
3. Any balanced structure of tags OK.
• Option of tags that don’t require balance, like <P> in
HTML.
Database Management Systems, R. Ramakrishnan
16
XML: An Example
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<BOOKLIST>
<BOOK genre="Science" format="Hardcover">
<AUTHOR>
<FIRSTNAME>Richard</FIRSTNAME><LASTNAME>Feynman</LASTNAME>
</AUTHOR>
<TITLE>The Character of Physical Law</TITLE>
<PUBLISHED>1980</PUBLISHED>
</BOOK>
<BOOK genre="Fiction">
<AUTHOR>
<FIRSTNAME>R.K.</FIRSTNAME><LASTNAME>Narayan</LASTNAME>
</AUTHOR>
<TITLE>Waiting for the Mahatma</TITLE>
<PUBLISHED>1981</PUBLISHED>
</BOOK>
<BOOK genre="Fiction">
<AUTHOR>
<FIRSTNAME>R.K.</FIRSTNAME><LASTNAME>Narayan</LASTNAME>
</AUTHOR>
<TITLE>The English Teacher</TITLE>
<PUBLISHED>1980</PUBLISHED>
</BOOK>
</BOOKLIST>
Database Management Systems, R. Ramakrishnan
17
XML – Elements
<BOOK genre="Science" format="Hardcover">…</BOOK>
attribute
open tag
element name
attribute value
data
closing tag
Xml is case and space sensitive
Element opening and closing tag names must be identical
Opening tags: “<” + element name + “>”
Closing tags: “</” + element name + “>”
Empty Elements have no data and no closing tag:
• They begin with a “<“ and end with a “/>”
<BOOK/>
Database Management Systems, R. Ramakrishnan
18
XML – Attributes
<BOOK genre="Science" format="Hardcover">…</BOOK>
attribute
open tag
attribute value
element name
data
closing tag
Attributes provide additional information for element tags.
There can be zero or more attributes in every element; each one
has the form:
attribute_name=“attribute_value”
- There is no space between the name and the “=”
- Attribute values must be surrounded by “ or ‘ characters
Multiple attributes are separated by white space (one or more
spaces or tabs).
Database Management Systems, R. Ramakrishnan
19
Elements
The segment of an XML document between an opening
and a corresponding closing tag is called an element.
element
<person>
<name> Malcolm Atchison </name>
<tel> (215) 898 4321 </tel>
<tel> (215) 898 4321 </tel>
<email> [email protected] </email>
</person>
element, a sub-element
of
Database Management Systems, R. Ramakrishnan
not an element
20
XML – Data and Comments
<BOOK genre="Science" format="Hardcover">…</BOOK>
attribute
open tag
attribute value
element name
closing tag
data
Xml data is any information between an opening and closing
tag
Xml data must not contain the ‘<‘ or ‘>’ characters
Comments:
<!- comment ->
Database Management Systems, R. Ramakrishnan
21
XML text
XML has only one “basic” type -- text.
It is bounded by tags, e.g.
<title> The Big Sleep </title>
<year> 1935 </ year> --- 1935 is still text
XML text is called PCDATA (for parsed
character data). It uses a 16-bit encoding.
Database Management Systems, R. Ramakrishnan
22
XML – Nesting & Hierarchy
Xml tags can be nested in a tree hierarchy
Xml documents can have only one root tag
Between an opening and closing tag you can insert:
1. Data
2. More Elements
3. A combination of data and elements
<root>
<tag1>
Some Text
<tag2>More</tag2>
</tag1>
</root>
Database Management Systems, R. Ramakrishnan
23
Representing relational DBs:
Two ways
projects:
title
employees:
name
budget
ssn
Database Management Systems, R. Ramakrishnan
managedBy
age
24
Project and Employee relations in XML
Projects and employees are intermixed
<db>
<project>
<employee>
<name> Sandra </name>
<title> Pattern recognition </title>
<ssn> 2234 </ssn>
<budget> 10000 </budget>
<age> 35 </age>
<managedBy> Joe</managedBy>
</employee>
</project>
<project>
<employee>
<title> Auto guided vehicle </title>
<budget> 70000 </budget>
<name> Joe </name>
<managedBy> Sandra </managedBy>
<ssn> 344556 </ssn>
</project>
<age> 34 < /age>
:
</employee>
</db>
Database Management Systems, R. Ramakrishnan
25
Project and Employee relations in XML (cont’d)
Employees follows projects
<db>
<projects>
<project>
<title> Pattern recognition </title>
<budget> 10000 </budget>
<managedBy>Joe </managedBy>
</project>
<project>
<title>Auto guided vehicles</title>
<budget> 70000 </budget>
<managedBy>Sandra</managedBy>
</project>
:
</projects>
Database Management Systems, R. Ramakrishnan
<employees>
<employee>
<name> Joe </name>
<ssn> 344556 </ssn>
<age> 34 </age>
</employee>
<employee>
<name> Sandra </name>
<ssn> 2234 </ssn>
<age>35 </age>
</employee>
:
<employees>
</db>
26
More XML: Oids and References
<person id=“o555”> <name> Jane </name> </person>
<person id=“o456”> <name> Mary </name>
<children idref=“o123 o555”/>
</person>
<person id=“o123” mother=“o456”><name>John</name>
</person>
oids and references in XML are just syntax
Database Management Systems, R. Ramakrishnan
27
XML Data Model (Graph)
db
#0
book
book
publisher
b1
b2
pub
title
#1
pcdata
mkp
author
#2
pcdata
title
#3
pcdata
pub
author
#5
#4
pcdata
pcdata
Complete... Chamberlin Principles... Bernstein
Database Management Systems, R. Ramakrishnan
author
Newcomer
name
#6
pcdata
state
#7
pcdata
Morgan... CA
28
Document Type Descriptors
Sort of like a schema but not really.
<!ELEMENT Book (title, author*) >
<!ELEMENT title #PCDATA>
<!ELEMENT author (name, address,age?)>
<!ATTLIST Book id ID #REQUIRED>
<!ATTLIST Book pub IDREF #IMPLIED>
Inherited from SGML DTD standard
BNF grammar establishing constraints on element
structure and content
Definitions of entities
Database Management Systems, R. Ramakrishnan
29
DTD – An Example
<?xml version='1.0'?>
<!ELEMENT Basket (Cherry+, (Apple | Orange)*) >
<!ELEMENT Cherry EMPTY>
<!ATTLIST Cherry flavor CDATA #REQUIRED>
<!ELEMENT Apple EMPTY>
<!ATTLIST Apple color CDATA #REQUIRED>
<!ELEMENT Orange EMPTY>
<!ATTLIST Orange location ‘Florida’>
--------------------------------------------------------------------------------
<Basket>
<Cherry flavor=‘good’/>
<Apple color=‘red’/>
<Apple color=‘green’/>
</Basket>
Database Management Systems, R. Ramakrishnan
<Basket>
<Apple/>
<Cherry flavor=‘good’/>
<Orange/>
</Basket>
30
DTD - !ELEMENT
<!ELEMENT Basket (Cherry+, (Apple | Orange)*) >
Name
Children
!ELEMENT declares an element name, and
what children elements it should have
Content types:
•
•
•
•
•
Other elements
#PCDATA (parsed character data)
EMPTY (no content)
ANY (no checking inside this structure)
A regular expression
Database Management Systems, R. Ramakrishnan
31
DTD - !ELEMENT (Contd.)
A regular expression has the following
structure:
• exp1, exp2, exp3, …, expk: A list of regular
expressions
• exp*: An optional expression with zero or more
occurrences
• exp+: An optional expression with one or more
occurrences
• exp1 | exp2 | … | expk: A disjunction of
expressions
Database Management Systems, R. Ramakrishnan
32
DTD - !ATTLIST
<!ATTLIST Cherry flavor CDATA #REQUIRED>
Element Attribute
Type
Flag
<!ATTLIST Orange location CDATA #REQUIRED
color ‘orange’>
!ATTLIST defines a list of attributes for an
element
Attributes can be of different types, can be
required or not required, and they can have
default values.
Database Management Systems, R. Ramakrishnan
33
DTD – Well-Formed and Valid
<?xml version='1.0'?>
<!ELEMENT Basket (Cherry+)>
<!ELEMENT Cherry EMPTY>
<!ATTLIST Cherry flavor CDATA #REQUIRED>
--------------------------------------------------------------------------------
Not Well-Formed
Well-Formed but Invalid
<basket>
<Job>
<Cherry flavor=good>
<Location>Home</Location>
</Basket>
</Job>
Well-Formed and Valid
<Basket>
<Cherry flavor=‘good’/>
</Basket>
Database Management Systems, R. Ramakrishnan
34
Example: An Address Book
<person>
<name> MacNiel, John </name>
<greet> Dr. John MacNiel </greet>
<addr>1234 Huron Street </addr>
<addr> Rome, OH 98765 </addr>
<tel> (321) 786 2543 </tel>
<fax> (321) 786 2543 </fax>
<tel> (321) 786 2543 </tel>
<email> [email protected] </email>
</person>
Database Management Systems, R. Ramakrishnan
Exactly one name
At most one greeting
As many address lines
as needed (in order)
Mixed telephones
and faxes
As many
as needed
35
Specifying the structure
name
to specify a name element
greet?
to specify an optional
(0 or 1) greet elements
name,greet?
to specify a name followed by
an optional greet
Database Management Systems, R. Ramakrishnan
36
Specifying the structure (cont)
addr*
to specify 0 or more address lines
tel | fax
a tel or a fax element
(tel | fax)*
0 or more repeats of tel or fax
email*
0 or more email elements
Database Management Systems, R. Ramakrishnan
37
A DTD for the address book
<!DOCTYPE addressbook [
<!ELEMENT addressbook (person*)>
<!ELEMENT person
(name, greet?, address*, (fax | tel)*, email*)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT greet (#PCDATA)>
<!ELEMENT address (#PCDATA)>
<!ELEMENT tel
(#PCDATA)>
<!ELEMENT fax
(#PCDATA)>
<!ELEMENT email (#PCDATA)>
]>
Database Management Systems, R. Ramakrishnan
38
DTD for the example relational DB
<!DOCTYPE db [
<!ELEMENT db
<!ELEMENT projects
<!ELEMENT employees
<!ELEMENT project
<!ELEMENT employee
...
]>
Database Management Systems, R. Ramakrishnan
(projects,employees)>
(project*)>
(employee*)>
(title, budget, managedBy)>
(name, ssn, age)>
39
Summary of XML regular
expressions
Each element name is a tag.
Its components are the tags that appear nested
within, in the order specified.
A
The tag A occurs
e1,e2
The expression e1 followed by e2
e*
0 or more occurrences of e
e?
Optional -- 0 or 1 occurrences
e+
1 or more occurrences
e1 | e2
either e1 or e2
(e)
grouping
Database Management Systems, R. Ramakrishnan
40
XML Namespaces
Namespaces are a simple and straightforward way to
distinguish names used in XML documents, no matter where
they come from
The only reason namespaces exist, is to give elements and
attributes programmer-friendly names that will be unique
across the whole Internet
41
Database Management Systems, R. Ramakrishnan
41
Example
<h:html xmlns:xdc="http://www.xml.com/books"
xmlns:h="http://www.w3.org/HTML/1998/html4">
<h:head><h:title>Book
Review</h:title></h:head>
<h:body>
<xdc:bookreview> <xdc:title>XML: A
Primer</xdc:title>
<h:table> <h:tr align="center">
<h:td>Author</h:td><h:td>Price</h:td>
<h:td>Pages</h:td><h:td>Date</h:td></h:tr>
<h:tr align="left">
<h:td><xdc:author>Simon St.
Laurent</xdc:author></h:td>
<h:td><xdc:price>31.98</xdc:price></h:td>
<h:td><xdc:pages>352</xdc:pages></h:td>
<h:td><xdc:date>1998/01</xdc:date></h:td>
</h:tr> </h:table> </xdc:bookreview>
42
</h:body>
</h:html>
Database Management Systems, R. Ramakrishnan
42
XML Namespaces
The prefixes are linked to the full names using the attributes
on the top element whose names begin xmlns:
The prefixes are just shorthand placeholders for the full
names
Those full names are URIs, i.e. Web addresses
43
Database Management Systems, R. Ramakrishnan
43
Extensibility in XML
Anyone can invent new tags and attach a meaning to
those tags
But if every user creates its own XML definition for
describing his data, it is not possible to achieve
interoperability
For example, one may prefer to use the tag name
“POR”, while another prefers using the tag name
“PurchaseOrderReq”
In other words, a tagged document is not very
useful without some kind of agreement on the tags
among inter-operating applications
44
Database Management Systems, R. Ramakrishnan
44
Many Efforts for Standardized
Tags…
HL7 for healthcare
RosettaNet for supply chain integration in
Information Technology and Electronic
Components domain
MPEG7 in multimedia applications
ebXML for eBusiness
Common Business Library (CBL) for
electronic catalogs, purchase orders, etc.
…
45
Database Management Systems, R. Ramakrishnan
45
XML Parsers
A parser takes an XML document and makes its
structure and content available to an application
through an API
There are two main Application Programming
Interfaces (APIs) for writing parsers:
• Document Object Model (DOM) and
• Simple API for XML (SAX)
Today, many parsers are both DOM and SAX
compliant
46
Database Management Systems, R. Ramakrishnan
46
XML DOM Parser
Application Code
Begin parsing
Initialize Parser
In memory
DOM: Perform
Processing
Parsing complete
XML
Parser
XML
Document
A parser validates and makes the data
contained in an XML document available
to the application
47
Database Management Systems, R. Ramakrishnan
47
XSLT Processor
XSLT Processor
XSL Style
Sheet 1
Parser
XSL Style
Sheet 2
Output from
Style
Sheet 1
Output from
Style
Sheet 2
XML
Document
• Converts an XML document to
another form
• An XSL style sheet is a set of
transformation instructions for
converting a source XML document to
a target document
48
Database Management Systems, R. Ramakrishnan
48
table.xsl
bar.xsl
art.xsl
49
Database Management Systems, R. Ramakrishnan
49
XML Querying
Path Expressions :
Bib.paper
Bib.book.publisher
Bib.paper.author.lastname
Given an OEM instance, the value of a path
expression p is a set of objects
Database Management Systems, R. Ramakrishnan
50
Path Expressions
Bib
&o1
Examples:
paper
paper
book
references
&o12
&o24
references
DB =
author
title
&o43
year
&o44
&o29
references
author
http
author
title publisher
title
author
author
author
&o45 &o46
&o52
page
&25
&96
1997
firstname
lastname
&o70
“Serge”
&o47 &o48 &o49 &o50 &o51
firstname
&o71
“Abiteboul”
Bib.paper={&o12,&o29}
Bib.book.publisher={&o51}
Bib.paper.author.lastname={&o71,&206}
Database Management Systems, R. Ramakrishnan
&243
“Victor”
last
lastname
first
&206
“Vianu”
122
133
51
XQuery
Emerging standard for querying XML documents. Basic form:
FOR <variables ranging over sets of elements>
WHERE <condition>
RETURN <set of elements>;
Sets of elements described by paths, consisting of:
1. URL, if necessary.
2. Element names forming a path in the semistructured data
graph, e.g., //BAR/NAME =
“start at any BAR node and go to a NAME child.”
3. Ending condition of the form
[<condition about subelements, @attributes, and values>]
Database Management Systems, R. Ramakrishnan
52
XQuery
Overview:
FOR-LET-WHERE-ORDERBY-RETURN = FLWOR
FOR/LET Clauses
List of tuples
WHERE Clause
List of tuples
ORDERBY/RETURN Clause
Database Management Systems, R. Ramakrishnan
Instance of Xquery data model
53
XQuery
FOR $x in expr -- binds $x to each value in
the list expr
LET $x = expr -- binds $x to the entire list
expr
• Useful for common subexpressions and for
aggregations
Database Management Systems, R. Ramakrishnan
54
FOR v.s. LET
FOR $x IN document("bib.xml")/bib/book
RETURN <result> $x </result>
LET $x IN document("bib.xml")/bib/book
RETURN <result> $x </result>
Database Management Systems, R. Ramakrishnan
Returns:
<result> <book>...</book></result>
<result> <book>...</book></result>
<result> <book>...</book></result>
...
Returns:
<result> <book>...</book>
<book>...</book>
<book>...</book>
...
</result>
55
XQuery
Find all book titles published after 1995:
FOR $x IN document("bib.xml")/bib/book
WHERE $x/year > 1995
RETURN $x/title
Result:
<title> abc </title>
<title> def </title>
<title> ghi </title>
Database Management Systems, R. Ramakrishnan
56
XQuery
For each author of a book by Morgan
Kaufmann, list all books s/he published:
FOR $a IN distinct(document("bib.xml")
/bib/book[publisher=“Morgan Kaufmann”]/author)
RETURN <result>
$a,
FOR $t IN /bib/book[author=$a]/title
RETURN $t
</result>
distinct = a function that eliminates duplicates
Database Management Systems, R. Ramakrishnan
57
XQuery
Result:
<result>
<author>Jones</author>
<title> abc </title>
<title> def </title>
</result>
<result>
<author> Smith </author>
<title> ghi </title>
</result>
Database Management Systems, R. Ramakrishnan
58
XQuery
<big_publishers>
FOR $p IN distinct(document("bib.xml")//publisher)
LET $b := document("bib.xml")/book[publisher = $p]
WHERE count($b) > 100
RETURN $p
</big_publishers>
count = a (aggregate) function that returns the number of elms
Database Management Systems, R. Ramakrishnan
59
XQuery
Find books whose price is larger than average:
LET $a=avg(document("bib.xml")/bib/book/price)
FOR $b in document("bib.xml")/bib/book
WHERE $b/price > $a
RETURN $b
Database Management Systems, R. Ramakrishnan
60
Examples for XQuery queries
FOR $x IN
doc(www.company.com/info.xml)
//employee [employeeSalary gt 70000]/employeeName
RETURN <res> $x/firstName, $x/lastName </res>
FOR $x IN
doc(www.company.com/info.xml)/company/employee
WHERE $x/employeeSalary gt 70000
RETURN <res> $x/employeeName/firstName,
$x/employeeName/lastName </res>
FOR $x IN
doc(www.company.com/info.xml)/company
/project [projectNumber = 5]/projectWorker,
$y IN
doc(www.company.com/info.xml)/company/employee
WHERE $x/hours gt 20.0 AND $y.ssn = $x.ssn
RETURN <res> $x/EmployeeName/firstName,
$y/employeeName/lastName, $x/hours </res>
Database Management Systems, R. Ramakrishnan
61
transcript.xml
<Transcripts>
<Transcript>
<Student StudId=“111111111” Name=“John Doe” />
<CrsTaken CrsCode=“CS308” Semester=“F1997” Grade=“B” />
<CrsTaken CrsCode=“MAT123” Semester=“F1997” Grade=“B” />
<CrsTaken CrsCode=“EE101” Semester=“F1997” Grade=“A” />
<CrsTaken CrsCode=“CS305” Semester=“F1995” Grade=“A” />
</Transcript>
<Transcript>
<Student StudId=“987654321” Name=“Bart Simpson” />
<CrsTaken CrsCode=“CS305” Semester=“F1995” Grade=“C” />
<CrsTaken CrsCode=“CS308” Semester=“F1994” Grade=“B” />
</Transcript>
… … cont’d … …
Database Management Systems, R. Ramakrishnan
62
transcript.xml (cont’d)
<Transcript>
<Student StudId=“123454321” Name=“Joe Blow” />
<CrsTaken CrsCode=“CS315” Semester=“S1997” Grade=“A” />
<CrsTaken CrsCode=“CS305” Semester=“S1996” Grade=“A” />
<CrsTaken CrsCode=“MAT123” Semester=“S1996” Grade=“C” />
</Transcript>
<Transcript>
<Student StudId=“023456789” Name=“Homer Simpson” />
<CrsTaken CrsCode=“EE101” Semester=“F1995” Grade=“B” />
<CrsTaken CrsCode=“CS305” Semester=“S1996” Grade=“A” />
</Transcript>
</Transcripts>
Database Management Systems, R. Ramakrishnan
63
XQuery
Example:
FOR $t IN
document(“http://xyz.edu/transcript.xml”)/Transcript
WHERE $t/CrsTaken/@CrsCode = “MAT123”
RETURN $t/Student
Result:
<Student StudId=“111111111” Name=“John Doe” />
<Student StudId=“123454321” Name=“Joe Blow” />
Database Management Systems, R. Ramakrishnan
64
XQuery (cont’d)
Previous query doesn’t produce a well-formed
XML document; the following does:
FLWR
<StudentList>
{
inside XML
FOR $t IN document(“transcript.xml”)/Transcript
WHERE $t/CrsTaken/@CrsCode = “MAT123”
RETURN $t/Student
}
</StudentList>
FOR binds $t to Transcript elements one by one, filters
using WHERE, then places Student-children as children of
StudentList using RETURN
Database Management Systems, R. Ramakrishnan
65
Document Restructuring with XQuery
Reconstruct lists of students taking each class using the
Transcript records:
FOR $c IN distinct(document(“transcript.xml”)/CrsTaken)
RETURN
<ClassRoster CrsCode = {$c/@CrsCode} Semester =
{$c/@Semester}>
{
FOR $t IN document(“transcript.xml”)/Transcript
WHERE $t/CrsTaken/@CrsCode = $c/CrsCode
AND $t/CrsTaken/@Semester = $c/@Semester
RETURN $t/Student
ORDERBY ($t/Student/@StudId)
FLWOR inside
}
RETURN
</ClassRoster>
ORDERBY ($c/@CrsCode)
Database Management Systems, R. Ramakrishnan
66
Document Restructuring (cont’d)
Output elements have the form:
<ClassRoster CrsCode=“CS305” Semester=“F1995” >
<Student StudId=“111111111” Name=“John Doe” />
<Student StudId=“987654321” Name=“Bart Simpson” />
</ClassRoster>
Problem: the above element will be output twice – once
when $c is bound to
<CrsTaken CrsCode=“CS305” Semester=“F1995” Grade=“A” />
and once when it is bound to
Bart Simpson’s
John Doe’s
<CrsTaken CrsCode=“CS305” Semester=“F1995” Grade=“C” />
Note: grades are different – distinct( ) won’t eliminate
transcript records that refer to same class!
Database Management Systems, R. Ramakrishnan
67
Document Restructuring (cont’d)
Solution: instead of
FOR $c IN distinct(document(“transcript.xml”)/CrsTaken)
use
FOR $c IN document(“classes.xml”)//Class
Document
on next
slide
where classes.xml lists course offerings (course
code/semester) explicitly (no need to extract them from
transcript records).
Then $c is bound to each class exactly once, so each class
roster will be output exactly once
Database Management Systems, R. Ramakrishnan
68
http://xyz.edu/classes.xml
<Classes>
<Class CrsCode=“CS308” Semester=“F1997” >
<CrsName>SE</CrsName> <Instructor>Adrian Jones</Instructor>
</Class>
<Class CrsCode=“EE101” Semester=“F1995” >
<CrsName>Circuits</CrsName> <Instructor>David Jones</Instructor>
</Class>
<Class CrsCode=“CS305” Semester=“F1995” >
<CrsName>Databases</CrsName> <Instructor>Mary Doe</Instructor>
</Class>
<Class CrsCode=“CS315” Semester=“S1997” >
<CrsName>TP</CrsName> <Instructor>John Smyth</Instructor>
</Class>
<Class CrsCode=“MAR123” Semester=“F1997” >
<CrsName>Algebra</CrsName> <Instructor>Ann White</Instructor>
</Class>
</Classes>
Database Management Systems, R. Ramakrishnan
69
XQuery Semantics
So far the discussion was informal
XQuery semantics defines what the expected
result of a query is
Defined analogously to the semantics of SQL
Database Management Systems, R. Ramakrishnan
70
XQuery Semantics (cont’d)
Step 1: Produce a list of bindings for variables
• The FOR clause binds each variable to an ordered
list of nodes specified by an XQuery expression.
The expression can be:
• An XQuery query
• A function that returns a list of nodes
• End result of a FOR clause:
• Ordered list of tuples of document nodes
• Each tuple is a binding for the variables in the FOR
clause
Database Management Systems, R. Ramakrishnan
71
XQuery Semantics (cont’d)
Example (bindings):
• Let FOR declare $A and $B
• Bind $A to document nodes {v,w}; $B to {x,y,z}
• Then FOR clause produces the following list of
bindings for $A and $B:
• $A/v, $B/x
• $A/v, $B/y
• $A/v, $B/z
• $A/w, $B/x
• $A/w, $B/y
• $A/w, $B/z
Database Management Systems, R. Ramakrishnan
72
XQuery Semantics (cont’d)
Step 2: filter the bindings via the WHERE clause
• Use each tuple-binding to substitute its components for
variables; retain those bindings that make WHERE true
• Example: WHERE $A/CrsTaken/@CrsCode =
$B/Class/@CrsCode
• Binding: $A/w, where w = <CrsTaken CrsCode=“CS308” …/>
$B/x, where x = <Class CrsCode=“CS308” … />
• Then w/CrsTaken/@CrsCode = x/Class/@CrsCode, so the
WHERE condition is satisfied & binding retained
Database Management Systems, R. Ramakrishnan
73
XQuery Semantics (cont’d)
Step 3: Construct result
• For each retained tuple of bindings, instantiate the
RETURN clause
• This creates a fragment of the output document
• Do this for each retained tuple of bindings in sequence
Database Management Systems, R. Ramakrishnan
74