XML: Extensible Markup Language

Download Report

Transcript XML: Extensible Markup Language

XML: Extensible Markup Language
FST-UMAC
Gong Zhiguo
How the Web is Today
• HTML documents
• all intended for human consumption
• many generated automatically by
applications
Easy to fetch any Web page, from any server, any platform
Gong Z.G.
2
Limits of the Web Today
• Application cannot consume HTML
• HTML wrapper technology is brittle
– screen scraping
• OO technology (Corba) requires
controlled environment
• Companies merge, form partnerships;
need interoperability fast
Gong Z.G.
3
Paradigm Shift on the Web
• new Web standard XML:
– XML generated by applications
– XML consumed by applications
• data exchange
– across platforms: enterprise interoperability
– across enterprises
Web: from collection of documents to data and documents
Gong Z.G.
4
XML
• a W3C standard to complement HTML
• origins: structured text SGML
• motivation:
– HTML describes presentation
– XML describes content
• HTML4.0  XML  SGML
• http://www.w3.org/TR/REC-xml (2/98)
Gong Z.G.
5
From HTML to XML
HTML describes the presentation
Gong Z.G.
6
HTML
<h1> Bibliography </h1>
<p> <i> Foundations of Databases </i>
Abiteboul, Hull, Vianu
<br> Addison Wesley, 1995
<p> <i> Data on the Web </i>
Abiteoul, Buneman, Suciu
<br> Morgan Kaufmann, 1999
Gong Z.G.
7
XML
<bibliography>
<book> <title> Foundations… </title>
<author> Abiteboul </author>
<author> Hull </author>
<author> Vianu </author>
<publisher> Addison Wesley </publisher>
<year> 1995 </year>
</book>
…
</bibliography>
Gong Z.G.
8
XML Terminology
•
•
•
•
•
•
tags: book, title, author, …
start tag: <book>, end tag: </book>
elements: <book>…<book>,<author>…</author>
elements are nested
empty element: <red></red> abbrv. <red/>
an XML document: single root element
well formed XML document: if it has matching tags
Gong Z.G.
9
More XML: Attributes
<book price = “55” currency = “USD”>
<title> Foundations of Databases </title>
<author> Abiteboul </author>
…
<year> 1995 </year>
</book>
attributes are alternative ways to represent data
Gong Z.G.
10
Query Languages: Motivation
• granularity of the HTML Web: one file
• granularity of Web data varies:
– single data item: “get John’s salary”
– entire database: “get all salaries”
– aggregates: “get average salary”
• need query language to define granularity
Gong Z.G.
11
XML-QL:
A Query Language for XML
• http://www.w3.org/TR/NOTE-xml-ql
(8/98)
• features:
– regular path expressions
– patterns, templates
– Skolem Functions
• based on OEM data model
Gong Z.G.
12
Pattern Matching in XML-QL
where <book language=“french”>
<publisher>
<name> Morgan Kaufmann </name>
</publisher>
<author> $a </author>
</book> in “www.a.b.c/bib.xml”
construct $a
Gong Z.G.
13
Simple Constructors in XML-QL
where <book language = $l>
<author> $a </>
</> in “www.a.b.c/bib.xml”
construct <result> <author> $a </> <lang> $l </> </>
Note: </> abbreviates </book> or </result> or ...
<result> <author>Smith</author><lang>English</lang></result>
<result> <author>Smith</author><lang>Mandarin</lang></result>
<result> <author>Doe</author><lang>English</lang></result>
Gong Z.G.
14
Schemas in XML
• Document Type Definition (DTD)
• XML Schema
• RDF Schema
Gong Z.G.
15
Document Type Definition:
DTD
• part of the original XML specification
• an XML document may have a DTD
• terminology for XML:
– well-formed: if tags are correctly closed
– valid: if it has a DTD and conforms to it
• validation is useful in data exchange
Gong Z.G.
16
DTDs as Grammars
<!DOCTYPE paper [
<!ELEMENT paper (section*)>
<!ELEMENT section ((title,section*) | text)>
<!ELEMENT title
(#PCDATA)>
<!ELEMENT text
(#PCDATA)>
]>
<paper> <section> <text> </text> </section>
<section> <title> </title> <section> … </section>
<section> … </section>
</section>
</paper>
Gong Z.G.
17
DTDs as Schemas
Not so well suited:
• impose unwanted constraints on order
<!ELEMENT person (name,phone)>
• references cannot be constrained
• can be too vague:
<!ELEMENT person ((name|phone|email)*)>
Gong Z.G.
18
XML Storage
•
•
•
•
•
text file (XML)
store in ternary relation
use DTD to derive schema
mine data to derive schema
build special purpose repository (Lore)
Gong Z.G.
19
XML Storage: Text File
• advantages
– simple
– less space than one thinks
– reasonable clustering
• disadvantage
– no updates
– require special purpose query processor
Gong Z.G.
20
Store XML in Ternary Relation
Ref
S o u rc e
&o1
&
&
&
&
&
paper
&o2
title
&o3
author
author
&o4
“…”
year
&o5
“…”
[Florescu, Kossman 1999]
&o6
“1986”
Gong Z.G.
o1
o2
o2
o2
o2
Val
N ode
& o3
& o4
& o5
& o6
L abel
D est
paper
title
a u th o r
a u th o r
year
&
&
&
&
&
o2
o3
o4
o5
o6
V a lu e
T h e C a lc u lu s
…
…
1986
21
Use DTD to derive Schema
• DTD:
<!ELEMENT employee (name, address, project*)>
<!ELEMENT address (street, city, state, zip)>
• ODMG classes:
class Employee public type tuple
(name:string, address:Address, project:List(Project))
class Address public type tuple (street:string, …)
• [Christophides et al. 1994 , Shanmugasundaram et al. 1999]
Gong Z.G.
22
Mine Data to Derive Schema
paper
paper paper
Paper1
paper
year
author
title
author
authortitle authortitleauthor title
fn
ln fn
ln
fn
ln
fn
fn 1
ln 1
fn 2
ln 2
title
year
X
X
X
X
X
X
X
-
X
-
X
X
X
X
-
ln
Paper2
a u th o r
X
title
X
[Deutsch et al. 1999]
Gong Z.G.
23
XML and Databases (1)
• “Is XML a database?”
• In a strict sense, no.
• In a more liberal sense, yes, but …
– XML has:
• Storage (the XML document)
• A schema (DTD)
• Query languages (XQL, XML-QL, …)
• Programming interfaces (SAX, DOM)
– XML lacks:
• Efficient storage, indexes, security, transactions, multiuser access, triggers, queries across multiple documents
Gong Z.G.
24
XML and Databases (2)
• Data versus Documents
– There are two ways to use XML in a
database environment:
• Use XML as a data transport, i.e., to get data in
and out of the database
– Data is stored in a relational or object-oriented
database
– Middleware converts between the database and XML
• Use a “native XML” database, i.e., store data in
document form
– Use a content management system
Gong Z.G.
25
XML and Databases (3)
• Data-centric documents
– Fairly regular structure
– Fine-grained data
– Little or no mixed content
– Order of sibling elements often not significant
• Document-centric documents
– Irregular structure
– Larger-grained data
– Lots of mixed content
– Order of sibling elements is significant
Gong Z.G.
26
XML and Databases (4)
• Data-centric storage and retrieval systems
– Use a database
• Add middleware to convert to/from XML
– Use an XML server (specialized product for ecommerce)
– Use an XML-enabled web server with a database
backend
• Document-centric storage and retrieval systems
– Content management system
– Persistent DOM implementation
Gong Z.G.
27
XML and Databases (5)
• Mapping document structure to database structure
– Template-driven
• No predefined mapping
• Embedded commands process (retrieve) data
• Currently only available from RDBMS to XML
<?xml version=“1.0”>
<FlightInfo>
<Intro>The following flights have
available seats:</Intro>
<SelectStmt>SELECT Airline, FltNumber,
Depart, Arrive FROM Flights</SelectStmt>
<Conclude>We hope one of these meets your
needs</Conclude>
</FlightInfo>
Gong Z.G.
28
XML and Databases (6)
– Template-driven - Example result:
<?xml version=“1.0”>
<FlightInfo>
<Intro>The following flights have
available seats:</Intro>
<Flights>
<Row>
<Airline>ACME</Airline>
<FltNumber>123</FltNumber>
<Depart>Dec 12, 2000, 13:43</Depart>
<Arrive>Dec 13, 2000, 01:21</Arrive>
</Row>
</Flights>
<Conclude>We hope one of these meets your
needs</Conclude>
</FlightInfo>
Gong Z.G.
29
XML and Databases (7)
• Mapping document structure to
database structure
– Model-driven
• A data model is imposed on the structure of the
XML document
• This model is mapped to the structures in the
database
• There are two common models:
– Model the XML document as a single table or a set of
tables
– Model the XML document as a tree of data-specific
objects (good for OODBMS mapping)
Gong Z.G.
30
XML and Databases (8)
– Single table or set of tables:
<?xml version=“1.0”>
<database>
<table>
<row>
<column1>...</column1>
<column2>...</column2>
...
</row>
– Tree organization:
</table>
Orders
</database>
|
SalesOrder
/
|
\
Customer Item Item
|
|
Gong Z.G.
Part
Part
31
XML and Databases (9)
• Generating DTDs from a database schema and vice versa
– Many times the DTD does not change often for an
application and does not need to be automatically generated.
– Some simple conversions are possible
• Example: DTD from relational schema:
 For each table, create an ELEMENT.
 For each column in a table, create an
attribute or a PCDATA-only child ELEMENT.
 For each primary key/foreign key
relationship in which a column of the
table contributes the primary key, create
a child ELEMENT.
Gong Z.G.
32
XML and Databases (10)
• Document-centric storage and retrieval systems
– Content management system
• Allows the storage of discrete content fragments, such as
examples, procedures, chapters, as well as metadata such
as author names, revision dates, etc.
• Many content management systems are built on top of
relational or object-oriented database systems.
• Examples:
– BladeRunner (Interleaf), SigmaLink (STEP), Parlance
Content Manager (XyEnterprise),
Target 2000 (Progressive Information Technology)
– Persistent DOM implementation
Gong Z.G.
33
Further Readings
www. w3.org/XML
www-db.stanford.edu/~widom
www-rocq.inria.fr/~abiteboul
db.cis.upenn.edu
www.research.att.com/~suciu
Abiteboul, Buneman, Suciu
Data on the Web: From Relational to Semistructured to XML
Morgan Kaufmann, 1999 (appears in October)
Gong Z.G.
34