XXX-xml - Rose
Download
Report
Transcript XXX-xml - Rose
XML
Salman Azhar
Semi-structured Data
XML (Extensible Markup Language)
Well-formed and Valid XML
Document Type Definitions
IDs and IDREFs
These slides use some figures, definitions, and explanations from ElmasriNavathe’s Fundamentals of Database Systems
and Molina-Ullman-Widom’s Database Systems
2/6/05
Salman Azhar: Database Systems
1
Framework
1.
Information Integration :
2.
Semi-structured Data :
3.
Making databases from various places work as
one.
A new data model designed to cope with
problems of information integration.
XML :
A standard language for describing semistructured data schemas and representing data.
2/6/05
Salman Azhar: Database Systems
2
1. Information Integration
Generally databases in an enterprises have:
Several underlying database management
systems
Oracle,
Informix,
MS SQL Server,
Sybase (SQL Server),
DB2,
MS Access, etc.
Several underlying database schemas
Information in an employee table can contain
2/6/05
Employee Name, SSN, DOB, title, hrsPerWeek.
modifiedTime, modifiedBy
Employee Name, SSN, DOB, title, degree, createTime,
createBy
Employee Name, SSN, DOB, title, salary, modifiedTime,
modifiedBy, createTime, createBy
Salman Azhar: Database Systems
3
2. Semi-structured Data
A new data model designed to
cope with problems of information
integration
Accommodates of different DBMS
Oracle,
Informix,
MS SQL Server,
Sybase (SQL Server),
DB2,
MS Access, etc.
Integrates different schemas
2/6/05
Employee Name, SSN, DOB, title, hrsPerWeek, modifiedTime, modifiedBy
Employee Name, SSN, DOB, title, degree, createTime, createBy
Employee Name, SSN, DOB, title, salary, createTime, createBy, modifiedTime,
modifiedBy
Salman Azhar: Database Systems
4
3. XML
A standard language for describing
semi-structured data schemas and
representing data.
2/6/05
Salman Azhar: Database Systems
5
The Information-Integration
Problem
Major bottleneck in enterprise
application integration
For example…
Hewlett Packard split into HP and Agilent
HP bought Compaq
2/6/05
Need to separate data into different destinations
Need to integrate data from different sources
Salman Azhar: Database Systems
6
The Information-Integration
Problem
Related data exists in many places and
could, in principle, work together.
But different databases differ in:
Model
1.
relational, object-oriented?
Schema
2.
normalized/denormalized?
Terminology
3.
are consultants employees? Retirees?
Subcontractors?
Conventions
4.
2/6/05
meters versus feet?
Salman Azhar: Database Systems
7
Example
Consider merger of two stores in a Mall
may be some overlap in the products sold
but the databases are different
2/6/05
Salman Azhar: Database Systems
8
Example
Each company has a database
One may use a relational DBMS
One stores the phones of distributors,
the other does not
One distinguishes products in one department
the other keeps the data in an MS-Word document
the other doesn’t
One counts inventory by number of items,
2/6/05
the other by cases
Salman Azhar: Database Systems
9
Two Approaches to Integration
1.
Warehousing
Makes a copy of the data
2.
More developed of the two
Mediation
Creates a view of the data
2/6/05
Newer and less developed
Salman Azhar: Database Systems
10
Warehouse Diagram
User query
Result
Warehouse
2/6/05
Wrapper
Wrapper
Source 1
Source 2
Salman Azhar: Database Systems
11
A Mediator
Result
User query
Mediator
Query
Result
Result
Wrapper
Query
2/6/05
Wrapper
Result
Source 1
Query
Query
Result
Source 2
Salman Azhar: Database Systems
12
Warehousing
Make copies of the data sources at a central
site and transform it to a common schema
Reconstruct data daily/weekly
Do not try to keep it more up-to-date than that.
Pro:
very well-developed
several commercial tools are available
Con:
2/6/05
data can be old since updates are expensive
24-hour availability threatened by large data updates
Salman Azhar: Database Systems
13
Mediation
Create a view of all sources, as if they were
integrated
Answers a view query by translating it to
terminology of the sources and querying them
Pro:
Current data
Con:
2/6/05
Can be slow as it requires real time merger of different
data sources
Lack of tools available
Salman Azhar: Database Systems
14
Warehouse Diagram
User query
Result
Warehouse
2/6/05
Wrapper
Wrapper
Source 1
Source 2
Salman Azhar: Database Systems
15
A Mediator
Result
User query
Mediator
Query
Result
Result
Wrapper
Query
2/6/05
Wrapper
Result
Source 1
Query
Query
Result
Source 2
Salman Azhar: Database Systems
16
Semi-structured: Motivation
Most effective approach to Information
Integration:
Semi-structured Data Model
or Semi-structured Objects
2/6/05
Salman Azhar: Database Systems
17
Semi-structured: Motivation
Main limitation of Object-Oriented
Models:
Object Models are Strongly Typed
Objects of a class have one structure only
Semi-structured approach solves this
problem
2/6/05
Salman Azhar: Database Systems
18
Semi-structured Data
Purpose:
Represent data from independent sources
more flexibly than
2/6/05
either relational
or object-oriented models
Salman Azhar: Database Systems
19
Semi-structured Data
Each object has a class of their own and
properties are defined whatever labels
are attached to that object
Properties mean
2/6/05
attributes,
relationships,
methods, etc.
Salman Azhar: Database Systems
20
Semi-structured Data
Think of objects
but with the type of each object is the
objects its own business
not that of its “class”
Labels to indicate meaning of
substructures
2/6/05
Salman Azhar: Database Systems
21
Semi-structured Graphs
Easy to think of Semi-structured data as
Graphs
Nodes = objects
Labels on arcs =
2/6/05
attributes leading to a leaf node
relationships leading to another node
Salman Azhar: Database Systems
22
Semi-structured Graphs
Atomic values at leaf nodes
nodes with no arcs out
Flexibility: no restriction on…
labels out of a node
number of successors with a given label
2/6/05
Salman Azhar: Database Systems
23
Example: Data Graph
Root object represents the entire DB. Often look like trees, but are not.
root
The
restaurant
object for KFC
(arc-in called
rest; arc-out
labeled name
to KFC)
soda
rest
soda
manf
name
sellsAt
manf
PepsiCo
prize
name
year
Pepsi
Sobe
name
addr
KFC
Main St
Notice a
new kind
of data.
2003
award
BestSeller
The soda object for Pepsi
(arc-in called soda;
arc-out called name to Pepsi)
2/6/05
Salman Azhar: Database Systems
24
Stage is Now Set for XML
A technology has application to
different situations
2/6/05
foundations remain the same
applications changes
Salman Azhar: Database Systems
25
Extensible Markup Language
(XML)
XML
HTML
uses tags for semantics (e.g., “this is an address”)
uses tags for formatting (e.g., “italic”),
Key idea:
2/6/05
create tag sets for a domain (e.g., genomics)
translate all data into properly tagged XML docs
Salman Azhar: Database Systems
26
Well-Formed and Valid XML
Well-Formed XML
allows you to invent your own tags
similar to labels in semi-structured data graph
Valid XML
involves a DTD (Document Type Definition)
DTD gives
2/6/05
a grammar for the use of labels
limits the set of labels our of node
the order and number of times a label occurs
Salman Azhar: Database Systems
27
Well-Formed XML
All XML documents have
Header defines
Header
Body
version
specifies that the document is in well-formed XML
Body can include
2/6/05
root tag
several properly matching tags
Salman Azhar: Database Systems
28
Well-Formed XML: Header
Start the document with a declaration
surrounded by <? … ?> .
Normal declaration for Well-Formed
XML is:
<? XML VERSION = “1.0” STANDALONE = “yes” ?>
Version indicates version number
Standalone = “yes” means no DTD
2/6/05
no DTD means well-formed XML
Salman Azhar: Database Systems
29
Well-Formed XML: Body
Body of document is a root tag
surrounding nested tags.
Body can include:
several properly matching tags
special tag called root tag
2/6/05
(as in html structure)
can have a special meaning such as document type
or can be generic
Salman Azhar: Database Systems
30
Tags
Tags, as in HTML
are normally matched pairs, as
may be nested arbitrarily
some tags requiring no matching ending
2/6/05
<BLAH> … </BLAH>
such as <P> in HTML, are also permitted
however, we will not use these in examples
Salman Azhar: Database Systems
31
Example: Well-Formed XML
<? XML VERSION = “1.0” STANDALONE = “yes” ?>
<RESTS>
<REST>
<NAME>Taco Bell</NAME>
One of several nested
<SODA><NAME>Pepsi</NAME>
REST tags representing
<PRICE>1.00</PRICE></ SODA>
information about a
<SODA><NAME>Sobe</NAME>
single REST
<PRICE>2.00</PRICE></SODA>
<NAME> tag specifies </REST >
the REST name
<REST> … <SODA> tags
Literal Data items
have
names
</REST >
are contained at
and
price
for
the atomic level
…
each Soda
</RESTS>
Root tag RESTS
surrounds the
entire document
nested in
<NAME> and
<PRICE> tags
2/6/05
Salman Azhar: Database Systems
32
XML and Semi-structured Data
Consider this…
Is Well-Formed XML documents with
nested tags is exactly the same idea as
trees of semi-structured data?
Tags
Nodes
represent data between matching tags
Parent-child relationship
2/6/05
are the labels on edges
is immediate nesting in XML
Salman Azhar: Database Systems
33
XML and Semi-structured Data
Semi-structured approach allows for
non-tree structures
We shall see that XML also enables nontree structures
2/6/05
mimics the semi-structured data model
Salman Azhar: Database Systems
34
Group Exercise
Convert the following into a Semistructured representation
<? XML VERSION = “1.0” STANDALONE = “yes” ?>
<RESTS>
<REST>
<NAME>Taco Bell</NAME>
<SODA><NAME>Pepsi</NAME>
<PRICE>1.00</PRICE></ SODA>
<SODA><NAME>Sobe</NAME>
<PRICE>2.00</PRICE></SODA>
</REST >
<REST> …
Note: Do not turn over to the
</REST >
next page before attempting
…
</RESTS>
this exercise yourself!
2/6/05
Salman Azhar: Database Systems
35
Solution:
The semi-structured representation
<? XML VERSION = “1.0” STANDALONE = “yes”
?>
<RESTS>
<REST>
<NAME>Taco Bell</NAME>
RESTS
<SODA><NAME>Pepsi</NAME>
<PRICE>1.00</PRICE></ SODA>
<SODA><NAME>Sobe</NAME>
<PRICE>2.00</PRICE></SODA> REST
</REST >
<REST> …
</REST >
NAME
…
SODA
</RESTS>
REST
REST
SODA
Taco Bell
NAME
Pepsi
2/6/05
PRICE
1.00
NAME
Sobe
PRICE
...
Note: Data is
stored in leaf
nodes and
structure (tags)
in internal nodes
2.00
Salman Azhar: Database Systems
36
Valid XML
Switching gears: Well-formed to Valid XML
Valid XML is the most interesting use of XML
Essentially a context-free grammar for describing
XML tags and their nesting
Specified by DTD
Each domain of interest creates one DTD that
describes all the documents this group will share
2/6/05
For example, electronic components, travel industry, etc.,
will have their own DTDs
Salman Azhar: Database Systems
37
DTD Structure
Note: !DOCTYPE is key word
with <root tag> being the
name of DOCTYPE
<!DOCTYPE <root tag> [
<!ELEMENT <name> ( <components> )
<more elements>
]>
Between [ … ] list of ELEMENT definition
Each !ELEMENT has a <name> with the allowed
list of <components> usually in the order listed
2/6/05
Salman Azhar: Database Systems
38
DTD Elements
Element definition consists
of its name (tag)
and a parenthesized description of any
nested tags
includes order of subtags
and their multiplicity (0, 1, or many times)
Leaves (text elements)
2/6/05
have #PCDATA in place of nested tags
Salman Azhar: Database Systems
39
Example: DTD
<!DOCTYPE RESTS [
RESTS can have * (0
or more) REST
<!ELEMENT RESTS (REST*)>
REST has NAME and
<!ELEMENT REST (RNAME, SODA+)>then + (1 or more)
SODA… Order
matters!
<!ELEMENT NAME (#PCDATA)>
SODA has NAME followed PRICE
SODA’s NAME and PRICE are data (#PCDATA)
NAME and PRICE are
data (#PCDATA): No
more tags just text
GROUP EXERCISE: COMPLETE THE DTD
]>
Note: Do not turn over to the next page before attempting
this exercise yourself!
2/6/05
Salman Azhar: Database Systems
40
Example: DTD
<!DOCTYPE RESTS [
RESTS can have * (0
or more) REST
<!ELEMENT RESTS (REST*)>
REST has NAME and
<!ELEMENT REST (RNAME, SODA+)>then + (1 or more)
SODA… Order
matters!
<!ELEMENT NAME (#PCDATA)>
<!ELEMENT SODA (NAME, PRICE)> NAME and PRICE are
data (#PCDATA): No
more tags just text
<!ELEMENT NAME (#PCDATA)>
<!ELEMENT PRICE (#PCDATA)>
SODA has NAME
followed PRICE
]>
2/6/05
Salman Azhar: Database Systems
41
Element Descriptions Rules
Subtags must appear in order shown
A tag may be followed by a symbol to
indicate its multiplicity:
Identical to UNIX regular expressions.
* = zero or more.
+ = one or more.
? = zero or one.
Alternative sequences of tags can be
connected by
the symbol |
2/6/05
Salman Azhar: Database Systems
42
Example: Element Description
A name is
Either an optional title (e.g., “Dr.”), a first
name, and a last name, in that order,
or it is an IP address
<!ELEMENT NAME (
(TITLE?, FIRST, LAST) | IPADDR
Alternative symbol
)>
2/6/05
Salman Azhar: Database Systems
43
Use of DTDs
In order to specify a document follows
a particular DTD
1.
Set STANDALONE = “no”
a)
b)
2/6/05
Either include the DTD as a preamble of the
XML document
Follow DOCTYPE and the <root tag> by
SYSTEM and a path to the file where the DTD
is stored
Salman Azhar: Database Systems
44
Example (a)
<? XML VERSION = “1.0” STANDALONE = “no” ?>
<!DOCTYPE RESTS [
DTD
<!ELEMENT RESTS (REST*)>
<!ELEMENT REST (NAME, SODA+)>
<!ELEMENT NAME (#PCDATA)>
<!ELEMENT SODA (NAME, PRICE)>
<!ELEMENT NAME (#PCDATA)>
<!ELEMENT PRICE (#PCDATA)>
]>
Document
<RESTS>
Same as earlier but this time it conforms to the above DTD
<REST>
<NAME>Taco Bell</NAME>
<SODA><NAME>Pepsi</NAME> <PRICE>1.00</PRICE></ SODA>
<SODA><NAME>Sobe</NAME> <PRICE>2.00</PRICE></SODA>
</REST >
<REST> …
</REST >
…
</RESTS>
2/6/05
Salman Azhar: Database Systems
45
Example (b)
Assume the RESTS DTD is in file rest.dtd
<? XML VERSION = “1.0” STANDALONE = “no” ?>
Get the DTD
<!DOCTYPE Rests SYSTEM “rest.dtd”>
<RESTS>
<REST>
<NAME>Taco Bell</NAME>
<SODA><NAME>Pepsi</NAME>
<PRICE>1.00</PRICE></ SODA>
<SODA><NAME>Sobe</NAME>
<PRICE>2.00</PRICE></SODA>
</REST >
<REST> …
</REST >
…
</RESTS>
2/6/05
Salman Azhar: Database Systems
from the file
rest.dtd
Document
Same as
earlier but
this time it
conforms to
the DTD in
rest.dtd
46
Attributes
Attributes are another important component
of DTD and XML docs
Opening tags in XML can have attributes
like <A HREF = “…”> in HTML
In DTD <!ATTLIST <elementname>… >
2/6/05
gives a list of attributes and their data types for
this element
Salman Azhar: Database Systems
47
Example: Attributes
Rests can have an attribute kind
which is either qsr, family, or other.
The element definition is unchanged
However, we add an ATTLIST.
<!ELEMENT REST (NAME SODA*)>
<!ATTLIST REST kind “qsr” |
“family” | “other”>
2/6/05
Salman Azhar: Database Systems
48
Example: Attribute Use
In a document that allows REST tags, we might
see:
<REST kind = “qsr”> New info: kind = “qsr”
<NAME>KFC</NAME>
<SODA><NAME>Pepsi</NAME>
<PRICE>1.00</PRICE></SODA>
...
</REST>
2/6/05
Salman Azhar: Database Systems
49
IDs and IDREFs
Introduce links from one object to another
Allows the structure of an XML document to
be a general graph
rather than just a tree.
These are pointers from one object to
another
2/6/05
in analogy to HTML’s NAME = “blah” and HREF =
“#blah”
Salman Azhar: Database Systems
50
Creating IDs
We give an element Elephant an
attribute Attention of type ID in the
DTD
When using tag <Elephant> in an XML
document, give its attribute Attention a
unique value.
For example,
2/6/05
<Elephant
Attention = “213”>
Salman Azhar: Database Systems
51
Creating IDREFs
IDREFs are similar to IDs:
To allow objects of type Fig to refer to
another object with an ID attribute,
Or, let the attribute have type IDREFS,
2/6/05
give Fig an attribute of type IDREF (single
string of type ID)
so the Fig –object can refer to any number of
other objects (any number strings of type ID).
Salman Azhar: Database Systems
52
Example: IDs and IDREFs
Let us redesign our RESTS DTD to include
both REST and SODA sub-elements
Both rests and sodas will have ID attributes called
name
Rests have PRICE sub-objects,
Sodas have attribute soldBy,
2/6/05
consisting of a number (the price of one soda) and an
IDREF theSoda leading to that soda
which is an IDREFS leading to all the rests that sell it
Salman Azhar: Database Systems
53
The DTD
RESTS have 0+
REST and 0+ SODA
<!DOCTYPE Rests [
<!ELEMENT RESTS (REST*, SODA*)>
REST objects have name as an
<!ELEMENT REST (PRICE+)>
ID attribute and have one or
more PRICE sub-objects
<!ATTLIST REST name ID>
PRICE objects <!ELEMENT PRICE (#PCDATA)>
have a
<!ATTLIST PRICE theSoda IDREF>
number (the
price) and
<!ELEMENT SODA ()>
one reference
to a soda
<!ATTLIST SODA name ID, soldBy IDREFS>
]>
Soda objects have an ID attribute called name,
and a soldBy attribute that is a set of Rest names
2/6/05
Salman Azhar: Database Systems
54
Example Document
<RESTS>
<REST name = “Taco Bell”>
<PRICE theSoda = “Pepsi”>1.00</PRICE>
<PRICE theSoda = “Sobe”>2.00</PRICE>
</REST> …
<SODA name = “Pepsi”, soldBy = “KFC,
TacoBell,…”>
</SODA> …
</RESTS>
2/6/05
<!DOCTYPE Rests [
<!ELEMENT RESTS (REST*, SODA*)>
<!ELEMENT REST (PRICE+)>
<!ATTLIST REST name ID>
<!ELEMENT PRICE (#PCDATA)>
<!ATTLIST PRICE theSoda
IDREF>
<!ELEMENT SODA ()>
<!ATTLIST SODA name ID,
soldBy IDREFS>
]>
Salman Azhar: Database Systems
55
Recap
Semi-structured Data
XML (Extensible Markup Language)
Well-formed and Valid XML
Document Type Definitions
IDs and IDREFs
2/6/05
Salman Azhar: Database Systems
56
Perspective
Here XML is used as a EDI medium
EDI = electronic data interchange
There are many other using for XML
2/6/05
Each has its own utilization
Salman Azhar: Database Systems
57
Questions?
Questions???
2/6/05
Doesn’t mean you will get all the answers!
Salman Azhar: Database Systems
58