10/10 - SEAS - University of Pennsylvania
Download
Report
Transcript 10/10 - SEAS - University of Pennsylvania
Normal Forms and XML
Zachary G. Ives
University of Pennsylvania
CIS 550 – Database & Information Systems
October 10, 2007
Some slide content courtesy of Susan Davidson & Raghu Ramakrishnan
Announcements
Homework 3 will be due Monday10/22
Midterm: Wednesday 10/24
Please start forming into project groups of 4, with a
list of names due 10/24
2
Lossless Join Decomposition
R1, … Rk is a lossless join decomposition of R w.r.t. an FD set F if
for every instance r of R that satisfies F,
R1(r) ⋈ ... ⋈ Rk(r) = r
Consider:
sid
name
serno
subj
cid
exp-grade
1
23
Sam
Nitin
570103
550103
AI
DB
570
550
B
A
What if we decompose on
(sid, name) and (serno, subj, cid, exp-grade)?
3
Testing for Lossless Join
R1, R2 is a lossless join decomposition of R with respect to F
iff at least one of the following dependencies is in F+
(R1 R2) R1 – R2
(R1 R2) R2 – R1
So for the FD set:
sid name
serno cid, exp-grade
cid subj
Is (sid, name) and (serno, subj, cid, exp-grade) a lossless
decomposition?
4
Dependency Preservation
Ensures we can “easily” check whether a FD X Y
is violated during an update to a database:
The projection of an FD set F onto a set of attributes Z,
FZ is
{X Y | X Y F +, X Y Z}
i.e., it is those FDs local to Z’s attributes
A decomposition R1, …, Rk is dependency preserving if
F + = (FR1 ... FRk)+
The decomposition hasn’t “lost” any essential FD’s, so
we can check without doing a join
5
Example of Lossless and
Dependency-Preserving Decompositions
Given relation scheme
R(name, street, city, st, zip, item, price)
And FD set name street, city
street, city st
street, city zip
name, item price
Consider the decomposition
R1(name, street, city, st, zip) and R2(name, item, price)
Is it lossless?
Is it dependency preserving?
What if we replaced the first FD by name, street city?
6
Another Example
Given scheme: R(sid, fid, subj)
and FD set: fid subj
sid, subj fid
Consider the decomposition
R1(sid, fid) and R2(fid, subj)
Is it lossless?
Is it dependency preserving?
7
FD’s and Keys
Ideally, we want a design s.t. for each nontrivial
dependency X Y, X is a superkey for some
relation schema in R
We just saw that this isn’t always possible
Hence we have two kinds of normal forms
8
Two Important Normal Forms
Boyce-Codd Normal Form (BCNF). For every relation
scheme R and for every X A that holds over R,
either A X (it is trivial) ,or
or X is a superkey for R
Third Normal Form (3NF). For every relation scheme
R and for every X A that holds over R,
either A X (it is trivial), or
X is a superkey for R, or
A is a member of some key for R
9
Normal Forms Compared
BCNF is preferable, but sometimes in conflict with the
goal of dependency preservation
It’s strictly stronger than 3NF
Let’s see algorithms to obtain:
A BCNF lossless join decomposition (nondeterministic)
A 3NF lossless join, dependency preserving
decomposition
10
BCNF Decomposition Algorithm
(from Korth et al.; our book gives a recursive version)
result := {R}
compute F+
while there is a relation schema Ri in result that isn’t in BCNF
{
i.e., A doesn’t
form a key
let A B be a nontrivial FD on Ri
s.t. A Ri is not in F+
and A and B are disjoint
}
result:= (result – Ri) {(Ri - B), (A,B)}
11
An Example
Given the schema:
Stuff(sid, name, serno, classroom, cid, fid, prof)
And FDs:
sid name
fid prof
serno classroom, cid, fid
Find the Boyce-Codd Normal Form for this schema
What if instead:
sid name
fid prof
classroom, cid serno
serno cid
12
3NF Decomposition Algorithm
Let F be a minimal cover
i:=0
for each FD A B in F {
if none of the schemas Rj, 1 j i, contains AB
{
increment i
Ri := (A, B)
}
}
if no schema Rj, 1 j i contains a candidate key for R {
increment i
Ri := any candidate key for R
}
return (R1, …, Ri)
Build dep.preserving
decomp.
Ensure
lossless
decomp.
13
An Example
Given the schema:
Stuff(sid, name, serno, classroom, cid, fid, prof)
And FDs:
sid name
fid prof
serno classroom, cid, fid
Find the Third Normal Form for this schema
What if instead:
sid name
fid prof
classroom, cid serno
serno cid
14
Summary of Normalization
We can always decompose into 3NF and get:
Lossless join
Dependency preservation
But with BCNF:
We are only guaranteed lossless joins
The algorithm is nondeterministic, so there is not a
unique decomposition for a given schema R
BCNF is stronger than 3NF: every BCNF schema is
also in 3NF
15
Normalization Is Good… Or Is It?
In some cases, we might not mind redundancy, if the
data isn’t directly updated:
Reports (people like to see breakdowns by semester,
department, course, etc.)
Warehouses (archived copies of data for doing complex
analysis)
Data sharing (sometimes we may export data into objectoriented or hierarchical formats)
16
XML: A Semi-Structured Data Model
17
Why XML?
XML is the confluence of several factors:
The Web needed a more declarative format for data
Documents needed a mechanism for extended tags
Database people needed a more flexible interchange format
“Lingua franca” of data
It’s parsable even if we don’t know what it means!
Original expectation:
The whole web would go to XML instead of HTML
Today’s reality:
Not so… But XML is used all over “under the covers”
18
Why DB People Like XML
Can get data from all sorts of sources
Allows us to touch data we don’t own!
This was actually a huge change in the DB community
Interesting relationships with DB techniques
Useful to do relational-style operations
Leverages ideas from object-oriented, semistructured data
Blends schema and data into one format
Unlike relational model, where we need schema first
… But too little schema can be a drawback, too!
19
XML Anatomy
Processing Instr.
<?xml version="1.0" encoding="ISO-8859-1" ?>
<dblp>
Open-tag
<mastersthesis mdate="2002-01-03" key="ms/Brown92">
<author>Kurt P. Brown</author>
<title>PRPL: A Database Workload Specification Language</title>
<year>1992</year>
Element
<school>Univ. of Wisconsin-Madison</school>
</mastersthesis>
<article mdate="2002-01-03" key="tr/dec/SRC1997-018">
<editor>Paul R. McJones</editor>
Attribute
<title>The 1995 SQL Reunion</title>
<journal>Digital System Research Center Report</journal>
<volume>SRC1997-018</volume>
Close-tag
<year>1997</year>
<ee>db/labs/dec/SRC1997-018.html</ee>
<ee>http://www.mcjones.org/System_R/SQL_Reunion_95/</ee>
20
</article>
Well-Formed XML
A legal XML document – fully parsable by an XML
parser
All open-tags have matching close-tags (unlike so many
HTML documents!), or a special:
<tag/> shortcut for empty tags (equivalent to <tag></tag>
Attributes (which are unordered, in contrast to elements)
only appear once in an element
There’s a single root element
XML is case-sensitive
21
XML as a Data Model
XML “information set” includes 7 types of nodes:
Document (root)
Element
Attribute
Processing instruction
Text (content)
Namespace
Comment
XML data model includes this, plus typing info, plus
order info and a few other things
22
XML Data Model Visualized
(and simplified!)
Root
?xml
2002…
p-i
element
dblp
article
mdate
key
author title year school
1992
ms/Brown92
key
editor title journal volume year ee ee
2002…
tr/dec/…
PRPL…
Kurt P….
attribute
text
mastersthesis
mdate
root
Digital…
Univ….
1997
The…
Paul R.
db/labs/dec
SRC…
http://www.
23
What Does XML Do?
Serves as a document format (super-HTML)
Allows custom tags (e.g., used by MS Word, openoffice)
Supplement it with stylesheets (XSL) to define formatting
Data exchange format (must agree on terminology)
Marshalling and unmarshalling data in SOAP and
Web Services
24
XML as a Super-HTML
(MS Word)
<h1 class="Section1">
<a name="_top“ />CIS 550: Database and Information
Systems</h1>
<h2 class="Section1">Fall 2004</h2>
<p class="MsoNormal">
<place>311 Towne</place>,
Tuesday/Thursday
<time Hour="13" Minute="30">1:30PM –
3:00PM</time>
</p>
25
XML Easily Encodes Relations
Student-course-grade
sid serno exp-grade
1 570103
B
23 550103
A
<student-course-grade>
<tuple><sid>1</sid><serno>570103</serno><expgrade>B</exp-grade></tuple>
<tuple><sid>23</sid><serno>550103</serno><expgrade>A</exp-grade></tuple>
</student-course-grade>
26
But XML is More Flexible…
“Non-First-Normal-Form” (NF2)
<parents>
<parent name=“Jean” >
<son>John</son>
<daughter>Joan</daughter>
<daughter>Jill</daughter>
</parent>
<parent name=“Feng”>
<daughter>Felicity</daughter>
</parent>
Coincides with “semi-structured data”,
…
invented by DB people at Penn and Stanford
27
XML and Code
Web Services (.NET, recent Java web service toolkits) are
using XML to pass parameters and make function calls
Why?
Easy to be forwards-compatible
Easy to read over and validate (?)
Generally firewall-compatible
Drawbacks? XML is a verbose and inefficient encoding!
XML is used to represent:
SOAP: the “envelope” that data is marshalled into
XML Schema: gives some typing info about structures being passed
WSDL: the IDL (interface def language)
UDDI: provides an interface for querying about web services
28
Integrating XML: What If We Have
Multiple Sources with the Same Tags?
Namespaces allow us to specify a context for different tags
Two parts:
Binding of namespace to URI
Qualified names
<root xmlns=“http://www.first.com/aspace” xmlns:otherns=“…”>
<tag xmlns:myns=“http://www.fictitious.com/mypath”>
<thistag>is in the default namespace (aspace)</thistag>
<myns:thistag>is in myns</myns:thistag>
<otherns:thistag>is a different tag in otherns</otherns:thistag>
</tag>
</root>
29