\input /u/ullman/doc/nmacs

Download Report

Transcript \input /u/ullman/doc/nmacs

3
Chapter 3.6-7
Normalization of Database
Tables
Spring 2007
1
3
Normalization
• Normalization is process for assigning attributes to
entities
Reduces data redundancies
Helps eliminate data anomalies
Produces controlled redundancies to link tables
• Normalization stages
1NF - First normal form
2NF - Second normal form
3NF - Third normal form
4NF - Fourth normal form
Spring 2007
2
Example:
3
title
year
Name
length
Stars
address
Starts-in
Movies
Owns
Name
Studios
address
Spring 2007
3
Movies
3
Problem Example
Title
Year
Length FilmType
Star Wars
Star Wars
Star Wars
1977 124
1977 124
1977 124
Fugitive
Wayne’s World
Wayne’s World
1995 135
1992 95
1992 95
StudioName StarName
StarPhone
Color
Color
Color
Fox
Fox
Fox
Carrie Fisher
Harrison Ford
Mark hamill
713-872-9282
832-999-2002
512-472-0282
Color
Color
Color
Universal
Paramount
Paramount
Harrison Ford
Dana Carvey
Mike Meyers
832-999-2002
882-333-9999
222-355-1111
• Update anomalies: If Harrison Ford’s phone #
changes, must change it in each of his tuples. If Length
value of Star Wars needs to be changed, must change
all occurrences
•Deletion anomalies: If we delete Wayne’s World
entries from database, we also loose all info about Dana
Carvey & Mike Meyers
Spring 2007
4
3
Conversion to 1NF
• Repeating groups must be eliminated
Proper primary key developed
• Uniquely identifies each tuple
Dependencies
can be identified
• undesirable dependencies allowed
– Partial
» based on part of composite primary key
– Transitive
» one nonprime attribute depends on another nonprime
attribute
Spring 2007
5
Conversion to 1NF Cont.
3
• An attribute that is at least part of a key is known
as a prime attribute or key attribute or primary
key.
Spring 2007
6
Example
•
•
•
•
3
Projects assigned to employees
Each project has a number and a name
Each employee has a number, a name, a job class
Each employee working on a project, need to keep number
of hours spent on project, and hourly rate.
• Project Assignments Table :
( PROJ_NUM, PROJ_NAME, EMP_NUM, EMP_NAME,
JOB_CLASS, CHG_HOUR, HOURS)
• What’s the Key for this relation?
Spring 2007
7
3
Data Organization: 1NF
Spring 2007
8
3
Dependency Diagram (1NF)
PROJ_ NUM, EMP_NUM --> PROJ_NAME, EMP_NAME,
JOB_CLASS, CHG_HOUR, HOURS
PROJ_NUM --> PROJ_NAME
EMP_NUM-->EMP_NAME, JOB_CLASS, CHG_HOURS
JOB_CLASS --> CHG_HOUR
Spring 2007
9
3
1NF Summarized
•
•
•
•
Spring 2007
All key attributes defined
Primary Key identified
No repeating groups in table
All attributes dependent on
primary key
10
3
2NF Summarized
• In 1NF, but
• Includes no partial dependencies
• Partial dependency:
An attribute is functionally dependent on a
portion of the primary key.
• Example:
 PROJ_NUM  PROJ_NAME
EMP_NUM-->EMP_NAME, JOB_CLASS,
CHG_HOURS
Spring 2007
11
Conversion to 2NF
1.
2.
3.
4.
5.
6.
3
Start with 1NF format:
Write each key component on a separate line
Write dependent attributes after each key component
Write original key on last line
Write any remaining attributes after original key
Each component is new table
PROJECT (PROJ_NUM, PROJ_NAME)
EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR)
ASSIGN (PROJ_NUM, EMP_NUM, HOURS)
Spring 2007
12
3
2NF Conversion Results
Spring 2007
13
3
2NF Summarized
• In 1NF
• Includes no partial dependencies
• Still possible to exhibit transitive dependency
Attributes may be functionally dependent on
non-key attributes
Spring 2007
14
3
Conversion to 3NF
• decompose table(s) to eliminate transitive
functional dependencies
PROJECT (PROJ_NUM, PROJ_NAME)
ASSIGN (PROJ_NUM, EMP_NUM, HOURS)
EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS)
JOB (JOB_CLASS, CHG_HOUR)
Spring 2007
15
3
3NF Summarized
• In 2NF
• Contains no transitive dependencies
Spring 2007
16
Boyce-Codd Normal Form
(BCNF)
3
• Formally, R is in BCNF if for every nontrivial FD
for R, say X  A, X is a superkey.
“Nontrivial” =
right-side attribute not in left side.
Trivial FDs examples
• AA
• ABA
• Informally: the only arrows in the FD diagram are
arrows out of superkeys
Note: BCNF violation when relation has more
than one superkey that overlap
Spring 2007
18
3
What normal form?
3NF Table Not in BCNF
Spring 2007
19
3
Decomposition of Table
Structure to Meet BCNF
Spring 2007
20
Decomposition to Reach BCNF
3
Setting: relation R with FDs F.
Suppose relation R has BCNF violation X  B and X
not a superkey.
Spring 2007
21
X+.
1. Compute
Cannot be all attributes – why?
2. Decompose R into X+ and (R–X+)  X.
R
3
X X+
3. Find the FD’s for the decomposed relations.
Project the FD’s from F = calculate all
consequents of F that involve only attributes
from X+ or only from (RX+)  X.
Spring 2007
22
Decomposition to Reach BCNF
3
• Identify the violating FD
• E.g. : X1X2…Xn  B1B2…Bm
• Add to the right-hand side of FD as many
attributes as are functionally determined by
(X1X2…Xn)
• Decompose relation R into two relations:
One relation has all attributes Xs & Bs
Second relation has the Xs plus any other
remaining attributes from R other than Bs
Spring 2007
23
BCNF--Example
3
Assume R(S, J, T)
• S: Student
• J: subject
• T: Teacher
• Student S is taught subject J by teacher T.
• Constraints:
For each subject, each student of that subject is
taught by only one teacher
Each teacher teaches only one subject (but each
subject is taught by several teachers)
Spring 2007
24
BCNF--Example
S
J
T
Smith
Smith
Math
Physics
Prof. White
Prof. Green
Jane
Jane
Math
Physics
Prof. White
Prof. Brown
3
Functional Dependencies:
•S , J  T
•T  J
Spring 2007
25
BCNF--Example
3
• Candidate keys: {S, J} and {S, T}
S
T
J
• 3NF but not in BCNF
• Update anomaly: if we delete the info that Jane is
studying Physics we also loose the info that Prof.
Brown teaches Physics
• Solution: two relations R1{S, T} R2{T, J}
Spring 2007
26
3
Decomposition Based on BCNF is
Necessarily Correct
Attributes A, B, C.
Relations R1[A,B]
FD: B  C
R2[B, C]
Tuples in R: (a, b, c)
Tuples in R1: (a, b)
Tuples in R2: (b, c)
Natural join of R1 and R2 = (a, b, c)  original relation R
can be reconstructed by forming the natural join of R1 and
R2.
Spring 2007
27
3
Decomposition Based on BCNF is
Necessarily Correct
Attributes A, B, C.
Relations R1[A,B]
FD: B  C
R2[B, C]
Tuples in R: (a, b, c) , (d, b, e)
Tuples in R1: (a, b), (d, b)
Tuples in R2: (b, c), (b, e)
Tuples in the natural join of R1 and R2:
(a,b,c), (a,b, e) (d, b, c), (d, b, e)
Can (a,b,e), (d, b, c) be a bogus tuples?
Spring 2007
28
Decomposition Based on BCNF is
Necessarily Correct
3
• Answer: No
• Because: B  C i.e. if 2 tuples have same B
attribute then they must have the same C attribute.
 (b,c) = (b,e)
 (a, b,e) = (a, b,c) and (d, b, c) = (d, b, e)
Spring 2007
29
Theorem
3
• Any two-attribute relation is in BCNF.
Spring 2007
30
3
Decomposition Theorem
• Suppose we decompose a relation R(X, Y, Z) into
R1(X, Y) and R2(X,Z) and project the R onto R1
and R2.
• Then join(R1, R2) is guaranteed to reconstruct R
if and only if XY or XZ
• Notice that whenever we decompose because of a
BNCF violation, one of the above FDs holds.
Spring 2007
31
3NF
One FD structure causes problems in BCNF:
3
• If you decompose, you can’t recover all of the original FD’s.
• If you don’t decompose, you violate BCNF.
Abstractly: AB  C and C  B.
• Example : street city  zip, and zip  city.
BCNF violation: C  B has a left side that is not a superkey.
• Based on previous algorithm, decompose into BC and AC.
 But the FD AB  C does not hold in new tables.
Spring 2007
32
3
Example
A = street, B = city, C = zip.
city
Houston
Houston
street
1 Main St.
14000 Main St.
zip
77002
77005
zip  city
BCNF vio
Decompose:
street
1 Main St.
14000 Main St.
city
Houston
Houston
Spring 2007
zip
77002
77005
zip
77002
77005
It is a bad idea to
decompose relation
because you loose the
ability to check the
dependency:
street
city  zip
33
3
Example
Houston
Houston
Boston
1 Main St.
14000 Main St.
1 Main St.
77002
77005
33555
zip  city
Decompose:
1 Main St.
14000 Main St.
1 Main St.
Houston
Houston
Boston
Spring 2007
77002
77005
33555
77002
77005
33555
It is a bad idea to
decompose relation
because you loose the
ability to check the
dependency:
street
city  zip
34
“Elegant” Workaround
3
Define the problem away.
• A relation R is in 3NF iff (if and only if)
for every nontrivial FD X  A, either:
1. X is a superkey, or
2. A is prime = member of at least one key.
• Thus, if we just normalize to the 3NF, the problem
goes away.
Spring 2007
35
What 3NF and BCNF Give You
•
3
There are two important properties of a
decomposition:
1.
Recovery : it should be possible to project the
original relations onto the decomposed schema,
and then reconstruct the original.
2.
Dependency Preservation : it should be possible
to check in the projected relations whether all the
given FD’s are satisfied.
Spring 2007
36
3NF and BCNF, Continued
3
• We can get (1) with a BCNF decomposition.
• We can get both (1) and (2) with a 3NF
decomposition.
• But we can’t always get (1) and (2) with a BCNF
decomposition.
street-city-zip is an example.
Spring 2007
37
3
Mutli-valued Dependencies
Fourth Normal Form
Spring 2007
38
Definition of MVD
3
• A multivalued dependency is an assertion that
two attributes (sets of attributes) are
independent of one another.
• Formally: A multivalued dependency (MVD)
on R, X ->->Y , says that if two tuples of R
agree on all the attributes of X, then their
components in Y may be swapped, and the
result will be two tuples that are also in the
relation.
Spring 2007
39
Example
3
Actors(name, addr, phones, cars) with MVD
Name  phones.
name addr phones
cars
sue
a
p1
b1
sue
a
p2
b2
it must also have the same tuples with phones components
swapped:
name addr phones
cars
sue
a
p2
b1
sue
a
p1
b2
Note: we must check this condition for all pairs of tuples that agree
on name, not just one pair.
Spring 2007
40
Example 2
name
C. Fisher
C. Fisher
C. Fisher
C. Fisher
C. Fisher
C. Fisher
street
123 Maple St.
5 Locust Ln.
123 Maple St.
5 Locust Ln.
123 Maple St.
5 Locust Ln.
city
Hollywood
Malibu
Hollywood
Malibu
Hollywood
Malibu
3
title
year
Star Wars
1977
Star Wars
1977
Empire
1980
Empire
1980
Return of the Jedi1983
Return of the Jedi1983
•An actor may have more than one address
•Key?
What normal form?
•Note the redundancies
•MVD: name  street city
• read: name determines 1 or more street &
city independent of all other attributes
Spring 2007
41
MVD Rules
3
1. Every FD is an MVD.
 Because if X Y, then swapping Y’s between tuples that
agree on X doesn’t create new tuples.
 Example, in Actors: name  addr.
• Note: the opposite is not true i.e. not every MVD is a FD
2. Complementation: if X  Y, then X  Z, where Z is all
attributes not in X or Y.
 Example: since name  phones holds in Actors,
the name  addr cars.
Spring 2007
42
3
Splitting Doesn’t Hold
• name  street
city
holds, but
• name  street
does not hold
Name does not determine 1 or more street independent
of city.
• name  city
Spring 2007
does not
hold
43
Example 2
name
C. Fisher
C. Fisher
C. Fisher
C. Fisher
C. Fisher
C. Fisher
street
123 Maple St.
5 Locust Ln.
123 Maple St.
5 Locust Ln.
123 Maple St.
5 Locust Ln.
city
Hollywood
Malibu
Hollywood
Malibu
Hollywood
Malibu
3
title
year
Star Wars
1977
Star Wars
1977
Empire
1980
Empire
1980
Return of the Jedi1983
Return of the Jedi1983
•An actor may have more than one address
•MVD: name  street city
• read: name determines 1 or more street &
city independent of all other attributes
•Also (complement MVD):
Spring 2007
name  title year
44
Fourth Normal Form
3
• The redundancy that comes from MVD’s is
not removable by putting the database
schema in BCNF.
• There is a stronger normal form, called 4NF,
that (intuitively) treats MVD’s as FD’s when it
comes to decomposition, but not when
determining keys of the relation.
Spring 2007
45
3
4NF
Eliminate redundancy due to multiplicative effect of MVD’s.
• Roughly: treat MVD’s as FD's for decomposition, but not for finding
keys.
• Formally: R is in Fourth Normal Form if whenever MVD
X  Y is nontrivial (Y is not a subset of X, and X  Y is not all
attributes), then X is a superkey.
 Remember, X  Y implies X  Y, so 4NF is more stringent
than BCNF.
• Decompose R, using
4NF violation X  Y,
into XY and X  (R—Y).
Spring 2007
R
X Y
46
Example
3
Drinkers(name, addr, phones, cars)
• FD:
name  addr
name  phones
name  cars.
• Only key: {name, phones, cars}
• Nontrivial MVD’s:
• All three dependencies above violate 4NF. Why?
• Successive decomposition yields 4NF relations:
D1(name, addr)
D2(name, phones)
D3(name, cars)
Spring 2007
47
Example 2
name
C. Fisher
C. Fisher
C. Fisher
C. Fisher
C. Fisher
C. Fisher
street
123 Maple St.
5 Locust Ln.
123 Maple St.
5 Locust Ln.
123 Maple St.
5 Locust Ln.
city
Hollywood
Malibu
Hollywood
Malibu
Hollywood
Malibu
 name  street
3
title
year
Star Wars
1977
Star Wars
1977
Empire
1980
Empire
1980
Return of the Jedi1983
Return of the Jedi1983
city
Decompose into:
R1(name, street, city)
R2(name, title, year)
Spring 2007
48