Transcript - bYTEBoss
Logical Database Design (3 of 3)
John Ortiz
Normalization
If a relation is not in BCNF or 3NF, we refine
it by decomposing it into two or more smaller
relation schemas that are in the normal form.
Decomposition has to be used carefully, since
there are potential problems.
What are desirable properties of a
decomposition, and how to test them?
How to obtain a decomposition with some
desirable properties?
Lecture 7
Logical Database Design (2)
2
Decomposition of a Relation
Let R be a relation schema. A decomposition
of R, demoted by D = {R1, R2, ..., Rn}, is a set
of relation schemas such that R = R1 ...
Rn.
If {R1, R2, ..., Rn} is a decomposition of R and
r is an instance of R, then
r R1(r)
R2(r)
...
Rn(r)
Information may be lost (i.e. wrong tuples
may be added by the natural join) due to a
decomposition.
Lecture 7
Logical Database Design (2)
3
An Example of Information Loss
Before SC SID Name Cno Grade Room
101 Bill
202 Mary
After
SR Name Room
Bill
Mary
SR
Lecture 7
SG
326
326
SG SID Cno Grade Room
326
326
SID
101
202
101
202
1000 B
2000 A
101 1000 B
202 2000 A
Name
Bill
Mary
Mary
Bill
Cno
1000
2000
1000
2000
Grade
B
A
B
A
Logical Database Design (2)
326
326
Room
326
326
326
326
4
Lossless Join Decomposition
Let R be a relation schema, and D = {R1, R2, ...,
Rn} be a decomposition of R. D is a lossless
(non-additive) join decomposition of R if for
every legal instance r of R, we have
r = R1(r)
R2(r)
...
Rn(r)
Theorem: Let F be a set of FDs over R, and D
= {R1, R2} be a decomposition of R. D is a
lossless-join decomposition if and only if
R1 R2 R1 - R2 is in F+; or
R1 R2 R2 - R1 is in F+.
Lecture 7
Logical Database Design (2)
5
Lossless Join: An Example
Consider F = {B AH, L CAt} over
Bank-Loans(Bank, Assets, Headquarter,
Loan#, Customer, Amount).
Let D = {Banks(B,A,H), Loans(B,L,C,At)}.
Since Banks Loans = B AH = Banks - Loans
is in F+ (since it is already in F), D is a losslessjoin decomposition.
What if the decomposition contains more than
two relations.
Lecture 7
Logical Database Design (2)
6
Test for Lossless Join *
Algorithm TestLJ (Chase)
Input: A relation schema R(A1, …, Am), a set of FDs
F, and a decomposition D = {R1, …, Rn}.
Output: Yes, if D is a Lossless join; no, otherwise.
Method:
1. Create an n m table T (labeled by Ai and Rj).
2. If Ri contains Aj, place aj at Ti,j. Otherwise,
place bij at Ti,j.
Lecture 7
Logical Database Design (2)
7
TestLJ (cont.) *
3. Repeat for each FD X Y in F do
For all rows with identical symbols on X do
make the symbols on Y identical.
(choose aj over bij whenever possible)
Until no more change can be made.
4. Return yes if there is a row of aj’s.
Otherwise, return no.
Lecture 7
Logical Database Design (2)
8
TestLJ: An Example
Continue with the previous example.
Set up the table T.
BAH
BLCAt
B A H
L C
At
a1 a2 a3 b14 b15 b16
a1 b22 b23 a4 a5 a6
Enforce B AH.
BAH
BLCAt
B A
a1 a2
a1 a2
H
a3
a3
L C
At
b14 b15 b16
a4 a5
a6
Need to repeat until no more changes.
Lecture 7
Logical Database Design (2)
9
Dependency-Preserving Decomposition
Let F be a set of FDs over R, and D = {R1, R2,
..., Rn} be a decomposition of R. D is a
dependency-preserving decomposition if
F+ = (R1(F) R2(F) . . . Rn(F))+
where for i = 1, …, n
Ri(F) = { X Y | X Y F and XY Ri}.
Restrict FDs to local relations. If all “global”
FDs can be derived from “local” FDs, all
dependencies are preserved.
Lecture 7
Logical Database Design (2)
10
Dependency Preservation: An Example
Consider F = {CS Z, Z C} over R(City,
Street, Zipcode), and D ={R1(S, Z), R2(C, Z)}.
Then R1(F) = {} and R2(F) = {Z C} (consider
non-trivial FDs only)
Since CS Z F+ but
CS Z (R1(F) R2(F))+,
D is not dependency-preserving.
Lecture 7
Logical Database Design (2)
11
Test for Dependency Preservation
Algorithm TestDP
Input: A relation schema R, A set of FDs F over
R, a decomposition D = {R1, R2, ..., Rn} of R.
Output: Yes, if D is dependency-preserving; no,
otherwise.
Method:
for every X Y F
if Ri such that XY Ri
then X Y is preserved;
Lecture 7
Logical Database Design (2)
12
TestDP (cont.)
else W := X;
repeat for i from 1 to n do
W := W ((W Ri)+ Ri);
until there is no change to W;
if Y W then X Y is preserved;
if every X Y is preserved
then return yes;
else return no.
Derive global FDs using only local FDs.
Lecture 7
Logical Database Design (2)
13
TestDP: An example
Consider F = {A B, B C, C D, D A } over
R(A, B, C, D), & D = {R1(A,B), R2(B,C), R3(C,D)}.
Is D a dependency-preserving decomposition?
Since AB R1, A B is preserved.
Since BC R2, B C is preserved.
Since CD R3, C D is preserved.
Since DA is not in any one of the three relations,
we need to compute W.
Lecture 7
Logical Database Design (2)
14
TestDP: An example (cont.) *
Initialization: W = D;
first iteration:
W = D ((D AB)+ AB) = D;
W = D ((D BC)+ BC) = D;
W = D ((D CD)+ CD)
= D (D+ CD)
= D (ABCD CD) = CD;
W changed from D to CD.
Lecture 7
Logical Database Design (2)
15
TestDP: An example (cont.) *
second iteration:
W = CD ((CD AB)+ AB) = CD;
W = CD ((CD BC)+ BC)
= CD (C+ BC) = BCD;
W = BCD ((BCD CD)+ CD)
= BCD;
W changed from CD to BCD.
Lecture 7
Logical Database Design (2)
16
Normalization
It is good to have BCNF relation schemas.
If a relation schema is not in BCNF, then
decompose it into a set of relation schemas:
every new schema is in BCNF;
it is lossless-join (can guarantee);
it is dependency-preserving (no guarantee).
If not possible to have all nice properties, be
happy with a lossless join, dependency
preserving 3NF decomposition (can guarantee)
Lecture 7
Logical Database Design (2)
18
Normalization to BCNF
Algorithm LLJD-BCNF
Input: R: A relation schema
F: A set of FDs satisfied by R.
Output: A lossless-join decomposition D = {R1, …,
Rn}, such that each Ri is in BCNF.
Lecture 7
Logical Database Design (2)
19
Normalization to BCNF (cont.)
Method:
D := {R};
while Ri D that is not in BCNF do
begin
Find an FD X Y such that
(1) Ri is not BCNF because of X Y, and
(2) XY Ri;
D := D - Ri {Ri - Y, XY}
end;
Lecture 7
Logical Database Design (2)
20
Normalization to BCNF (cont.) *
Theorem: Algorithm LLJD-BCNF is correct.
Proof (sketch):
Every schema in D is in BCNF because the
algorithm will not stop otherwise.
D is a lossless-join decomposition because in each
iteration, Ri is decomposed into 2 smaller
schemas (Ri - Y) and XY and they satisfy the
condition:
(Ri - Y) XY = X Y = (XY - (Ri - Y)).
Lecture 7
Logical Database Design (2)
21
Normalization to BCNF: An Example
Consider F = {B AH, L CAt} over
Bank-Loans(Bank, Assets, Headquarter, Loan#,
Customer, Amount), and a set of FDs,
Candidate keys: LB
Initialization: D = {BAHLCAt }
Lecture 7
Logical Database Design (2)
22
Normalize to BCNF: An Example *
1st iteration:
Ri = BAHLCAt is not in BCNF because B AH is
not a trivial FD and B is not a superkey.
Replace BAHLCAt by BAH and BLCAt.
Hence: D = {BAH, BLCAt}.
BAH is in BCNF because in BAH, B is a
candidate key.
Lecture 7
Logical Database Design (2)
23
Normalize to BCNF: An Example *
2nd iteration:
Ri = BLCAt is not in BCNF because L CAt is not a
trivial FD and L is not a superkey in BLCAt.
Replace BLCAt by CLAt and BL.
Hence, D = {BAH, CLAt, BL}.
CLAt is in BCNF because in CLAt, L is a candidate key.
BL is in BCNF (see theorem on next page).
Final result: D = {BAH, CLAt, BL}.
D happens to be dependency-preserving.
Any relation schema with exactly two attributes
is in BCNF.
Lecture 7
Logical Database Design (2)
24
Normalize to BCNF: Another Ex. *
Consider R(City, Street, Zipcode), and
F = {CS Z, Z C }.
Candidate keys: CS, ZS.
Initialization: D = {CSZ};
1st iteration:
R = CSZ is not in BCNF because Z C is not a
trivial FD and Z is not a superkey.
D = {ZC, ZS}.
D is not dependency-preserving because CS Z
is not preserved.
Lecture 7
Logical Database Design (2)
25
Equivalence of FD Sets
Let F and G be two sets of FDs satisfied by R. F
and G are equivalent, denoted by F G, if F+ =
G+.
Example: F = {B CD, AD E, B A} and G =
{B CDE, B ABC, AD E} are equivalent.
Check to see that every FD in F is also in G+
and that every FD in G is also in F+
Lecture 7
Logical Database Design (2)
26
Extraneous Attributes
Let F be a set of FDs.
F contains an extraneous attribute A if there
is an FD X Y in F, such that
either A X, and
[F - {X Y} {X - {A} Y}] F;
or A Y, and
[F - {X Y} {X Y - {A} }] F.
This is a “useless” attribute either at the
left side or at the right side of an FD.
Lecture 7
Logical Database Design (2)
27
Summary
A good schema should have three properties:
BCNF (or 3NF if BCNF can not be obtained)
Lossless join
Dependency preserving
Lossless join BCNF decomposition is
guaranteed, need to check for dependency
preservation
Lossless join, dependency preserving 3NF
decomposition is guaranteed (need to find the
minimal cover)
Lecture 7
Logical Database Design (2)
28
Look Ahead
Next topic: SQL Overview & DDL
Read textbook:
Chapter 8, 10.1-10.6
Lecture 7
Logical Database Design (2)
29