Transcript - bYTEBoss

Logical Database Design (3 of 3)
John Ortiz
Normalization
 If a relation is not in BCNF or 3NF, we refine
it by decomposing it into two or more smaller
relation schemas that are in the normal form.
 Decomposition has to be used carefully, since
there are potential problems.
 What are desirable properties of a
decomposition, and how to test them?
 How to obtain a decomposition with some
desirable properties?
Lecture 7
Logical Database Design (2)
2
Decomposition of a Relation
 Let R be a relation schema. A decomposition
of R, demoted by D = {R1, R2, ..., Rn}, is a set
of relation schemas such that R = R1  ... 
Rn.
 If {R1, R2, ..., Rn} is a decomposition of R and
r is an instance of R, then
r  R1(r)
R2(r)
...
Rn(r)
 Information may be lost (i.e. wrong tuples
may be added by the natural join) due to a
decomposition.
Lecture 7
Logical Database Design (2)
3
An Example of Information Loss
 Before SC SID Name Cno Grade Room
101 Bill
202 Mary
 After
SR Name Room
Bill
Mary
SR
Lecture 7
SG
326
326
SG SID Cno Grade Room
326
326
SID
101
202
101
202
1000 B
2000 A
101 1000 B
202 2000 A
Name
Bill
Mary
Mary
Bill
Cno
1000
2000
1000
2000
Grade
B
A
B
A
Logical Database Design (2)
326
326
Room
326
326
326
326
4
Lossless Join Decomposition
 Let R be a relation schema, and D = {R1, R2, ...,
Rn} be a decomposition of R. D is a lossless
(non-additive) join decomposition of R if for
every legal instance r of R, we have
r = R1(r)
R2(r)
...
Rn(r)
 Theorem: Let F be a set of FDs over R, and D
= {R1, R2} be a decomposition of R. D is a
lossless-join decomposition if and only if
R1  R2  R1 - R2 is in F+; or
R1  R2  R2 - R1 is in F+.
Lecture 7
Logical Database Design (2)
5
Lossless Join: An Example
Consider F = {B  AH, L  CAt} over
Bank-Loans(Bank, Assets, Headquarter,
Loan#, Customer, Amount).
Let D = {Banks(B,A,H), Loans(B,L,C,At)}.
Since Banks  Loans = B  AH = Banks - Loans
is in F+ (since it is already in F), D is a losslessjoin decomposition.
What if the decomposition contains more than
two relations.
Lecture 7
Logical Database Design (2)
6
Test for Lossless Join *
Algorithm TestLJ (Chase)
Input: A relation schema R(A1, …, Am), a set of FDs
F, and a decomposition D = {R1, …, Rn}.
Output: Yes, if D is a Lossless join; no, otherwise.
Method:
1. Create an n  m table T (labeled by Ai and Rj).
2. If Ri contains Aj, place aj at Ti,j. Otherwise,
place bij at Ti,j.
Lecture 7
Logical Database Design (2)
7
TestLJ (cont.) *
3. Repeat for each FD X  Y in F do
For all rows with identical symbols on X do
make the symbols on Y identical.
(choose aj over bij whenever possible)
Until no more change can be made.
4. Return yes if there is a row of aj’s.
Otherwise, return no.
Lecture 7
Logical Database Design (2)
8
TestLJ: An Example
Continue with the previous example.
 Set up the table T.
BAH
BLCAt
B A H
L C
At
a1 a2 a3 b14 b15 b16
a1 b22 b23 a4 a5 a6
 Enforce B  AH.
BAH
BLCAt
B A
a1 a2
a1 a2
H
a3
a3
L C
At
b14 b15 b16
a4 a5
a6
 Need to repeat until no more changes.
Lecture 7
Logical Database Design (2)
9
Dependency-Preserving Decomposition
 Let F be a set of FDs over R, and D = {R1, R2,
..., Rn} be a decomposition of R. D is a
dependency-preserving decomposition if
F+ = (R1(F)  R2(F)  . . .  Rn(F))+
where for i = 1, …, n
Ri(F) = { X  Y | X  Y  F and XY  Ri}.
Restrict FDs to local relations. If all “global”
FDs can be derived from “local” FDs, all
dependencies are preserved.
Lecture 7
Logical Database Design (2)
10
Dependency Preservation: An Example
Consider F = {CS  Z, Z  C} over R(City,
Street, Zipcode), and D ={R1(S, Z), R2(C, Z)}.
Then R1(F) = {} and R2(F) = {Z  C} (consider
non-trivial FDs only)
Since CS  Z  F+ but
CS  Z  (R1(F)  R2(F))+,
D is not dependency-preserving.
Lecture 7
Logical Database Design (2)
11
Test for Dependency Preservation
Algorithm TestDP
Input: A relation schema R, A set of FDs F over
R, a decomposition D = {R1, R2, ..., Rn} of R.
Output: Yes, if D is dependency-preserving; no,
otherwise.
Method:
for every X  Y  F
if  Ri such that XY  Ri
then X  Y is preserved;
Lecture 7
Logical Database Design (2)
12
TestDP (cont.)
else W := X;
repeat for i from 1 to n do
W := W  ((W  Ri)+  Ri);
until there is no change to W;
if Y  W then X  Y is preserved;
if every X  Y is preserved
then return yes;
else return no.
Derive global FDs using only local FDs.
Lecture 7
Logical Database Design (2)
13
TestDP: An example
Consider F = {A  B, B  C, C  D, D  A } over
R(A, B, C, D), & D = {R1(A,B), R2(B,C), R3(C,D)}.
Is D a dependency-preserving decomposition?
Since AB  R1, A  B is preserved.
Since BC  R2, B  C is preserved.
Since CD  R3, C  D is preserved.
Since DA is not in any one of the three relations,
we need to compute W.
Lecture 7
Logical Database Design (2)
14
TestDP: An example (cont.) *
Initialization: W = D;
first iteration:
W = D  ((D  AB)+  AB) = D;
W = D  ((D  BC)+  BC) = D;
W = D  ((D  CD)+  CD)
= D  (D+  CD)
= D  (ABCD  CD) = CD;
W changed from D to CD.
Lecture 7
Logical Database Design (2)
15
TestDP: An example (cont.) *
second iteration:
W = CD  ((CD  AB)+  AB) = CD;
W = CD  ((CD  BC)+  BC)
= CD  (C+  BC) = BCD;
W = BCD  ((BCD  CD)+  CD)
= BCD;
W changed from CD to BCD.
Lecture 7
Logical Database Design (2)
16
Normalization
 It is good to have BCNF relation schemas.
 If a relation schema is not in BCNF, then
decompose it into a set of relation schemas:
every new schema is in BCNF;
it is lossless-join (can guarantee);
it is dependency-preserving (no guarantee).
 If not possible to have all nice properties, be
happy with a lossless join, dependency
preserving 3NF decomposition (can guarantee)
Lecture 7
Logical Database Design (2)
18
Normalization to BCNF
Algorithm LLJD-BCNF
Input: R: A relation schema
F: A set of FDs satisfied by R.
Output: A lossless-join decomposition D = {R1, …,
Rn}, such that each Ri is in BCNF.
Lecture 7
Logical Database Design (2)
19
Normalization to BCNF (cont.)
Method:
D := {R};
while  Ri  D that is not in BCNF do
begin
Find an FD X  Y such that
(1) Ri is not BCNF because of X  Y, and
(2) XY  Ri;
D := D - Ri  {Ri - Y, XY}
end;
Lecture 7
Logical Database Design (2)
20
Normalization to BCNF (cont.) *
Theorem: Algorithm LLJD-BCNF is correct.
Proof (sketch):
 Every schema in D is in BCNF because the
algorithm will not stop otherwise.
 D is a lossless-join decomposition because in each
iteration, Ri is decomposed into 2 smaller
schemas (Ri - Y) and XY and they satisfy the
condition:
(Ri - Y)  XY = X  Y = (XY - (Ri - Y)).
Lecture 7
Logical Database Design (2)
21
Normalization to BCNF: An Example
Consider F = {B  AH, L  CAt} over
Bank-Loans(Bank, Assets, Headquarter, Loan#,
Customer, Amount), and a set of FDs,
Candidate keys: LB
Initialization: D = {BAHLCAt }
Lecture 7
Logical Database Design (2)
22
Normalize to BCNF: An Example *
1st iteration:
 Ri = BAHLCAt is not in BCNF because B  AH is
not a trivial FD and B is not a superkey.
 Replace BAHLCAt by BAH and BLCAt.
Hence: D = {BAH, BLCAt}.
BAH is in BCNF because in BAH, B is a
candidate key.
Lecture 7
Logical Database Design (2)
23
Normalize to BCNF: An Example *
2nd iteration:
 Ri = BLCAt is not in BCNF because L  CAt is not a
trivial FD and L is not a superkey in BLCAt.
 Replace BLCAt by CLAt and BL.
Hence, D = {BAH, CLAt, BL}.
CLAt is in BCNF because in CLAt, L is a candidate key.
BL is in BCNF (see theorem on next page).
Final result: D = {BAH, CLAt, BL}.
 D happens to be dependency-preserving.
Any relation schema with exactly two attributes
is in BCNF.
Lecture 7
Logical Database Design (2)
24
Normalize to BCNF: Another Ex. *
Consider R(City, Street, Zipcode), and
F = {CS  Z, Z  C }.
Candidate keys: CS, ZS.
Initialization: D = {CSZ};
1st iteration:
 R = CSZ is not in BCNF because Z  C is not a
trivial FD and Z is not a superkey.
 D = {ZC, ZS}.
D is not dependency-preserving because CS  Z
is not preserved.
Lecture 7
Logical Database Design (2)
25
Equivalence of FD Sets
Let F and G be two sets of FDs satisfied by R. F
and G are equivalent, denoted by F  G, if F+ =
G+.
Example: F = {B  CD, AD  E, B  A} and G =
{B  CDE, B  ABC, AD  E} are equivalent.
Check to see that every FD in F is also in G+
and that every FD in G is also in F+
Lecture 7
Logical Database Design (2)
26
Extraneous Attributes
Let F be a set of FDs.
F contains an extraneous attribute A if there
is an FD X  Y in F, such that
 either A  X, and
[F - {X  Y}  {X - {A}  Y}]  F;
 or A  Y, and
[F - {X  Y}  {X  Y - {A} }]  F.
This is a “useless” attribute either at the
left side or at the right side of an FD.
Lecture 7
Logical Database Design (2)
27
Summary
 A good schema should have three properties:
 BCNF (or 3NF if BCNF can not be obtained)
Lossless join
Dependency preserving
 Lossless join BCNF decomposition is
guaranteed, need to check for dependency
preservation
 Lossless join, dependency preserving 3NF
decomposition is guaranteed (need to find the
minimal cover)
Lecture 7
Logical Database Design (2)
28
Look Ahead
 Next topic: SQL Overview & DDL
 Read textbook:
Chapter 8, 10.1-10.6
Lecture 7
Logical Database Design (2)
29