A, F - COW :: Ceng
Download
Report
Transcript A, F - COW :: Ceng
Relational Normalization
Theory
4/5/2017
CENG352 Database Management Systems
1
Redundancy
• Dependencies between attributes cause
redundancy
– Eg. All addresses in the same town have the
same zip code
SSN Name Town
1234 Joe
Stony Brook
4321 Mary Stony Brook
5454 Tom Stony Brook
………………….
4/5/2017
Zip
11790
11790
11790
CENG352 Database Management Systems
Redundancy
2
Redundancy and Other Problems
• Set valued attributes in the E-R diagram result in
multiple rows in corresponding table
• Example: Person (SSN, Name, Address, Hobbies)
– A person entity with multiple hobbies yields multiple
rows in table Person
• Hence, the association between Name and Address for the
same person is stored redundantly
– SSN is key of entity set, but (SSN, Hobby) is key of
corresponding relation
• The relation Person can’t describe people without hobbies
4/5/2017
CENG352 Database Management Systems
3
Example
ER Model
SSN
Name
1111 Joe
Address
Hobby
123 Main {biking, hiking}
Relational Model
SSN
Name
Address
1111 Joe
123 Main
1111 Joe
123 Main
…………….
Hobby
biking
hiking
Redundancy
4/5/2017
CENG352 Database Management Systems
4
Anomalies
• Redundancy leads to anomalies:
– Update anomaly: A change in Address must be
made in several places
– Deletion anomaly: Suppose a person gives up
all hobbies. Do we:
• Set Hobby attribute to null? No, since Hobby is part
of key
• Delete the entire row? No, since we lose other
information in the row
– Insertion anomaly: Hobby value must be
supplied for any inserted row since Hobby is
part of key
4/5/2017
CENG352 Database Management Systems
5
Decomposition
• Solution: use two relations to store Person
information
– Person1 (SSN, Name, Address)
– Hobbies (SSN, Hobby)
• The decomposition is more general: people
with hobbies can now be described
• No update anomalies:
– Name and address stored once
– A hobby can be separately supplied or
deleted
4/5/2017
CENG352 Database Management Systems
6
Normalization Theory
• Result of E-R analysis need further
refinement
• Appropriate decomposition can solve
problems
• The underlying theory is referred to as
normalization theory and is based on
functional dependencies (and other kinds,
like multi-valued dependencies)
4/5/2017
CENG352 Database Management Systems
7
Functional Dependencies
• Definition: A functional dependency (FD) on a
relation schema R is a constraint X Y, where X
and Y are subsets of attributes of R.
• An FD X Y is satisfied in an instance r of R if
for every pair of tuples, t and s: if t and s agree on
all attributes in X then they must agree on all
attributes in Y
– Key constraint is a special kind of functional
dependency: all attributes of relation occur on the
right-hand side of the FD:
• SSN SSN, Name, Address
4/5/2017
CENG352 Database Management Systems
8
Functional Dependencies
(Examples)
• Address ZipCode
– Stony Brook’s ZIP is 11733
• ArtistName BirthYear
– Picasso was born in 1881
• Autobrand Manufacturer, Engine type
– Pontiac is built by General Motors with gasoline engine
• Author, Title PublDate
– Shakespeare’s Hamlet published in 1600
4/5/2017
CENG352 Database Management Systems
9
Functional Dependency - Example
• Brokerage firm allows multiple clients to share an
account, but each account is managed from a single
office and a client can have no more than one account in
an office
– HasAccount (AcctNum, ClientId, OfficeId)
• keys are (ClientId, OfficeId), (AcctNum, ClientId)
– Client, OfficeId AcctNum
– AcctNum OfficeId
• Thus, attribute values need not depend only on key values
Write an SQL assertion that prevents a violation of this
constraint.
4/5/2017
CENG352 Database Management Systems
10
Entailment, Closure, Equivalence
• Definition: If F is a set of FDs on schema R and f is
another FD on R, then F entails f if every instance r of
R that satisfies every FD in F also satisfies f
– Ex: F = {A B, B C} and f is A C
• If Streetaddr Town and Town Zip then Streetaddr Zip
• Definition: The closure of F, denoted F+, is the set of
all FDs entailed by F
• Definition: F and G are equivalent if F entails G and G
entails F
4/5/2017
CENG352 Database Management Systems
11
Entailment (cont’d)
• Closure, entailment, and equivalence are semantic
concepts – defined in terms of the actual relations in the
“real world.”
– They define what these notions are, not how to compute them
• How to check if F entails f or if F and G are equivalent?
Solution: find algorithmic, syntactic ways to compute these
notions
• Important: The syntactic solution must be “correct” with respect to the
semantic definitions
• Correctness has two aspects: soundness and completeness
4/5/2017
CENG352 Database Management Systems
12
Armstrong’s Axioms for FDs
• This is the syntactic way of computing/testing
the various properties of FDs
• Reflexivity: If Y X then X Y (trivial FD)
– Name, Address Name
• Augmentation: If X Y then X Z YZ
– If Town Zip then Town, Name Zip, Name
• Transitivity: If X Y and Y Z then X Z
4/5/2017
CENG352 Database Management Systems
13
Soundness
• Axioms are sound: If an FD f: X Y can be derived
from a set of FDs F using the axioms, then f holds in
every relation that satisfies every FD in F.
• Example: Given X Y and X Z then
X XY
YX YZ
X YZ
Augmentation by X
Augmentation by Y
Transitivity
– Thus, X Y Z is satisfied in every relation where both X Y
and X Y are satisfied
• Therefore, we have derived the union rule for FDs: we can take the
union of the RHSs of FDs that have the same LHS
4/5/2017
CENG352 Database Management Systems
14
Derived inference rules
• Union: if XY and XZ, then XYZ.
• Decomposition: if XYZ, then XY and XZ.
• Pseudotransitivity: if XY and WYZ, then
WXZ.
• These additional rules are not essential; their soundness
can be proved using Armstrong’s Axioms.
• Exercise: Prove rules Decomposition and
Pseudotransitivity using A.A.
4/5/2017
CENG352 Database Management Systems
15
Completeness
• Axioms are complete: If F entails f , then f
can be derived from F using the axioms
• A consequence of completeness is the
following (naïve) algorithm to determining
if F entails f:
– Algorithm: Use the axioms in all possible ways
to generate F+ (the set of possible FD’s is finite
so this can be done) and see if f is in F+
4/5/2017
CENG352 Database Management Systems
16
Generating
+
F
F
AB C
union AB BCD
decomp
aug
trans AB BCDE
A D
AB BD
AB CDE
D E
aug
BCD BCDE
Thus, AB BD, AB BCD, AB BCDE, and AB CDE
are all elements of F+
4/5/2017
CENG352 Database Management Systems
17
Attribute Closure
• Calculating attribute closure leads to a more
efficient way of checking entailment
• The attribute closure of a set of attributes, X,
with respect to a set of functional dependencies,
F, (denoted X+F) is the set of all attributes, A,
such that X A
– X +F1 is not necessarily the same as X +F2 if F1 F2
• Attribute closure and entailment:
– Algorithm: Given a set of FDs, F, then X Y if and
only if X+F Y
4/5/2017
CENG352 Database Management Systems
18
Example - Computing Attribute Closure
XF+
X
F: AB C
AD
DE
AC B
A
AB
{A, D, E}
{A, B, C, D, E}
(Hence AB is a key)
Is AB E entailed by F?
Yes
Is D C entailed by F?
No
B
D
{B}
{D, E}
Result: XF+ allows us to determine FDs
of the form X Y entailed by F
4/5/2017
CENG352 Database Management Systems
19
Computation of Attribute Closure X+F
closure := X;
// since X X+F
repeat
old := closure;
if there is an FD Z V in F such that
Z closure and V closure
then closure := closure V
until old = closure
– If T closure then X T is entailed by F
4/5/2017
CENG352 Database Management Systems
20
Example: Computation of Attribute
Closure
Problem: Compute the attribute closure of AB with
respect to the set of FDs : AB C (a)
AD
(b)
D E (c)
AC B (d)
Solution:
Initially closure = {AB}
Using (a) closure = {ABC}
Using (b) closure = {ABCD}
Using (c) closure = {ABCDE}
4/5/2017
CENG352 Database Management Systems
21
Normal Forms
• Each normal form is a set of conditions on a schema
that guarantees certain properties (relating to
redundancy and update anomalies)
• First normal form (1NF) is the same as the definition
of relational model (relations = sets of tuples; each
tuple = sequence of atomic values)
• Second normal form (2NF) – a research lab accident;
has no practical or theoretical value – won’t discuss
• The two commonly used normal forms are third
normal form (3NF) and Boyce-Codd normal form
(BCNF)
4/5/2017
CENG352 Database Management Systems
22
BCNF
• Definition: A relation schema R is in BCNF if
for every FD X Y associated with R either
– Y X (i.e., the FD is trivial) or
– X is a superkey of R
• Example: Person1(SSN, Name, Address)
– The only FD is SSN Name, Address
– Since SSN is a key, Person1 is in BCNF
4/5/2017
CENG352 Database Management Systems
23
(non) BCNF Examples
• Person (SSN, Name, Address, Hobby)
– The FD SSN Name, Address does not satisfy
requirements of BCNF
• since the key is (SSN, Hobby)
• HasAccount (AccountNumber, ClientId, OfficeId)
– The FD AcctNum OfficeId does not satisfy BCNF
requirements
• since keys are (ClientId, OfficeId) and (AcctNum, ClientId)
4/5/2017
CENG352 Database Management Systems
24
Redundancy
• Suppose R has a FD A B. If an instance has 2 rows with
same value in A, they must also have same value in B (=>
redundancy, if the A-value repeats twice)
SSN
redundancy
SSN
Name
1111 Joe
1111 Joe
Name, Address
Address
Hobby
123 Main stamps
123 Main coins
• If A is a superkey, there cannot be two rows with same
value of A
– Hence, BCNF eliminates redundancy
4/5/2017
CENG352 Database Management Systems
25
Third Normal Form
• A relational schema R is in 3NF if for
every FD X Y associated with R either:
– Y X (i.e., the FD is trivial); or
– X is a superkey of R; or
– Every A Y is part of some key of R
• 3NF is weaker than BCNF (every schema
that is in BCNF is also in 3NF)
4/5/2017
CENG352 Database Management Systems
BCNF
conditions
26
3NF Example
• HasAccount (AcctNum, ClientId, OfficeId)
– ClientId, OfficeId AcctNum
• OK since LHS contains a key
– AcctNum OfficeId
• OK since RHS is part of a key
• HasAccount is in 3NF but it might still contain
redundant information due to AcctNum OfficeId
(which is not allowed by BCNF)
4/5/2017
CENG352 Database Management Systems
27
3NF Example
• HasAccount might store redundant data:
ClientId
1111
2222
3333
OfficeId
Stony Brook
Stony Brook
Stony Brook
AcctNum
28315
28315
28315
3NF: OfficeId part of key
FD: AcctNum OfficeId
redundancy
• Decompose to eliminate redundancy:
ClientId
AcctNum
1111
2222
3333
28315
28315
28315
BCNF (only trivial FDs)
4/5/2017
OfficeId
Stony Brook
AcctNum
28315
BCNF: AcctNum is key
FD: AcctNum OfficeId
CENG352 Database Management Systems
28
(Non) 3NF Example
• Person (SSN, Name, Address, Hobby)
– (SSN, Hobby) is the only key.
– SSN Name violates 3NF conditions
since Name is not part of a key and SSN
is not a superkey
4/5/2017
CENG352 Database Management Systems
29
Decompositions
• Goal: Eliminate redundancy by
decomposing a relation into several
relations in a higher normal form
• Decomposition must be lossless: it must be
possible to reconstruct the original relation
from the relations in the decomposition
• We will see why
4/5/2017
CENG352 Database Management Systems
30
Decomposition
• Schema R = (R, F)
– R is a set of attributes
– F is a set of functional dependencies over R
• The decomposition of schema R is a collection of
schemas Ri = (Ri, Fi) where
– R = i Ri for all i (no new attributes)
– Fi is a set of functional dependences involving only
attributes of Ri
– F entails Fi for all i (no new FDs)
• The decomposition of an instance, r, of R is a set
of relations ri = Ri(r) for all i
4/5/2017
CENG352 Database Management Systems
31
Example Decomposition
Schema (R, F) where
R = {SSN, Name, Address, Hobby}
F = {SSN Name, Address}
can be decomposed into
R1 = {SSN, Name, Address}
F1 = {SSN Name, Address}
and
R2 = {SSN, Hobby}
F2 = { }
4/5/2017
CENG352 Database Management Systems
32
Lossless Schema Decomposition
• A decomposition should not lose information
• A decomposition (R1,…,Rn) of a schema, R, is
lossless if every valid instance, r, of R can be
reconstructed from its components:
r = r1
r2
……
rn
where each ri = Ri(r)
4/5/2017
CENG352 Database Management Systems
33
Lossy Decomposition
The following is always the case (Think why?):
r r1
r2
rn
...
But the following is not always true:
r r1
Example:
SSN
r2
...
r
rn
r1
r2
Name
Address
SSN Name
Name
Address
1111 Joe
2222 Alice
3333 Alice
1 Pine
2 Oak
3 Pine
1111 Joe
2222 Alice
3333 Alice
Joe
Alice
Alice
1 Pine
2 Oak
3 Pine
The tuples (2222, Alice, 3 Pine) and (3333, Alice, 2 Oak) are in the join,
but not in the original
4/5/2017
CENG352 Database Management Systems
34
Lossy Decompositions:
What is Actually Lost?
• In the previous example, the tuples (2222, Alice, 3
Pine) and (3333, Alice, 2 Oak) were gained, not lost!
– Why do we say that the decomposition was lossy?
• What was lost is information:
– That 2222 lives at 2 Oak: In the decomposition, 2222 can
live at either 2 Oak or 3 Pine
– That 3333 lives at 3 Pine: In the decomposition, 3333 can
live at either 2 Oak or 3 Pine
4/5/2017
CENG352 Database Management Systems
35
Testing for Losslessness
• A (binary) decomposition of R = (R, F)
into R1 = (R1, F1) and R2 = (R2, F2) is
lossless if and only if :
– either the FD
• (R1 R2 ) R1 is in F+
– or the FD
• (R1 R2 ) R2 is in F+
4/5/2017
CENG352 Database Management Systems
36
Example
Schema (R, F) where
R = {SSN, Name, Address, Hobby}
F = {SSN Name, Address}
can be decomposed into
R1 = {SSN, Name, Address}
F1 = {SSN Name, Address}
and
R2 = {SSN, Hobby}
F2 = { }
Since R1 R2 = SSN and SSN R1 the
decomposition is lossless
4/5/2017
CENG352 Database Management Systems
37
Intuition Behind the Test for Losslessness
• Suppose R1 R2 R2 . Then a row of r1
can combine with exactly one row of r2 in
the natural join (since in r2 a particular set
of values for the attributes in R1 R2
defines a unique row)
R1 R2
…………. a
………… a
………… b
………… c
r1
4/5/2017
R1 R2
a ………...
b ………….
c ………….
r2
CENG352 Database Management Systems
38
Proof of Lossless Condition
• r r1
r2
– this is true for any decomposition
r2
• r r1
If R1 R2 R2 then
r2) = card (r1)
card (r1
(since each row of r1 joins with exactly one row of r2)
But card (r) card (r1) (since r1 is a projection of
and therefore card (r) card (r1
r 2)
Hence r = r1
4/5/2017
r2
CENG352 Database Management Systems
39
r)
Dependency Preservation
• Consider a decomposition of R = (R, F) into R1 = (R1,
F1) and R2 = (R2, F2)
– An FD X Y of F is in Fi iff X Y Ri
– An FD, f F may be in neither F1, nor F2, nor even
(F1 F2)+
• Checking that f is true in r1 or r2 is (relatively) easy
• Checking f in r1 r2 is harder – requires a join
• Ideally: want to check FDs locally, in r1 and r2, and have
a guarantee that every f F holds in r1
r2
• The decomposition is dependency preserving iff the sets
F and F1 F2 are equivalent: F+ = (F1 F2)+
– Then checking all FDs in F, as r1 and r2 are updated, can be
done by checking F1 in r1 and F2 in r2
4/5/2017
CENG352 Database Management Systems
40
Dependency Preservation
• If f is an FD in F, but f is not in F1 F2,
there are two possibilities:
– f (F1 F2)+
• If the constraints in F1 and F2 are maintained, f
will be maintained automatically.
– f (F1 F2)+
• f can be checked only by first taking the join of r1
and r2. This is costly.
4/5/2017
CENG352 Database Management Systems
41
Example
Schema (R, F) where
R = {SSN, Name, Address, Hobby}
F = {SSN Name, Address}
can be decomposed into
R1 = {SSN, Name, Address}
F1 = {SSN Name, Address}
and
R2 = {SSN, Hobby}
F2 = { }
Since F = F1 F2 the decomposition is
dependency preserving
4/5/2017
CENG352 Database Management Systems
42
Example
• Schema: R= (ABC; F) , F = {A B, B C, C B}
• Decomposition:
– (AC, F1), F1 = {AC}
• Note: AC F, but in F+
– (BC, F2), F2 = {B C, C B}
• A B (F1 F2), but A B (F1 F2)+.
– So F+ = (F1 F2)+ and thus the decompositions is still
dependency preserving
4/5/2017
CENG352 Database Management Systems
43
Example
• HasAccount (AccountNumber, ClientId, OfficeId)
f1: AccountNumber OfficeId
f2: ClientId, OfficeId AccountNumber
• Decomposition:
AcctOffice = (AccountNumber, OfficeId; {AccountNumber OfficeId})
AcctClient = (AccountNumber, ClientId; {})
• Decomposition is lossless: R1 R2= {AccountNumber} and
AccountNumber OfficeId
• In BCNF
• Not dependency preserving: f2 (F1 F2)+
• HasAccount does not have BCNF decompositions that are both
lossless and dependency preserving! (Check, eg, by enumeration)
• Hence: BCNF+lossless+dependency preserving decompositions
are not always achievable!
4/5/2017
CENG352 Database Management Systems
44
BCNF Decomposition Algorithm
Input: R = (R; F)
Decomp := R
while there is S = (S; F’) Decomp and S not in BCNF do
Find X Y F’ that violates BCNF // X isn’t a superkey in S
Replace S in Decomp with S1 = (XY; F1), S2 = (S - (Y - X); F2)
// F1 = all FDs of F’ involving only attributes of XY
// F2 = all FDs of F’ involving only attributes of S - (Y - X)
end
return Decomp
4/5/2017
CENG352 Database Management Systems
45
Example
Given: R = (R; T) where R = ABCDEFGH and
T = {ABH C, A DE, BGH F, F ADH, BH GE}
step 1: Find a FD that violates BCNF
Not ABH C since (ABH)+ includes all attributes
(BH is a key)
A DE violates BCNF since A is not a superkey (A+ =ADE)
step 2: Split R into:
R1 = (ADE, {A DE })
R2 = (ABCFGH; {ABH C, BGH F, F AH , BH G})
Note 1: R1 is in BCNF
Note 2: Decomposition is lossless since A is a key of R1.
Note 3: FDs F D and BH E are not in T1 or T2. But
both can be derived from T1 T2
(E.g., F A and A D implies F D)
Hence, decomposition is dependency preserving.
4/5/2017
CENG352 Database Management Systems
46
Example (con’t)
Given: R2 = (ABCFGH; {ABHC, BGHF, FAH, BHG})
step 1: Find a FD that violates BCNF.
Not ABH C or BGH F, since BH is a key of R2
F AH violates BCNF since F is not a superkey (F+ =AH)
step 2: Split R2 into:
R21 = (FAH, {F AH})
R22 = (BCFG; {})
Note 1: Both R21 and R22 are in BCNF.
Note 2: The decomposition is lossless (since F is a key of R21)
Note 3: FDs ABH C, BGH F, BH G are not in T21
or T22 , and they can’t be derived from T1 T21 T22 .
Hence the decomposition is not dependency-preserving
4/5/2017
CENG352 Database Management Systems
47
Properties of BCNF Decomposition
Algorithm
• A BCNF decomposition is not necessarily
dependency preserving
• But always lossless
• BCNF+lossless+dependency preserving is
sometimes unachievable (recall HasAccount)
4/5/2017
CENG352 Database Management Systems
48
Third Normal Form
• Compromise – Not all redundancy
removed, but dependency preserving
decompositions are always possible (and, of
course, lossless)
• 3NF decomposition is based on a minimal
cover
4/5/2017
CENG352 Database Management Systems
49
Minimal Cover
• A minimal cover of a set of dependencies, T, is a set of
dependencies, U, such that:
– U is equivalent to T (T+ = U+)
– All FDs in U have the form X A where A is a single
attribute
– It is not possible to make U smaller (while preserving
equivalence) by
• Deleting an FD
• Deleting an attribute from an FD (either from LHS or RHS)
– FDs and attributes that can be deleted in this way are called
redundant
4/5/2017
CENG352 Database Management Systems
50
Computing Minimal Cover
• Example: T = {ABH CK, A D, C E,
BGH F, F AD, E F, BH E}
• step 1: Make RHS of each FD into a single attribute
– Algorithm: Use the decomposition inference rule for FDs
– Example: F AD replaced by F A, F D ; ABH CK by
ABH C, ABH K
• step 2: Eliminate redundant attributes from LHS.
– Algorithm: If FD XB A T (where B is a single attribute)
and X A is entailed by T, then B was unnecessary
– Example: Can an attribute be deleted from ABH C ?
• Compute AB+T, AH+T, BH+T.
• Since C (BH)+T , BH C is entailed by T and A is redundant in
ABH C.
4/5/2017
CENG352 Database Management Systems
51
Computing Minimal Cover (con’t)
• step 3: Delete redundant FDs from T
– Algorithm: If T – {f} entails f, then f is redundant
• If f is X A then check if A X+T-{f}
– Example: BH F is entailed by E F, BH E, so
it is redundant
• Note: The order of steps 2 and 3 cannot be
interchanged!!
4/5/2017
CENG352 Database Management Systems
52
3NF Decomposition
Starting with a schema R = (R, T)
• step 1: Compute a minimal cover, U, of T. The
decomposition is based on U, but since U+ = T+
the same functional dependencies will hold
– A minimal cover for
T={ABHCK, AD, CE, BGHF, FAD,
E F, BH E}
is
U={BHC, BHK, AD, CE, FA, EF}
4/5/2017
CENG352 Database Management Systems
53
3NF Decomposition (con’t)
• step 2: Partition U into sets U1, U2, … Un
such that the LHS of all elements of Ui are the
same
– U1 = {BH C, BH K}, U2 = {A D},
U3 = {C E}, U4 = {F A}, U5 = {E F}
4/5/2017
CENG352 Database Management Systems
54
3NF Decomposition (con’t)
• step 3: For each Ui form schema Ri = (Ri, Ui),
where Ri is the set of all attributes mentioned in Ui
– Each FD of U will be in some Ri. Hence the
decomposition is dependency preserving
– R1 = (BHCK; BH C, BH K),
R2 = (AD; A D),
R3 = (CE; C E),
R4 = (FA; F A),
R5 = (EF; E F)
4/5/2017
CENG352 Database Management Systems
55
3NF Decomposition (con’t)
• step 4: If no Ri is a superkey of R, add schema R0 =
(R0,{}) where R0 is a key of R.
– R0 = (BGH, {})
• R0 might be needed when not all attributes are necessarily
contained in R1R2 …Rn
– A missing attribute, A, must be part of all keys
(since it’s not in any FD of U, deriving a key constraint from U
involves the augmentation axiom)
• R0 might be needed even if all attributes are accounted for in
R1R2 …Rn
– Example: (ABCD; {AB, CD}). Step 3 decomposition:
R1 = (AB; {AB}), R2 = (CD; {CD}). Lossy! Need to add
(AC; { }), for losslessness
– Step 4 guarantees lossless decomposition.
4/5/2017
CENG352 Database Management Systems
56
BCNF Design Strategy
• The resulting decomposition, R0, R1, … Rn , is
– Dependency preserving (since every FD in U is a FD of
some schema)
– Lossless (although this is not obvious)
– In 3NF (although this is not obvious)
• Strategy for decomposing a relation
– Use 3NF decomposition first to get lossless,
dependency preserving decomposition
– If any resulting schema is not in BCNF, split it using
the BCNF algorithm (but this may yield a nondependency preserving result)
4/5/2017
CENG352 Database Management Systems
57
Multivalued Dependencies
Fourth Normal Form
Reasoning About FD’s + MVD’s
58
Definition of MVD
• A multivalued dependency (MVD) on R,
X ->->Y , says that if two tuples of R
agree on all the attributes of X, then their
components in Y may be swapped, and the
result will be two tuples that are also in the
relation.
• i.e., for each value of X, the values of Y
are independent of the values of R-X-Y.
59
Example: MVD
Drinkers(name, addr, phones, beersLiked)
• A drinker’s phones are independent of the
beers they like.
– name->->phones and name ->->beersLiked.
• Thus, each of a drinker’s phones appears with
each of the beers they like in all
combinations.
• This repetition is unlike FD redundancy.
– name->addr is the only FD.
60
Tuples Implied by name->->phones
If we have tuples:
name
sue
sue
sue
sue
addr
a
a
a
a
phones
p1
p2
p2
p1
beersLiked
b1
b2
b1
b2
Then these tuples must also be in the relation.
61
MVD Rules
• Every FD is an MVD (promotion ).
– If X ->Y, then swapping Y ’s between two tuples that
agree on X doesn’t change the tuples.
– Therefore, the “new” tuples are surely in the
relation, and we know X ->->Y.
• Complementation : If X ->->Y, and Z is all the
other attributes, then X ->->Z.
62
Splitting Doesn’t Hold
• Like FD’s, we cannot generally split the left
side of an MVD.
• But unlike FD’s, we cannot split the right
side either --- sometimes you have to leave
several attributes on the right side.
63
Example: Multiattribute Right Sides
Drinkers(name, areaCode, phone, beersLiked,
manf)
• A drinker can have several phones, with the
number divided between areaCode and
phone (last 7 digits).
• A drinker can like several beers, each with
its own manufacturer.
64
Example Continued
• Since the areaCode-phone combinations for
a drinker are independent of the beersLikedmanf combinations, we expect that the
following MVD’s hold:
name ->-> areaCode phone
name ->-> beersLiked manf
65
Example Data
Here is possible data satisfying these MVD’s:
name
Sue
Sue
Sue
Sue
areaCode
650
650
415
415
phone
555-1111
555-1111
555-9999
555-9999
beersLiked
Bud
WickedAle
Bud
WickedAle
manf
A.B.
Pete’s
A.B.
Pete’s
But we cannot swap area codes or phones by themselves.
That is, neither name->->areaCode nor name->->phone
holds for this relation.
66
Fourth Normal Form
• The redundancy that comes from MVD’s is
not removable by putting the database
schema in BCNF.
• There is a stronger normal form, called
4NF, that (intuitively) treats MVD’s as
FD’s when it comes to decomposition, but
not when determining keys of the relation.
67
4NF Definition
•
A relation R is in 4NF if: whenever
X ->->Y is a nontrivial MVD, then X
is a superkey.
–
Nontrivial MVD means that:
1. Y is not a subset of X, and
2. X and Y are not, together, all the attributes.
–
Note that the definition of “superkey”
still depends on FD’s only.
68
BCNF Versus 4NF
• Remember that every FD X ->Y is also an
MVD, X ->->Y.
• Thus, if R is in 4NF, it is certainly in
BCNF.
– Because any BCNF violation is a 4NF violation
(after conversion to an MVD).
• But R could be in BCNF and not 4NF,
because MVD’s are “invisible” to BCNF.
69
Decomposition and 4NF
•
If X ->->Y is a 4NF violation for relation
R, we can decompose R using the same
technique as for BCNF.
1. XY is one of the decomposed relations.
2. All but Y – X is the other.
70
Example: 4NF Decomposition
Drinkers(name, addr, phones, beersLiked)
FD:
name -> addr
MVD’s: name ->-> phones
name ->-> beersLiked
• Key is {name, phones, beersLiked}.
• All dependencies violate 4NF.
71
Example Continued
• Decompose using name -> addr:
1. Drinkers1(name, addr)
In 4NF; only dependency is name -> addr.
2. Drinkers2(name, phones, beersLiked)
Not in 4NF. MVD’s name ->-> phones and
name ->-> beersLiked apply. No FD’s, so all
three attributes form the key.
72
Example: Decompose Drinkers2
• Either MVD name ->-> phones or name ->> beersLiked tells us to decompose to:
– Drinkers3(name, phones)
– Drinkers4(name, beersLiked)
73
Normalization Drawbacks
• By limiting redundancy, normalization helps
maintain consistency and saves space
• But performance of querying can suffer because
related information that was stored in a single
relation is now distributed among several
• Example: A join is required to get the names and
grades of all students taking CS305 in S2002.
SELECT S.Name, T.Grade
FROM Student S, Transcript T
WHERE S.Id = T.StudId AND
T.CrsCode = ‘CS305’
4/5/2017
AND
T.Semester = ‘S2002’
CENG352 Database Management Systems
74
Denormalization
• Tradeoff: Judiciously introduce redundancy to improve
performance of certain queries
• Example: Add attribute Name to Transcript
SELECT T.Name, T.Grade
FROM Transcript’ T
WHERE T.CrsCode = ‘CS305’
AND
T.Semester = ‘S2002’
– Join is avoided
– If queries are asked more frequently than Transcript
is modified, added redundancy might improve
average performance
– But, Transcript’ is no longer in BCNF since key is
(StudId, CrsCode, Semester) and StudId Name
4/5/2017
CENG352 Database Management Systems
75
Exercises
1) Consider the following table.
• Give an example of update anomaly, an example of deletion
anomaly and an example of insertion anomaly knowing that
– A product has many suppliers and can have many other products as a
substitute (i.e. a product can be replaced by its substitute).
– The purchase price is determined by a supplier for a product, while
the sale price is for a given product regardless of the supplier.
– The quantity is for a given product, again regardless of the supplier.
76
Solution
Update Anomaly:
- Changing the quantity of a product implies updating the
quantity for as many suppliers and substitutes there is for
the product.
Deletion Anomaly:
- By deleting the only substitute of a product, the whole
product entry needs to be removed.
Insertion Anomaly:
- We can’t add a substitute of a product if we do not know the
supplier of the product.
77
Exercises (cont.)
2) Give a schema of a decomposition that
avoids such anomalies.
• Solution:
Product(ProductID, Quantity, SalePrice)
Suppliers( ProductID, SupplierID, PurchasePrice)
Substitutes( ProductID, Substitute)
78
3) Assume we have a relation schema R= (player, salary, team, city).
An example relation instance:
player
salary
Jeter
15,600,000
Garciaparra 10,500,000
team
Yankees New
Red Sox
city
York
Boston
We expect the following functional dependencies to hold:
•
•
•
•
player →salary,
player → team, team →city
Argue that R is currently not in BCNF.
Decompose R into BCNF. Argue that the decomposition is lossless-join, and
that it preserves dependencies.
Find an alternative decomposition of R into BCNF which is still lossless-join,
but which not preserve dependencies. (State which dependency it does not
preserve.)
Show, by means of an example, that a decomposition into (player, salary, city)
and (team, city) is not lossless-join.
79