Zvi`s changes, if any, are marked in green, they are not copyrighted

Download Report

Transcript Zvi`s changes, if any, are marked in green, they are not copyrighted

Zvi’s changes, if any, are marked in
green, they are not copyrighted by the
authors, and the authors are not
responsible for them
!Chapter 7: Relational Database Design
 Our goal in this chapter is to formalize your intuition from the
entity-relationship design approach. It’s actually kind of fun.
 If you are into the intellectual aesthetic of what we are doing,
notice that with ER, we captured notions that are common to all
database applications: 1-1, 1-many, many-to-many relationships,
ISA relationships, etc.
 When I design a database application I draw those entities and
relationships, but when I get down to the details, I use
“normalization theory” (aka database design theory) to be sure I
get this right.
Database System Concepts
7.2
©Silberschatz, Korth and Sudarshan
First Normal Form
 Domain is atomic if its elements are considered to be indivisible
units
• Examples of non-atomic domains:
• Set of names, composite attributes
• Identification numbers like CS101 that can be broken up into
parts
 A relational schema R is in first normal form if the domains of all
attributes of R are atomic
 Non-atomic values complicate storage and encourage redundant
(repeated) storage of data
• E.g. Set of accounts stored with each customer, and set of owners
stored with each account
• We assume all relations are in first normal form
Database System Concepts
7.3
©Silberschatz, Korth and Sudarshan
First Normal Form (Contd.)
 Atomicity is actually a property of how the elements of the domain
are used.
• E.g. Strings would normally be considered indivisible
• Suppose that students are given course numbers which are strings
of the form CS0012 or EE1127
• If the first two characters are extracted to find the department, the
domain of course numbers is not atomic.
• Doing so is a bad idea: leads to encoding of information in application
program rather than in the database.
 In practice people do it to answer many queries and that is why
SQL DML allows it, but we are in relational algebra…
 Story of a certain French bank….
Database System Concepts
7.4
©Silberschatz, Korth and Sudarshan
Pitfalls in Relational Database Design
 Relational database design requires that we find a
“good” collection of relation schemas. A bad design
may lead to
• Repetition of Information.
• Inability to represent certain information.
 Design Goals:
• Avoid redundant data
• Ensure that relationships among attributes are
represented
• Facilitate the checking of updates for violation of
database integrity constraints.
Database System Concepts
7.5
©Silberschatz, Korth and Sudarshan
A canonical example
 We first consider the relation schema R=R(E,G,S), where
• E:
Employee number
• G:
Grade (as in rank within a company)
• S:
Salary
 The users specified for us a set of semantic constraints:
• Each value of E has a single value of G associated with it.
• Each value of G has a single value of S associated with it.
 For simplicity, we will sometimes refer to relations schemes by
listing their attributes; thus we may write EGS instead of R
above.
 We will spend a lot of time discussing this example, if we
understand it, we understand more than half of normalization
theory needed in practice!
Database System Concepts
7.6
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
A sample instance
 Consider a sample instance of EGS:
• EGS:
E
a
b
g
d
G
A
B
A
C
S
1
2
1
1
 By examining the above sample instance, we will note that the
design suffers from various deficiencies.
Database System Concepts
7.7
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
Deficiencies of the design
 Redundancy: The fact (or more technically, the constraint) that G = A
implies S = 1 is stored twice in the relation.
 General problems:
• Inability to store partial information: If the salary schedule is augmented by a
new grade, G = D, with corresponding salary of S = 3, this fact cannot be
stored in the relation, unless E has the value of null.
(But it is quite clear that in this relation E is the only possibly primary key,so
we cannot have “several” tuples potentially with the value of null in this
column.)
• More difficult to enforce consistent updating.
• Information may disappear: If employee b leaves, we forget what salary is
given to grade B.
Database System Concepts
7.8
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
Decomposition
 The above problems can be solved by decomposing the relation
EGS into two relations:
• R1(EG)=pEG (R):
• R2(GS)=pGS(R):
SELECT E, G
SELECT G, S
FROM R;
FROM R;
 Obtaining,
• R1:
E
a
b
g
d
• R2:
G
A
B
C
Database System Concepts
G
A
B
A
C
S
1
2
1
7.9
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
Reconstructing the original relation
 How to reconstruct R from R1, R2?
 It is easy to see that it is enough to compute R = R1  R2 (the
natural join of R1 and R2, written using easily displayed fonts),
that is:
SELECT E, G, S
FROM R1, R2
WHERE R1.G = R2.G;
and indeed we obtain.
•
E
a
b
g
d
G
A
B
A
C
S
1
2
1
1
 Will this be true of other decompositions?
Database System Concepts
7.10
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
A second decomposition
 Let us try decomposing EGS into EG and ES. We obtain:
• ES:
E
a
b
g
d
S
1
2
1
1
• EG:
E
a
b
g
d
G
A
B
A
C
 The natural join gives the original relation back: EG  ES = EGS.
Database System Concepts
7.11
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
A third decomposition
 Let us attempt another “reasonable'' decomposition: EGS into ES
and GS. We obtain:
• ES:
E
a
b
g
d
S
1
2
1
1
• GS:
G
A
B
C
S
1
2
1
Database System Concepts
7.12
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
The natural join of the “small” relations
 Let us compute the natural join of ES and GS. We obtain:
•
E
a
a
b
g
g
d
d
G
A
C
B
A
C
A
C
S
1
1
2
1
1
1
1
 This relation is not equal to the original EGS relation
Database System Concepts
7.13
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
Decompositions vs. reconstructions
 By examining the 3 decompositions we observe that:
• Some decompositions allow us to reconstruct the original relation.
• Some decompositions do not allow us to reconstruct the original
relation
• We want a decomposition that allows us to reconstruct the original
relation (otherwise we get false facts).
Database System Concepts
7.14
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
Lossless join decompositions
 We say that some decomposition of a relation R into relations
R1, ..., Rm, where each Ri is the projection of R on some
attributes is a lossless join decomposition, if and only if:
• Each attribute of R appears in some Ri
• R is the natural join of R1, ..., Rm
 We will also use the term “valid'' for lossless join
Database System Concepts
7.15
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
Example
 Consider the relation schema:
Lending-schema = (branch-name, branch-city, assets,
customer-name, loan-number, amount)
 Redundancy:
• Data for branch-name, branch-city, assets are repeated for each loan that a
branch makes
• Wastes space
• Complicates updating, introducing possibility of inconsistency of assets value
 Null values
• Cannot store information about a branch if no loans exist
• Can use null values, but they are difficult to handle.
Database System Concepts
7.16
©Silberschatz, Korth and Sudarshan
Decomposition
 Decompose the relation schema Lending-schema into:
Branch-schema = (branch-name, branch-city,assets)
Loan-info-schema = (customer-name, loan-number,
branch-name, amount)
 All attributes of an original schema (R) must appear in
the decomposition (R1, R2):
R = R1  R2
 Lossless-join decomposition.
For all possible relations r on schema R
r = R1 (r)
R2 (r)
A very important property: If R = R1  R2, always:
r  R1 (r) R2 (r) Therefore lossless means: you do not
gain spurious tuples by joining, which seems the only
reasonable way to reconstruct the original relation.
Database System Concepts
7.17
©Silberschatz, Korth and Sudarshan
Example of Non Lossless-Join Decomposition
 Decomposition of R = (A, B)
R2 = (A)
A B
A
B
a
a
b
a
b
1
2
A(r)
B(r)
1
2
1
r
A (r)
Database System Concepts
R2 = (B)
B (r)
A
B
a
a
b
b
1
2
1
2
7.18
©Silberschatz, Korth and Sudarshan
Goal — Devise a Theory for the Following
 Decide whether a particular relation R is in “good” form.
 In the case that a relation R is not in “good” form, decompose it
into a set of relations {R1, R2, ..., Rn} such that
• each relation is in good form
• the decomposition is a lossless-join decomposition
• And dependencies are preserved (more about this later)
 Our theory is based on:
• functional dependencies
• multivalued dependencies (optional) Dennis doesn’t like these
because he has never seen them hold in practice. But look up in the
book if you like.
Database System Concepts
7.19
©Silberschatz, Korth and Sudarshan
Functional Dependencies
 Constraints on the set of legal relations.
 Require that the value for a certain set of attributes determines
uniquely the value for another set of attributes.
 A functional dependency is a generalization of the notion of a
key.
Database System Concepts
7.20
©Silberschatz, Korth and Sudarshan
Functional Dependencies (Cont.)
 Let R be a relation schema
a  R and b  R
 The functional dependency
ab
holds on R if and only if for any legal relations r(R), whenever any
two tuples t1 and t2 of r agree on the attributes a, they also agree
on the attributes b. That is,
t1[a] = t2 [a]  t1[b ] = t2 [b ]
 Example: Consider r(A,B) with the following instance of r.
1
1
3
4
5
7
 On this instance, A  B does NOT hold, but B  A does hold.
Database System Concepts
7.21
©Silberschatz, Korth and Sudarshan
An example
 Although functional dependencies are properties of a schema, and not
only of the individual instances, we will practice, and check if certain
FDs are satisfied by some sample instance:
•
A
a1
a2
a2
a1
a1
a2
•
•
•
•
•
AC
B
b1
b1
b2
b1
b2
b3
D
d1
d2
d3
d1
d2
d2
E
e1
e2
e3
e1
e4
e5
F
f1
f2
f3
f4
f5
f6
No
AB  C
Yes
E  CD
Yes
DB
No
F  ABC
Yes
Database System Concepts
C
c1
c1
c3
c1
c2
c3
7.22
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
Relative power of some FDs
 An important note: A  C is always at least as powerful as AB 
C
that is
 If a relation satisfies A  C it must satisfy AB  C
 What we are really saying is that if C = f(A), then of course C =
f(A,B)
 If not obvious to you, then look at it this way.
A  C says that any two rows that agree on their A values also
agree on their C values.
AB  C says that any two rows that agree on both their A and B
values also agree on their C values.
Well, if they agree on both their A and B values, then surely they
agree on their A values, so if AC held, we would be done. The
reverse does not hold.
Database System Concepts
7.23
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
Functional Dependencies (Cont.)
 K is a superkey for relation schema R if and only if K  R
 K is a candidate key for R if and only if
• K  R, and
• for no proper subset a  K, a  R
 Functional dependencies allow us to express constraints that
cannot be expressed using superkeys. Consider the schema:
Loan-info-schema = (customer-name, loan-number,
branch-name, amount).
We expect this set of functional dependencies to hold:
loan-number  amount
loan-number  branch-name
but would not expect the following to hold:
loan-number  customer-name
Database System Concepts
7.24
©Silberschatz, Korth and Sudarshan
Use of Functional Dependencies
 We use functional dependencies to:
• test relations to see if they are legal under a given set of functional
dependencies.
•
If a relation r is legal under a set F of functional dependencies, we
say that r satisfies F.
• specify constraints on the set of legal relations
• We say that F holds on R if all legal relations on R satisfy the set of
functional dependencies F.
 Note: A specific instance of a relation schema may satisfy a
functional dependency even if the functional dependency does not
hold on all legal instances. For example, a specific instance of
Loan-schema may, by chance, satisfy
loan-number  customer-name.
Database System Concepts
7.25
©Silberschatz, Korth and Sudarshan
Trivial FDs
 An FD X Y, where X and Y are sets of attributes is trivial
if and only if
Y X.
(Such FD gives no constraints, as it is always satisfied.)
 Example: AB A is trivial.
Database System Concepts
7.26
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
Decomposition and union of some FDs
 An FD X  A1 A2 ... Am, where Ai's are individual attributes
is equivalent to
the set of FDs:
X  A1,
X  A2,
...,
X Am.
 Example: ABC  DE is equivalent to ABC  D and ABC  E.
Database System Concepts
7.27
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
Logical implications of FDs
 It will be important to us to determine if a given set of FDs forces other
FDs to be true (necessarily implies that additional FDs must hold).
 Consider again the EGS relation. Which FDs are satisfied?
 E  G, G  S, E  S are all true in the real world.
 If the real world tells you only:
E  G and G  S
 Can you deduce on your own, without understanding the application that
E  S?
Database System Concepts
7.28
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
Logical implications of FDs (cont.)
 Yes, by simple logical argument: transitivity.
 Take any (set of) tuples that are equal on E. Then given E G
we know that they are equal on G. Then given G S we know
that they are equal on S. So we have shown that E S must
hold
 We say that E G, G S logically imply E S and we write
E G, G S |= E S.
Database System Concepts
7.29
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
Logical implications of FDs (cont.)
 If the real world tells you only:
• E G and E S,
 Can you deduce on your own, without understanding the
application that
• G S
 No, because of a counterexample:
•
E
a
b
G
A
A
S
1
2
 This relation satisfies E G and E S, but violates G S.
Database System Concepts
7.30
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
Closure of a Set of Functional
Dependencies
 Given a set F set of functional dependencies, there are certain
other functional dependencies that are logically implied by F.
• E.g. If A  B and B  C, then we can infer that A  C
 The set of all functional dependencies logically implied by F is the
closure of F.
 We denote the closure of F by F+.
Database System Concepts
7.31
©Silberschatz, Korth and Sudarshan
Closure of Attribute Sets
 Given a set of attributes a, define the closure of a under F
(denoted by a+) as the set of attributes that are functionally
determined by a under F:
a  b is in F+  b  a+
 Algorithm to compute a+, the closure of a under F (guilt by
association approach)
result := a;
while (changes to result) do
for each b  g in F do
begin
if b  result then result := result  g
end
Database System Concepts
7.32
©Silberschatz, Korth and Sudarshan
Example of Attribute Set Closure
 R = (A, B, C, G, H, I)
 F = {A  B
AC
CG  H
CG  I
B  H}
 (AG)+
1. result = AG
2. result = ABCG
(A  C and A  B)
3. result = ABCGH
(CG  H and CG  AGBC)
4. result = ABCGHI
(CG  I and CG  AGBCH)
 Is AG a candidate key?
1. Is AG a super key?
1. Does AG  R? Or equivalently: (AG)+ = R?
2. Is any subset of AG a superkey?
1. Does A  R? or equivalently: A+ = R?
2. Does G  R? or equivalently: G+ = R?
Database System Concepts
7.33
©Silberschatz, Korth and Sudarshan
!Example (order of FDs slightly changed)
 Let R = ABCDEFGHIJ
 Given FDs:
• F  BG
• A  DE
• BD
• CF
• IJ
 Then
ABC+ = ABCDEFG
and ABC  Z (for a set of attributes Z), if and only if Z  ABCDEFG
 Therefore:
• ABC FC is true
• ABC IC is false
Database System Concepts
7.34
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
EGS again
 We can easily show that
•
{E  G, G  S}+ contains E  S
• {E  G, E  S}+ does not contain G  S
So what we did in an ad-hoc manner previously can be
converted to an algorithmic procedure.
Database System Concepts
7.35
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
Uses of Attribute Closure
There are several uses of the attribute closure algorithm:
 Testing for superkey:
• To test if a is a superkey, we compute a+, and check if a+ contains all
attributes of R.
 Testing functional dependencies
• To check if a functional dependency a  b holds (or, in other words,
is in F+), just check if b  a+.
• That is, we compute a+ by using attribute closure, and then check if
it contains b.
• Is a simple and cheap test, and very useful
 Computing closure of F
• For each g  R, we find the closure g+, and for each S  g+, we
output a functional dependency g  S.
Database System Concepts
7.36
©Silberschatz, Korth and Sudarshan
Keys of relations
 The notion of an FD allows us to formally define keys.
 Given R, satisfying a set of FDs, a set of attributes X of R is a
key, if and only if:
• X+ = R.
• For every Y  X such that Y X, we have Y+ R.
 Note that if R does not satisfy any (nontrivial) FDs, then R is the
only key of R.
 In our example E  G, G  S, E  S

E was the only key of EGS.
Database System Concepts
7.37
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
Review of EGS
 Let us review the relation EGS. The “new relations” were:
• EG, with the key E and a non-trivial FD E  G.
• GS, with the key G and a non-trivial FD G  S.
• ES, with the key E and a non-trivial FD E  S.
 Each of those relations was “good,”
• I.e., there were no redundancies in any of them, each nontrivial FD
contained a key on the left side
 It turns out tha EG, GS is better than EG, ES as a
decomposition. Can you see why?
Database System Concepts
7.41
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
!Third Normal Form
 A relation schema R is in third normal form (3NF) if for all:
a  b in F+
at least one of the following holds:
• a  b is trivial (i.e., b  a)
• a is a superkey for R, that is check whether a+ = R
• Each attribute A in b – a is contained in a candidate key for R.
(NOTE: each attribute may be in a different candidate key)
 We now proceed to show how to obtain third normal form with a
lossless join decomposition.
Database System Concepts
7.47
©Silberschatz, Korth and Sudarshan
Canonical Cover
 Sets of functional dependencies may have redundant
dependencies that can be inferred from the others
• Eg: A  C is redundant in: {A  B, B  C, A  C}
 Parts of a functional dependency may be redundant
{A  B, B  C, A  CD} can be simplified to
{A  B, B  C, A  D}
Note, this is really a redundant dependency issue, as we can say
that in {A  B, B  C, A  C, A  D} , A  C is redundant
• E.g. on RHS:
• E.g. on LHS:
{A  B, B  C, AC  D} can be simplified to
{A  B, B  C, A  D}
This is NOT redundant dependency issue, but unnecessarily
weak description of dependencies
 Intuitively, a canonical cover of F is a “minimal” set of functional
dependencies equivalent to F, with no redundant dependencies
or having redundant parts of dependencies
Database System Concepts
7.48
©Silberschatz, Korth and Sudarshan
Extraneous Attributes
 Consider a set F of functional dependencies and the functional
dependency a  b in F.
• Attribute A is extraneous in a if A  a
and F logically implies (F – {a  b})  {(a – A)  b}.
• Attribute A is extraneous in b if A  b
and the set of functional dependencies
(F – {a  b})  {a (b – A)} logically implies F.
 Note: implication in the opposite direction is trivial in each of
the cases above, since a “stronger” functional dependency
always implies a weaker one
 Example: Given F = {A  B, AB  C }
• B is extraneous in AB  C because A  C is implied by F
(equivalently C is in A+)
 Example: Given F = {A  C, AB  CD}
• C is extraneous in AB  CD since AB  C can be inferred even
after deleting C
Database System Concepts
7.49
©Silberschatz, Korth and Sudarshan
Testing if an Attribute is Extraneous
 Consider a set F of functional dependencies and the functional
dependency a  b in F.
• To test if attribute A  a (left side) is extraneous in a
1. compute the attribute closure ( {a} – A)+ using the dependencies
in F including a  b,
2. check that ( {a} – A)+ contains b; if it does, A is extraneous
• To test if attribute A  b (right side) is extraneous in b
1. compute a+ using only the dependencies in
F’ = (F – {a  b})  {a (b – A)},
2. check that a+ contains A; if it does, A is extraneous
Database System Concepts
7.50
©Silberschatz, Korth and Sudarshan
Canonical Cover Algorithm (IMPORTANT)
 A canonical cover for F is a set of dependencies Fc such that
• F logically implies all dependencies in Fc, and
• Fc logically implies all dependencies in F, and
• No functional dependency in Fc contains an extraneous attribute, and
• Each left side of functional dependency in Fc is unique.
 To compute a canonical cover for F:
repeat
Use the union rule to replace any dependencies in F
a1  b1 and a1  b2 with a1  b1 b2
Find a functional dependency a  b with an
extraneous attribute either in a or in b
If an extraneous attribute is found, delete it from a  b
until F does not change
 Note: Union rule need not be applied until the end. Dennis prefers delaying
the union because it makes it easier to spot redundant relations. But the
book uses this approach.
Database System Concepts
7.51
©Silberschatz, Korth and Sudarshan
Canonical Cover Algorithm (Dennis Preference)
 To compute a canonical cover for F:
 Convert functional dependencies so right hand side of every
functional dependency has just one attribute
 repeat
Remove redundant functional dependencies
Find extraneous attributes from the left side of functional
dependencies and remove them
until F does not change
 Apply the union rule.
 You may use either algorithm. They are close but not the same.
Database System Concepts
7.52
©Silberschatz, Korth and Sudarshan
The EmToPrHoSkLoRo relation
 The relation deals with employees who use tools on projects and work a
certain number of hours per week. An employee may work in various
locations and has a variety of skills. All employees having a certain skill
and working in a certain location meet in a specified room once a week.
 The attributes of the relation are:
• Em:
Employee
• To:
Tool
• Pr:
Project
• Ho:
Hours per week
• Sk:
Skill
• Lo:
Location
• Ro:
Room for meeting
Database System Concepts
7.53
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
The FDs of the relation
 The relation satisfies the following FDs:
• Each employee uses a single tool: Em  To
• Each employee works on a single project: Em  Pr.
• Each tool can be used on a single project only: To  Pr.
• An employee uses each tool for the same number of hours each
week: EmTo  Ho.
• All the employees working in a location having a certain skill always
meet in the same room: SkLo  Ro.
• Each room is in one location only: Ro  Lo.
Database System Concepts
7.54
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
Our FDs
1. Em  To
2. Em  Pr
3. To Pr
4. EmTo  Ho
5. SkLo  Ro
6. Ro  Lo
Database System Concepts
7.55
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
Run the algorithm
 No RHS can be combined, so we check whether there are any
redundant attributes.
 We start with FD 1.
• We check whether we can remove To from Em  To. This is
possible if we can derive Em  To using
Em  Pr
To  Pr
EmTo  Ho
SkLo  Ro
Ro  Lo
Computing the closure of Em using the above FDs gives us only EmPr,
so the attribute To must be kept.
Database System Concepts
7.56
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
Run the algorithm
• We check whether we can remove Pr (from FD2). This is possible if
we can derive Em  Pr using
Em  To
To  Pr
EmTo  Ho
SkLo  Ro
Ro  Lo
Computing the closure of Em using the above FDs gives us
EmToPrHo, so FD 2 can be removed..
Database System Concepts
7.57
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
Run the algorithm
 We now have
1. Em  To
2. To  Pr
3. EmTo  Ho
4. SkLo  Ro
5. Ro  Lo
 The next one is 3.
• We check if Em can be removed from the left. This is possible if we
can derive To  Ho using all the FDs. Computing the closure of To
using the FDs gives ToPr, and therefore Em cannot be removed
• We check if To can be removed. This is possible if we can derive
Em  Ho using all the FDs. Computing the closure of Em using the
FDs gives EmToPrHo, and therefore To can be removed
Database System Concepts
7.58
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
Run the algorithm
 We now have
1. Em  To
2. To  Pr
3. Em  Ho
4. SkLo  Ro
5. Ro  Lo
Database System Concepts
7.59
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
Run the algorithm
 We now attempt to remove an attribute from the LHS of 4, and





an entire FD, but neither is possible. We try again, but we cannot
change anything.
Therefore we are done
To form the tables, simply combine the left hand and right hand
sides (after combining tables FDs having a common left hand
side via the union rule). Then remove any relation whose
attributes are subsets of some other relation.
In our example:
EmToHo ToPr SkLoRo
If any of these tables contains a superkey of all attributes, we
have a lossless join decomposition. That wasn’t the case so we
add EmSkLo. (Check that attribute cover gives all attributes and
this is minimal.)
Database System Concepts
7.60
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
Why is it necessary to store a global key
 Why is it necessary to store the global key?
 Consider the relation: LnFnVaSa:
• Ln:
Last Name
• Fn
First Name
• Va
Vacation days per year
• Sa
Salary
 The functional dependencies are:
• Ln  Va
• Fn  Sa
 The key is
LnFn
 The relation is not in 3NF
Database System Concepts
7.61
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
Why is it necessary to store . . . (cont.)
 Our algorithm (without the key being included) will produce the
decomposition
• LnVa
• FnSa
 This is not a lossless-join decomposition
• In fact we do not know who the employees are (what are the valid
pairs of LnFn)
 So we decompose
• LnVa
• FnSa
• LnFn
Database System Concepts
7.62
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
How about EGS
 Applying to algorithm to
1. E  G
2. G  S
3. E  S
 The only thing we can do is to find that in FD 3, attribute S is
redundant, and therefore we get as canonical cover:
1. E  G
2. G  S
Database System Concepts
7.63
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
Decomposition of EGS
 Applying the algorithm to EGS, we get our desired
decomposition:
• EG
• GS
Database System Concepts
7.64
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
Back to SQL DDL
 How are we going to express in SQL what we have learned?
 We need to express:
• keys
• functional dependencies
 Expressing keys is very easy, we use the PRIMARY KEY and
UNIQUE keywords
 Expressing functional dependencies is possible also by means of
a CHECK condition
• What we need to say for the relation SkLoRo is that each tuple
satisfies the following condition
There are no tuples in the relation with the same value of Ro and
different values of Lo
Database System Concepts
7.65
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
Back to SQL DDL (cont.)
 CREATE TABLE SkLoRo
(Sk …,
Lo …,
Ro…,
PRIMARY KEY (Sk,Lo),
CHECK (NOT EXISTS SELECT *
FROM SkLoRo AS copy
WHERE (SkLoRo.Ro = copy.Ro
AND NOT SkLoRo.Lo = copy.Lo));
Clever but I’ve never seen this in practice.
Database System Concepts
7.66
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
!Design Goals
 Goal for a relational database design is:
• Third Normal Form.
• Lossless join.
• Dependency preservation.
Database System Concepts
7.67
©Silberschatz, Korth and Sudarshan
Denormalization for Performance
 May want to use non-normalized schema for performance
 E.g. displaying customer-name along with account-number and
balance requires join of account with depositor
 Alternative 1: Use denormalized relation containing attributes of
account as well as depositor with all above attributes
• faster lookup
• Extra space and extra execution time for updates
• extra coding work for programmer and possibility of error in extra code
 Alternative 2: use a materialized view defined as
account
depositor
• Benefits and drawbacks same as above, except no extra coding work
for programmer and avoids possible errors
Database System Concepts
7.68
©Silberschatz, Korth and Sudarshan
Other Design Issues
 Some aspects of database design are not caught by normalization
 Sometimes performance dictates combining several rows into one:
Instead of earnings(company-id, year, amount), use
• earnings-2000, earnings-2001, earnings-2002, etc., all on the schema
(company-id, earnings).
• Above are in third normal form (in fact a slightly stronger optional form
called BCNF), but make querying across years difficult and needs new
table each year
• company-year(company-id, earnings-2000, earnings-2001, earnings-2002)
• Also in BCNF, but also makes querying across years difficult and
requires new attribute each year.
• Is an example of a crosstab, where values for one attribute become
column names
• Used in spreadsheets, and in data analysis tools
Database System Concepts
7.69
©Silberschatz, Korth and Sudarshan
Vertical Partitioning
 Can you see any reason why
 Acctbal(acctid, bal), acctaddress(acctid, street, city, zip)
might be better than
acct(acctid, bal, street, city, zip)?
 Both are normalized
Database System Concepts
7.70
©Silberschatz, Korth and Sudarshan