Relational Database Design - State University of New York

Download Report

Transcript Relational Database Design - State University of New York

Relational Database Design III
Database Principles
Normalization:
• A table is good if it is about one thing.
• A table is good if the only FDs are of the form
key  attribute
• Consider the table below with the following FDs. We
know that its CKs are {A,F}, {B,F} and {C,F}.
R
A B C
D
B, E -> C
A, F -> B
C -> A, D
B -> E
E F
• So R is a bad table with several bad FDs.
Database Principles
bad
good
bad
bad
Exercise:
• Label the FDs of the following table as good or bad. The
only CK is {B}.
R
A B C
D
E F
D -> C, A
B -> F
F -> C, E
B -> D
Database Principles
bad
good
bad
good
Exercise:
• Label the FDs of the following table as good or bad. The
CKs are {B,E}, {D,E} and {F,E}.
R
A B C
D
E F
A, B -> C
D, E -> B
F -> D
B, E -> F
D -> A
Database Principles
bad
good
bad
good
bad
Prime and Non-prime Attributes
• Attributes that belong to some CK are called prime.
• Other attributes are called non-prime.
• The CKs of R are {A,F}, {B,F} and {C,F}.
R
A B C
D
B, E -> C
A, F -> B
C -> A, D
B -> E
E F
• Prime Attributes: A, B, C, F
• Non-prime Attributes: D, E
Database Principles
Prime and Non-prime Attributes
• The CK of R is {B}.
R
A B C
D
D -> C, A
B -> F
F -> C, E
B -> D
E F
• Prime Attributes: B
• Non-prime Attributes: A, C, D, E, F
Database Principles
Prime and Non-prime Attributes
• The CKs are {B,E}, {D,E} and {F,E}.
R
A B C
D
A, B -> C
D, E -> B
F -> D
B, E -> F
D -> A
E F
• Prime Attributes: B, D, F, E
• Non-prime Attributes: A, C
Database Principles
The Normal Forms:
• The process of turning bad tables into good ones is
called normalization.
• The process of taking good tables and combining them
in such a way as to create a bad table (typically done for
the sake of efficiency) is called denormalization.
unnormalized
tables
BCNF
4NF
...
3NF
Database Principles
2NF
1NF
Basic Idea:
• Start with a table in one of the outer rings of the diagram
and decompose it into two tables that belong to an inner
ring.
BCNF
4NF
...
3NF
2NF
1NF
• NOTE: The closer to the center the “better” the tables.
Database Principles
Decomposing Tables:
• Decomposing a table means breaking it up into two or
more tables using the relational algebra project operator.
R
A
B
C
D
E
S
A
T
B
C
S = π A,B,C ( R )
C
D
E
T = π C,D,E ( R )
Exercise: Write these relational algebra operations in SQL.
Database Principles
Beware: Not all Table Decompositions are Safe:
• Consider the table
Student
SID
SName
DOB
Major
s1
s2
s3
s4
John
Mary
Alfredo
Tracey
1982
1983
1975
1982
Hist
Eng
Math
CS
• Suppose we decompose the table into two tables
S = π sid,sname,dob ( Student )
SID
s1
s2
s3
s4
T = π dob,major ( Student )
SName
DOB
DOB
John
Mary
Alfredo
Tracey
1982
1983
1975
1982
1982
1983
1975
1982
Database Principles
Major
Hist
Eng
Math
CS
Question:
• Do S and T contain the same information as Student?
• Examples:
What is Alfredo’s major?
Math
What year was Mary born in? 1983
What is John’s major?
History or Computer Science
So you can see that S and T do NOT contain the same
info as Student.
• The easy test is to try to recreate Student from S and T
using JOIN.
Database Principles
Answer:
• Join(S,T)
SID
SName
DOB
Major
s1
s1
s2
s3
s4
s4
John
John
Mary
Alfredo
Tracey
Tracey
1982
1982
1983
1975
1982
1982
Hist
CS
Eng
Math
Hist
CS
• We notice that the join table contains noise rows.
• This means we can not reconstruct Student exactly so
we have lost info using this decomposition.
• It is called a lossy-join decomposition but we might call it
a noisy-join decomposition.
Database Principles
Lossless-Join Decomposition
• A decomposition of a table R into two tables, S and T, is
called a lossless-join decomposition if R = join(S,T), with
no new rows and no rows lost.
• We might call this a noiseless-join decomposition.
• If a decomposition of R into two tables, S and T, is a
lossless-join decomposition then no information found in
the original table R is lost.
Database Principles
Main Theorem:
•
If R is a table and we decompose R into two tables, S
and T, such that
– union(S ,T)= R and
intersect( S , T ) is a key of either S or T
then this is a lossless-join decomposition of R and
R = join(S,T).
NOTE: What makes our original decomposition a lossy-join
∩ S , T ) is
decomposition is that {DOB} = intersect(
neither a key to S nor to T.
Database Principles
Theorem Proof:
• If R is a table and we decompose R into two tables, S
and T, such that
• union(S ,T) = R and
• intersect( S , T ) is a key of T.
Let |R| denote the number of rows in R.
|S| <= |R| and |T| <= |R| since both S and T are
produced by projecting over R.
R subset_of join(S,T) so |R| <= |join(S,T)|
Since intersect( S , T ) is a key to T, each value of
intersect( S , T ) appears once and only once in T. Thus
each row in S will join to only one row of T and so
|join(S,T)| <= |S|.
Since |S| <= |R| we conclude |join(S,T)| == |R| and so
the two tables Database
are equal.
Principles
Lossless-join Example:
• Consider the table
Student
SID
SName
DOB
Major
s1
s2
s3
s4
John
Mary
Alfredo
Tracey
1982
1983
1975
1982
Hist
Eng
Math
CS
• Suppose we decompose the table into two tables
S = π sid,sname,dob ( Student )
SID
s1
s2
s3
s4
T = π sid,dob,major ( Student )
SName
DOB
SID
DOB
Major
John
Mary
Alfredo
Tracey
1982
1983
1975
1982
s1
s2
s3
s4
1982
1983
1975
1982
Hist
Eng
Math
CS
Database Principles
Question:
• Do S and T contain the same information as Student?
• Examples:
What is Alfredo’s major?
Math
What year was Mary born in? 1983
What is John’s major?
History
So you can see that S and T do contain the same info as
Student.
• Try to recreate Student from S and T using JOIN.
Database Principles
Answer:
• Join(S,T)
SID
SName
DOB
Major
s1
s2
s3
s4
John
Mary
Alfredo
Tracey
1982
1983
1975
1982
Hist
Eng
Math
CS
• We notice that the join table contain no noise rows.
• This means we can reconstruct Student exactly so we
have not lost info using this decomposition.
• For that reason it is called a lossless-join decomposition
but we might call it a noiseless-join decomposition.
Database Principles
We Use Lossless-join Decompositions.
• In all of the work that follows, the only decompositions we
will be applying are lossless-join decompositions.
Database Principles
Normalization (1NF):
• First Normal Form: A table is in First Normal Form
(1NF) if all of its values are atomic.
Enrollment
SID
CrsID
s1
c1,c2,c3
NOT 1NF
Enrollment
SID
s1
s1
s1
CrsID
c1
c2
c3
NOTE: All our tables will be assumed 1NF
Database Principles
1NF
Normalization (2NF):
• Second Normal Form: A table is in Second Normal
Form (2NF) if it is in 1NF and there are no bad FDs of
the form:
prime (proper subset of candidate key) -> non-prime
• Note 1: This situation can not occur if all CKs are
singletons. In that case all prime attributes are also keys.
• Note 2: Suppose the CKs for a table are {A,E} and {B, E}
and suppose A, B --> C. This is not a 2NF violation since
{A,B} is not a subset of a CK. We will deal with this
situation alter.
Database Principles
2NF Violation Example:
• Suppose we have an alternative BOOK table whose CKs
are {ISBN} and {author,title}.
Book1
ISBN author title pub_name pub_date c_price a_addr
CKs: {ISBN}, {author,title}
FD:
author --> a_addr
-- 2NF violation
• Explanation: The prime attributes are {ISBN, author, title}
and a_addr is non-prime so the FD violates the 2NF rule.
Database Principles
2NF Violation Resolution
• To remove an FD that is a 2NF violation from a table
perform the following decomposition:
– create a new table containing all the attributes of the
bad FD
Author
author a_addr
CK: {author}
FD: author --> a_addr
-- good
– remove the attributes from the right-hand-side of the
bad FD from the original table
Book
ISBN author title pub_name pub_date c_price
Database Principles
CK: {ISBN}
FDs: all good
Did We Lose Information?
intersect(Book , Author ) = {author}
• {author} is the intersection of the two table schemas and is
the key to one of the tables (Author) so this is a losslessjoin decomposition.
• In other words:
join(Book, Author) = Book1
Database Principles
2NF Exercise:
• Consider the table below with the following FDs and
whose CKs are {A,F}, {B,F} and {C,F}.
R
A B C
D
B, E -> C
A, F -> B
C -> A, D
B -> E
E F
• Step 1: Identify the prime and non-prime attributes
prime: A, B, C, F
non-prime: D, E
• Step 2: Identify the 2NF violating FDs.
C --> D
and
B --> E
Database Principles
2NF Exercise:
• Step 3: Decompose the original table R, removing the
bad FDs
R
A B C
D
E F
decomposes to
R’
A B C F
S
C D
CKs: {A, F}, {B, F},
PK: C
{C,F}
FD: C -> D -- good
FDs: B -> C
-- bad
A, F -> B -- good
C -> A -- bad
Database Principles
T
B E
PK: B
FD: B -> E -- good
Exercise:
• In the previous example we combined the two FDs
as
B, E -> C
B -> E
-- FD 1
-- FD 2
B -> C
• Prove this is so.
(i) B -> B, E
(ii) B -> C
-- FD1, ident(B), aug+
-- (i), FD1, trans
Database Principles
Normalization (3NF):
• Third Normal Form: A table is in Third Normal Form
(3NF) if it is in 2NF and there are no bad FDs of the
form:
non-prime -> non-prime
Database Principles
3NF Violation Example:
• Suppose we have the original Cardholder table created
from the ER Diagram.
Cardholder
borrowerid
b_name
b_addr
b_status
loan_limit
pk
prime: borrowerid
non-prime: b_name, b_addr, b_status, loan_limit
borrowerid -> b_name
borrowerid -> b_addr
borrowerid -> b_status
borrowerid -> loan_limit
b_status -> loan_limit
-- good
-- good
-- good
-- good
-- 3NF violation
Database Principles
3NF Violation Resolution
• To remove an FD that is a 3NF violation from a table
perform the following decomposition:
– create a new table containing all the attributes of the
bad FD
Status
b_status loan_limit
CK: {b_status}
FD: b_status --> loan_limit
-- good
– remove the attributes from the right-hand-side of the
bad FD from the original table
Cardholder
borrowerid b_name b_addr b_status
Database Principles
CK: {borrowerid}
FDs: all good
3NF Exercise:
• The following table has only one CK and that is {B}.
Decompose the table.
R
A B C
D
D -> C, A
B -> F
F -> C, E
B -> D
E F
bad
good
bad
good
• Step 1: List the prime and non-prime attributes
prime: B
non-prime: A, C, D, E, F
Step 2: Identify the 3NF violating FDs.
D -> C, A
and
F -> C, E
Exercise: Prove that a table with only singleton CKs has no 2NF violations.
Database Principles
3NF Exercise:
• Step 3: Decompose the original table R, removing the
bad FDs
R
A B C
D
D -> C, A
B -> F
F -> C, E
B -> D
E F
decomposes to
R’
B D F
S
D C A
PK: B
FDs: B -> D,F -- good
PK: D
FD: D -> C,A -- good
T
F E
PK: F
FD: F -> E -- good
NOTE: The column C is missing from the table T because
we removed it from R when we created S so it was not
available when we created T.
Database Principles
Boyce-Codd Normal Form (BCNF):
• Boyce-Codd Normal Form: A table is in Boyce-Codd
Normal Form (BCNF) if it is in 3NF and there are no
remaining bad FDs of any kind.
• Any remaining bad FDs must be of one of the following
forms:
prime(not a key) -> prime
prime(not a subset of a candidate key) -> non- prime
prime(not subset of a candidate key), non-prime -> prime
prime(not subset of a candidate key), non-prime -> non-prime
Exercise: Prove non-prime

prime is impossible.
Database Principles
BCNF Violation Example:
• The table below describes the assignment of a police
officer (o) to drive a certain vehicle (v) during a certain
shift (s) on a certain day (d). The assignment is given a
certain readiness level (r ) that depends on the skills of
the officer and the equipment in the vehicle. The weather
(w) is a function of the date and the shift.
• The only CK is {VIN,Shift,Date}.
ShiftAssignment
Officer VIN Shift Date R_Level Weather
o
v
s
d
r
a
Database Principles
w
BCNF Violation Example:
ShiftAssignment
Officer VIN Shift Date R_Level Weather
o
v
s
d
r
w
a
Prime: VIN, Shift, Date
Non-prime: Officer, R_Level, Weather
FDs: VIN Shift, Date -> Officer
VIN, Officer -> R_Level
Shift, Date -> Weather
-- good
-- BCNF violation
-- BCNF violation
Database Principles
BCNF Violation Resolution
• To remove an FD that is a BCNF violation from a table
perform the following decomposition:
– create a new table containing all the attributes of the
bad FD
Readiness
VIN Officer R_Level
Weather
Shift Date Weather
CK: {VIN, Officer}
FD: VIN, Officer --> R_Level
CK: {Shift, Date}
FD: Shift, Date --> Weather
Database Principles
-- good
-- good
BCNF Violation Resolution
• To remove an FD that is a BCNF violation from a table
perform the following decomposition:
– remove the attributes from the right-hand-side of the
bad FDs from the original table
ShiftAssignment
Officer VIN Shift Date
PK: {VIN, Shift, Date}
FD: VIN, Shift, Date --> Officer
pk
Database Principles
-- good
Exercise:
R
A
B
C
D
FDs: A -> D
D -> B
A, C -> E
E -> A
E
CKs: (A, C} {E, C}
Prime: A, E, C
Non-prime: B, D
Violations:
A -> D
D -> B
A, C -> E
E -> A
-- 2NF viol
-- 3NF viol
-- good
-- BCNF viol
Database Principles
Exercise:
• Remove 2NF violations
R’
A
B
C
S
A
E
D
FD: A -> D
-- good
pk
A, C -> E
E -> A
-- good
-- BCNF viol
• Remove 3NF violations:
R’
A
B
C
E
D -> B seems to have disappeared??
But did it?
A -> D together with D -> B imply
A -> B
This new FD is another 2NF violation
Database Principles
Exercise:
• Remove 2NF violations (second try)
R’
A
B
C
S
A
E
D
B
pk
FD: A -> D,B
D -> B
-- good
-- 3NF viol
• Remove 3NF violations:
R”
A C
S
A
E
D
pk
A, C -> E -- good
E -> A
-- BCNF viol
A -> D -- good
Database Principles
S
D
B
pk
D -> B
-- good
Exercise:
• Remove BCNF violation
R’”
C E
S
A
pk
pk
D
A -> D -- good
S
D
B
pk
D -> B
-- good
U
A
E
pk
E -> A -- good
NOTE: The FD A, C -> E seems to have
disappeared but performing a join of R’” and U
will reproduce R”
R” = join(R’”,U)
and so recover the original FD (and its
associated Enterprise Rule).
Database Principles
Fourth Normal Form (4NF):
• By this time there are no more bad FDs.
• This makes the tables pretty good but sometimes it is still
possible to improve them.
• 4NF violations can come about from Enterprise Rules
that can not be expressed as an FD.
• 4NF violations typically come about when the two relationships, r1 and r2, described below are converted into a
single table.
E1
e1
r1
(1,n)
E2
(1,n)
e2
r2
(1,n)
Database Principles
E3
(1,n)
e3
4NF Violation Example:
• From the ERD
CAR
type
MAKER
name
sells
(1,n)
(1,n)
COUNTRY
name
sold_in
(1,n)
(1,n)
• We produce the table
sells_sold_in
car_type maker_name
country_name
• Instead of
sells
car_type maker_name
sells_in
maker_name
Database Principles
country_name
4NF violation example
What rows would the table sells_sold_in contain?
One possibility is
sells_sold_in = join(sells,sold_in)
This will mean that in sells_sold_in every car_type that a
particular car maker produces will be matched with
every country that the same car maker sells cars in.
If this is the case, then the sells_sold_in table contains a
4NF violation.
Database Principles
How to Recognize a 4NF Violation:
• Suppose we have a car maker that sells two models in
two countries.
sells
car_type maker_name
c1
c2
m1
m1
sells_in
maker_name
m1
m1
country_name
crty1
crty2
• In the single sells_sold_in table this would appear as
sells_sold_in
car_type maker_name
c1
c1
c2
c2
country_name
m1
ctry1
m1
crty2
m1
crty1
m2
crty2
Database Principles
How to Recognize a 4NF Violation:
• Within this table there may be a hidden Enterprise Rule
make_sells_in
car_type maker_name
c1
c1
c2
c2
m1
m1
m1
m2
country_name
ctry1
crty2
crty1
crty2
Enterprise Rule: If a car maker sells one car type in a country
then it sells all its car types in the same country.
Exercise: Verify that the insert, update and delete anomalies exist
in the table above and they do not exist in the tables below
sells
car_type maker_name
c1
c2
sells_in
maker_name
m1
m1
m1
m1
Database Principles
country_name
crty1
crty2
How to Recognize a 4NF Violation:
A table is a 4NF violation only if the permanent Enterprise
Rule exists and not whether or not, on a particular day, it
happens to contain all the rows of a join of two other
tables.
The sells_sold_in table has a 4NF violation only if
sells_sold_in = join(sells, sold_in)
every time you might ever construct this join.
NOTE: intersect(sells, sold_in ) is not a key in either sells or sold_in so as a
decomposition, this is not guaranteed to be a lossless-join decomposition.
It is lossless-join if sells_sold_in in a 4NF violation however. prove this.
Database Principles
How to Recognize a 4NF Violation looking at Rows:
• sells_sold_in is a 4NF violation if join(sells, sells_in) is
always equal to sells_sold_in
join(sells, sells_in) = maker_sells_in
sells
car_type maker_name
c1
c2
car_type maker_name
m1
m1
=
join
sells_in
maker_name
m1
m1
maker_sells_in
c1
c1
c2
c2
country_name
crty1
crty2
Database Principles
m1
m1
m1
m2
country_name
ctry1
crty2
crty1
crty2
4NF Violation Resolution
Decompose the table with a 4NF violation into two tables.
Since the 4NF-violation table is always equal to the join
the two decomposition tables this decomposition will be
a lossless-join decomposition.
Database Principles
Will We Ever Encounter a 4NF violation?
• Anyone who has taken this course should know not to
create a single table from the two relationships in the
original ER diagram so should not create a table
containing a 4NF violation.
• It is the presence of the Enterprise Rule mentioned
earlier in a single table to produces a 4NF violation.
• So if you do create the single table but the Enterprise
Rule is not true then the single table is not a violation of
the 4NF Rule.
Database Principles
Abstractly:
Suppose R is a table with two attribute sets X and Y.
Further suppose
union(X, Y) = R
intersect(X, Y) != 
intersect(X, Y) is not a key to either S = X(R) or T = Y(R)
Then R contains a 4NF violation if and only if this
decomposition is always lossless-join.
This is a good thing. it tells us that if there is a problem
there is always a solution.
Database Principles