The Relational Model

Download Report

Transcript The Relational Model

Closure
• The closure of {B1…Bk} under the set of FDs
S, denoted by {B1…Bk}+, is defined as
follows:
{B1…Bk}+ = {B | any relation satisfies S
will also satisfies B1…BkB}
Computing the closure
•
•
•
•
Given: the set S and {A1,…,An}
Compute: {A1,…,An}+- denote this set by X
Step 1: X = {A1,…,An}
Step 2: find a FD B1…BkB in S such that
{B1,…,Bk}  X and B X, then X=X{B}
• Step 3: repeat step 2 until nothing more can
be added to X, then go to step 4
• Step 4: return X
Example
•
•
•
•
•
S = {ABC, BCAD, DE, CFB}
Compute {A,B}+
Step 1: X = {A,B}
Step 2: X = X {C}={A,B,C} because ABC
Step 3 back to step 2: X = X {D} because
BCAD
• Step 3 back to step 2: X = X {E} because DE
• Step 3 back to step 2: nothing more
• Step 3 go to step 4: return {A,B,C,D,E}
Correctness of closure algorithm
• It computes true functional dependencies
– proof: show that if B belongs to {A1,…,An}+ then A1…AnB
holds. By induction over the number of steps (n) used in adding an
attribute B into the set X
• n=0 then B belongs to {A1,…,An} and so A1…AnB is a trivial
functional dependency
• n  n+1: if B is added to X in the step n+1, then A1…AnBj for all j
by inductive hypothesis; this, together with B1…BkB, implies that
A1…AnB
• It computes all functional dependencies
– proof: show that if B does not belong to {A1,…,An}+ then
A1…AnB does not hold. By constructing an instance I of the
relation R such that the FD does not hold.
A in the closure
Others
111 … 111
111 … 111
000 … 000
111 … 111
Simple questions
• What is {A1,…,An}+ if {A1,…,An} is a key
of the relation?
• Can {A1,…,An}+ ={A1,…,An}?
• Does {B1,…,Bm} {A1,…,An} imply
{B1,…,Bm}+ {A1,…,An}+?
Transitive Rules
• Given
• then
A1…An  B1…Bm
B1…Bm  C1…Ck
A1…An  C1…Ck
Closing sets of FDs
• Given a set of FDs we can derive some
other FDs using the rules about FDs (e.g.
combining, splitting, and transitive)
• For a relation R, a set of FD is called a basis
for R if all other FDs of R can be derived
form it.
• A basis is minimal if none of its proper
subsets is a basis.
Projecting FDs
B’s
• Given:
R
S
– R with a set of FDs F
– S (a new relation) is obtained by removing the attributes {B1,…,Bm}
from R
• Questions: What are the FDs of S?
• Answer: if A1…AnC1…Ck is a FD of R and none of the Bs appears
on the left or right side ({B1,…,Bm}{A1,…,An,C1,…,Ck}=) is a
FD of S
Projecting - Example
• Given R(A,B,C,D) with the FDs AB,
BC, and CD.
• Remove the attribute B from R, we obtain a
new relation S(A,C,D).
• What are the FDs of S?
– AC?
– AD?
– CD?
We can compute this by:
•Compute all the closure of every subset of {A,C,D}
by using the FDs of R that do not contain B.
Homework
3.5.1 Consider a relation with schema R(A,B,C,D) and
FD’s ABC, CD, and DA.
– What are all the nontrivial FD’s that follow from the given
FD’s? List only the FDs with one attribute on the right? (5pt)
– What are the keys of R?
(5pt)
– What are the superkeys but not keys?
(5pt)
3.5.3 Show that the following rule holds:
if A1…AnB1…Bm and C1…CkD1…Dt hold
then A1…AnC1…Ck B1…BmD1…Dt also holds.
(5pt)
For those whole like fun:
3.5.4 Does the following hold:
– if AB then BA
– if ABC and AC then BC
3.5.8 A set of attributes is closed if X+=X. What are
the FDs of a relation R(A,B,C,D) if
– all sets of four attributes are closed
– the only closed sets are {} and {A,B,C,D}
– the closed sets are {}, {A,B}, {A,B,C,D}
(note: the cases are considered separate)
Stars: try the exercises with stars.
Design of Relational Database
Schema
title
year
length
studioName
starName
filmType
Star Wars
1977
124
Fox
Mark Hamill
color
Star Wars
1977
124
Fox
Harrison Ford
color
Star Wars
1977
124
Fox
Carrie Fisher
color
Mighty Ducks
1991
104
Disney
Emilio Estevez
color
Wayne’s World
1992
95
Paramount
Dana Carvey
color
Wayne’s World
1992
95
Paramount
Mike Meyers
color
Some observations:
• value of studioName is the same in several tuples
• value of filmType is also repeated
What wrong with it?
• redundancy
 store the same value unnecessary several time
• update anormalies  an update might require several changes
• deletion anormalies  losing information if delete a value
Possible ways to avoid anormalies
(Intuition)
• The bad way: start again (Oh, no!)
• The natural way: try to decompose the
given relation into two or more relations
that
– contain the same information
– avoid the anormalies
Example
title
year
length
studioName
starName
filmType
Star Wars
1977
124
Fox
Mark Hamill
color
Star Wars
1977
124
Fox
Harrison Ford
color
Star Wars
1977
124
Fox
Carrie Fisher
color
Mighty Ducks
1991
104
Disney
Emilio Estevez
color
Wayne’s World
1992
95
Paramount
Dana Carvey
color
Wayne’s World
1992
95
Paramount
Mike Meyers
color
title
year
length
studioName
filmType
Star Wars
1977 124
Fox
color
Mighty
Ducks
1991 104
Disney
color
Wayne’s
World
1992 95
Paramount
color
title
year
starName
Star Wars
1977
Mark Hamill
Star Wars
1977
Harrison Ford
Star Wars
1977
Carrie Fisher
Mighty Ducks
1991
Emilio Estevez
Wayne’s World
1992
Dana Carvey
Wayne’s World
1992
Mike Meyers
MovieStudioStar(title, year, length, studioName, starName, filmType)
is decomposed into 2 relations
MovieStudio(title, year, length, studioName, filmType) and StarsIn(title, year, starName)
Decomposition
Given a relation R with schema {A1,…,An}. A
decomposition of R into two relations S and T
with schemas {B1,…,Bm} and {C1,…,Ck},
respectively, such that
1. {A1,…,An} = {B1,…,Bm}  {C1,…,Ck}
2. The tuples in S are the projections onto
{B1,…,Bm} of all the tuples in R.
3. The tuples in T are the projections onto
{C1,…,Ck} of all the tuples in R.
Example – Projections
title
year
length
studioName
starName
filmType
Star Wars
1977
124
Fox
Mark Hamill
color
Star Wars
1977
124
Fox
Harrison Ford
color
Star Wars
1977
124
Fox
Carrie Fisher
color
Mighty Ducks
1991
104
Disney
Emilio Estevez
color
Wayne’s World
1992
95
Paramount
Dana Carvey
color
Wayne’s World
1992
95
Paramount
Mike Meyers
color
How do we come up with this decomposition?
title
year
length
studioName
filmType
Star Wars
1977 124
Fox
color
Mighty
Ducks
1991 104
Disney
color
Wayne’s
World
1992 95
Paramount
color
title
year
starName
Star Wars
1977
Mark Hamill
Star Wars
1977
Harrison Ford
Star Wars
1977
Carrie Fisher
Mighty Ducks
1991
Emilio Estevez
Wayne’s World
1992
Dana Carvey
Wayne’s World
1992
Mike Meyers
MovieStudioStar(title, year, length, studioName, starName, filmType)
is decomposed into 2 relations
MovieStudio(title, year, length, studioName, filmType) and StarsIn(title, year, starName)
Boyce-Codd Normal Form (BCNF)
• BCNF: a relation R is in BCNF iff:
whenever there is a nontrivial FD
A1…AnB for R, it is the case that
{A1,…,An} is a superkey for R.
• Why this definition? Answer: if a relation is
in BCNF then there is no anormaly.
• Example:
MovieStudioStar(title, year, length, studioName, starName, filmType): not in BCNF
MovieStudio(title, year, length, studioName, filmType): in BCNF
StarsIn(title, year, starName): in BCNF
Decomposition into BCNF
• Suppose that we decompose a relation R
into two relations S and T which are in
BCNF. The requirements for S and T:
– S and T is a decomposition of R
– it is possible to reconstruct R from S and T
• Will every decomposition of R satisfy these
two conditions?
• What are the FDs of the new relations?
Algorithm
• Given a relation R with the attributes {A1,…,An}.
• Step 1: For every nontrivial FD B1…BmB if {B1,…,Bm}
is a superkey then returns R (no decomposition is needed)
• Step 2: Takes a nontrivial FD B1…BmB such that
{B1,…,Bm} is not a superkey, then decomposes R into two
relations S and T with the following schema:
– S’s schema: {B1,…,Bm}+
– T’s schema: {B1,…,Bm}  ({A1,…,An}\{B1,…,Bm}+)
• Repeat Step 1&2 for S and T until no decomposition is
needed for every new relation; return the set of new
relations as the result
Example
The ‘new’ movie relation with the following attributes:
{title,year,studioName,president,presAddress} (we call this set ALL)
with the FDs: {title yearstudioName, studioNamepresident, presidentpresAddress}
Only one key: {title,year}
studioNamepresident violated BCNF
Step 2: takes studioNamepresident, decomposes into
– S with the schema {studioName}+={studioName,president,presAddress}
– T with the schema
{studioName,title,year}={studioName}  (ALL\ {studioName}+)
Check:
{studioName,title,year} is in BCNF (the first two FDs)
{studioName,president,presAddress} is not in BCNF
Continue with the decomposition of S using presidentpresAddress and we get the
following two relation schemas: {president,presAddress} and {president,studioName}
both are in BCNF.
The final result: {studioName,title,year}, {president,presAddress},{president,studioName}
Recovering information from a
decomposition
• Suppose that R with the schema {A1,…,An}
is decomposed into two relations S and T
according to the algorithm whose attributes
are {B1,…,Bm}+ and {B1,…,Bm}
({A1,…,An}\{B1,…,Bm}+)
• The tuples of R can be obtained by joining
all possible pairs of S and T where
{B1,…,Bm} have the same values.
Recovering …
the rest of the closure
the B’s
others
t’ (S)
Join
Projection
t (R)
t’’ (T)
{B1,…,Bm}
{B1,…,Bm}+\ {B1,…,Bm}
{A1,…,An}\{B1,…,Bm}+
Example – Decomposition and Recovering
title
year
length
studioName
starName
filmType
Star Wars
1977
124
Fox
Mark Hamill
color
Star Wars
1977
124
Fox
Harrison Ford
color
Star Wars
1977
124
Fox
Carrie Fisher
color
Mighty Ducks
1991
104
Disney
Emilio Estevez
color
Wayne’s World
1992
95
Paramount
Dana Carvey
color
Wayne’s World
1992
95
Paramount
Mike Meyers
color
title
year
length
studioName
filmType
Star Wars
1977 124
Fox
color
Mighty
Ducks
1991 104
Disney
color
Wayne’s
World
1992 95
Paramount
color
title
year
starName
Star Wars
1977
Mark Hamill
Star Wars
1977
Harrison Ford
Star Wars
1977
Carrie Fisher
Mighty Ducks
1991
Emilio Estevez
Wayne’s World
1992
Dana Carvey
Wayne’s World
1992
Mike Meyers
MovieStudioStar(title, year, length, studioName, starName, filmType) is not in BCNF
is decomposed into 2 relations that are in BCNF:
MovieStudio(title, year, length, studioName, filmType) and StarsIn(title, year, starName)
Some remarks
•
•
•
•
The algorithm will stop and output a set of BCNF relations.
Not every decomposition according to the algorithm is good
The FD’s for the new relations are determined by ‘projecting’.
If a decomposition is based on FDs (according to the algorithm)
then the recovering process will give us exactly the original
relation.
• If a decomposition is not based on FDs then we might not be
able to recover the original relation from the new ones:
– Example: R(A,B,C) with AB and we decompose it into S(A,B) and
T(B,C):
A
1
4
B
2
2
C
3
5
A
1
B
2
B
2
C
3
4
2
2
5
A
1
B
2
C
3
1
4
4
2
2
2
5
3
5
Third Normal Form (3NF)
• So far: if a relation is not in BCNF then
anormalies arise.
• Given a relation Bookings with the attributes:
– title: name of the movie
– theater: name of the theater where the movie is being
shown
– city: the city where the theater is located
(a tuple (m,t,c): represents the fact that movie m is
shown at theater t in city c)
Bookings(title,theater,city)
• The FDs of the relations:
– theater  city
– title city  theater
• theatercity violates the BCNF condition, why?
• decomposition yields: {theater,city} and {theater,title}
• Consider the relations:
Possible relations according to the FDs
of each schema
recovering
Violate the FD title citytheater
theater
Guild
city
Menlo
theater title
Guild Net
theater
Guild
title
Net
city
Menlo
Park
Menlo
Park
Park
Net
Menlo
Net
3NF
• A relaxation of the BCNF condition: a
relation R is in 3NF if: whenever there is a
nontrivial FD A1…AnB, either
{A1,…,An} is a superkey or B is a member
of some key.
• Bookings(title,theater,city) is in 3NF
Checking BCNF and 3NF
• Given R(A,B,C,D) with FDs ABC, CD, DA.
• Question: Indicate the BCNF violations and 3NF violations.
• Steps in answering the question:
– Step 1: compute all nontrivial FDs (right side: one att)
– Step 2: find all keys
– Step 3: find all the violations
• Step 1: ABC, CD, DA, ABD, CA, DBC, ACD
• Step 2: Keys – {A,B}, {C,B}, and {D,B}
• Step 3:
– BCNF violation: CD, DA, CA, ACD and their trivial extensions
(e.g. CDD, DAA,…)
– 3NF violation: none