Norm in detail
Download
Report
Transcript Norm in detail
Relational Database Design
Relational Database Design
RDBMS design issues – Pitfalls and Normalization.
Overview of Normal Forms
Pitfalls in Relational Database Design
Functional Dependencies
Decomposition
Boyce-Codd Normal Form
Third Normal Form
Multivalued Dependencies and Fourth Normal Form
Overall Database Design Process
Database System Concepts
7.2
©Silberschatz, Korth and Sudarshan
The Banking Schema
branch = (branch_name, branch_city, assets)
customer = (customer_id, customer_name, customer_street, customer_city)
loan = (loan_number, amount)
account = (account_number, balance)
employee = (employee_id. employee_name, telephone_number, start_date)
dependent_name = (employee_id, dname)
account_branch = (account_number, branch_name)
loan_branch = (loan_number, branch_name)
borrower = (customer_id, loan_number)
depositor = (customer_id, account_number)
cust_banker = (customer_id, employee_id, type)
works_for = (worker_employee_id, manager_employee_id)
payment = (loan_number, payment_number, payment_date, payment_amount)
savings_account = (account_number, interest_rate)
checking_account = (account_number, overdraft_amount)
Database System Concepts
7.3
©Silberschatz, Korth and Sudarshan
Combine Schemas?
Suppose we combine borrower and loan to get
bor_loan = (customer_id, loan_number, amount )
Result is possible repetition of information (L-100 in example below)
Database System Concepts
7.4
©Silberschatz, Korth and Sudarshan
A Combined Schema Without Repetition
Consider combining loan_branch and loan
loan_amt_br = (loan_number, amount, branch_name)
No repetition (as suggested by example below)
Database System Concepts
7.5
©Silberschatz, Korth and Sudarshan
What About Smaller Schemas?
Suppose we had started with bor_loan. How would we know to split up
(decompose) it into borrower and loan?
Write a rule “if there were a schema (loan_number, amount), then
loan_number would be a candidate key”
Denote as a functional dependency:
loan_number amount
In bor_loan, because loan_number is not a candidate key, the amount of a
loan may have to be repeated. This indicates the need to decompose
bor_loan.
Not all decompositions are good. Suppose we decompose employee into
employee1 = (employee_id, employee_name)
employee2 = (employee_name, telephone_number, start_date)
The next slide shows how we lose information -- we cannot reconstruct the
original employee relation -- and so, this is a lossy decomposition.
Database System Concepts
7.6
©Silberschatz, Korth and Sudarshan
A Lossy Decomposition
Database System Concepts
7.7
©Silberschatz, Korth and Sudarshan
Pitfalls in Relational Database Design
Relational database design requires that we find a
“good” collection of relation schemas. A bad design
may lead to
Repetition of Information.
Inability to represent certain information.
Design Goals:
Avoid redundant data
Ensure that relationships among attributes are
represented
Facilitate the checking of updates for violation of
database integrity constraints.
Database System Concepts
7.8
©Silberschatz, Korth and Sudarshan
RDBMS Design issues
So far we have assumed that attributes are grouped to form a
relation schema by using the common sense of database
designer or by mapping a schema defined by ER model.
We still need some formal measure of why one grouping of
attributes into a relation schema may be better than another.
Unsatisfactory relation schemas that do not meet certain
conditions – the normal form tests – are decomposed into
smaller relation schemas that meet the tests and hence
possess the desirable properties.
Thus, the normalization procedure provides database
designers with;
A formal framework for analyzing relation schemas based on
their keys and on the functional dependencies among their
attributes.
A series of normal form tests that can be carried out on
individual relation schemas so that the relational database can
be normalized to any desired degree.
Database System Concepts
7.9
©Silberschatz, Korth and Sudarshan
First Normal Form
A relational database table that adheres to 1NF is one that meets
a certain minimum set of criteria.
These criteria are basically concerned with ensuring that the
table is a faithful representation of a relation and that it is free of
repeating groups.
Some definitions of 1NF, most notably that of Edgar F. Codd,
make reference to the concept of atomicity.
Codd states that the "values in the domains on which each
relation is defined are required to be atomic with respect to the
DBMS."
Codd defines an atomic value as one that "cannot be
decomposed into smaller pieces by the DBMS (excluding certain
special functions).“
Meaning a field should not be divided into parts with more than
one kind of data in it such that what one part means to the DBMS
depends on another part of the same field.
Database System Concepts
7.10
©Silberschatz, Korth and Sudarshan
First Normal Form
Domain is atomic if its elements are considered to be indivisible
units
Examples of non-atomic domains:
Set of names, composite attributes
Identification numbers like CS101 that can be broken up into
parts
A relational schema R is in first normal form if the domains of all
attributes of R are atomic
Non-atomic values complicate storage and encourage redundant
(repeated) storage of data
Example: Set of accounts stored with each customer, and set
of owners stored with each account
We assume all relations are in first normal form (and revisit this
again!)
Database System Concepts
7.11
©Silberschatz, Korth and Sudarshan
First Normal Form (Cont’d)
Atomicity is actually a property of how the elements of the
domain are used.
Example: Strings would normally be considered indivisible
Suppose that students are given roll numbers which are
strings of the form CS0012 or EE1127
If the first two characters are extracted to find the
department, the domain of roll numbers is not atomic.
Doing so is a bad idea: leads to encoding of information in
application program rather than in the database.
Database System Concepts
7.12
©Silberschatz, Korth and Sudarshan
Goal — Devise a Theory for the Following
Decide whether a particular relation R is in “good” form.
In the case that a relation R is not in “good” form, decompose it
into a set of relations {R1, R2, ..., Rn} such that
each relation is in good form
the decomposition is a lossless-join decomposition
Our theory is based on:
functional dependencies
multivalued dependencies
Database System Concepts
7.13
©Silberschatz, Korth and Sudarshan
Overview of Normal Forms
1NF ( First Normal Form)
To understand
2NF
3NF
Concept of FD’s ( Functional Dependency) required
BCNF
To understand
4NF
5NF
Concept of MVD (Multi Valued Dependency) is required
Database System Concepts
7.14
©Silberschatz, Korth and Sudarshan
Normal Forms & FDs: Review
Unnormalized – There are multivalued attributes or repeating
groups
1 NF – No composite attributes or repeating groups,
2 NF – 1 NF plus no partial dependencies
3 NF – 2 NF plus no transitive dependencies
Database System Concepts
7.15
©Silberschatz, Korth and Sudarshan
Normalization
The basic objective of normalization is to reduce the various
anomalies in the database.
Normalization can be looked upon as a process of analyzing the
given relation schemas based on their FDs and primary keys to
achieve the desirable properties of ;
Minimizing redundancy
Minimizing the insertion, deletion, and update anomalies.
Unsatisfactory relation schemas that do not meet certain conditions –
the normal form tests – are decomposed into smaller relation schemas
that meet the tests and hence possess the desirable properties.
Thus, the normalization procedure provides database designers with;
A formal framework for analyzing relation schemas based on their keys and
on the functional dependencies among their attributes.
A series of normal form tests that can be carried out on individual relation
schemas so that the relational database can be normalized to any desired
degree.
Database System Concepts
7.16
©Silberschatz, Korth and Sudarshan
Normalization…
The normal form of a relation refers to the highest normal form
condition that it meets, and hence indicates the degree to which
it has been normalized.
Normal forms when considered in isolation from other factors, do
not guarantee a good database design.
It is generally not sufficient to check separately that each relation
schema in the database is, say, in BCNF or 3NF.
Rather, the process of normalization through decomposition
must also confirm the existence of additional properties that the
relation schemas, taken together should possess;
The Lossless join,
The dependency preservation property, which ensures that each
functional dependency is represented in some individual relations
resulting after decomposition.
Database System Concepts
7.17
©Silberschatz, Korth and Sudarshan
RDBMS design
RDBMS design involves checking the current design through
Normal Form Test.
If the design is not in desired Normal form, then
Decompose the Relations (Tables) into smaller ones
Fulfill the properties of decomposition.
Properties of decomposition
Functional dependency preservation
– Identify the FDs in the given Relation schema
– Apply Armstrong Axioms to find set of all FDs (Closure)
Database System Concepts
7.18
©Silberschatz, Korth and Sudarshan
Overview
To understand
2NF
3NF
FD’s required
BCNF
FD’s & Closure of FDs required
To understand
4NF
5NF
Database System Concepts
MVD is required
7.19
©Silberschatz, Korth and Sudarshan
First Normal Form
Domain is atomic if its elements are considered to be indivisible
units
Examples of non-atomic domains:
Set of names, composite attributes
Identification numbers like CS101 that can be broken up into
parts
A relational schema R is in first normal form if the domains of all
attributes of R are atomic
Non-atomic values complicate storage and encourage redundant
(repeated) storage of data
Database System Concepts
7.20
©Silberschatz, Korth and Sudarshan
Keys & Functional Dependency
A Key Attributes is special attribute which can
uniquely identify the record for Entity
A super key of an entity set is a set of one or more
attributes whose values uniquely determine each
entity.
Ex. Super key : ( customer-id, Customer-name)
A candidate key of an entity set is a minimal super key
Customer-id is candidate key of customer
account-number is candidate key of account
Although several candidate keys may exist, one of the
candidate keys is selected to be the primary key.
Functional Dependencies shows us
Inter – relation
among attributes
21
Database System Concepts
7.21
©Silberschatz, Korth and Sudarshan
Functional Dependencies and Keys
A FD is a generalization of the notion of a key.
How ?
If a given set of values for each attribute in X uniquely determines
each of the values of the attributes in Y
X - Y
For Example in Relation schema
Student (sid, name, supervisor_id, specialization),
we write the functional dependency:
{sid} {name, supervisor_id, specialization}
The sid determines all attributes (i.e., the entire record)
If two tuples in the relation student have the same sid, then they
must have the same values on all attributes.
In other words they must be the same tuple (since the relational
modes does not allow duplicate
records)
Database System Concepts
7.22
©Silberschatz, Korth and Sudarshan
Functional Dependency
Functional Dependency :
Inter – relation among attributes of an entity.
Let R be a relation on the relation scheme R, then R satisfies the
functional – dependency X - Y
If a given set of values for each attribute in X uniquely determines
each of the values of the attributes in Y.
( X determines Y
Y functionally dependent on X )
FD’s can be used to group the attributes into Relation-scheme,
which is in a particular Normal forms
Approach of Normalization :
Step I
: Test the given Relation scheme for any Normal form
Step II
: If the given R-scheme is not in any N.F. (2NF/3NF)
Step III
: Decompose the Relation-scheme into small Relationschema.
Step IV
: Test the decomposed schema for Normal forms.
Database System Concepts
7.23
©Silberschatz, Korth and Sudarshan
FDs example
Database System Concepts
7.24
©Silberschatz, Korth and Sudarshan
Database System Concepts
7.25
©Silberschatz, Korth and Sudarshan
Functional Dependency
Inter – relation among attributes of an entity.
Let R be a relation on the relation scheme R, then R
satisfies the functional – dependency X - Y
If a given set of values for each attribute in X uniquely
determines each of the values of the attributes in Y.
X determines Y
Y functionally dependent on X
FDs allows us to express constraints that we can
not express with Superkeys !
FD’s can be used to group the attributes into Relationscheme, which is in a particular Normal forms
Database System Concepts
7.26
©Silberschatz, Korth and Sudarshan
FDs
Note: A specific instance of a relation schema may satisfy a functional
dependency even if the functional dependency does not hold on all legal
instances.
For example, a specific instance of Loan may, by chance, satisfy
amount customer_name.
Functional dependencies allow us to express constraints that
cannot be expressed using superkeys. Consider the schema:
bor_loan = (customer_id, loan_number, amount ,
customer_name).
We expect this functional dependency to hold:
loan_number amount
but would not expect the following to hold:
amount customer_name
Database System Concepts
7.27
©Silberschatz, Korth and Sudarshan
How to identify FDs
Identification of correct FDs are key to efficient DBMS
Design .
There are many commonly committed errors in
identifying FDs.
R satisfies the functional – dependency X Y
If a given set of values for each attribute in X
uniquely determines each of the values of the
attributes in Y.
X determines Y
While identifying FDs we should ensure that each
attribute in X uniquely determines and not that
some tuples satisfy it !
Database System Concepts
7.28
©Silberschatz, Korth and Sudarshan
Common pitfalls in identifying FDs
Design Goals & Normalization
Design Anomalies
Identifying FDs : When & How ?
Procedure of FDs identifications ?
Pitfalls in FDs identification
Limitations with Normal Forms
Decompose the Relation Scheme
FDs to be taken care of while decomposition
Dependency preservation, Lossless join.
Database System Concepts
7.29
©Silberschatz, Korth and Sudarshan
Procedure to identify FDs
How to Identify the FDs in Scheme.
We start finding any two tuples with the
same X value and then Y values in these
tuples must be same.
Repeat these procedure until all such
pairs of tuples with the same X values are
examined with Y values.
Then verify that whether R in real world
scenario satisfies X Y or not ?
Should FDs be identified only by observing the
tuples/data ?
Analyse the FDs and see whether it will work ?
Database System Concepts
7.30
©Silberschatz, Korth and Sudarshan
Common pitfalls in FDs in the Schema
Whether the FDs
Course Professor
Professor Course
Satisfied in the following Schema ? Why so ?
If a given set of values for each attribute in X uniquely determines each of the
values of the attributes in Y.
Then verify that whether R in real world scenario satisfies Professor ->
Course or not ?
Can same value for Professor attribute may have more than one values for
Course in real world scenario?
DBMS Design & Normalization :
Step I : Test the given Relation scheme for any
Normal form
Step II : Identify & Analyse the FDs of the given
Scheme.
Step III If the given R-scheme is not in any N.F.
(2NF/3NF) then Step IV.
Step IV: Decompose the Relation-scheme into small
Relation-schema.
Step V : Test the decomposed schema for Normal
forms.
Database System Concepts
7.32
©Silberschatz, Korth and Sudarshan
Normalization Using Functional Dependencies
When we decompose a relation schema R with a set of
functional dependencies F into R1, R2,.., Rn we want
Lossless-join decomposition: Otherwise decomposition would result in
information loss.
No redundancy: The relations Ri preferably should be in either BoyceCodd Normal Form or Third Normal Form.
Dependency preservation: Let Fi be the set of dependencies F+ that
include only attributes in Ri.
Preferably the decomposition should be dependency preserving,
that is,
(F1 F2 … Fn)+ = F+
Otherwise, checking updates for violation of functional
dependencies may require computing joins, which is expensive.
Database System Concepts
7.33
©Silberschatz, Korth and Sudarshan
Functional Dependencies (Cont.)
A functional dependency is trivial if it is satisfied by all instances of a
relation
Example:
customer_name, loan_number customer_name
customer_name customer_name
In general, is trivial if
Loss less Join : so there is no information loss after decomposition
Dependency Preservation : Ensures that each functional dependency
is represented in some relation after decomposition.
Database System Concepts
7.34
©Silberschatz, Korth and Sudarshan
Closure of a Set of Functional
Dependencies
Given a set F set of functional dependencies, there are certain other
functional dependencies that are logically implied by F.
E.g. If A B and B C, then we can infer that A C
The set of all functional dependencies logically implied by F is the
closure of F.
We denote the closure of F by F+.
We can find all of F+ by applying Armstrong’s Axioms:
if , then
(reflexivity)
if , then
(augmentation)
if , and , then (transitivity)
These rules are
sound (generate only functional dependencies that actually hold) and
complete (generate all functional dependencies that hold).
Database System Concepts
7.35
©Silberschatz, Korth and Sudarshan
Example
R = (A, B, C, G, H, I)
F={ AB
AC
CG H
CG I
B H}
some members of F+
AH
by transitivity from A B and B H
AG I
by augmenting A C with G, to get AG CG
and then transitivity with CG I
CG HI
from CG H and CG I : “union rule” can be inferred from
– definition of functional dependencies, or
– Augmentation of CG I to infer CG CGI, augmentation of
CG H to infer CGI HI, and then transitivity
Database System Concepts
7.36
©Silberschatz, Korth and Sudarshan
Functional Dependencies (Cont.)
K is a superkey for relation schema R if and only if K R
K is a candidate key for R if and only if
K R, and
for no K, R
Functional dependencies allow us to express constraints that cannot be
expressed using superkeys. Consider the schema:
bor_loan = (customer_id, loan_number, amount ).
We expect this functional dependency to hold:
loan_number amount
but would not expect the following to hold:
amount customer_name
Database System Concepts
7.37
©Silberschatz, Korth and Sudarshan
1NF
1NF :- A Relation scheme is said to be in 1NF .if there
are no composite attributes, and every attribute is
having atomic or indivisible values.
2NF :- A Relation is said to be in 2NF;
1. IF it is in 1NF and
2. Non –key attribute are functionally-dependent
on the key – attributes.
further , if the key has more than one attribute, then no
non key attributes should be functionally dependent
upon a part of the key attributes.
Database System Concepts
7.38
©Silberschatz, Korth and Sudarshan
Database System Concepts
7.39
©Silberschatz, Korth and Sudarshan
Design Goals & Pitfalls with FDs
The basic objective of normalization is to reduce the various
anomalies in the database.
Normalization can be looked upon as a process of analyzing
the given relation schemas based on their FDs and primary
keys to achieve the desirable properties of ;
Minimizing redundancy
Minimizing the insertion, deletion, and update anomalies.
How to identify FDs and prime / nonprime attributes?
An Attribute A in a relation is called a prime attribute if A is a part of
any candidate key of the relation.
There are some pitfalls in FDs identifying !
Database System Concepts
7.40
©Silberschatz, Korth and Sudarshan
Decomposition
The bad design suggests that we should decompose that table.
It is generally not sufficient to check separately that each
relation schema in the database is, say, in 2NF or 3NF.
Rather, the process of normalization through
decomposition must also confirm the existence of
additional properties that the relation schemas, taken
together should possess;
The Lossless join,
The dependency preservation property, which ensures that
each functional dependency is represented in some individual
relations resulting after decomposition.
Database System Concepts
7.41
©Silberschatz, Korth and Sudarshan
How to identify Normal Forms
Unnormalized – There are multivalued attributes or repeating
groups, No Key attributes
1NF
2NF
3NF
BCNF
Concept of FD’s ( Functional Dependency) required
What are the Key Attributes ?
What are the Non-Key Attributes ?
Find out Functional Dependencies ?
Database System Concepts
7.42
©Silberschatz, Korth and Sudarshan
Second Normal Form
2NF :- A Relation is said to be in 2NF;
1. IF it is in 1NF and
2. Non –key attribute are functionally-dependent
on the key – attributes.
further ,if the key has more than one attribute ,then no non key
attributes should be functionally dependent upon a part of the
key attributes.
Database System Concepts
7.43
©Silberschatz, Korth and Sudarshan
2 NF Cont…
Key attribute
B
C
A
D
E
Entity
Nonkey
attribute
F
G
Partial Dependency is not allowed
Why ? What is the effect of allowing
partial dependency ?
Is Second Normal Form capable to give
good Database implementation ?
Database System Concepts
7.44
©Silberschatz, Korth and Sudarshan
First Normal Form
A relational database table that adheres to 1NF is one that meets
a certain minimum set of criteria.
These criteria are basically concerned with ensuring that the
table is a faithful representation of a relation and that it is free of
repeating groups.
Some definitions of 1NF, most notably that of Edgar F. Codd,
make reference to the concept of atomicity.
Codd states that the "values in the domains on which each
relation is defined are required to be atomic with respect to the
DBMS."
First Normal Form Scheme should be such that in a given table
there should be
No Composite Attributes
No Repeating groups and
All the attributes can be uniqueky identified by Key Attributes
Database System Concepts
7.45
©Silberschatz, Korth and Sudarshan
Normal Forms & FDs: Review
Unnormalized – There are multivalued attributes or repeating
groups
1 NF – No composite attributes or repeating groups,
2 NF – 1 NF plus no partial dependencies
3 NF – 2 NF plus no transitive dependencies
Database System Concepts
7.46
©Silberschatz, Korth and Sudarshan
Normal Form Test for TEACHES
Let us consider Teaches Relation for testing whether it fulfills
any Normal Form ?
What are the FDs in this scheme ?
TEACHES contains attributes Professor, Course, Room,
Room_Cap, Enrol_Lmt (Enrolment Limit).
The relation scheme for the relation TEACHES is (Prof, Course,
Room, Room_Cap,Enrol_lmt)
The domain the attribute Prof is all the faulty members of the
university.
The domain of the attribute Course is the courses offered by
the university.
The domain of Room is the rooms in the buildings of the
university.
The domain of Room_Cap is an integer value indicating the
seating capacity of the room.
Database System Concepts
7.47
©Silberschatz, Korth and Sudarshan
Design - Analysis
Go for Normal Form Tests ?
The Teaches relation is in First Normal Form or not ?
Identify the FDs in Scheme.
Should FDs be identified only by observing the tuples/data ?
Analyse the FDs and see how will it work ?
Go for higher Normal Form Tests ?
What is the remedy of the problem ?
Database System Concepts
7.48
©Silberschatz, Korth and Sudarshan
Teaches Relation
First Normal Form
What are the Key Attributes ?
What are the Non-Key Attributes ?
Find out Functional Dependencies ?
Are Non –key attribute functionally-dependent
on the key – attributes.?
Database System Concepts
7.49
©Silberschatz, Korth and Sudarshan
FDs in Scheme
The domain of the Enrlo_Lmt is also integer value and should be
less than or equal to the corresponding value for Room_Cap.
The TEACHES relation is in first normal form since it’s attributes
contain only atomic values and there are key attributes uniquely
identifying record.
FDs ?
The course is scheduled in a given room and each Course
uniquely identifies Room
Course Room
since the room has the given maximum number of available
seats, there is a functional dependency
Room Room_Cap
And hence from transitivity
Course Room Room_Cap
Thus the functional dependencies in this relation are
{Course (Prof, Room, Room_Cap, Enrol_Lmt),
Room Room_Cap}
Database System Concepts
7.50
©Silberschatz, Korth and Sudarshan
FDs Analysis
Database System Concepts
7.51
©Silberschatz, Korth and Sudarshan
Analysis of Scheme
There is an other transitive dependency (Here we assume that
Enrol_Lmt is the upper limit on registration for a course and is based
solely on the room capacity)
Room Room_Cap Enrol_Lmt
The presence of these transitive dependency in TEACHES will
cause the following problems:
The capacity of a room cannot be entered in the
database unless a course is scheduled in the room (
Insert Anamoly)and
The capacity of a room in which only one course is
scheduled will be deleted if the only course
scheduled in that room is deleted. ( Delete Anamoly)
Because the same room can appear more than once
in the database, there could be inconsistencies
between the multiple occurrences of the attribute
pair Room and Room_Cap.
Database System Concepts
7.52
©Silberschatz, Korth and Sudarshan
Why Anomalies?
TEACHES ( Professor, Course, Room, Room_Cap, Enrol_Lmt )
Anamolies occur in the given TEACHES scheme because of
partial dependencies;
which means Non-key attributes are NOT
functionally dependent on Key attributes !
This violates the condition of Second
Normal form !
Thus the given relation is in First Normal
Form only.
We have to Decompose the relation .
Database System Concepts
7.53
©Silberschatz, Korth and Sudarshan
Second Normal Form
2NF :- A Relation is said to be in 2NF;
1. IF it is in 1NF and
2. Non –key attribute are functionally-dependent
on the key – attributes.
further ,if the key has more than one attribute ,then no non key
attributes should be functionally dependent upon a part of the
key attributes.
Database System Concepts
7.54
©Silberschatz, Korth and Sudarshan
Normal Forms & Decomposition
Case Study of TEACHES Relation
First Normal Form?
Anomalies in 1NF
Decomposition for 2NF
Is 2NF without anomalies?
Decomposition for 3NF
Decomposition principles
Definition of 3NF
Need for Closure
Armstrong Axiom for Closure
Database System Concepts
7.55
©Silberschatz, Korth and Sudarshan
Decomposition
The bad design suggests that we should decompose that table.
It is generally not sufficient to check separately that each
relation schema in the database is, say, in 2NF or 3NF.
Rather, the process of normalization through
decomposition must also confirm the existence of
additional properties that the relation schemas, taken
together should possess;
The Lossless join,
The dependency preservation property, which ensures that
each functional dependency is represented in some individual
relations resulting after decomposition.
Database System Concepts
7.56
©Silberschatz, Korth and Sudarshan
First Normal Form
Database System Concepts
7.57
©Silberschatz, Korth and Sudarshan
Procedure to identify FDs
How to Identify the FDs in Scheme.
We start finding any two tuples with the
same X value and then Y values in these
tuples must be same.
Repeat these procedure until all such
pairs of tuples with the same X values are
examined with Y values.
Then verify that whether R in real world
scenario X Y satisfies or not ?
Should FDs be identified only by observing the
tuples/data ?
Analyse the FDs and see whether it will work ?
Database System Concepts
7.58
©Silberschatz, Korth and Sudarshan
Decomposition into 2NF
The Decomposed Relation does not have Partial Dependency.
Thus the Schema is now in Second Normal Form .
Is Second Normal Form capable to remove all anomalies
and give good Database implementation ?
Database System Concepts
7.59
©Silberschatz, Korth and Sudarshan
No Anomaly ?
Although 2 NF does not give any Partial dependency but there
can be other possibilities of dependencies,
There is a interrelation join dependency between
COURSE_DETAILS and ROOM_DETAILS to enforce the
constraint that the Enrol_Lmt be less than or equal to the
Room_Cap.
Relation COURSE_DETAILS has a Transitive dependency
Course Room Enrol_Lmt
We can not insert a Room value in the relation alongwith an
Enrol_Lmt value unless we have a Course value to go alonwith
Room value.
Non-key attribute is functionally dependent on another non
key attribute.!
Database System Concepts
7.60
©Silberschatz, Korth and Sudarshan
Further decomposition
Database System Concepts
7.61
©Silberschatz, Korth and Sudarshan
Third Normal Form :
Third Normal Form :
A relation scheme is said to be in 3 NF if ;
It is in 2 NF.
No non-key attribute is functionally-dependent upon any Non-
key attribute.
Thus, there should be no transitive-dependency of a non-key attribute
on the primary key.
3 NF Scheme does not allow partial dependencies like in 2NF but
additionally 3NF does not allow any transitive dependencies.
A relation scheme R < S, F > is in 3 NF if for all non-trivial Functional
dependencies in F + (closure of F) of the form X A, either X
contains a key (i.e. X is a super-key) or A is a prime attribute
Database System Concepts
7.62
©Silberschatz, Korth and Sudarshan
Normal Forms & FDs: Review
Unnormalized – There are multivalued attributes or repeating
groups and record can not be uniquely identified by Primary key
1 NF – No composite attributes or repeating groups,
2 NF – 1 NF plus no partial dependencies
3 NF – 2 NF plus no transitive dependencies
Database System Concepts
7.63
©Silberschatz, Korth and Sudarshan
Dependencies
Multivalued Attributes (or repeating groups): non-key
attributes or groups of non-key attributes the values of which
are not uniquely identified by (directly or indirectly) (not
functionally dependent on) the value of the Primary Key (or its
part).
As we can see Course_ID and Units (credits) are not
uniquely identified by Stud_ID and Name
STUDENT
Stud_ID
Name
Course_ID
Units
101
Lennon
MSI 250
3.00
101
Lennon
MSI 415
3.00
125
Johnson
MSI 331
3.00
Database System Concepts
7.64
©Silberschatz, Korth and Sudarshan
Dependencies Type
Partial Dependency – when an non-key attribute is
determined by a part, but not the whole, of a COMPOSITE
primary key.
Name is non-key attribute which is determined by Cust_ID
and does not require the other part of primary key namely
Order_ID
Partial
Dependency
CUSTOMER
Database System Concepts
Cust_ID
Name
Order_ID
101
AT&T
1234
101
AT&T
156
125
Cisco
1250
7.65
©Silberschatz, Korth and Sudarshan
Dependencies types
Transitive Dependency – when a non-key attribute
determines another non-key attribute.
Employee is the relation in which Emp_ID is the key attribute
Dept_ID is a non-key attribute and also the Dept_Name is
non-key attribute.
Transitive
Dependency
EMPLOYEE
Emp_ID
F_Name
L_Name
Dept_ID
Dept_Name
111
Mary
Jones
1
Acct
122
Sarah
Smith
2
Mktg
Database System Concepts
7.66
©Silberschatz, Korth and Sudarshan
Example 1: Determine NF with FDs
There is no composite
attributes. All attributes are
directly or indirectly
determined by the primary
key;, thus the relation is at
least in 1 NF
ISBN Title
ISBN Publisher
Publisher Address
BOOK
ISBN
Database System Concepts
Title
Publisher
7.67
Address
©Silberschatz, Korth and Sudarshan
Example 1: Determine NF
ISBN Title
ISBN Publisher
Publisher Address
The relation is at least in 1NF.
There is no COMPOSITE
primary key, therefore there
can’t be partial dependencies.
Therefore, the relation is at
least in 2NF
BOOK
ISBN
Database System Concepts
Title
Publisher
7.68
Address
©Silberschatz, Korth and Sudarshan
Example 1: Determine NF
ISBN Title
ISBN Publisher
Publisher Address
Publisher is a non-key attribute,
and it determines Address,
another non-key attribute.
Therefore, there is a transitive
dependency, which means that
the relation is NOT in 3 NF.
BOOK
ISBN
Database System Concepts
Title
Publisher
7.69
Address
©Silberschatz, Korth and Sudarshan
Example 1: Determine NF
ISBN Title
ISBN Publisher
Publisher Address
We know that the relation is at
least in 2NF, and it is not in 3
NF. Therefore, we conclude
that the relation is in 2NF.
BOOK
ISBN
Database System Concepts
Title
Publisher
7.70
Address
©Silberschatz, Korth and Sudarshan
Example 1: Determine NF
ISBN Title
ISBN Publisher
Publisher
Address
In our solution we will write the
following justification:
1) No Composite attributes,
therefore at least 1NF
2) No partial dependencies,
therefore at least 2NF
3) There is a transitive dependency
(Publisher Address), therefore,
not 3NF
Conclusion: The relation is in 2NF
BOOK
ISBN
Database System Concepts
Title
Publisher
7.71
Address
©Silberschatz, Korth and Sudarshan
Example 2: Determine NF
Product_ID
Description
All attributes are directly or
indirectly determined by the
primary key; therefore, the relation
is at least in 1 NF
ORDER
Order_No
Database System Concepts
Product_ID
7.72
Description
©Silberschatz, Korth and Sudarshan
Example 2: Determine
NF
Product_ID
Description
The relation is at least in 1NF.
There is a COMPOSITE Primary Key (PK) (Order_No,
Product_ID), therefore there can be partial
dependencies. Product_ID, which is a part of PK,
determines Description; hence, there is a partial
dependency. Therefore, the relation is not 2NF.
No sense to check for transitive dependencies!
ORDER
Order_No
Database System Concepts
Product_ID
7.73
Description
©Silberschatz, Korth and Sudarshan
Example 2: Determine
NF
Product_ID
Description
We know that the relation is at least
in 1NF, and it is not in 2 NF.
Therefore, we conclude that the
relation is in 1 NF.
ORDER
Order_No
Database System Concepts
Product_ID
7.74
Description
©Silberschatz, Korth and Sudarshan
Example 2: Determine
NF
Product_ID
Description
In your solution you will write the
following justification:
1) No M/V attributes, therefore at least 1NF
2) There is a partial dependency
(Product_ID Description), therefore not
in 2NF
Conclusion: The relation is in 1NF
ORDER
Order_No
Database System Concepts
Product_ID
7.75
Description
©Silberschatz, Korth and Sudarshan
Example 3: Determine NF
Comp_ID and No are not
determined by the primary
key; therefore, the relation
is NOT in 1 NF.
No sense in looking at
partial or transitive
dependencies.
Part_ID Description
Part_ID Price
Part_ID, Comp_ID No
PART
Part_ID
Database System Concepts
Descr
Price
7.76
Comp_ID
No
©Silberschatz, Korth and Sudarshan
Example 3: Determine NF
Part_ID Description
Part_ID Price
Part_ID, Comp_ID No
In your solution you will write
the following justification:
1) There are M/V attributes;
therefore, not 1NF
Conclusion: The relation is not
normalized.
PART
Part_ID
Database System Concepts
Descr
Price
7.77
Comp_ID
No
©Silberschatz, Korth and Sudarshan
Bringing a Relation to 1NF
STUDENT
Stud_ID
Name
Course_ID
Units
101
Lennon
MSI 250
3.00
101
Lennon
MSI 415
3.00
125
Johnson
MSI 331
3.00
Database System Concepts
7.78
©Silberschatz, Korth and Sudarshan
Bringing a Relation to 1NF
Option 1: Make a determinant of the repeating group (or the
multivalued attribute) a part of the primary key.
Composite
Primary Key
STUDENT
Stud_ID
Name
Course_ID
Units
101
Lennon
MSI 250
3.00
101
Lennon
MSI 415
3.00
125
Johnson
MSI 331
3.00
Database System Concepts
7.79
©Silberschatz, Korth and Sudarshan
Bringing a Relation to 1NF
Option 2: Remove the entire repeating group from the relation.
Create another relation which would contain all the attributes of
the repeating group, plus the primary key from the first relation.
In this new relation, the primary key from the original relation
and the determinant of the repeating group will comprise a
primary key.
STUDENT
Stud_ID
Name
Course_ID
Units
101
Lennon
MSI 250
3.00
101
Lennon
MSI 415
3.00
125
Johnson
MSI 331
3.00
Database System Concepts
7.80
©Silberschatz, Korth and Sudarshan
Bringing a Relation to 1NF
STUDENT
Stud_ID
Name
101
Lennon
125
Jonson
STUDENT_COURSE
Stud_ID
Course
Units
101
MSI 250
3
101
MSI 415
3
125
MSI 331
3
Database System Concepts
7.81
©Silberschatz, Korth and Sudarshan
Bringing a Relation to 2NF
Composite
Primary Key
STUDENT
Stud_ID
Name
Course_ID
Units
101
Lennon
MSI 250
3.00
101
Lennon
MSI 415
3.00
125
Johnson
MSI 331
3.00
Database System Concepts
7.82
©Silberschatz, Korth and Sudarshan
Bringing a Relation to 2NF
Goal: Remove Partial Dependencies
Partial
Dependencies
Composite
Primary Key
STUDENT
Stud_ID
Name
Course_ID
Units
101
Lennon
MSI 250
3.00
101
Lennon
MSI 415
3.00
125
Johnson
MSI 331
3.00
Database System Concepts
7.83
©Silberschatz, Korth and Sudarshan
Bringing a Relation to 2NF
Remove attributes that are dependent from the part
but not the whole of the primary key from the original
relation. For each partial dependency, create a new
relation, with the corresponding part of the primary
key from the original as the primary key.
STUDENT
Stud_ID
Name
Course_ID
Units
101
Lennon
MSI 250
3.00
101
Lennon
MSI 415
3.00
125
Johnson
MSI 331
3.00
Database System Concepts
7.84
©Silberschatz, Korth and Sudarshan
Bringing a Relation to 2NF
CUSTOMER
STUDENT_COURSE
Stud_ID
Name
Course_ID
Units
101
Lennon
MSI 250
3.00
101
Lennon
MSI 415
3.00
125
Johnson
MSI 331
3.00
Stud_ID
Course_ID
101
MSI 250
101
MSI 415
125
MSI 331
COURSE
STUDENT
Stud_ID
Name
Course_ID
Units
101
Lennon
MSI 250
3.00
101
Lennon
MSI 415
3.00
125
Johnson
MSI 331
3.00
Database System Concepts
7.85
©Silberschatz, Korth and Sudarshan
Bringing a Relation to 3NF
Goal: Get rid of transitive dependencies.
Transitive
Dependency
EMPLOYEE
Emp_ID
F_Name
L_Name
111
Mary
Jones
1
Acct
122
Sarah
Smith
2
Mktg
Database System Concepts
7.86
Dept_ID Dept_Name
©Silberschatz, Korth and Sudarshan
Bringing a Relation to 3NF
Remove the attributes, which are dependent on a
non-key attribute, from the original relation. For each
transitive dependency, create a new relation with the
non-key attribute which is a determinant in the
transitive dependency as a primary key, and the
dependent non-key attribute as a dependent.
EMPLOYEE
Emp_ID
F_Name
L_Name
111
Mary
Jones
1
Acct
122
Sarah
Smith
2
Mktg
Database System Concepts
7.87
Dept_ID Dept_Name
©Silberschatz, Korth and Sudarshan
Bringing a Relation to 3NF
EMPLOYEE
Emp_ID
F_Name
L_Name
Dept_ID Dept_Name
111
Mary
Jones
1
Acct
122
Sarah
Smith
2
Mktg
EMPLOYEE
Emp_ID
F_Name
L_Name
Dept_ID
111
Mary
Jones
1
122
Sarah
Smith
2
DEPARTMENT
Dept_ID Dept_Name
Database System Concepts
1
Acct
2
Mktg
7.88
©Silberschatz, Korth and Sudarshan
Decomposition
The bad design suggests that we should decompose that table.
It is generally not sufficient to check separately that each
relation schema in the database is, say, in 2NF or 3NF.
Rather, the process of normalization through
decomposition must also confirm the existence of
additional properties that the relation schemas, taken
together should possess;
The Lossless join,
The dependency preservation property, which ensures that
each functional dependency is represented in some individual
relations resulting after decomposition.
Database System Concepts
7.89
©Silberschatz, Korth and Sudarshan
Boyce-Codd Normal Form
A relation scheme R < S, F > is in Boyce Codd Normal Form
if for every non-trivial FD in F+ of the form X
A
Where X subset S
A belongs S
X is Superkey of R
“ BCNF is stronger than 3 NF ”
How ?
If a relation is in BCNF then it must be in 3NF,
but it is not necessary that if it is in BCNF then it should be in 3 NF
as well.
Database System Concepts
7.90
©Silberschatz, Korth and Sudarshan
Example
R = (A, B, C, G, H, I)
F={ AB
AC
CG H
CG I
B H}
some members of F+
AH
by transitivity from A B and B H
AG I
by augmenting A C with G, to get AG CG
and then transitivity with CG I
CG HI
from CG H and CG I : “union rule” can be inferred from
– definition of functional dependencies, or
– Augmentation of CG I to infer CG CGI, augmentation of
CG H to infer CGI HI, and then transitivity
Database System Concepts
7.91
©Silberschatz, Korth and Sudarshan
Boyce-Codd Normal Form
A relation scheme R < S, F > is in Boyce Codd Normal Form
if for every non-trivial FD in F+ of the form X
A
Where X subset S
A belongs S
X is Superkey of R
“ BCNF is stronger than 3 NF ”
How ?
If a relation is in BCNF then it must be in 3NF,
but it is not necessary that if it is in BCNF then it should be in 3 NF
as well.
Database System Concepts
7.92
©Silberschatz, Korth and Sudarshan
BCNF
A relation schema R is in BCNF with respect to a set F of functional
dependencies if for all functional dependencies in F+ of the form
, where R and R, at least one of the following holds:
is trivial (i.e., )
is a superkey for R
Boyce-Codd normal form (or BCNF) requires that
there are no non-trivial functional dependencies of
attributes on something else than a superset of a
candidate key. At this stage, all attributes are
dependent on a key, a whole key and nothing but a
key (excluding trivial dependencies, like A->A).
Database System Concepts
7.93
©Silberschatz, Korth and Sudarshan
Overview
To understand
2NF
3NF
BCNF
FD’s required
To understand
4NF
5NF
Database System Concepts
MVD is required
7.94
©Silberschatz, Korth and Sudarshan
Normalization Using Functional Dependencies
When we decompose a relation schema R with a set of
functional dependencies F into R1, R2,.., Rn we want
Lossless-join decomposition: Otherwise decomposition would result in
information loss.
No redundancy: The relations Ri preferably should be in either BoyceCodd Normal Form or Third Normal Form.
Dependency preservation: Let Fi be the set of dependencies F+ that
include only attributes in Ri.
Preferably the decomposition should be dependency preserving,
that is,
(F1 F2 … Fn)+ = F+
Otherwise, checking updates for violation of functional
dependencies may require computing joins, which is expensive.
Database System Concepts
7.95
©Silberschatz, Korth and Sudarshan
Example
R = (A, B, C)
F = {A B, B C)
R1 = (A, B), R2 = (B, C)
Lossless-join decomposition:
R1 R2 = {B} and B BC
Dependency preserving
R1 = (A, B), R2 = (A, C)
Lossless-join decomposition:
R1 R2 = {A} and A AB
Not dependency preserving
(cannot check B C without computing R1
Database System Concepts
7.96
R2)
©Silberschatz, Korth and Sudarshan
Testing for Dependency Preservation
To check if a dependency is preserved in a decomposition of
R into R1, R2, …, Rn we apply the following simplified test (with
attribute closure done w.r.t. F)
result =
while (changes to result) do
for each Ri in the decomposition
t = (result Ri)+ Ri
result = result t
If result contains all attributes in , then the functional dependency
is preserved.
We apply the test on all dependencies in F to check if a
decomposition is dependency preserving
This procedure takes polynomial time, instead of the exponential
time required to compute F+ and (F1 F2 … Fn)+
Database System Concepts
7.97
©Silberschatz, Korth and Sudarshan
Design Objectives & FDs Preservation
The basic objective of normalization is to reduce the various
anomalies in the database.
Normalization can be looked upon as a process of analyzing
the given relation schemas based on their FDs and primary
keys to achieve the desirable properties.
How to identify FDs and prime / nonprime attributes?
An Attribute A in a relation is called a prime attribute if A is a part of
any candidate key of the relation.
Normalization goals: Dependency preservation
Trivial Dependency :defn.
3 Normal Form & BCNF!
Database System Concepts
7.98
©Silberschatz, Korth and Sudarshan
Normalization Using Functional Dependencies
When we decompose a relation schema R with a set of
functional dependencies F into R1, R2,.., Rn we want
Lossless-join decomposition: Otherwise decomposition would result in
information loss.
No redundancy: The relations Ri preferably should be in either BoyceCodd Normal Form or Third Normal Form.
Dependency preservation: Let Fi be the set of dependencies F+ that
include only attributes in Ri.
Preferably the decomposition should be dependency preserving,
that is,
(F1 F2 … Fn)+ = F+
Otherwise, checking updates for violation of functional
dependencies may require computing joins, which is expensive.
Database System Concepts
7.99
©Silberschatz, Korth and Sudarshan
Functional Dependencies (Cont.)
A functional dependency is trivial (obvious) if it is satisfied by all
instances of a relation
E.g.
customer-name, loan-number customer-name
customer-name customer-name
In general, is trivial if
Database System Concepts
7.100
©Silberschatz, Korth and Sudarshan
Design Goals
Goal for a relational database design is:
BCNF.
Lossless join.
Dependency preservation.
If we cannot achieve this, we accept one of
Lack of dependency preservation
Redundancy due to use of 3NF
Interestingly, SQL does not provide a direct way of specifying functional
dependencies other than superkeys.
Can specify FDs using assertions, but they are expensive to test
Even if we had a dependency preserving decomposition, using SQL we
would not be able to efficiently test a functional dependency whose left
hand side is not a key.
Database System Concepts
7.101
©Silberschatz, Korth and Sudarshan
Multivalued Dependencies (MVDs)
For Achieving Higher Normal Forms like 4 NF and 5 NF, we
have to understand MVDs.
FDs rule out certain tuples from being in a relation,
if A B then we can not have two tuples with same A value
but different B values.
MVDs do not rule out existence of certain tuples as there can
be MultiValues Dependency !
Let R be a relation schema and let R and R. The
multivalued dependency
holds on R if in any legal relation r(R), for all pairs for tuples t1
and t2 in r such that t1[] = t2 [], there exist tuples t3 and t4 in
r such that:
Database System Concepts
t1[] = t2 [] = t3 [] = t4 []
t3[]
= t1 []
t3[R – ] = t2[R – ]
t4 []
= t2[]
t4[R – ] = t1[R7.102
– ]
©Silberschatz, Korth and Sudarshan
Decomposition Properties
Decompose the relation schema
Lending-schema (branch-name, branch-city, assets, customer-name, loannumber, amount)
into:
Branch-schema = (branch-name, branch-city,assets)
Loan-info-schema = (customer-name, loan-number,
branch-name, amount)
All attributes of an original schema (R) must appear in the
decomposition (R1, R2):
R = R1 R2
Lossless-join decomposition.
For all possible relations r on schema R
r = R1 (r) R2 (r)
A decomposition of R into R1 and R2 is lossless join if and only if at least
one of the following dependencies is in F+:
R1 R2 R1
R1 R2 R2
Database System Concepts
7.103
©Silberschatz, Korth and Sudarshan
Dependency Preservation
Let Fi be the set of dependencies F + that include only attributes in
Ri.
A decomposition is dependency preserving, if
(F1 F2 … Fn )+ = F +
If it is not, then checking updates for violation of functional
dependencies may require computing joins, which is
expensive.
Database System Concepts
7.104
©Silberschatz, Korth and Sudarshan
Third Normal Form
A relation schema R is in third normal form (3NF) if for all:
in F+
at least one of the following holds:
is trivial (i.e., )
is a superkey for R
Each attribute A in – is contained in a candidate key for R.
(NOTE: each attribute may be in a different candidate key)
If a relation is in BCNF it is in 3NF (since in BCNF one of the first two
conditions above must hold).
Third condition is a minimal relaxation of BCNF to ensure dependency
preservation (will see why later).
Database System Concepts
7.105
©Silberschatz, Korth and Sudarshan
Closure of a Set of Functional
Dependencies
We can find all of F+ by applying Armstrong’s Axioms:
if , then
(reflexivity)
if , then
(augmentation)
if , and , then (transitivity)
These rules are
sound (generate only functional dependencies that actually hold) and
complete (generate all functional dependencies that hold).
Database System Concepts
7.106
©Silberschatz, Korth and Sudarshan
Third Normal Form: Motivation
There are some situations where
BCNF is not dependency preserving, and
efficient checking for FD violation on updates is important
Solution: define a weaker normal form, called Third Normal Form.
Allows some redundancy (with resultant problems; we will see
examples later)
But FDs can be checked on individual relations without computing a
join.
There is always a lossless-join, dependency-preserving decomposition
into 3NF.
Database System Concepts
7.107
©Silberschatz, Korth and Sudarshan
Boyce–Codd Normal Form (BCNF)
Based on functional dependencies that take into account all
candidate keys in a relation, however BCNF also has additional
constraints compared with general definition of 3NF.
BCNF - A relation is in BCNF if and only if every determinant is a
candidate key.
Database System Concepts
7.108
©Silberschatz, Korth and Sudarshan
Third Normal Form : Closure
Third Normal Form :
A relation scheme is said to be in 3 NF if ;
It is in 2 NF.
No non-key attribute is functionally-dependent upon any Non-
key attribute.
Thus, there should be no transitive-dependency of a non-key attribute
on the primary key.
3 NF Scheme does not allow partial dependencies like in 2NF but
additionally 3NF does not allow any transitive dependencies.
A relation scheme R < S, F > is in 3 NF if for all non-trivial Functional
dependencies in F + (closure of F) of the form X A, either X
contains a key (i.e. X is a super-key) or A is a prime attribute
Database System Concepts
7.109
©Silberschatz, Korth and Sudarshan
Boyce–Codd normal form (BCNF)
Difference between 3NF and BCNF is that for
a functional dependency A B,
3NF allows this dependency in a relation if B is
a primary-key attribute and A is not a
candidate key.
Whereas, BCNF insists that for this
dependency to remain in a relation, A must be
a candidate key.
Every relation in BCNF is also in 3NF.
However, relation in 3NF may not be in BCNF.
Database System Concepts
7.110
©Silberschatz, Korth and Sudarshan
Boyce–Codd normal form (BCNF)
Violation of BCNF is quite rare.
Potential to violate BCNF may occur in a relation that:
contains two (or more) composite candidate
keys;
the candidate keys overlap (i.e. have at least
one attribute in common).
Database System Concepts
7.111
©Silberschatz, Korth and Sudarshan
General Definitions of 2NF and 3NF
Second normal form (2NF)
A relation that is in 1NF and every nonprimary-key attribute is fully functionally
dependent on any candidate key.
Third normal form (3NF)
A relation that is in 1NF and 2NF and in
which no non-primary-key attribute is
transitively dependent on any candidate key.
Database System Concepts
7.112
©Silberschatz, Korth and Sudarshan
Third Normal Form
A relation schema R is in third normal form (3NF) if for all:
in F+
at least one of the following holds:
is trivial (i.e., )
is a superkey for R
Each attribute A in – is contained in a candidate key for R.
(NOTE: each attribute may be in a different candidate key)
If a relation is in BCNF it is in 3NF (since in BCNF one of the first
two conditions above must hold).
Third condition is a minimal relaxation of BCNF to ensure
dependency preservation (will see why later).
Database System Concepts
7.113
©Silberschatz, Korth and Sudarshan
Third Normal Form: Motivation
There are some situations where
BCNF is not dependency preserving, and
efficient checking for FD violation on updates is important
Solution: define a weaker normal form, called Third Normal Form.
Allows some redundancy (with resultant problems; we will see
examples later)
But FDs can be checked on individual relations without computing a
join.
There is always a lossless-join, dependency-preserving decomposition
into 3NF.
Database System Concepts
7.114
©Silberschatz, Korth and Sudarshan
Third Normal Form
A relation schema R is in third normal form (3NF) if for all:
in F+
at least one of the following holds:
is trivial (i.e., )
is a superkey for R
Each attribute A in – is contained in a candidate key for R.
(NOTE: each attribute may be in a different candidate key)
If a relation is in BCNF it is in 3NF (since in BCNF one of the first
two conditions above must hold).
Third condition is a minimal relaxation of BCNF to ensure
dependency preservation (will see why later).
Database System Concepts
7.115
©Silberschatz, Korth and Sudarshan
3NF
A Relation scheme is said to be in 3nf if;
IT is 2NF
No non-key attribute is functionaly dependent upon upon any
non –key attribute .
Thus ,there should be no Transitive-dependencey of a non-key
attribute on the primary key
Database System Concepts
7.116
©Silberschatz, Korth and Sudarshan
3NF (Cont.)
Example
R = (J, K, L)
F = {JK L, L K}
Two candidate keys: JK and JL
R is in 3NF
JK L
LK
JK is a superkey
K is contained in a candidate key
BCNF decomposition has (JL) and (LK)
Testing for JK L requires a join
There is some redundancy in this schema
Equivalent to example in book:
Banker-schema = (branch-name, customer-name, banker-name)
banker-name branch name
branch name customer-name banker-name
Database System Concepts
7.117
©Silberschatz, Korth and Sudarshan
Testing for 3NF
Optimization: Need to check only FDs in F, need not check all
FDs in F+.
Use attribute closure to check, for each dependency , if
is a superkey.
If is not a superkey, we have to verify if each attribute in is
contained in a candidate key of R
this test is rather more expensive, since it involve finding candidate
keys
testing for 3NF has been shown to be NP-hard
Interestingly, decomposition into third normal form (described
shortly) can be done in polynomial time
Database System Concepts
7.118
©Silberschatz, Korth and Sudarshan
Database System Concepts
7.119
©Silberschatz, Korth and Sudarshan
Overview and Higher Normal Form
Overview of Normal Forms
Third Normal Form: Weaker & Better ?
Why BCNF & Why not BCNF ?
Closure Approach
FDs Analysis
MVDs & 4 NF
Performance & Denormalization ?
Database System Concepts
7.120
©Silberschatz, Korth and Sudarshan
Relationship Between Normal Forms
Database System Concepts
7.121
©Silberschatz, Korth and Sudarshan
Database System Concepts
7.122
©Silberschatz, Korth and Sudarshan
Normal Forms & FDs: Review
UNF (Unnormalized) – There are multivalued attributes or
repeating groups, Record can not be uniquely identified.
1 NF – No composite attributes or repeating groups,
2 NF – 1 NF plus no partial dependencies
3 NF – 2 NF plus no transitive dependencies
BCNF - 3 NF plus A relation is in BCNF if and only if every
determinant is a candidate key.
Database System Concepts
Determinant: If A B is a FD and if B is not
functionally dependent upon any subset of A ,
then A is said to be the determinant of B.
7.123
©Silberschatz, Korth and Sudarshan
3 NF & BCNF
3 NF can also be defined in another way, using Closure Concept
of FDs.
Given a set F set of functional dependencies, there
are certain other functional dependencies that are
logically implied by F.
E.g. If A B and B C, then we can infer that A C
The set of all functional dependencies logically implied by F is
the closure of F. We denote the closure of F by F+.
The Closure F+ can be used to give alternate definition of 3 NF.
Database System Concepts
7.124
©Silberschatz, Korth and Sudarshan
Third Normal Form :
Third Normal Form :
A relation scheme is said to be in 3 NF if ;
There should be no transitive-dependency of a non-key attribute on
the primary key.
3 NF Scheme does not allow partial dependencies like in 2NF but
additionally 3NF does not allow any transitive dependencies.
A relation scheme R < S, F > is in 3 NF if for all non-trivial Functional
dependencies in F + (closure of F) of the form X A, either X
contains a key (i.e. X is a super-key) or A is a prime attribute
Database System Concepts
7.125
©Silberschatz, Korth and Sudarshan
Third Normal Form
A relation schema R is in third normal form (3NF) if for all:
in F+
at least one of the following holds:
is trivial (i.e., )
is a superkey for R
Each attribute A in – is contained in a candidate key for R.
(NOTE: each attribute may be in a different candidate key)
If a relation is in BCNF it is in 3NF (since in BCNF one of the first two
conditions above must hold).
Third condition is a minimal relaxation of BCNF to ensure dependency
preservation (will see why later).
Database System Concepts
7.126
©Silberschatz, Korth and Sudarshan
Boyce-Codd Normal Form
A relation schema R is in BCNF with respect to a set F of functional
dependencies if for all functional dependencies in F+ of the form
, where R and R, at least one of the following holds:
is trivial (i.e., )
is a superkey for R
Boyce-Codd normal form (or BCNF) requires that
there are no non-trivial functional dependencies of
attributes on something else than a superset of a
candidate key.
At this stage, all attributes are dependent on a key, a
whole key and nothing but a key (excluding trivial
dependencies, like A->A).
Database System Concepts
7.127
©Silberschatz, Korth and Sudarshan
Fourth Normal Form
A relation schema R is in 4NF with respect to a set D of functional
and multivalued dependencies if for all multivalued dependencies
in D+ of the form , where R and R, at least one of the
following hold:
is trivial (i.e., or = R)
is a superkey for schema R
If a relation is in 4NF it is in BCNF
Database System Concepts
7.128
©Silberschatz, Korth and Sudarshan
When to use BCNF
Although BCNF is stricter than 3 NF band it is not neccesary that
if some scheme is in 3 NF then it must be in BCNF but In most
cases 3 NF is also BCNF .
3 NF is NOT BCNF only when a relation has composite
candidate keys that have atleast one attribute in common.
The BCNF is implemented in cases where the table has
(a) Multiple candidate keys
(b) Composite candidate keys
(c) Candidate keys that overlap
Database System Concepts
7.129
©Silberschatz, Korth and Sudarshan
Why we require BCNF
Assume that a relation has more than one possible key, Assume
further that the composite keys have a common attribute.
IF an attribute of a composite key is dependent on an attribute of
the other composite key, then this ,normalization called BCNF is
needed
For example:- professor ( professor code, dept, head of dept,
percent time)
Now it is assumed that
1. A professor can work in more than one department.
2 .The percentage of the time he spends in each department is
given.
3. Each department has only one head of department.
Database System Concepts
7.130
©Silberschatz, Korth and Sudarshan
3NF relation
Professor
code
Department
Head of
Department
Percent
P1
Physics
Ghosh
50
P1
mathmematics
Krishnan
50
P2
Chemistry
Rao
25
p2
physics
ghosh
75
p3
mathematics
krishnan
100
Here the names of dept, and head are duplicated,
if professor p2 resigns rows 3 and row 4 are deleted .
We lose the information that rao is the head of the department of
chemistry.
So the Normalization of the relation is done by creating a new relation for
department of chemistry
Database System Concepts
7.131
©Silberschatz, Korth and Sudarshan
The Relationship diagram
The two possible composite keys are
professor code & dept or professor code & head of dept
Observe that department as well as head of dept are not non-key
attribute. They are part of a composite key.
Head of the department
department
Professor code
Percent time
department
Head of department
Department
Head of department
Professor code
Database System Concepts
Percent time
7.132
©Silberschatz, Korth and Sudarshan
Now Relation is in BCNF
department
Percent time
Professor code
Department
Head of department
Prof code
department
percent
P1
Physics
50
P1
Maths
50
P2
Chemistry
25
P2
Physics
75
p3
Maths
100
Database System Concepts
7.133
department
Head of dept
Physics
Ghosh
Maths
Krishnan
chemistry
rao
©Silberschatz, Korth and Sudarshan
3NF Table Not in BCNF
Figure 4.7
Database System Concepts
7.134
©Silberschatz, Korth and Sudarshan
Decomposition of Table
Structure to Meet BCNF
Database System Concepts
7.135
©Silberschatz, Korth and Sudarshan
BCNF Conversion Results
Database System Concepts
7.136
©Silberschatz, Korth and Sudarshan
Database System Concepts
7.137
©Silberschatz, Korth and Sudarshan
DBMS Design
MVDs & Higher Normal Forms
Why MVDs
MVDs Rules
Splitting in MVDs
DBMS Design & Role of Normalization
Limitation of Normalization
Database System Concepts
7.138
©Silberschatz, Korth and Sudarshan
Multivalued Dependencies (MVDs)
FDs rule out certain tuples from being in a relation,
if A B then we can not have two tuples with same A value
but different B values.
MVDs do not rule out existence of certain tuples as there can
be MultiValued Dependency !
Let R be a relation schema and let R and R. The
multivalued dependency
which can be read as α multidetermines β
holds on R if in any legal relation r(R), for all pairs for tuples t1
and t2 in r such that t1[] = t2 [], there exist tuples t3 and t4 in
r such that:
t1[] = t2 [] = t3 [] = t4 []
t3[]
= t1 []
t3[R – ] = t2[R – ]
t4 []
= t2[]
t4[R – ] = t1[R – ]
Database System Concepts
7.139
©Silberschatz, Korth and Sudarshan
MVDs
Multivalued dependencies are also referred to as tuple
generating dependencies.
After the Boyce -Codd normal form the results may be devoid of
any functional dependencies but it may encounter multivalued
dependencies as the multivalued dependencies also cause
redundancy of data.
For eg: If there are 3 attributes involved in a Relation A,B, and C..
Then for every value of A we will have respective values for B
and C..
But it is a necessary in the 4th normal form that both B and C
values are independent of each other. This is represented by .,,
A->>B
A->>C..
MVD or Multivalued Dependency is a dependency where one
attribute value is potentially a "multivalued fact" about another
and the attributes must be independent of each other.
Database System Concepts
7.140
©Silberschatz, Korth and Sudarshan
Theory of MVDs
From the definition of multivalued dependency, we can derive the
following rule:
If , then
That is, every functional dependency is also a multivalued dependency
The closure D+ of D is the set of all functional and multivalued
dependencies logically implied by D.
We can compute D+ from D, using the formal definitions of functional
dependencies and multivalued dependencies.
We can manage with such reasoning for very simple multivalued
dependencies, which seem to be most common in practice
For complex dependencies, it is better to reason about sets of
dependencies using a system of inference rules .
Database System Concepts
7.141
©Silberschatz, Korth and Sudarshan
MVDs & Higher Normal Forms
For Achieving Higher Normal Forms like 4 NF, we have to understand MVDs.
Consider following Scheme of a database of teaching courses, the books
recommended for the course, and the lecturers who will be teaching the
course:
Because the lecturers attached to
the course and the books attached to
the course are independent of each
other, this database design has a
Multi-Valued Dependency;
Course
Book
Lecturer
AHA
Silberschatz
John D
AHA
Nederpelt
William M
AHA
Silberschatz
William M
AHA
Nederpelt
John D
AHA
Silberschatz
Christian G
AHA
Nederpelt
Christian G
OSO
Silberschatz
John D
OSO
Silberschatz
William M
Database System Concepts
If we were to add a new book to the
AHA course, we would have to add
one record for each of the lecturers on
that course, and vice versa.
7.142
There are two multivalued
dependencies in this relation:
{course} {book} & equivalently
{course} {lecturer}
©Silberschatz, Korth and Sudarshan
Boyce-Codd Normal Form
A relation schema R is in BCNF with respect to a set F of functional
dependencies if for all functional dependencies in F+ of the form
, where R and R, at least one of the following holds:
is trivial (i.e., )
is a superkey for R
Boyce-Codd normal form (or BCNF) requires that
there are no non-trivial functional dependencies of
attributes on something else than a superset of a
candidate key.
At this stage, all attributes are dependent on a key, a
whole key and nothing but a key (excluding trivial
dependencies, like A->A).
Database System Concepts
7.143
©Silberschatz, Korth and Sudarshan
Fourth Normal Form
A relation schema R is in 4NF with respect to a set D of functional
and multivalued dependencies if for all multivalued dependencies
in D+ of the form , where R and R, at least one of the
following hold:
is trivial (i.e., or = R)
is a superkey for schema R
If a relation is in 4NF it is in BCNF
Database System Concepts
7.144
©Silberschatz, Korth and Sudarshan
Database System Concepts
7.145
©Silberschatz, Korth and Sudarshan
Database System Concepts
7.146
©Silberschatz, Korth and Sudarshan
Database System Concepts
7.147
©Silberschatz, Korth and Sudarshan
Database System Concepts
7.148
©Silberschatz, Korth and Sudarshan
Database System Concepts
7.149
©Silberschatz, Korth and Sudarshan
Database System Concepts
7.150
©Silberschatz, Korth and Sudarshan
Database System Concepts
7.151
©Silberschatz, Korth and Sudarshan
Database System Concepts
7.152
©Silberschatz, Korth and Sudarshan
Database System Concepts
7.153
©Silberschatz, Korth and Sudarshan
Database System Concepts
7.154
©Silberschatz, Korth and Sudarshan
Database System Concepts
7.155
©Silberschatz, Korth and Sudarshan
Database System Concepts
7.156
©Silberschatz, Korth and Sudarshan
Further Normal Forms
Join dependencies generalize multivalued dependencies
lead to project-join normal form (PJNF) (also called fifth normal
form)
A class of even more general constraints, leads to a normal form
called domain-key normal form.
Problem with these generalized constraints: are hard to reason with,
and no set of sound and complete set of inference rules exists.
Hence rarely used
Database System Concepts
7.157
©Silberschatz, Korth and Sudarshan
5 NF
Fifth normal form (5NF), also known as Project-join normal
form (PJ/NF) is a level of database normalization designed to
reduce redundancy in relational databases recording multivalued facts by isolating semantically related multiple
relationships.
A table is said to be in the 5NF if and only if every join
dependency in it is implied by the candidate keys
A join dependency *{A, B, … Z} on R is implied by the candidate
key(s) of R if and only if each of A, B, …, Z is a superkey for R
Database System Concepts
7.158
©Silberschatz, Korth and Sudarshan
Normalization and Database design
While we have tried to present the normal forms in a simple and
understandable way, we are by no means suggesting that the
data design process is correspondingly simple.
The design process involves many complexities which are quite
beyond the scope of general discussion.
In the first place, an initial set of data elements and records has
to be developed, as candidates for normalization. Then the
factors affecting normalization have to be assessed:
Single-valued vs. multi-valued dependencies.
Dependency on the entire key.
Independent vs. dependent facts.
The presence of mutual constraints.
The presence of non-unique or non-singular representations.
And, finally, the desirability of normalization has to be assessed, in
terms of its performance impact on retrieval applications.
Database System Concepts
7.159
©Silberschatz, Korth and Sudarshan
DBMS: Normalization & Beyond
Normalization :
» Advantages & Disadvantages
Normalization : Sufficiency
»Does it ensure best design
Recent developments in DBMS and Data &
Knowledge Engineering
Database System Concepts
7.160
©Silberschatz, Korth and Sudarshan
ER Model and Normalization
When an E-R diagram is carefully designed, identifying all entities
correctly, the tables generated from the E-R diagram should not need
further normalization.
However, in a real (imperfect) design, there can be functional
dependencies from non-key attributes of an entity to other attributes of
the entity
Example: an employee entity with attributes department_number
and department_address, and a functional dependency
department_number department_address
Good design would have made department an entity
Functional dependencies from non-key attributes of a relationship set
possible, but rare --- most relationships are binary
Database System Concepts
7.161
©Silberschatz, Korth and Sudarshan
Denormalization for Performance
May want to use non-normalized schema for performance
Normalization imposes conditions for Check thus causing
computations and slowing down the operations.
For example, displaying customer_name along with
account_number and balance requires join of account with depositor
Alternative 1: Use denormalized relation containing attributes of
account as well as depositor with all above attributes
faster lookup
extra space and extra execution time for updates
extra coding work for programmer and possibility of error in
extra code
Alternative 2: use a materialized view defined as
account
depositor
Benefits and drawbacks same as above, except no extra coding
work for programmer and avoids possible errors
Database System Concepts
7.162
©Silberschatz, Korth and Sudarshan
Other Design Issues
Some aspects of database design are not caught by normalization
Examples of bad database design, to be avoided:
Instead of earnings (company_id, year, amount ), use
earnings_2004, earnings_2005, earnings_2006, etc., all on the
schema (company_id, earnings).
Above are in BCNF, but make querying across years difficult and
needs new table each year
company_year(company_id, earnings_2004, earnings_2005,
earnings_2006)
Also in BCNF, but also makes querying across years difficult and
requires new attribute each year.
Is an example of a crosstab, where values for one attribute
become column names
Used in spreadsheets, and in data analysis tools
Database System Concepts
7.163
©Silberschatz, Korth and Sudarshan
Advantages of Normalization
1) Avoids data modification (INSERT/DELETE/UPDATE)
anomalies as each data item lives in One place
2) Normalization is conceptually cleaner and easier to maintain
and change as your needs change
3) Fewer null values and less opportunity for inconsistency
4) A better handle on database security
5) The normalization process helps maximize the use of
clustered indexes, which is the most powerful and useful type of
index available. As more data is separated into multiple
tables because of normalization the more clustered indexes
become available to help
speed up data access.
Database System Concepts
7.164
©Silberschatz, Korth and Sudarshan
Disadvantages
1) Requires much more CPU, memory, and I/O to process thus
normalized data gives reduced database performance
2) Requires more joins to get the desired result. A poorly-written
query can bring the database down
3) Maintenance overhead. The higher the level of normalization,
the greater the number of tables in the database
Database System Concepts
7.165
©Silberschatz, Korth and Sudarshan
Adequacy of Normalization
Is Normalization alone is enough for a good design ?
Does all the design anomalies are removed by Normalization to
highest Normal Forms ?
Is normalization always lead to best design?
Do we sometime need to De-normalized the Data ?
When do we need to De-normalize ?
Database System Concepts
7.166
©Silberschatz, Korth and Sudarshan