Normalization of Database Tables
Download
Report
Transcript Normalization of Database Tables
Chapter 5
Normalization of Database Tables
Database Systems: Design, Implementation, and Management
Peter Rob & Carlos Coronel
In this chapter, you will learn:
What normalization is and what role it plays in
database design
About the normal forms 1NF, 2NF, 3NF, BCNF, and
4NF
How normal forms can be transformed from lower
normal forms to higher normal forms
That normalization and E-R modeling are used
concurrently to produce a good database design
That some situations require denormalization to
generate information efficiently
Database Tables and Normalization
Normalization
Process
for evaluating and correcting table structures to
minimize data redundancies
process for assigning attributes to tables. It
reduces data redundancies
helps eliminate data anomalies.
produces controlled redundancies to link tables
Normalization works through a series of
stages called normal forms:
First
normal form (1NF)
Second normal form (2NF)
Third normal form (3NF)
Fourth normal form (4NF)
Database Tables and Normalization
Normalization
2NF
is better than 1NF;
3NF is better than 2NF
For most business database design purposes,
3NF is as high as we need to go in normalization process
The highest level of normalization is not always
most desirable.
Database Tables and Normalization
The Need for Normalization
Case
of a Construction Company
Building project -- Project number, Name, Employees assigned
to the project.
Employee -- Employee number, Name, Job classification
The company charges its clients by billing the hours spent on
each project. The hourly billing rate is dependent on the
employee’s position.
Periodically, a report is generated. Table 5.1
The easiest way to generate the required report might seem to
be a table whose contents correspond to the reporting
requirements. Figure 5.1
Database Tables and Normalization
Need for Normalization:
Problems with the Figure 5.1
The
project number is intended to be a primary key,
but it contains nulls.
The
table displays data redundancies.
The
table entries invite data inconsistencies.
The
data redundancies yield the following anomalies:
Update anomalies. (modify JOB_CLASS for Employee 105)
Insertion anomalies. (a new Employee not yet assigned)
Deletion anomalies. ( Employee 103 quits)
The Normalization Process
Database Tables and Normalization
Conversion to 1NF
Repeating groups – a group of multiple entries can exist for any
single key attribute occurrence.
Repeating groups must be eliminated
Any project number (PROJ_NUM) can have a group of several
data entries.
A relational table must not contain repeating groups.
Step 1: Eliminate the Repeating Groups –
Repeating groups can be eliminated by adding the
appropriate entry in at least the primary key column(s).
Step 2: Identify the Primary Key
•
Uniquely identifies attribute values (rows)
•
Combination of PROJ_NUM and EMP_NUM
Database Tables and Normalization
Step 3: Identify all Dependencies
Dependency Diagram
The primary key components are bold, underlined, and
shaded in a different color.
The arrows above entities indicate
all desirable dependencies ( dependencies based on PK )
The arrows below the dependency diagram indicate less
desirable dependencies –
– partial dependencies
( dependencies based on only a part of PK )
– transitive dependencies
( nonprime attribute → nonprime attribute )
Prime attribute = Key attribute
Nonprime attribute = Nonkey attribute
Database Tables and Normalization
EMP_NUM → EMP_NAME, JOB_CLASS_, CHG_HOUR
PROJ_NUM → PROJ_NAME
JOB_CLASS → CHG_HOUR
Database Tables and Normalization
1NF Definition
The
term first normal form (1NF) describes the
tabular format in which:
All the key attributes are defined.
There are no repeating groups in the table.
Each row/col intersection can contain one and only one
value, not set of values.
All attributes are dependent on the primary key.
All relational tables satisfy the 1NF requirements.
1NF Drawback
Partial
dependencies (EMP_NUM → EMP_NAME, JOB_CLASS_,
CHG_HOUR)
→→ data redundancies
→→ data anomalies
Database Tables and Normalization
Conversion to 2NF
Step
1: Identify All Key Components
Writing each key component on a separate line, and then
writing the original key on the last line and
PROJ_NUM
EMP_NUM
PROJ_NUM, EMP_NUM
Step
2: Identify the Dependent Attributes
Writing the dependent attributes after each new key.
PROJECT ( PROJ_NUM, PROJ_NAME)
EMPLOYEE ( EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR)
ASSIGN ( PROJ_NUM, EMP_NUM, HOURS)
Database Tables and Normalization
2NF Definition
A table is in 2NF if:
It is in 1NF and
It includes no partial dependencies;
that is, no attribute is dependent on only portion of primary key.
A table whose primary key is not composite
must automatically be in 2NF.
BECAUSE a partial dependency can exist only if
a table has a composite primary key
It is still possible for a table in 2NF to exhibit transitive dependency;
that is, one or more attributes may be functionally dependent on nonkey
attributes.
2NF Drawback
Transitive dependencies (JOB_CLASS →
→→ data redundancies
→→ data anomalies
CHG_HOUR)
Database Tables and Normalization
Conversion to 3NF
Create
a separate table with attributes in a transitive
functional dependence relationship.
Step 1: Identify Each New Determinant
JOB_CLASS
Step 2: Identify the Dependent Attributes
JOB_CLASS → CHG_HOUR
Step 3: Remove the Dependent Attributes from Transitive
Dependencies
EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS)
JOB (JOB_CLASS, CHG_HOUR)
PROJECT (PROJ_NUM, PROJ_NAME)
ASSIGN (PROJ_NUM, EMP_NUM, HOURS)
Database Tables and Normalization
3NF Definition
A
table is in 3NF if:
It is in 2NF and
It contains no transitive dependencies.
Improving the Design
Table structures are cleaned up to eliminate the
troublesome initial partial and transitive
dependencies
Normalization
cannot, by itself, be relied on to make good designs
It is valuable
because its use helps eliminate data redundancies
Improving the Design
Issues to address in order to produce a good
normalized set of tables:
Evaluate
PK Assignments
Evaluate Naming Conventions
Refine Attribute Atomicity
Identify New Attributes
Identify New Relationships
Refine Primary Keys as Required for Data Granularity
Maintain Historical Accuracy
Evaluate Using Derived Attributes
Improving the Design
Adding
relationships
Project’s
manager
EMP_NUM as a FK in PROJECT.
3NF
Project
manager
Improving the Design
PK
assignment
JOB_CODE
Naming
conventions
JOB_CHG_HOUR
JOB_CLASS
>> JOB_DESCRIPTION
2NF
Improving the Design
3NF
atomicity
EMP_NAME >> EMP_LNAME,EMP_FNAME,EMP_INITIAL
Adding attributes
JOB_CLASS
EMP_HIREDATE
Attribute
Improving the Design
Refining PKs
(EMP_NUM+PROJ_NUM) >> ASSIGN_NUM
Maintaining historical accuracy
3NF
ASSIGN_CHG_HOUR <<>> JOB_CHG_HOUR
Using derived attributes
ASSIGN_CHARGE = ASSIGN_HOURS × ASSIGN_CHG_HOUR
Surrogate Key Considerations
When primary key is considered to be unsuitable,
designers use system-defined surrogate keys
The DBMS can be used to have the system assign
the PK values (JOB_CODE)
>> to ensure entity integrity
Limitations on system-defined surrogate keys
Data
entries in Table 5.3 are inappropriate
because they duplicate existing records
However, it does not prevent us from making the entries
shown in Table 5.3.
>> Multiple duplicate records problem
We still must ensure the uniqueness in
JOB_DESCRIPTION through the use of a unique index.
Database Tables and Normalization
Boyce-Codd Normal Form (BCNF)
A table is in Boyce-Codd normal form (BCNF)
if every determinant in the table is a candidate key.
(A determinant is any attribute whose value determines other
values with a row.)
If a table contains only one candidate key,
the 3NF and the BCNF are equivalent.
BCNF can be violated
only if the table contains more than one candidate key
BCNF is a special case of 3NF.
Figure 5.7 illustrates a table that is in 3NF but not in BCNF.
Figure 5.8 shows how the table can be decomposed to conform
to the BCNF form.
A Table That Is In 3NF But Not In BCNF
A + B → C, D
C → B : Not transitive dependencies
(A nonkey attribute is the determinant of a key attribute)
→ → 3NF
C
: Not candidate key → → Not In BCNF
The Decomposition of a Table Structure to meet BCNF Requirements
A + B → C, D
C→B
Change the PK
to A+C
A + C → B, D
C→B
A+C→D
C→B
The Boyce-Codd Normal Form (BCNF)
STU_ID + STAFF_ID → CLASS_CODE, ENROLL_GRADE
CLASS_CODE → STAFF_ID
Decomposition into BCNF
Figure 5.9
Normalization and Database Design
Normalization should be part of the design process
E-R Diagram provides macro view
Normalization provides micro view of entities
Focuses on characteristics of specific entities
A micro view of the entities within the ER diagram
Difficult to separate normalization from E-R diagramming
Two techniques should be used concurrently
Normalization and Database Design
Database Design and Normalization Example:
(Construction Company)
Summary of Operations:
The company manages many projects.
Each project requires the services of many employees.
An employee may be assigned to several different projects.
Some employees are not assigned to a project and perform duties
not specifically related to a project.
Some employees are part of a labor pool, to be shared by all project
teams.
Each employee has a (single) primary job classification. This job
classification determines the hourly billing rate.
Many employees can have the same job classification.
Normalization and Database Design
Two Initial Entities:
[ 3NF ]
PROJECT (PROJ_NUM, PROJ_NAME)
EMPLOYEE ( EMP_NUM, EMP_LNAME, EMP_FNAME, EMP_INITIAL,
JOB_DESCRIPTION, JOB_CHG_HOUR)
No partial dep.
Transitive dep. JOB_DESCRIPTION → JOB_CHG_HOUR [ 2NF ]
Normalization and Database Design
Normalization and Database Design
Three Entities After Transitive Dependency Removed
PROJECT (PROJ_NUM, PROJ_NAME)
EMPLOYEE
( EMP_NUM, EMP_LNAME, EMP_FNAME, EMP_INITIAL, JOB_CODE)
EMPLOYEE
JOB
( JOB_CODE, JOB_DESCRIPTION, JOB_CHG_HOUR)
The Modified ERD For A Contracting Company
Because the normalization process yields an additional
entity (JOB),
we modify the initial ERD.
M
Is held by
Normalization and Database Design
Normalization and Database Design
Normalization and Database Design
Creation of the Composite Entity ASSIGNMENT
ASSIGNMENT
Is held by
The Final ( Implementable) ERD for the Contracting Company
Normalization and Database Design
Normalization and Database Design
Attribute ASSIGN_HOUR is assigned to the composite
entity ASSIGN.
“Manages” relationship is created between EMPLOYEE
and PROJECT.
Manages
PROJECT (PROJ_NUM, PROJ_NAME, EMP_NUM)
ASSIGNMENT (ASSIGN_NUM,
ASSIGN_DATE , ASSIGN_HOURS, ASSIGN_CHG_HOUR , ASSIGN_CHARGE,
EMP_NUM,
PROJ_NUM)
EMPLOYEE (EMP_NUM, EMP_LNAME, EMP_FNAME, EMP_INITIAL,
EMP_HIREDATE, JOB_CODE)
JOB (JOB_CODE, JOB_DESCRIPTION, JOB_CHG_HOUR)
Normalization and Database Design
Higher-Level Normal Forms
In some databases,
multiple multivalued attributes exist
4NF Definition
A
table is in 4NF
if it is in 3NF and
has no multiple sets of multivalued dependencies.
Higher-Level Normal Forms
An employee can have multiple assignments and can also be
involved in multiple service organization.
EMP_SERVICE (volunteer work) and EMP_ASSIGN (assigned
project)
Independent
each may have many different values.
The table contain two sets of multivalued dependencies.
1
A Set of Tables in 4NF
The solution is to eliminate the problems caused by
independent multivalued dependencies.
ORGANIZATION
M
Service
N
EMPLOYEE
M
Assign
N
PROJECT
Denormalization
Creation of normalized relations is important database
design goal
Processing requirements should also be a goal
If tables decomposed to conform to normalization
requirements
Number of database tables expands
Joining larger number of tables
takes additional disk input/output (I/O) operations and
processing logic
Reduces system speed
Denormalization
Normalization is only one of many database design
goals.
Normalized (decomposed) tables require additional
processing, ( Join : additional I/O operations )
reduce system speed.
CUSTOMER( CUS_NUM, CUS_NAME, … , ZIP_CODE, CITY)
transitive dep.
Is it really practical?
ZIP(ZIP_CODE, CITY)
Some degree of denormalization
→→ increase processing speed
Denormalization
Normalization purity is often difficult to sustain in the modern
database environment.
The conflict between design efficiency, information requirements,
and processing speed are often resolved through compromises that
include denormalization.
In fact, we used a 2NF structure in the JOB table (Fig.5.6) to
decrease the likelihood of referential integrity violations.
JOB table
Original PK: JOB_CLASS
New PK: JOB_CODE
EMPLOYEE table
Original FK: JOB_CLASS
New FK: JOB_CODE
Use denormalization cautiously.
The Initial 1NF Structure
Identifying the Possible PK Attributes
Table Structures Based On The Selected PKs
Foreign key
Summary
Summary