Transcript 5 5

5
Orange Coast College
Business Division
CS/CIS Department
Fall 2004
CIS 182
Introduction to Database Concepts
Instructor
Dr. Martha Malaty
Text & Original Presentations
Database Systems: Design, Implementation, and
Management, Sixth Edition, Rob and Coronel, 2004
1
5
Chapter 5
Normalization of Database Tables
Database Systems:
Design, Implementation, and Management,
Sixth Edition, Rob and Coronel
2
In this chapter, you will learn:
5
• What normalization is and what role it plays in the
database design process
• About the normal forms 1NF, 2NF, 3NF, BCNF,
and 4NF
• How normal forms can be transformed from lower
normal forms to higher normal forms
• That normalization and ER modeling are used
concurrently to produce a good database design
• That some situations require denormalization to
generate information efficiently
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
3
Database Tables & Normalization
5
• Good table structure is essential for good DB design
• Normalization is the way to reach good table
structure
• Normalization is the process for assigning attributes
to entities through evaluating and correcting table
structures
–
–
–
–
Reduces data redundancies
Helps eliminate data anomalies
Produces controlled redundancies to link tables
Works through sequence of stages (normal forms)
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
4
Database Tables & Normalization
5
• Normalization works through a series of stages called
normal forms:
•
•
•
•
•
First normal form (1NF)
Second normal form (2NF)
Third normal form (3NF)
Fourth normal form (4NF)
...
• 2NF is better than 1NF; 3NF is better than 2NF; …
• For most business database design purposes, 3NF is
highest we need to go in the normalization process
• Highest level of normalization is not always most
desirable
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
5
Tutorials Websites On Normalization
5
• What do you need to know about
• Webopedia
• Rules of data normalization
• 3NF Tutorial
• ServerWatch
• VBMySQL.com
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
6
Unnormalized vs. Denormalized
5
• Unnormalized data:
– No ER model is developed when creating the
database
• Denormalized data
– Starting with normalized data, adding redundancy for
better performance
• We cannot denormalize unnormalized data
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
7
The Need for Normalization
5
• Example: A company manages building projects
– Charges its clients by billing hours spent on each
contract
– Hourly billing rate is dependent on employee’s
position
– Periodically, a report is generated that contains
information displayed in Table 5.1
– See RPT_FORMAT table in ConstructCo.mdb
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
8
A Sample Report Layout
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
5
9
A Table in the Report Format
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
5
10
Example: Construction company
5
• Three major problems:
– PROJ_NUM contains repeating groups
– Table entries invite data inconsistencies
– Table displays data anomalies
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
11
Construction Company Problems
5
• PROJ_NUM contains “repeating groups”
– Several data entries with NULL entries indicating that all entries
belong to the same PROJ_NUM
– PRO_NUM is intended to be primary key
– Primary key can’t be NULL
• Relational table can’t contain repeating groups
Repeating
groups
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
12
5
Construction Company Problems
• Table entries invite data inconsistencies
– Elect. Engineer could mistakenly be entered in another
abbreviated form (e.g. Elect.Eng, EE, …)
Possible
inconsistency
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
13
Construction Company Problems
5
• Table displays data anomalies
– Update anomalies
• Modifying JOB_CLASS needs several alterations
– Insertion anomalies
• New employee must be assigned project
– Deletion anomalies
• If employee is deleted, other vital data might be lost
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
14
Construction Company Problems
5
• Structure of data set in Fig. 5.1 (ConstructCo.mdb)
does not handle data very well
• The table structure appears to work; report is
generated with ease
• Unfortunately, the report may yield different results,
depending on what data anomaly has occurred
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
15
Levels of Normalizations
5
Un-normalized)
1NF
2NF
3NF
BCNF
4NF
5NF
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
16
Conversion to First Normal Form
5
• Repeating group
– Derives its name from the fact that a group of
multiple (related) entries can exist for any
single key attribute occurrence
• Relational table must not contain repeating
groups
• Normalizing the table structure will reduce
these data redundancies
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
17
Conversion to First Normal Form
5
• Normalization is three-step procedure
1. Eliminate Repeating Groups
2. Identify the Primary Key
3. Identify all Dependencies
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
18
Step 1: Eliminate Repeating Groups
5
• Present data in a tabular format, where each
cell has a single value and there are no
repeating groups
• Eliminate repeating groups by eliminating
nulls, making sure that each repeating group
attribute contains an appropriate data value
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
19
Data Organization: 1NF
5
• See DATA_ORG_INF
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
20
Step 2: Identify the Primary Key
5
• Primary key must uniquely identify attribute
values
• New key must be composed
• What would be the primary key for the
DATA_ORG_INF table?
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
21
Step 3: Identify all Dependencies
5
• Dependencies can be depicted with the help
of a “Dependency Diagram”
– Depicts all dependencies found within a given
table structure
– Helpful in getting bird’s-eye view of all
relationships among a table’s attributes
– Makes it much less likely that an important
dependency will be overlooked
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
22
Dependency Diagram: 1NF
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
5
23
First Normal Form
5
• Tabular format in which:
– All key attributes are defined
– There are no repeating groups in the table
– All attributes are dependent on primary key
• All relational tables satisfy 1NF requirements
• Some tables contain “Partial dependencies”
– Dependencies based on only part of the
primary key
– Sometimes used for performance reasons, but
should be used with caution
– Still subject to data redundancies
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
24
Conversion to Second Normal Form
5
• Relational database design can be
improved by converting the database into
second normal form (2NF)
• Two steps
1. Identify All key components
2. Identify the dependent attributes
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
25
Step 1: Identify All Key Components
5
• Write each key component on separate line,
and then write the original (composite) key
on the last line
• Each component will become the key in a
new table
• Example:
– Proj_Num
– Emp_Num
– Proj_Num, Emp_Num
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
26
Step 2: Identify Dependent Attributes
5
• Determine which attributes are dependent on
which other attributes
• At this point, most anomalies have been
eliminated
• Example:
– Proj_Num  Proj_name
– Emp_Num  Emp_Name, Job_Class, Chg_Hour
– Proj_Num, Emp_Num  Hours
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
27
Second Normal Form (2NF)
Conversion Results
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
5
28
Second Normal Form
5
• Table is in second normal form (2NF) if:
– It is in 1NF and
– It includes no partial dependencies:
• No attribute is dependent on only a portion of
the primary key
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
29
Conversion to Third Normal Form
5
• Three steps
1. Identify each new determinant
2. Identify the dependent attributes
3. Remove the dependent attributes from transitive
dependencies
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
30
Step 1: Identify New Determinants
5
• Determinant
– Any attribute whose value determines other
values within a row
• For every transitive dependency, write its
determinant as a PK for a new table
• Example
– Job_Class
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
31
Step 2: Identify Dependent Attributes
5
• Identify the attributes dependent on each
determinant identified in Step 1 and identify
the dependency
• Name the table to reflect its contents and
function
• Example
– Job_Class  Chg_Hour
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
32
Step 3: Remove Dependent Attributes
from Transitive Dependencies
5
• Eliminate all dependent attributes in transitive relationship(s)
from each table that has such a transitive relationship
• Draw a new dependency diagram to show all tables defined
in Steps 1–3
• Check new tables and modified tables from Step 3 to make
sure that each has a determinant and does not contain
inappropriate dependencies
• Example:
–
–
–
–
Proj_Num  Proj_name
Emp_Num  Emp_Name, Job_Class
Proj_Num, Emp_Num  Hours
Job_Class  Chg_Hour
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
33
Third Normal Form (3NF)
Conversion Results
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
5
34
Third Normal Form
5
• A table is in third normal form (3NF) if:
– It is in 2NF and
– It contains no transitive dependencies
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
35
Improving the Design
5
• Table structures are cleaned up to eliminate
the troublesome initial partial and transitive
dependencies
• Normalization cannot, by itself, be relied on
to make good designs
• It is valuable because its use helps eliminate
data redundancies
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
36
Improving the Design (continued)
5
• The following changes were (or should be)
made:
– PK assignment
– Naming conventions
– Attribute atomicity
– Adding attributes
– Adding relationships
– Refining PKs
– Maintaining historical accuracy
– Using derived attributes
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
37
PK assignment
5
• Addition of JOB_CODE attribute greatly decreases the
likelihood of referential integrity violation
• Example:
– One would write a job class as “DB Designer” and
another as “Database Designer” although both refer to
the same class
• Problem: The new key produces transitive
dependency
– JOB_CODE -> JOB_CLASS, CHG_HOUR
– JOB_CLASS -> CHG_HOUR
• The effect of transitive dependency is less problematic
than integrity violation
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
38
Naming conventions
5
• Adhere to naming convention by having the table
name as the first prefix of the attribute name
• Example:
– CHG_HOUR is changed to JOB_CHG_HOUR
– JOB_CLASS should be changed to
JOB_DESCRIPTION to better describe the entity
– HOURS is changed to ASSIGNED_HOURS to
associate it with the ASSIGN table
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
39
Attribute atomicity
5
• Atomic attribute:
– Attribute that cannot be further divided
– Gains querying flexibility
• Example: EMP_NAME is not atomic
– Use EMP_LNAME, EMP_FNAME, EMP_INITIAL
instead
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
40
Adding Attributes
5
• Adding attributes that are needed in the real-world
environment
• Example: EMP_HIREDATE can be used to track
employee’s “job longevity” to award bonuses to longterm employees & other morale enhancing measures
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
41
Adding Relationships
5
• To avoid unnecessary data duplication
• Example:
– To be able to supply detailed information about each
project’s manager, use EMP_NUM as a FK in PROJECT
table
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
42
Refining PKs
5
• If an employee makes two or more entries for the same project,
replace EMP_NUM & PROJ_NUM as a combined key, by a
system-assigned, surrogate key, ASIGN_NUM, in the ASSIGN
table
• Surrogate key:
– At the implementation level, a system-defined attribute that
is created and managed by the DBMS.
– It is automatically incremented for each row.
– In Oracle for example we use the "sequence" object for
surrogate keys.
• EMP_NUM & PROJ_NUM can still be used as FK’s
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
43
Maintaining historical accuracy
5
• Writing the job charge per hour in ASSIGN table is
crucial to maintain historical accuracy, in case that the
job charge per hour changes over time
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
44
Using derived attributes
5
• Derived attributes can be calculated when needed
• Storing a derived attribute makes it easier to write
application SW to produce the desired results and
saves reporting time
• ASSIGN_CHARGE is the result of multiplying
ASSIGN_HOURS by ASSIGN_CHG_HOUR
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
45
The Completed Database
5
• See Ch05_ConstructCo.mdb
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
46
The Completed Database (continued)
5
• See Ch05_ConstructCo.mdb
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
47
The Completed Database (continued)
5
• See Ch05_ConstructCo.mdb
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
48
Limitations on System-Assigned Keys
5
• System-assigned (surrogate) primary key may not
prevent confusing entries
• Example:
– JOB_CODE attribute was designed to be a PK of JOB
table
– This does not prevent duplicating existing records with
different PK values
– Data entries in Table 5.2 are inappropriate because they
duplicate existing records
– Yet there has been no violation of either entity integrity or
referential integrity
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
49
The Boyce-Codd Normal Form (BCNF)
5
• Google search
• About.com
– A relation is in Boyce-Codd Normal Form (BCNF) if every
determinant is a candidate key.
• Candidate key
– Has same characteristics as primary key, but for some reason,
not chosen to be primary key
• If a table contains only one candidate key, the 3NF and the
BCNF are equivalent
• BCNF can be violated only if the table contains more than one
candidate key
• Most designers consider the BCNF as a special case of 3NF
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
50
The Boyce-Codd Normal Form (BCNF)
(continued)
5
• A table is in 3NF if it is in 2NF and there are no
transitive dependencies
• A table can be in 3NF and not be in BCNF
– A transitive dependency exists when one nonprime
attribute is dependent on another nonprime attribute
– A nonkey attribute is the determinant of a key attribute
• How to convert a 3NF into BCNF?
– Decompose the tables according to the functional
dependencies
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
51
A Table That is in 3NF but not in BCNF
5
• Here:
– No partial dependencies
– No transitive dependencies
– But a nonkey attribute determines a key attribute
• A + B  C, D
• C
B
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
52
Decomposition to BCNF
5
• To convert the table into BCNF
– First change the PK into A + C; partial dependency is
generated
– Next Decompose the tables to remove partial
dependency
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
53
Sample Data for a BCNF Conversion
5
• Each CLASS_CODE identifies a class uniquely
• A student can take many classes
• A staff member can teach many classes
– STU_ID + STAFF_ID  CLASS_CODE, ENROLL_GRADE
– CLASS_CODE  STAFF_ID
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
54
Sample Data for a BCNF Conversion
5
• STU_ID + STAFF_ID  CLASS_CODE, ENROLL_GRADE
• CLASS_CODE  STAFF_ID
• Already in 3NF but has Some anomalies
– Update anomalies
• If a different staff member is assigned to teach class 32456, 2 rows will
require updating
– Deletion anomalies
• If student 135 drops class 28458, we loose information about who taught
that class
• Solution
– Decompose table structure
– Resulting tables are in 3NF & BCNF
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
55
Another BCNF Decomposition
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
5
56
Normalization and Database Design
5
• Normalization should be part of design process
• Make sure that proposed entities meet required
normal form before table structures are created
• Many real-world databases have been improperly
designed or burdened with anomalies if improperly
modified during course of time
• You may be asked to redesign and modify existing
databases
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
57
Normalization and Database Design
(continued)
5
• ER diagram
– Provides the big picture, or macro view, of an
organization’s data requirements and operations
– Created through an iterative process
• Identifying relevant entities, their attributes and their
relationship
• Use results to identify additional entities and attributes
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
58
Normalization and Database Design
5
(continued)
• Normalization procedures
– Focus on the characteristics of specific entities
– A micro view of the entities within the ER diagram
– May yield additional entities/attributes to be added to the
ERD
• Difficult to separate normalization process from ER
modeling process
• Two techniques should be used concurrently
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
59
Example: Contracting Company
5
• Operations/Business Rules:
–
–
–
–
Company manages many projects
Each project requires services of many employees
An employee may be assigned many projects
Some employees might not have any assigned
projects
– Every employee has a job classification that
determines the hourly billing rate
– Many employees can have the same job
classification
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
60
Example: Contracting Company
5
• The Initial ERD
– PROJECT is in 3NF
– EMPLOYEE: Contains transitive dependency
• JOB_DESCRIPTION defines the job’s charge per hour
• Classifications determine billing rate
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
61
Example: Contracting Company
5
• The Modified ERD; removing transitive dependency
• Question:
– Which type of relationship exist between EMPLOYEE & PROJECT?
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
62
The Incorrect Representation
of a M:N Relationship
5
• M:N between EMPLOYEE & PROJECT cannot be
implemented
– Need further table decomposition
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
63
Final (Implementable) ERD for a
Contracting Company
5
• After table decomposition
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
64
The Implemented Database for the
Contracting Company
5
• See Ch05_ConstructCo.mdb
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
65
Higher-Level Normal Forms
5
• In some databases, multiple multivalued attributes
exist
• Fourth Normal Form (4NF)
– Table is in 3NF
– No multiple sets of multivalued dependencies
• Fifth Normal Form (5NF, Domain Key NF)
– Not likely in business environment
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
66
Tables with Multivalued Dependencies
5
• The tables contain two sets
of independent multi-valued
dependencies
– ORG_CODE &
ASSIGN_NUM may have
different values for the same
employee
– One employee can have
many service entries and
many assignment to different
organizations
– There is no viable candidate
key
• EMP_NUM values are
not unique, nor any other
combination, since some
of them contain nulls
• Version 3 is in 3NF, but
has many redundancies
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
67
Fourth Normal Form
5
• Table is in fourth normal form (4NF) if
– It is in 3NF or BCNF
– Has no multiple sets of multivalued dependencies
• 4NF is largely academic if tables conform to the
following two rules:
– All attributes are dependent on primary key but
independent of each other
– No row contains two or more multivalued facts about
an entity
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
68
A Set of Tables in 4NF
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
5
69
Denormalization
5
• Creation of normalized relations is important
database design goal
• Processing requirements should also be a goal
• If tables decomposed to conform to normalization
requirements
– Number of database tables expands
• Joining larger number of tables takes additional disk
input/output (I/O) operations and processing logic
– Reduces system speed
• Conflicts among design efficiency, information
requirements, and processing speed are often
resolved through compromises that may include
denormalization
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
70
Denormalization (continued)
5
• Unnormalized tables in a production database
tend to have these defects:
– Data updates are less efficient because
programs that read and update tables must
deal with larger tables
– Indexing is much more cumbersome
– Unnormalized tables yield no simple strategies
for creating virtual tables known as views
• Use denormalization cautiously
• Understand why—under some
circumstances—unnormalized tables are a
better choice
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
71
Summary
5
• Normalization is a table design technique aimed
at minimizing data redundancies
• First three normal forms (1NF, 2NF, and 3NF)
are most commonly encountered
• Normalization is an important part—but only a
part—of the design process
• Continue the iterative ER process until all
entities and their attributes are defined and all
equivalent tables are in 3NF
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
72
Summary (continued)
5
• A table in 3NF may contain multivalued
dependencies that produce either numerous
null values or redundant data
• It may be necessary to convert a 3NF table to
the fourth normal form (4NF) by
– splitting such a table to remove multivalued
dependencies
• Tables are sometimes denormalized to yield
less I/O which increases processing speed
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
73