46-942 - Andrew.cmu.edu

Download Report

Transcript 46-942 - Andrew.cmu.edu

Information Resources
Management
March 13, 2001
Agenda
Administrivia
 Normalization
 Homework #7
 Mid-Term #2

Administrivia
Homework #4
 Homework #5
 Homework #6
 Quiz 2
 Mid-Term #1 Keys
 Mid-Term Grades

Regrade Requests HW 5 & 6
Create Database
 Enter query(s) as submitted
 Submit to me
 Database (electronic)
 Graded homework (paper)
 Reserve the right to change test data
and reexecute query

Normalization
Why & What
 1st Normal Form
 2nd Normal Form
 3rd Normal Form
 Boyce-Codd Normal Form
 4th Normal Form

Normalization - Why
Eliminate anomalies
 Avoid duplication
 Increase flexibility and stability
 Reduce maintenance

Normalization - What?!?
Analysis of functional dependencies
between attributes
 Building several smaller tables from
larger ones
 Decomposing relations with anomalies
to produce smaller, well-structured
relations
 Reducing complexity & increasing
stability

Normalization - What (2)

Series of Steps
 Recipe for constructing a “good”
physical model of a database from a
logical model
 Applied
to all existing tables, including
ones produced by earlier
normalization steps
Example
Sales
(Order#, Date, CustID, Name, Address,
City, State, Zip, {Product#,
ProductDesc, Price, QuantityOrdered},
Subtotal, Tax, S&H, Total)

What are the problems with using a
single table for all order information?
Problems
Implementing Repeating Groups
 Duplication of Data (customer name &
address)
 Unnecessary Data (subtotal, total, tax)
 Others

Normalization is a process to eliminate
these problems.
1st Normal Form
Eliminate Repeating Groups
 1st Normal Form has no repeating
groups


Create definition with all other attributes,
remove the repeat {}, and change the
primary key to include the “key” for the
repeating group.
Example
Sales
(Order#, Date, CustID, Name, Address,
City, State, Zip, Product#, ProductDesc,
Price, QuantityOrdered, Subtotal, Tax,
S&H, Total)

Why is this better?
1st NF Improvements
Implementation is possible
 Querying is possible

2nd Normal Form
Remove all partial functional
dependencies
 2nd Normal Form has no partial
functional dependencies and is in 1st
Normal Form


Partial dependencies get their own
tables -- original table gets a foreign key
Partial Functional
Dependencies

An attribute is only dependent on part of
the primary key
 must be composite key
 single attribute key is 2nd NF

Functional dependencies can be specified
explicitly but usually come from the E-R
model, user specifications, and common
sense
key  non-key attributes
Example - Functional
Dependencies
Order#  Date, CustID, Name, Address, City, State,
Zip, Subtotal, Tax, S&H, Total
Order#, Product#  ProductDesc, Price,
QuantityOrdered
CustID  Name, Address, City, State, Zip
Product#  ProductDesc, Price
Which are partial functional
dependencies?
Example
Sales (Order#, Date, CustID, Name,
Address, City, State, Zip, Subtotal, Tax,
S&H, Total)
OrderLine (Order#, Product#,
ProductDesc, Price, QuantityOrdered)

Is this 2nd NF?
Example
Sales (Order#, Date, CustID, Name, Address,
City, State, Zip, Subtotal, Tax, S&H, Total)
OrderLine (Order#, Product#, QuantityOrdered)
Product (Product#, ProductDesc, Price)

Is this 2nd NF? Why is this better than 1st
NF?
2nd NF Improvements
Elimination of Duplicate Data
 No Loss

3rd Normal Form
Eliminate transitive functional
dependencies
 3rd Normal Form has no transitive
depencencies and is in 2nd Normal
Form


Transitive dependencies get their own
tables -- original table gets a foreign key
Transitive Functional
Dependencies
Attribute is dependent on another, nonkey attribute or attributes
 Attribute is the result of a calculation

CustID  Name, Address, City, State, Zip
Example
Sales (Order#, Date, CustID, Subtotal, Tax, S&H, Total)
OrderLine (Order#, Product#, QuantityOrdered)
Product (Product#, ProductDesc, Price)
Customer (CustID, Name, Address, City, State, Zip)

Is this 3rd NF? Why is this better than 2nd
NF?
Example
Sales (Order#, Date, CustID)
OrderLine (Order#, Product#, QuantityOrdered)
Product (Product#, ProductDesc, Price)
Customer (CustID, Name, Address, City, State, Zip)

Is this 3rd NF? Why is this better than 2nd
NF?
3rd NF Improvements
Elimination of Duplicate Data
 No Loss
 Data is Well-grouped

Beyond 3rd Normal Form

Assume we also want to track
information about products, builders,
and finishes

The following are the functional
dependencies:
 Product, Finish  Builder
 Builder  Finish
Beyond 3rd Normal Form
ProdFinish (Product#, {Finish, Builder})
becomes
ProdFinish (Product#, Finish, Builder)
Is this 3rd NF?
What’s wrong with 3rd NF?
Product
Finish
Builder
1
Cherry
Stan
1
Oak
Sue
2
Oak
Vera
3
Pine
Marv
3
Cherry
Stan
5
Oak
Vera
Product, Finish  Builder
Builder  Finish
What’s Wrong with 3rd NF?
Product
1
1
2
3
3
5
Finish
Cherry
Oak
Oak
Pine
Cherry
Oak
What happens when:
1. Vera is replaced by Vern?
Builder
Stan
Sue
Vera
Marv
Stan
Vera
2. Vera is rehired to work with Oak?
3. Product #3 in pine is discontinued?
What’s Wrong with 3rd NF?

Problems
1. Multiple changes need to be made
2. Can’t assign a builder without a
product
3. Lose information that Marv works in
Pine
Problem & Solution
Problem:
 Builder  Finish
 Builder is not a key
Solution:
 Boyce-Codd Normal Form
Boyce-Codd Normal Form
(BCNF)

Every determinant in a relation (LHS of
the FD’s) is a candidate key and 3rd NF

Make determinant part of the key and
that which is dependent on it an
attribute and renormalize
Example
ProductFinish (Product#, Builder, Finish)
Is this BCNF?
Hint: Is it 3rd NF?
Example
ProductFinish (Product#, Builder)
Builder (Builder, Finish)
Is there anything wrong with this?
Example
ProductBuilder (Product#, Builder)
Builder (Builder, Finish)
Normalization often results in the need
to rename tables so the table name
matches the actual contents.
Beyond BCNF

Normalization with separate repeating
groups can result in other anomalies
CustService (State, {SalesPerson},
{Delivery})
Beyond BCNF
CustService (State, SalesPerson, Delivery)
State
Sales
Delivery
State
Sales
Delivery
PA
George
UPS
NJ
Mike
UPS
PA
George
RPS
NJ
Mike
Truck
PA
Sue
UPS
NJ
Valerie
UPS
PA
Sue
RPS
NJ
Valerie
Truck
Is this BCNF?
Beyond BCNF
Everything is in the key -- must be
BCNF
 Still problems with duplication


Multivalued Dependencies
Multivalued Dependency
At least three attributes (A, B, C)
 A  B and A  C
 B and C are independent of each other
(they really shouldn’t be in the same
table)

4th Normal Form

No multivalued dependencies and
BCNF

Create separate tables for each
separate functional dependency
Example
SalesForce (State, SalesPerson)
Delivery (State, Delivery)
State
Sales
State
Delivery
PA
George
PA
UPS
PA
Sue
PA
RPS
NJ
Mike
NJ
UPS
NJ
Valerie
NJ
Truck
Beyond 4th Normal Form

5th Normal Form
 Project-Join Normal Form

Domain Key Normal Form (DKNF)
User View
Remove partial
functional dependencies
1st NF
Remove
repeating
groups
2nd NF
Remove remaining
functional dependency
anomalies
3rd NF
Remove transitive
functional dependencies
BCNF
Remove multivalued
dependencies
4th NF
In-Class Exercises

Identify the current normal form

If not 4th NF, transform to 4th NF
Homework #7
Normalization
 Database schema from HW #3
 Earlier due date - post key?

Mid-Term #2
Next week, 3/20
 Topics
 Converting an E-R Diagram to a
physical database schema
 Normalizing that schema (3NF)
 SQL
 Identification of BCNF, 4NF problems
