- Courses - University of California, Berkeley

Download Report

Transcript - Courses - University of California, Berkeley

Database Design: From
Conceptual Design to Physical
Implementation
University of California, Berkeley
School of Information Management and
Systems
SIMS 257 – Database Management
9/21/2000
SIMS 257 – Database Management
Review
• Database Design Process
• Normalization
9/21/2000
SIMS 257 – Database Management
Database Design Process
Application 1
External
Model
Application 2
Application 3
Application 4
External
Model
External
Model
External
Model
Application 1
Conceptual
requirements
Application 2
Conceptual
requirements
Application 3
Conceptual
requirements
Conceptual
Model
Logical
Model
Application 4
Conceptual
requirements
9/21/2000
SIMS 257 – Database Management
Internal
Model
Normalization
• Normalization theory is based on the
observation that relations with certain
properties are more effective in inserting,
updating and deleting data than other sets of
relations containing the same data
• Normalization is a multi-step process
beginning with an “unnormalized” relation
– Hospital example from Atre, S. Data Base: Structured Techniques for
Design, Performance, and Management.
9/21/2000
SIMS 257 – Database Management
Normal Forms
•
•
•
•
•
•
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form (BCNF)
Fourth Normal Form (4NF)
Fifth Normal Form (5NF)
9/21/2000
SIMS 257 – Database Management
Normalization
No transitive
dependency
between
nonkey
attributes
All
determinants
are candidate
keys - Single
multivalued
dependency
9/21/2000
BoyceCodd and
Higher
SIMS 257 – Database Management
Functional
dependencyof
nonkey
attributes on
the primary
key - Atomic
values only
Full
Functional
dependencyof
nonkey
attributes on
the primary
key
Unnormalized Relations
• First step in normalization is to convert the
data into a two-dimensional table
• In unnormalized relations data can repeat
within a column
9/21/2000
SIMS 257 – Database Management
Unnormalized Relation
Patient #
Surgeon #
145
1111 311
Surg. date
Patient Name
Jan 1,
1995; June
12, 1995
John White
Patient Addr Surgeon
15 New St.
New York,
NY
243
1234 467
2345 189
Jan 8,
1996
Charles Brown
4876 145
Nov 5,
1995
Hal Kane
5123 145
May 10,
1995
Paul Kosher
Charles
Field
10 Main St. Patricia
Rye, NY
Gold
Dogwood
Lane
Harrison,
David
NY
Rosen
55 Boston
Post Road,
Chester,
CN
Beth Little
Blind Brook
Mamaronec
k, NY
Beth Little
6845 243
Apr 5,
1994 Dec
15, 1984
Ann Hood
Hilton Road
Larchmont, Charles
NY
Field
9/21/2000
Postop drug
Drug side effects
Gallstone
s removal;
Beth Little Kidney
Michael
stones
Penicillin,
Diamond removal
none-
Apr 5,
1994 May
10, 1995
Mary Jones
Surgery
SIMS 257 – Database Management
rash
none
Eye
Cataract
removal
Thrombos Tetracyclin Fever
is removal e none
none
Open
Heart
Surgery
Cholecyst
ectomy
Gallstone
s
Removal
Eye
Cornea
Replacem
ent Eye
cataract
removal
Cephalosp
orin
none
Demicillin
none
none
none
Tetracyclin
e
Fever
First Normal Form
Patient #
Surgeon # Surgery DatePatient Name Patient Addr Surgeon Name
1111
145
01-Jan-95 John White
1111
311
12-Jun-95 John White
15 New St.
New York,
NY
15 New St.
New York,
NY
1234
243
05-Apr-94 Mary Jones
10 Main St.
Rye, NY
1234
467
10-May-95 Mary Jones
2345
4876
5123
6845
6845
9/21/2000
189
145
145
243
243
Charles
08-Jan-96 Brown
10 Main St.
Rye, NY
Dogwood
Lane
Harrison,
NY
05-Nov-95 Hal Kane
55 Boston
Post Road,
Chester,
CN
05-Apr-94 Ann Hood
15-Dec-84 Ann Hood
Hilton Road
Larchmont,
NY
Drug adminSide Effects
Charles Field
Gallstone
s removal
Kidney
stones
removal
Eye
Cataract
removal
Patricia Gold
Thrombos
is removal none
none
David Rosen
Open
Heart
Surgery
none
Beth Little
Cholecyst
ectomy
Demicillin
Beth Little
Michael
Diamond
Blind Brook
Mamaronec
10-May-95 Paul Kosher k, NY
Beth Little
Hilton Road
Larchmont,
NY
Surgery
Penicillin
rash
none
none
Tetracyclin
e
Fever
Cephalosp
orin
Charles Field
Gallstone
s
Removal
none
Eye
Cornea
Replacem Tetracyclin
ent
e
Charles Field
Eye
cataract
removal
SIMS 257 – Database Management
none
none
none
Fever
none
Second Normal Form
Patient #
1111
1234
2345
4876
5123
6845
9/21/2000
Patient Name Patient Address
15 New St. New
John White York, NY
10 Main St. Rye,
Mary Jones NY
Charles
Dogwood Lane
Brown
Harrison, NY
55 Boston Post
Hal Kane
Road, Chester,
Blind Brook
Paul Kosher Mamaroneck, NY
Hilton Road
Ann Hood
Larchmont, NY
SIMS 257 – Database Management
Second Normal Form
Surgeon #
Surgeon Name
145 Beth Little
189 David Rosen
243 Charles Field
311 Michael Diamond
467 Patricia Gold
9/21/2000
SIMS 257 – Database Management
Second Normal Form
Patient # Surgeon # Surgery Date
1111
1111
1234
1234
2345
4876
9/21/2000
Surgery
Drug Admin Side Effects
145
Gallstones
01-Jan-95 removal
Kidney
Penicillin
rash
311
stones
12-Jun-95 removal
none
none
243
Eye Cataract
05-Apr-94 removal
Tetracycline Fever
467
Thrombosis
10-May-95 removal
189
Open Heart
08-Jan-96 Surgery
Cephalospori
n
none
145
Cholecystect
05-Nov-95 omy
Demicillin
none
none
none
none
none
5123
145
6845
243
6845
243
Gallstones
10-May-95 Removal
Eye cataract
15-Dec-84 removal
Eye Cornea
05-Apr-94 Replacement
SIMS 257 – Database Management
none
none
Tetracycline Fever
Third Normal Form
Patient # Surgeon # Surgery Date
9/21/2000
Surgery
Drug Admin
1111
145
1111
311
01-Jan-95 Gallstones removal
Kidney stones
12-Jun-95 removal
1234
243
05-Apr-94 Eye Cataract removal Tetracycline
1234
467
10-May-95 Thrombosis removal
2345
189
08-Jan-96 Open Heart Surgery
Cephalosporin
4876
145
05-Nov-95 Cholecystectomy
Demicillin
5123
145
10-May-95 Gallstones Removal
none
6845
243
none
6845
243
15-Dec-84 Eye cataract removal
Eye Cornea
05-Apr-94 Replacement
SIMS 257 – Database Management
Penicillin
none
none
Tetracycline
Third Normal Form
Drug Admin
9/21/2000
Side Effects
Cephalosporin
none
Demicillin
none
none
none
Penicillin
rash
Tetracycline
Fever
SIMS 257 – Database Management
Most 3NF Relations are also
BCNF
Patient #
1111
1234
2345
4876
5123
6845
9/21/2000
Patient Name Patient Address
15 New St. New
John White York, NY
10 Main St. Rye,
Mary Jones NY
Charles
Dogwood Lane
Brown
Harrison, NY
55 Boston Post
Hal Kane
Road, Chester,
Blind Brook
Paul Kosher Mamaroneck, NY
Hilton Road
Ann Hood
Larchmont, NY
SIMS 257 – Database Management
Fourth Normal Form
• Any relation is in Fourth Normal Form if it
is BCNF and any multivalued dependencies
are trivial
• Eliminate non-trivial multivalued
dependencies by projecting into simpler
tables
9/21/2000
SIMS 257 – Database Management
Fifth Normal Form
• A relation is in 5NF if every join
dependency in the relation is implied by the
keys of the relation
• Implies that relations that have been
decomposed in previous NF can be
recombined via natural joins to recreate the
original relation.
9/21/2000
SIMS 257 – Database Management
Normalization
• Normalization is performed to reduce or
eliminate Insertion, Deletion or Update
anomalies.
• However, a completely normalized database
may not be the most efficient or effective
implementation.
• “Denormalization” is sometimes used to
improve efficiency.
9/21/2000
SIMS 257 – Database Management
Denormalization
• Usually driven by the need to improve
query speed
• Query speed is improved at the expense of
more complex or problematic DML (Data
manipulation language) for updates,
deletions and insertions.
9/21/2000
SIMS 257 – Database Management
Downward Denormalization
Before:
Customer
ID
Address
Name
Telephone
After:
Order
Order No
Date Taken
Date Dispatched
Date Invoiced
Cust ID
9/21/2000
Customer
ID
Address
Name
Telephone
Order
Order No
Date Taken
Date Dispatched
Date Invoiced
Cust ID
Cust Name
SIMS 257 – Database Management
Upward Denormalization
Order
Order No
Date Taken
Date Dispatched
Date Invoiced
Cust ID
Cust Name
Order
Order No
Date Taken
Date Dispatched
Date Invoiced
Cust ID
Cust Name
Order Price
Order Item
Order No
Item No
Item Price
Num Ordered
9/21/2000
SIMS 257 – Database Management
Order Item
Order No
Item No
Item Price
Num Ordered
Today: New Design
• Today we will build the COOKIE database
from needs (rough) through the conceptual
model, logical model and finally physical
implementation in Access.
9/21/2000
SIMS 257 – Database Management
ER Diagram Symbols
Primary
key
Attribute
Entity
Relationship
9/21/2000
Ovals are used to indicate the attributes associated with an
entity or relationship (That is, the pieces of information
recorded in the database about the entity or relationship)
An underlined name indicates that the attribute is a
primary key (That is, it can uniquely identify the entity)
Rectangles are used to indicate entities (That is, the
representatives or records describing persons, things, or
events in the database)
Diamonds are used to indicate relationships between
entities. (That is, some association between the data
records of different entities)
SIMS 257 – Database Management
Cookie Requirements
• Cookie is a bibliographic database that contains
information about a hypothetical union catalog of
several libraries.
• Need to record which books are held by which
libraries
• Need to search on bibliographic information
– Author, title, subject, call number for a given library,
etc.
• Need to know who publishes the books for
ordering, etc.
9/21/2000
SIMS 257 – Database Management
Cookie Database
• There are currently 5 main types of entities
in the database
–
–
–
–
–
–
9/21/2000
Books (bibfile)
Local Call numbers (callfile)
Libraries (libfile)
Publishers (pubfile)
Subject headings (subfile)
Links between subject and books (indxfile)
SIMS 257 – Database Management
BIBFILE
• Books (BIBFILE) contains information
about particular books. It includes one
record for each book. The attributes are:
–
–
–
–
–
–
–
–
–
9/21/2000
accno -- an “accession” or serial number
author -- The author’s name (not realistic -- one author per book)
title -- The title of the book
loc -- Location of publication (where published)
date -- Date of publication
price -- Price of the book
pagination -- Number of pages
ill -- What type of illustrations (maps, etc) if any
height -- Height of the book in centimeters
SIMS 257 – Database Management
Books/BIBFILE
Author
Title
accno
Loc
Books
Price
Date
Pagination
Ill
Height
9/21/2000
SIMS 257 – Database Management
CALLFILE
• CALLFILE contains call numbers and
holdings information linking particular
books with particular libraries. Its attributes
are:
– accno -- the book accession number
– libid -- the id of the holding library
– callno -- the call number of the book in the particular
library
– copies -- the number of copies held by the particular
library
9/21/2000
SIMS 257 – Database Management
LocalInfo/CALLFILE
libid
Callno
accno
Copies
CALLFILE
9/21/2000
SIMS 257 – Database Management
LIBFILE
• LIBFILE contain information about the
libraries participating in this union catalog.
Its attributes include:
–
–
–
–
–
–
–
–
9/21/2000
libid -- Library id number
library -- Name of the library
laddress -- Street address for the library
lcity -- City name
lstate -- State code (postal abbreviation)
lzip -- zip code
lphone -- Phone number
mop - suncl -- Library opening and closing times for each day of
the week.
SIMS 257 – Database Management
Libraries/LIBFILE
lstate
laddress
lcity
lzip
Library
lphone
Libid
SatCl
LIBFILE
SunOp
SatOp
Suncl
FCl
MOp
FOp
ThCl
Mcl
TuOp
9/21/2000
TuCl
WOp
SIMS 257 – Database Management
WCl
ThOp
PUBFILE
• PUBFILE contain information about the
publishers of books. Its attributes include
–
–
–
–
–
–
–
–
9/21/2000
pubid -- The publisher’s id number
publisher -- Publisher name
paddress -- Publisher street address
pcity -- Publisher city
pstate -- Publisher state
pzip -- Publisher zip code
pphone -- Publisher phone number
ship -- standard shipping time in days
SIMS 257 – Database Management
Publisher/PUBFILE
paddress
Publisher
pcity
PUBFILE
pubid
pzip
Ship
pphone
9/21/2000
pstate
SIMS 257 – Database Management
SUBFILE
• SUBFILE contains each unique subject
heading that can be assigned to books. Its
attributes are
– subcode -- Subject identification number
– subject -- the subject heading/description
9/21/2000
SIMS 257 – Database Management
Subjects/SUBFILE
subid
Subject
SUBFILE
9/21/2000
SIMS 257 – Database Management
INDXFILE
• INDXFILE provides a way to allow manyto-many mapping of subject headings to
books. Its attributes consist entirely of links
to other tables
– subcode -- link to subject id
– accno -- link to book accession number
9/21/2000
SIMS 257 – Database Management
Linking Subjects and Books
subid
accno
INDXFILE
9/21/2000
SIMS 257 – Database Management
Some examples of Cookie
Searches
• Who wrote Microcosmographia Academica?
• How many pages long is Alfred Whitehead’s The Aims of Education
and Other Essays?
• Which branches in Berkeley’s public library system are open on
Sunday?
• What is the call number of Moffitt Library’s copy of Abraham
Flexner’s book Universities: American, English, German?
• What books on the subject of higher education are among the holdings
of Berkeley (both UC and City) libraries?
• Print a list of the Mechanics Library holdings, in descending order by
height.
• What would it cost to replace every copy of each book that contains
illustrations (including graphs, maps, portraits, etc.)?
• Which library closes earliest on Friday night?
9/21/2000
SIMS 257 – Database Management
pubid
Cookie ER diagram
accno
BIBFILE
Has call
CALLFILE
accno
Has copy
LIBFILE
libid
libid
publishes
Has index
INDXFILE
accno
9/21/2000
PUBFILE
pubid
Has subject
subcode
SIMS 257 – Database Management
SUBFILE
subcode
Note: diagram
contains only
attributes used
for linking
What Problems?
• What sorts of problems and missing
features arise given the previous ER
diagram?
9/21/2000
SIMS 257 – Database Management
Problems Identified
• Field sizes inappropriate
• Author doesn’t allow
multiple authors (editors,
etc).
• Subtitles, parallel titles
• Edition information
• Series information
• lending status
• material type designation
• Genre, class information
• Better codes (ISBN?)
9/21/2000
• Missing information
(ISBN)
• Authority control for
authors
• Missing/incomplete data
• Data entry problems
• Ordering information
• Illustrations
• Subfield separation (such
as last_name, first_name)
• Separate personal and
corporate authors
SIMS 257 – Database Management
Problems (Cont.)
• Location field
inconsistent
• No notes field
• No language field
• Zipcode doesn’t support
plus-4
• No publisher shipping
addresses
9/21/2000
• No (indexable) keyword
search capability
• No support for
multivolume works
• No support for URLs
– to online version
– to libraries
– to publishers
SIMS 257 – Database Management
pubid
Original Cookie ER diagram
accno
BIBFILE
CALLFILE
Has call
accno
Libid
Callno
publishes
Has index
accno
9/21/2000
PUBFILE
INDXFILE
LIBFILE
Has copy
libid
pubid
Has subject
subid
SIMS 257 – Database Management
Address, etc
SUBFILE
subid
Library
subject
Cookie2: Separate Name
Authorities
pubid
accno
BIBFILE
CALLFILE
accno
accno
AUTHBIB
LIBFILE
libid
libid
nameid
PUBFILE
authtype
INDXFILE
pubid
SUBFILE
AUTHFILE
accno
nameid
9/21/2000
name
subcode
SIMS 257 – Database Management
subcode
Cookie3: Keywords
termid
accno termid
pubid
accno
BIBFILE
TERMS
CALLFILE
LIBFILE
accno
accno
AUTHBIB
KEYMAP
libid
libid
nameid
PUBFILE
authtype
INDXFILE
pubid
SUBFILE
AUTHFILE
accno
nameid
9/21/2000
name
subcode
SIMS 257 – Database Management
subcode
ser_title
SERIES
seriesid
Cookie 4: Series
termid
accno termid
seriesid
pubid
accno
BIBFILE
TERMS
CALLFILE
LIBFILE
accno
accno
AUTHBIB
KEYMAP
libid
libid
nameid
PUBFILE
authtype
INDXFILE
pubid
SUBFILE
AUTHFILE
accno
nameid
9/21/2000
name
subcode
SIMS 257 – Database Management
subcode
ser_title
SERIES
Cookie 5: Circulation
seriesid
termid
accno termid
seriesid
pubid
accno
BIBFILE
TERMS
CALLFILE
LIBFILE
accno
accno
AUTHBIB
KEYMAP
libid
circid
libid
nameid
PUBFILE
authtype
pubid
PATRON
INDXFILE
SUBFILE
CIRC
AUTHFILE
accno
nameid
9/21/2000
name
subcode
subcode
SIMS 257 – Database Management
copynumpatronid circid
Mapping to Relations
• Take each entity
–
–
–
–
–
–
BIBFILE
LIBFILE
CALLFILE
SUBFILE
PUBFILE
INDXFILE
• And make it a table...
9/21/2000
SIMS 257 – Database Management
Implementing the Physical
Database...
• For each of the entities, we will build a
table…
• Start up access…
• Use “New” in Tables…
• Loading data
• Entering data
• Data entry forms
9/21/2000
SIMS 257 – Database Management