Information Organization and Retrieval - Courses

Download Report

Transcript Information Organization and Retrieval - Courses

Database Design: ObjectOriented Modeling, Logical
Design and Normalization
University of California, Berkeley
School of Information Management and
Systems
SIMS 202: Information Organization and
Retrieval
9/20/2000
Information Organization and Retrieval
Review
• New Personal Database assignment
• Database Design Process
• Basics of ER Diagrams
9/20/2000
Information Organization and Retrieval
DiveShop ER Diagram
Customer
No
DiveCust
1
Destination
Name
Destination
no
Customer
No
1
n
Dest
Destination
no
Site No
1
Site No
Species
No
n
BioSite
ShipVia
n
DiveOrds
n
9/20/2000
ShipVia
1
1
Destination
n
Sites
Order
No
n
1
1/n
DiveItem
ShipWrck
n
Site No
1
Order
No
Item
No
n
1
Species
No
1
DiveStok
BioLife
Information Organization and Retrieval
Item
No
ShipVia
Today
• Object Oriented Modeling and UML
• Logical Database Design
• Normalization
• (Most UML examples based on McFadden, “Modern Database
Management”, 5th edition.
9/20/2000
Information Organization and Retrieval
Object-Oriented Modeling
• Becoming increasingly important as
– Object-Oriented and Object-Relational DBMS
continue to proliferate
– Databases become more complex and have
more complex relationships than are easily
captured in ER or EER diagrams
9/20/2000
Information Organization and Retrieval
Object Benefits
• Encapsulate both data and behavior
• Object-oriented modeling methods can be used for
both database design and process design
– Real-World applications have more than just the data in
the database they also involve the processes,
calculations, etc performed on that data to get real tasks
done
– OOM can be used for more challenging and complex
problems
9/20/2000
Information Organization and Retrieval
Unified Modeling Language
(UML)
• Combined three competing methods
• Can be used for graphically depicting
– Software designs and interaction
– Database
– Processes
•
9/20/2000
Information Organization and Retrieval
CLASS
• A class is a named description of a set of objects
that share the same attributes, operations,
relationships, and semantics.
– An object is an instance of a class that encapsulates
state and behavior.
• These objects can represent real-world things or conceptual
things.
– An attribute is a named property of a class that
describes a range of values that instances of that class
might hold.
– An operation is a named specification of a service that
can be requested from any of a class's objects to affect
behavior in some way or to return a value without
affecting behavior
9/20/2000
Information Organization and Retrieval
UML Relationships
• An relationship is a connection between or
among model elements.
• The UML defines four basic kinds of
relationships:
–
–
–
–
9/20/2000
Association
Dependency
Generalization
Realization
Information Organization and Retrieval
UML Diagrams
• The UML defines nine types of diagrams:
– activity diagram
– class diagram
• Describes the data and some behavioral (operations) of a
system
–
–
–
–
–
–
–
9/20/2000
collaboration diagram
component diagram
deployment diagram
object diagram
sequence diagram
statechart diagram
use case diagram
Information Organization and Retrieval
Class Diagrams
• A class diagram is a diagram that shows a
set of classes, interfaces, and/or
collaborations and the relationships among
these elements.
9/20/2000
Information Organization and Retrieval
UML Class Diagram
DIVEORDS
Order No
Customer No
Sale Date
Shipvia
PaymentMethod
CCNumber
No of People
Depart Date
Return Date
Destination
Vacation Cost
Class Name
List of Attributes
CalcTotalInvoice()
CalcEquipment()
9/20/2000
Information Organization and Retrieval
List of operations
Object Diagrams
307:DIVORDS
Order No = 307
Customer No = 1480
Sale Date = 9/1/99
Ship Via = UPS
PaymentMethod = Visa
CCNumber = 12345 678 90
CCExpDate = 1/1/01
No of People = 2
Depart Date = 11/8/00
Return Date = 11/15/00
Destination = Fiji
Vacation Cost = 10000
9/20/2000
Information Organization and Retrieval
Differences from Entities in ER
• Entities can be represented by Class
diagrams
• But Classes of objects also have additional
operations associated with them
9/20/2000
Information Organization and Retrieval
Operations
• Three basic types for database
– Constructor
– Query
– Update
9/20/2000
Information Organization and Retrieval
Associations
• An association is a relationship that describes a set
of links between or among objects.
• An association can have a name that describes the
nature of this relationship. You can put a triangle
next to this name to indicate the direction in which
the name should be read.
• An association contains an ordered list of
association ends.
– An association with exactly two association ends is
called a binary association
– An association with more than two ends is called an nary association.
9/20/2000
Information Organization and Retrieval
Associations: Unary relationships
*
0..1
Person
Is-married-to
0..1 manager
0..1
9/20/2000
Employee
Information Organization and Retrieval
manages
Associations: Binary
Relationship
Employee
0..1
Is-assigned
Parking
Place
0..1
One-to-one
Product
Line
1
contains
*
Product
One-to-many
Student
*
Registers-for
Many-to-many
9/20/2000
Information Organization and Retrieval
*
Course
Associations: Ternary
Relationships
Part
*
Vendor
9/20/2000
*
Supplies
Information Organization and Retrieval
* Warehouse
Association Classes
Registers-for
Student
*
Course
*
Computer Account
Registration
_________________
________________
acctID
Term
issues
Password
*
0..1
Grade
ServerSpace
________________
CheckEligibility()
9/20/2000
Information Organization and Retrieval
Derived Attributes, Associations,
and Roles
Course
Student
Course
Offering
_________
____________
____________ Scheduled-for
name
Registers-for
crseCode
term
ssn
*
crseTitle
*
*
1
section
dateOfBirth
creditHrs
time
Derived
/age
location
attribute
*
*
/participant Derived role
{age = currentDate – dateOfBirth}
/Takes
Derived association
9/20/2000
Information Organization and Retrieval
Generalization
Employee
____________
empName
empNumber
address
dateHired
____________
printLabel()
Hourly Employee
_______________
HourlyRate
_______________
computeWages()
9/20/2000
Salaried Employee
_______________
Annual Sal
stockoption
_______________
Information Organization and Retrieval
Contributepension()
Consultant
_______________
contractNumber
billingRate
_______________
computeFees()
Other Diagramming methods
• SOM (Semantic Object Model)
• Object Definition Language (ODL)
– Not really diagramming
– See Text chapter 3
• Access relationships display
• Hybrids
9/20/2000
Information Organization and Retrieval
Application of SOM to Diveshop
DIVECUST
Name
Address
Street
City
StateProvince
ZIPPostalCode
Country
Phone
FirstContact
1.1
1.1
1.1
1.1
1.1
1.1
1.1
DIVEORDS
9/20/2000
1.N
Information Organization and Retrieval
1.1
1.1
DIVEORDS
DIVEORDS
OrderNo
SaleDate
DIVECUST
id
SHIPVIA
DESTINATION
DIVEITEM
PaymentMethod
CCNumber
CCExpDate
NoOfPeople
DepartDate
ReturnDate
VacationCost
9/20/2000
Information Organization and Retrieval
DiveShop ER Diagram
Customer
No
DiveCust
1
Destination
Name
Destination
no
Customer
No
1
n
Dest
Destination
no
Site No
1
Site No
Species
No
n
BioSite
ShipVia
n
DiveOrds
n
9/20/2000
ShipVia
1
1
Destination
n
Sites
Order
No
n
1
1/n
DiveItem
ShipWrck
n
Site No
1
Order
No
Item
No
n
1
Species
No
1
DiveStok
BioLife
Information Organization and Retrieval
Item
No
ShipVia
Entities
•
•
•
•
•
Customer
Dive Order
Line item
Shipping information
Dive Equipment
Stock/Inventory
• Dive Locations
9/20/2000
• Dive Sites
• Sea Life
• Shipwrecks
Information Organization and Retrieval
Logical Design: Mapping to a
Relational Model
• Each entity in the ER Diagram becomes a relation.
• A properly normalized ER diagram will indicate
where intersection relations for many-to-many
mappings are needed.
• Relationships are indicated by common columns
(or domains) in tables that are related.
• We will examine the tables for the Diveshop
derived from the ER diagram
9/20/2000
Information Organization and Retrieval
Customer = DIVECUST
Customer No
Name
Street
City
State/Prov Zip/Postal Code
Country
1480 Louis Jazdzewski
2501 O'Connor
New Orleans
LA
60332
U.S.A.
1481 Barbara Wright
6344 W. Freeway
San Francisco
CA
95031
U.S.A.
1909 Stephen Bredenburg
559 N.E. 167
Indianapolis
Place IN
46241
U.S.A.
1913 Phillip Davoust
123 First Street
Berkeley CA
94704
U.S.A.
1969 David Burgett
320 Montgomery
SeattleStreet
WA
98105
U.S.A.
2001 Mary Rioux1701 Gateway
Pueblo
Blvd. #385
CO
81002
U.S.A.
2306 Kim Lopez 14134 Nottingham
HonoluluLane
HI
96826
U.S.A.
2589 Hiram Marley
7233 Mill Run
SanDrive
Francisco
CA
94123
U.S.A.
3154 Tanya Kulesa
505 S. Flower,
NewMail
YorkStop
NY 48943 10032
U.S.A.
3333 Charles Sekaron
110 East Park
Miller
Avenue,SD
Box 8
57362
U.S.A.
3684 Lowell Lutz915 E. Fesler
Dallas
TX
75043
U.S.A.
4158 Keith Lucas56 South Euclid
Chicago IL
60542
U.S.A.
4175 Karen Ng 2134 ElmhillKlamath
Pike Falls
OR
97603
U.S.A.
5510 Ken Soule 58 Sansome
Aurora
Street CO
89022
U.S.A.
9/20/2000
Information Organization and Retrieval
Phone
First Contact
(902) 555-88881/29/95
(415) 555-43212/2/93
(317) 555-36441/5/93
(415) 555-91843/9/98
(206) 555-75803/12/99
(719) 555-20103/15/97
(808) 555-50501/29/99
(415) 555-64302/18/99
(212) 555-67501/30/99
(613) 555-43333/16/98
(214) 555-27222/15/99
(312) 555-43103/17/98
(503) 555-47003/20/99
(303) 555-66952/5/99
Dive Order = DIVEORDS
Order No Customer No
Sale Date
307
1480
9/1/99
310
1481
9/1/99
313
1909
9/1/99
314
1913
9/1/99
317
1969
9/1/99
320
2001
9/1/99
321
2306
9/1/99
325
2589
9/1/99
326
3333
9/1/99
327
3684
9/1/99
329
4158
9/1/99
330
4175
9/1/99
331
5510
9/1/99
333
5926
9/1/99
336
5719
9/1/99
9/20/2000
Ship Via
UPS
FedEx
Walk In
FedEx
FedEx
Walk In
Emery
Emery
FedEx
DHL
Walk In
FedEx
FedEx
DHL
FedEx
PaymentMethod
CcNumber CcExpDateNo Of People
Depart DateReturn DateDestinationVacationCost
Visa
12345 678 90 1/1/01
2 11/8/00 11/15/00 Fiji
10000
Check
1
4/4/00
4/18/00 Santa Barbara 6000
Visa
456456456 9/11/00
4 6/27/00
7/11/00 Cozumel
8000
Check
3
2/7/00
2/14/00 Monterey
6000
AmEx
432432432 12/31/02
4
5/9/00
5/16/00 Fiji
20000
Cash
1 10/10/00 10/17/00 Santa Barbara 3000
Master Card
1112223334 8/12/00
1 3/15/00
4/12/00 New Jersey
8000
AmEx
332332332 12/10/99
1 3/15/00
4/12/00 New Jersey
8000
Money Order
2 2/10/00
2/17/00 Monterey
4000
Master Card
122122321 11/9/99
4 3/10/00
3/23/00 Florida
24000
Cash
1
5/4/00
5/15/00 Cozumel
1571
Check
2
7/3/00
7/10/00 Florida
6000
Money Order
6 6/20/00
6/30/00 Santa Barbara 36000
Discover 123123123 12/21/00
2 6/10/00
6/17/00 Fiji
10000
Cash
10
4/2/00
4/24/00 Great Barrier Reef
200000
Information Organization and Retrieval
Line item = DIVEITEM
Order No Item No
307
90010
307
90020
307
90021
307
90030
307
90051
310
90011
310
90045
310
90059
310
90074
310
90078
313
90127
314
90072
314
90094
314
90100
317
90012
9/20/2000
Rental/SaleQty
Rental
Rental
Rental
Rental
Rental
Rental
Rental
Rental
Rental
Rental
Sale
Rental
Rental
Rental
Sale
Information Organization and Retrieval
Line Note
4
1
1
2
2
1
1
1
1
1
1
3
3
3
2
This is our most popular mask.
These are our best selling fins.
A good weight belt for beginners
Holds 10 cubic feet of cargo.
Shipping information =
SHIPVIA
Ship Via
DHL
Emery
FedEx
UPS
US Mail
9/20/2000
Ship Cost
8
11
12
10
6
Information Organization and Retrieval
Dive Equipment Stock/Inventory
= DIVESTOK
Item No
90010
90011
90012
90020
90021
90022
90023
90024
90025
90030
90031
90032
90033
90040
90041
90042
9/20/2000
DescriptionEquipment On
Class
Hand Reorder Point
Cost
Sale Price Rental Price
Shotgun 2 Snorkel - Clear
12
2 $18.00
$30.00
$2.00
Shotgun 2 Snorkel - Red
12
2 $18.00
$30.00
$2.00
Shotgun 2 Snorkel - Teal
11
2 $18.00
$30.00
$2.00
Tri-Vent Mask
Mask
- Clear
14
2 $62.50 $100.00
$5.00
Tri-Vent Mask
Mask
- Red
10
2 $62.50 $100.00
$5.00
Tri-Vent Mask
Mask
- Teal
14
2 $62.50 $100.00
$7.00
Quad Vision
Mask
Mask - Clear
11
2 $48.25
$80.00
$7.00
Quad Vision
Mask
Mask - Red
13
2 $48.25
$80.00
$7.00
Quad Vision
Mask
Mask - Teal
10
2 $48.25
$80.00
$10.00
Sea Wing Fins
Fins - Clear
12
2 $60.00 $100.00
$12.00
Sea Wing Fins
Fins - Red
11
2 $60.00 $100.00
$12.00
Sea Wing Fins
Fins - Teal
12
2 $60.00 $100.00
$12.00
Jet Fin - Black
Fins
14
2 $30.00
$60.00
$10.00
D350 Second
Regulator
Stage
11
1 $162.50 $270.00
$20.00
G250 Second
Regulator
Stage
13
1 $144.50 $240.00
$20.00
G200 Second
Regulator
Stage
12
1 $105.25 $175.00
$20.00
Information Organization and Retrieval
Dive Locations = DEST
DestinationDestination
No
Avg
Name
Temp Avg
(F) Temp Spring
(C)
Temp
Spring
(F) Temp
Summer
(C) Temp
Summer
(F) Temp
Fall Temp
(C) (F)
Fall Temp (C)
Winter Temp
Winter
(F) Temp
Accomodations
(C)
Night Life
1 Cozumel
78
25.556
76
24.444
84
28.889
78
25.556
74
23.333 Cheap
Sleepy
2 Great Barrier Reef80
26.667
76
24.444
84
28.889
78
25.556
76
24.444 Moderate Pleasant
3 Monterey
60
15.556
62
16.667
64
17.778
64
17.778
58
14.444 Expensive Wild
4 Santa Barbara
75
23.889
73
22.777
78
25.556
72
22.222
70
21.111 Expensive Wild
5 Florida
77
25
75
23.889
85
29.444
78
25.556
70
21.111 Moderate Pleasant
6 Fiji
75
23.889
76
24.444
80
26.667
74
23.333
70
21.111 Expensive Sleepy
7 New Jersey
57
13.889
57
13.89
60
15.556
58
14.444
53
11.667 Expensive Pleasant
9/20/2000
Information Organization and Retrieval
Body of Water
Travel Cost
Caribbean
1000
Coral Sea
5000
Pacific
2000
Pacific
3000
Caribbean
3000
South Pacific 5000
Atlantic
2000
Dive Sites = SITE
Site No
DestinationSite
No Name
Site HighlightSiteDistance
NotesDistance
from Depth
Town
from(m)
(ft)Depth
Town (km)
(m) Visibility (ft)Visibility (m)
Current
1001
1 Palancar Reef Reef
10 16.09
100
30.48
150
45.72 Strong
1002
1 Santa Rosa ReefReef
8 12.87
80
24.384
150
45.72 Strong
1003
1 Chancanab ReefR eef
4 6.437
60
18.288
100
30.48 Mild
1004
1 Punta Sur
Reef
13 20.92
120
36.576
175
53.34 Strong
1005
1 Yocab Reef
Reef
6 9.656
50
15.24
100
30.48 Mild
2001
2 Heron Island
Reef
50 80.47
90
27.432
150
45.72 Mild
2002
2 Cod Hole
Fish
45 72.42
50
15.24
150
45.72 Mild
2003
2 Butterfly Bay
Caves
20 32.19
70
21.336
70
21.336 None
2004
2 Wheeler Reef Marine Life
30 48.28
50
15.24
125
38.1 Mild
2005
2 Watanabe
Marine Life
130 209.2
150
45.72
200
60.96 None
3001
3 Point Lobos
Marine Life
3 4.828
60
18.288
75
22.86 None
3002
3 Macabee BeachMarine Life
0.1 0.161
40
12.192
40
12.192 None
3003
3 Pinnacles
Pinnacle
1 1.609
60
18.288
50
15.24 Mild
3004
3 Monastery Beach
Marine Life
3 4.828
50
15.24
40
12.192 Surge
9/20/2000
Information Organization and Retrieval
Skill Level
Intermediate
Intermediate
Beginning
Advanced
Beginning
Intermediate
Beginning
Advanced
Beginning
Intermediate
Beginning
Beginning
Beginning
Beginning
Sea Life = BIOLIFE
Species NoCategory Common Name Species Name Length (cm)
Length (in)
Notes Graphic
90020 TriggerfishClown TriggerfishBallistoides conspicillum
50 19.685
90030 Snapper Red Emperor
Lutjanus sebae
60 23.622
90050 Wrasse Giant Maori Wrasse
Cheilinus undulatus 229 90.157
90070 Angelfish Blue Angelfish Pomacanthus nauarchus
30 11.811
90080 Cod
Lunartail RockcodVariola louti
80 31.496
90090 Scorpionfish
Firefish
Pterois volitans
38 14.961
90100 ButterflyfishOrnate Butterflyfish
Chaetodon Ornatissimus
19 7.4803
90110 Shark
Swell Shark
Cephaloscyllium ventriosum
102 40.157
90120 Ray
Bat Ray
Myliobatis californica 56 22.047
90130 Eel
California Moray Gymnothorax mordax 150 59.055
90140 Cod
Lingcod
Ophiodon elongatus 150 59.055
9/20/2000
Information Organization and Retrieval
BIOSITE -- linking relation
Species No Site No
90010
2001
90010
2002
90010
2003
90010
2004
90010
2005
90010
6001
90010
6003
90010
6004
90010
6005
90020
2001
90020
2002
9/20/2000
Information Organization and Retrieval
Shipwrecks = SHIPWRK
Ship Name Site No
Delaware
7007
F.S.Loop
4004
Gosford
4001
Great Isaac
7002
Lizzie D
7001
Mohawk
7004
R.P. Resor
7006
Star of Scotland 4002
Tolten
7008
USS Moody
4006
Valiant
4003
9/20/2000
Category Type
Interest
TonnageLength (ft)
Length (m) Beam (ft)
Beam (m)
Commercial
Steam Freighter
Treasure
1646
252
76.8096
37
11.2776
Commercial
Steam Schooner
Machinery
794
193
58.8264
39
11.8872
Commercial
Barque Rigged
Fixture
Sail
2250
280
85.344
42
12.8016
Commercial
Seagoing Tug
Fixture
1117
185
56.388
37
11.2776
Commercial
Tug/Rumrunner
Treasure
122
84
25.6032
21
6.4008
PassengerOcean Liner
Treasure
8140
402 122.5296
54
16.4592
Commercial
Oil Tanker Treasure
7450
435
132.588 66.8 20.36064
PassengerBritish Q-Boat
Treasure
1250
263
80.1624
35
10.668
Commercial
Freighter Fixture
1858
280
85.344
43
13.1064
Military
WWI Destroyer
Treasure
1308
314
95.7072
31
9.4488
PassengerLuxury Motor
Treasure
Yacht
444 162.4 49.49952
26
7.9248
Information Organization and Retrieval
Cause
Date Sunk Comments
Passengers/Crew
Survivors
Condition Graph
Fire
66
66 Broken
Deliberate
1/1/47
0
Scattered
Fire
Intact
Collision
4/16/47
27
27 Intact
Unknown 10/19/22
8
0 Intact
Collision
1/25/35
163
118 Scattered
Military
2/28/42
50
2 Broken
Weather
1/22/42
5
4 Broken
Military
3/13/42
28
1 Intact
Deliberate
1/1/33
0
Intact
Fire
12/17/30
25
25 Intact
Normalization
• Normalization theory is based on the
observation that relations with certain
properties are more effective in inserting,
updating and deleting data than other sets of
relations containing the same data
• Normalization is a multi-step process
beginning with an “unnormalized” relation
– Hospital example from Atre, S. Data Base: Structured Techniques for
Design, Performance, and Management.
9/20/2000
Information Organization and Retrieval
Normal Forms
•
•
•
•
•
•
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form (BCNF)
Fourth Normal Form (4NF)
Fifth Normal Form (5NF)
9/20/2000
Information Organization and Retrieval
Normalization
No transitive
dependency
between
nonkey
attributes
All
determinants
are candidate
keys - Single
multivalued
dependency
9/20/2000
BoyceCodd and
Higher
Information Organization and Retrieval
Functional
dependencyof
nonkey
attributes on
the primary
key - Atomic
values only
Full
Functional
dependencyof
nonkey
attributes on
the primary
key
Unnormalized Relations
• First step in normalization is to convert the
data into a two-dimensional table
• In unnormalized relations data can repeat
within a column
9/20/2000
Information Organization and Retrieval
Unnormalized Relation
Patient #
Surgeon #
145
1111 311
Surg. date
Patient Name
Jan 1,
1995; June
12, 1995
John White
Patient Addr Surgeon
15 New St.
New York,
NY
243
1234 467
2345 189
Jan 8,
1996
Charles Brown
4876 145
Nov 5,
1995
Hal Kane
5123 145
May 10,
1995
Paul Kosher
Charles
Field
10 Main St. Patricia
Rye, NY
Gold
Dogwood
Lane
Harrison,
David
NY
Rosen
55 Boston
Post Road,
Chester,
CN
Beth Little
Blind Brook
Mamaronec
k, NY
Beth Little
6845 243
Apr 5,
1994 Dec
15, 1984
Ann Hood
Hilton Road
Larchmont, Charles
NY
Field
9/20/2000
Postop drug
Drug side effects
Gallstone
s removal;
Beth Little Kidney
Michael
stones
Penicillin,
Diamond removal
none-
Apr 5,
1994 May
10, 1995
Mary Jones
Surgery
Information Organization and Retrieval
rash
none
Eye
Cataract
removal
Thrombos Tetracyclin Fever
is removal e none
none
Open
Heart
Surgery
Cholecyst
ectomy
Gallstone
s
Removal
Eye
Cornea
Replacem
ent Eye
cataract
removal
Cephalosp
orin
none
Demicillin
none
none
none
Tetracyclin
e
Fever
First Normal Form
• To move to First Normal Form a relation
must contain only atomic values at each row
and column.
– No repeating groups
– A column or set of columns is called a
Candidate Key when its values can uniquely
identify the row in the relation.
9/20/2000
Information Organization and Retrieval
First Normal Form
Patient #
Surgeon # Surgery DatePatient Name Patient Addr Surgeon Name
1111
145
01-Jan-95 John White
1111
311
12-Jun-95 John White
15 New St.
New York,
NY
15 New St.
New York,
NY
1234
243
05-Apr-94 Mary Jones
10 Main St.
Rye, NY
1234
467
10-May-95 Mary Jones
2345
4876
5123
6845
6845
9/20/2000
189
145
145
243
243
Charles
08-Jan-96 Brown
10 Main St.
Rye, NY
Dogwood
Lane
Harrison,
NY
05-Nov-95 Hal Kane
55 Boston
Post Road,
Chester,
CN
05-Apr-94 Ann Hood
15-Dec-84 Ann Hood
Hilton Road
Larchmont,
NY
Drug adminSide Effects
Charles Field
Gallstone
s removal
Kidney
stones
removal
Eye
Cataract
removal
Patricia Gold
Thrombos
is removal none
none
David Rosen
Open
Heart
Surgery
none
Beth Little
Cholecyst
ectomy
Demicillin
Beth Little
Michael
Diamond
Blind Brook
Mamaronec
10-May-95 Paul Kosher k, NY
Beth Little
Hilton Road
Larchmont,
NY
Surgery
Penicillin
rash
none
none
Tetracyclin
e
Fever
Cephalosp
orin
Charles Field
Gallstone
s
Removal
none
Eye
Cornea
Replacem Tetracyclin
ent
e
Charles Field
Eye
cataract
removal
Information Organization and Retrieval
none
none
none
Fever
none
1NF Storage Anomalies
• Insertion: A new patient has not yet undergone
surgery -- hence no surgeon # -- Since surgeon # is
part of the key we can’t insert.
• Insertion: If a surgeon is newly hired and hasn’t
operated yet -- there will be no way to include that
person in the database.
• Update: If a patient comes in for a new procedure,
and has moved, we need to change multiple
address entries.
• Deletion (type 1): Deleting a patient record may
also delete all info about a surgeon.
• Deletion (type 2): When there are functional
dependencies (like side effects and drug) changing
one item eliminates other information.
9/20/2000
Information Organization and Retrieval
Second Normal Form
• A relation is said to be in Second Normal
Form when every nonkey attribute is fully
functionally dependent on the primary key.
– That is, every nonkey attribute needs the full
primary key for unique identification
9/20/2000
Information Organization and Retrieval
Second Normal Form
Patient #
1111
1234
2345
4876
5123
6845
9/20/2000
Patient Name Patient Address
15 New St. New
John White York, NY
10 Main St. Rye,
Mary Jones NY
Charles
Dogwood Lane
Brown
Harrison, NY
55 Boston Post
Hal Kane
Road, Chester,
Blind Brook
Paul Kosher Mamaroneck, NY
Hilton Road
Ann Hood
Larchmont, NY
Information Organization and Retrieval
Second Normal Form
Surgeon #
Surgeon Name
145 Beth Little
189 David Rosen
243 Charles Field
311 Michael Diamond
467 Patricia Gold
9/20/2000
Information Organization and Retrieval
Second Normal Form
Patient # Surgeon # Surgery Date
1111
1111
1234
1234
2345
4876
9/20/2000
Surgery
Drug Admin Side Effects
145
Gallstones
01-Jan-95 removal
Kidney
Penicillin
rash
311
stones
12-Jun-95 removal
none
none
243
Eye Cataract
05-Apr-94 removal
Tetracycline Fever
467
Thrombosis
10-May-95 removal
189
Open Heart
08-Jan-96 Surgery
Cephalospori
n
none
145
Cholecystect
05-Nov-95 omy
Demicillin
none
none
none
none
none
5123
145
6845
243
6845
243
Gallstones
10-May-95 Removal
Eye cataract
15-Dec-84 removal
Eye Cornea
05-Apr-94 Replacement
Information Organization and Retrieval
none
none
Tetracycline Fever
1NF Storage Anomalies
Removed
• Insertion: Can now enter new patients without
surgery.
• Insertion: Can now enter Surgeons who haven’t
operated.
• Deletion (type 1): If Charles Brown dies the
corresponding tuples from Patient and Surgery
tables can be deleted without losing information
on David Rosen.
• Update: If John White comes in for third time, and
has moved, we only need to change the Patient
table
9/20/2000
Information Organization and Retrieval
2NF Storage Anomalies
• Insertion: Cannot enter the fact that a particular
drug has a particular side effect unless it is given
to a patient.
• Deletion: If John White receives some other drug
because of the penicillin rash, and a new drug and
side effect are entered, we lose the information
that penicillin can cause a rash
• Update: If drug side effects change (a new
formula) we have to update multiple occurrences
of side effects.
9/20/2000
Information Organization and Retrieval
Third Normal Form
• A relation is said to be in Third Normal Form if
there is no transitive functional dependency
between nonkey attributes
– When one nonkey attribute can be determined with one
or more nonkey attributes there is said to be a transitive
functional dependency.
• The side effect column in the Surgery table is
determined by the drug administered
– Side effect is transitively functionally dependent on
drug so Surgery is not 3NF
9/20/2000
Information Organization and Retrieval
Third Normal Form
Patient # Surgeon # Surgery Date
9/20/2000
Surgery
Drug Admin
1111
145
1111
311
01-Jan-95 Gallstones removal
Kidney stones
12-Jun-95 removal
1234
243
05-Apr-94 Eye Cataract removal Tetracycline
1234
467
10-May-95 Thrombosis removal
2345
189
08-Jan-96 Open Heart Surgery
Cephalosporin
4876
145
05-Nov-95 Cholecystectomy
Demicillin
5123
145
10-May-95 Gallstones Removal
none
6845
243
none
6845
243
15-Dec-84 Eye cataract removal
Eye Cornea
05-Apr-94 Replacement
Information Organization and Retrieval
Penicillin
none
none
Tetracycline
Third Normal Form
Drug Admin
9/20/2000
Side Effects
Cephalosporin
none
Demicillin
none
none
none
Penicillin
rash
Tetracycline
Fever
Information Organization and Retrieval
2NF Storage Anomalies
Removed
• Insertion: We can now enter the fact that a
particular drug has a particular side effect in the
Drug relation.
• Deletion: If John White recieves some other drug
as a result of the rash from penicillin, but the
information on penicillin and rash is maintained.
• Update: The side effects for each drug appear only
once.
9/20/2000
Information Organization and Retrieval
Boyce-Codd Normal Form
• Most 3NF relations are also BCNF
relations.
• A 3NF relation is NOT in BCNF if:
– Candidate keys in the relation are composite
keys (they are not single attributes)
– There is more than one candidate key in the
relation, and
– The keys are not disjoint, that is, some
attributes in the keys are common
9/20/2000
Information Organization and Retrieval
Most 3NF Relations are also
BCNF – Is this one?
Patient #
1111
1234
2345
4876
5123
6845
9/20/2000
Patient Name Patient Address
15 New St. New
John White York, NY
10 Main St. Rye,
Mary Jones NY
Charles
Dogwood Lane
Brown
Harrison, NY
55 Boston Post
Hal Kane
Road, Chester,
Blind Brook
Paul Kosher Mamaroneck, NY
Hilton Road
Ann Hood
Larchmont, NY
Information Organization and Retrieval
BCNF Relations
Patient # Patient Name
9/20/2000
Patient #
1111 John White
1111
1234 Mary Jones
Charles
2345 Brown
1234
4876 Hal Kane
4876
5123 Paul Kosher
5123
6845 Ann Hood
6845
Information Organization and Retrieval
2345
Patient Address
15 New St. New
York, NY
10 Main St. Rye,
NY
Dogwood Lane
Harrison, NY
55 Boston Post
Road, Chester,
Blind Brook
Mamaroneck, NY
Hilton Road
Larchmont, NY
Fourth Normal Form
• Any relation is in Fourth Normal Form if it
is BCNF and any multivalued dependencies
are trivial
• Eliminate non-trivial multivalued
dependencies by projecting into simpler
tables
9/20/2000
Information Organization and Retrieval
Fifth Normal Form
• A relation is in 5NF if every join
dependency in the relation is implied by the
keys of the relation
• Implies that relations that have been
decomposed in previous NF can be
recombined via natural joins to recreate the
original relation.
9/20/2000
Information Organization and Retrieval
Effectiveness and Efficiency
Issues for DBMS
• Focus on the relational model
• Any column in a relational database can be
searched for values.
• To improve efficiency indexes using storage
structures such as BTrees and Hashing are used
• But many useful functions are not indexable and
require complete scans of the the database
9/20/2000
Information Organization and Retrieval
Example: Text Fields
• In conventional RDBMS, when a text field
is indexed, only exact matching of the text
field contents (or Greater-than and Lessthan).
– Can search for individual words using pattern
matching, but a full scan is required.
• Text searching is still done best (and fastest)
by specialized text search programs (Search
Engines) that we will look at more later.
9/20/2000
Information Organization and Retrieval
Normalizing to death
• Normalization splits database information
across multiple tables.
• To retrieve complete information from a
normalized database, the JOIN operation
must be used.
• JOIN tends to be expensive in terms of
processing time, and very large joins are
very expensive.
9/20/2000
Information Organization and Retrieval
Advantages of RDBMS
• Possible to design complex data storage and
retrieval systems with ease (and without
conventional programming).
• Support for ACID transactions
–
–
–
–
9/20/2000
Atomic
Consistent
Independent
Durable
Information Organization and Retrieval
Advantages of RDBMS
• Support for very large databases
• Automatic optimization of searching (when
possible)
• RDBMS have a simple view of the
database that conforms to much of the data
used in businesses.
• Standard query language (SQL)
9/20/2000
Information Organization and Retrieval
Disadvantages of RDBMS
• Until recently, no support for complex objects
such as documents, video, images, spatial or timeseries data. (ORDBMS are adding support these).
• Often poor support for storage of complex objects.
(Disassembling the car to park it in the garage)
• Still no efficient and effective integrated support
for things like text searching within fields.
9/20/2000
Information Organization and Retrieval
Assignment 2
•
The following information should be turned in for the
preliminary design of your personal database project.
1.
2.
3.
•
9/20/2000
A general description of the data you will be using for the
database, and what uses you might expect the database to have
(should be expanded from the previous assignment).
A preliminary data dictionary for the files and data elements of
the database. You should have at least 5 files with some logical
connections between them. The data dictionary consists of all of
the attributes that you have identified for each entity, along with
indication of whether the attribute is a primary key (or part of a
primary key), and what format the data will be (e.g.: text,
decimal number, integer, etc.)
Produce an entity-relationship diagram of the database OR a
UML diagram.
These will be preliminary design specifications, so do not
feel that you must follow everything that you describe
here in the final database design.
Information Organization and Retrieval