Transcript ppt

Database Design
Relational Database
Relational Database

Before
File system
• organized data
Hierarchical and Network database
• data + metadata + data structure  database
• addressed limitations of file system
• tied to complex physical structure.

After
Conceptual simplicity
• store a collection of related entities in a “relational” table
Focus on logical representation (human view of data)
• how data are physically stored is no longer an issue
Database  RDBMS  application
• conducive to more effective design strategies
Database System
2
Logical View of Data

Entity
a person, place, event, or thing about which data is collected.
• e.g. a student

Entity Set
a collection of entities that share common characteristics
named to reflect its content
• e.g. STUDENT

Attributes
characteristics of the entity.
• e.g. student number, name, birthdate
named to reflect its content
• e.g. STU_NUM, STU_NAME, STU_DOB

Tables
contains a group of related entities or entity set
2-dimensional structure composed of rows and columns
also called relations
Database System
3
Relational DB Table: Characteristics

2-dimensional structure with rows & columns
Rows (tuples)
•
•
•
Represent attributes
Have a specific range of values
→

column 열 (attribute, field)
Represent single entity occurrence
Columns
2차원 구조
row 행
(tuple, record)
attribute domain
Each table must have a primary key 기본키
Primary key is an attribute (or a combination of attributes) that
uniquely identify each row

Relational database vs. File system terminology
Rows == Records, Columns == Fields, Tables == Files
Database Design
4
Table Characteristics

Table and Column names
Max. 8 & 10 characters in older DBMS
Cannot use special charcters (e.g. */.)
Use descriptive names (e.g. STUDENT, STU_DOB)

Column characteristics
Data type
• number, character, date, logical (Boolean)
Format
• 999.99, Xxxxxx, mm-dd-yy, Yes/No
Range
• 0-4, 35-65, {A,B,C,D}
Database System
5
Relational DB Table: Example


8 rows & 9 columns
Row = single entity occurrence


row 1 describes a student named Jone Doe
Column = an attribute

has specific characteristics (data type, format, value range)




stClass: char(2), {fr,jr,so,sr}
all values adhere to the attribute characteristics
Each row/column intersection contains a single data value
Primary key = stID
Database Design
6
Table: Keys

Consists of one or more attributes that determine other attributes
 Given the value of a key, you can look up (determine) the value of other attributes
e.g., student_ID  student’s name, major, status, grade, etc.
Composite key: Composed of more than one attribute
e.g., building name + room number  location, size, function/purpose, etc.
Key attribute: Any attribute that is part of a key
e.g., building name, room number

Superkey
any key that uniquely identifies each row

Candidate key
후보키
Any key that uniquely identifies each row (without redundancies)

Primary Key (PK)
기본키
The candidate key selected as the unique identifier

Foreign Key (FK)
외래키
An attribute whose values match the primary key values in a related table
Joins tables to derive information

Secondary Key
facilitates querying of the database
restrictive secondary key  narrow search result (e.g. STU_LNAME vs. STU_DOB)
Database Design
7
Table: Keys

Superkey

attribute(s) that uniquely identifies each row


Candidate Key

minimal superkey


candidate key selected as the unique identifier

STU_ID
Foreign Key

primary key from another table


STU_ID; STU_SSN; STU_DOB + STU_LNAME + STU_FNAME?
Primary Key


STU_ID; STU_SSN; STU_ID + any; STU_SSN + any; STU_DOB + STU_LNAME + STU_FNAME?
DEPT_CODE
Secondary Key

attribute(s) used for data retrieval

STU_LNAME + STU_DOB
STU_ID
STU_SSN
STU_DOB
STU_LNAME
STU_FNAME
DEPT_CODE
DEPT_CODE
12345
111-11-1111
12/12/1985
Doe
John
245
243
Astronomy
12346
222-22-2222
10/10/1985
Dew
John
243
245
Computer Science
12348
123-45-6789
11/11/1982
Dew
Jane
423
423
Sociology
Database Design
DEPT_NAME
8
Integrity Rules

Entity Integrity
개체 무결성
Ensures uniqueness of entities
• Primary key values must be unique and not empty
 e.g., no department can have duplicate or null DEPT_CODE

Referential Integrity 참조 무결성
Prevents invalid data entry
Foreign key value is null or matches primary key values in related table
 i.e., foreign key cannot contain values that does not exist in the related table.

Most RDBMS enforce integrity rules automatically.
STU_ID
STU_LNAME
STU_FNAME
DEPT_CODE
DEPT_CODE
DEPT_NAME
12345
Doe
John
245
243
Astronomy
12346
Dew
John
243
244
Computer Science
22134
Dew
James
245
Sociology
23456
Doe
Jane
243 246
Physics
Database Design
249
9
Example: Simple RDB
Database Systems: Design, Implementation, & Management: Rob & Coronel
Database System
10
Relationships in RDB

Representation of relationships among entities
개체간의 관계 표현
By shared attributes between tables (RDB model)
•
primary key  foreign key
E-R model provides a simplified picture

EMPLOYEE
One-to-One (1:1)
Could be due to improper data modeling
•
PILOT
MECHANIC
e.g. PILOT (id, name, dob) to EMPLOYEE (id, name, dob)
Commonly used to represent entity with uncommon attributes
•

e.g. PILOT (id, license) & MECHANIC (id, certificate) to EMPLOYEE (id, name, dob, title)
One-to-Many (1:M)
Most common relationship in RDB
Primary key of the One should be the foreign key in the Many

Many-to-Many (M:N)
Should not be accommodated in RDB directly
Implement by breaking it into a set of 1:M relationships
•
Database Design
Create a composite/bridge entity
11
M:N to 1:M Conversion
Database Systems: Design, Implementation, & Management: Rob & Coronel
Database Design
12
M:N to 1:M Conversion
CLASS
STUDENT
CLS_ID
CLS_ID
CRS_Name
Room
STU_ID
M
IT-s16
IT-s16
Web Authoring
403
1234
John Doe
M
DB-s16
IT-s16
Web Authoring
403
2345
2345
Jane Doe
F
IT-s16
DB-s16
Database
421
1234
2345
Jane Doe
F
DB-s16
DB-s16
Database
421
2345
3456
GI Joe
M
DB-s16
DB-s16
Database
421
3456
STU_ID
STU_Name
1234
John Doe
1234
Sex
STU_ID
CLS_ID
grade
CLS_ID
CRS_Name
Room
M
1234
IT-s16
B
IT-s16
Web Authoring
403
Jane Doe
F
1234
DB-s16
C
DB-s16
Database
421
GI Joe
M
2345
IT-s16
A
2345
DB-s16
A
3456
DB-s16
A
STU_ID
STU_Name
1234
John Doe
2345
3456
Sex
STUDENT
CLASS
ENROLL
Composite Table:
• Must contain at least the primary keys of original tables
Database Design
 Contains multiple occurrences of the foreign key values
• Additional attributes may be assigned as needed
13
Data Redundancy

Uncontrolled Redundancy

Kiduk Yang’s
Account Balance?
불필요한 중복
Unnecessary duplication of data
Repeated attribute values  Normalize (e.g., M:N to 1:M conversion)
 Derived attributes  Compute as needed


Controlled Redundancy


CUST_ID = KY123
필요한 중복
Shared attributes in multiple tables

CUSTOMER
INVOICE
Makes RDB work (e.g. foreign key)
For information requirements or transaction speed

e.g. INV_Price records historical product price
 e.g. Account Balance = account receivable - payments
PAYMENT
15/11/01 $280
15/11/15 $120
15/12/24 $ 80
16/01/01 $100
280 + 120 + 80 - 100 = $380
PRODUCT
INVOICE
PRD_ID
PRD_Name
C1234
Chainsaw
H2341
Hammer
Database Design
PRD_Price
INV_ID
PRD_ID
Date
INV_Price
CUST_ID
$100
121
C1234
2015/12/24
$80
KY123
$10
122
H2341
2015/12/25
$5
JJ122
123
C1234
2016/01/11
$100
SH002
14
Data Integrity

Nulls

No data entry

a “not applicable” condition



an unknown attribute value




uncollected data
e.g., date of hospitalization, cause of death
Can create problems


non-obtainable data
e.g., birthdate of John Doe
a known, but missing, attribute value


non-existing data
e.g., middle initial, fax number
when functions such as COUNT, AVERAGE, and SUM are used
Not permitted in primary key

Database System
should be avoided in other attributes
15
Indexes

Composed of an index key and a set of pointers
Points to data location (e.g. table rows)
Makes retrieval of data faster
each index is associated with only one table
MOVIE_ID
MOVIE_NAME
ACTOR_ID
1
231
Rebel without Cause
12
23
2
352
Twelve Angry Men
23
34
3
455
Godfather 2
34
4
460
Godfather II
34
5
625
On Golden Pond
23
ACTOR_NAME
ACTOR_ID
James Dean
12
Henry Fonda
Robert DeNiro
index key
(ACTOR_ID)
pointers
12
1
23
2, 5
34
3, 4
Database System
16
Data Dictionary & Schema

Data Dictionary
Detailed description of a data model
• for each table in a database
→
list all the attributes & their characteristics
e.g. name, data type, format, range
→
identify primary and foreign keys
Human view of entities, attributes, and relationships
• Blueprint & documentation of a database
→

design & communication tool
Relational Schema
Specification of the overall structure/organization of a database
• e.g. visualization of a structure
Shows all the entities and relationships among them
• tables w/ attributes
• relationships (linked attributes)
→
primary key  foreign key
• relationship type
→
Database System
1:M, M:N, 1:1
17
Data Dictionary

Lists attribute names and characteristics for each table in the database
record of design decisions and blueprint for implementation
Database
Systems: Design, Implementation, & Management: Rob & Coronel
Database
System
18
Relational Schema

A diagram of linked tables w/ attributes
Database Systems: Design, Implementation, & Management: Rob & Coronel
Database System
19
Exercises
Data Modeling Exercises
1.
Draw an E-R Diagram of the data model described by the business rules below.

A sales rep can write many invoices. Each invoice is written by a single sales rep.
A customer can generate many invoices. Each invoice is generated by a single customer.
Each invoice includes one or more invoice lines. Each invoice line is associated with one invoice.
- i.e., Each item purchased is recorded in an invoice line.
Each invoice line records a single product. Each product can be recorded in many invoice lines.
A customer can make many payments. Each payment is made by a single customer.
A vendor supplies many products. A product is supplied by many vendors.





M
1
M
generates
INVOICE
1
M
makes
CUSTOMER
PAYMENT
1
writes
1
SALESREP
Database Design
includes
M
INV_LINE
M
1
1
is in
PRODUCT
M
M
SUPPLY
1
VENDOR
21
Database Design Exercises
2.
Identify the business rules and draw an E-R diagram of the company described below.
Yanghoo is a consulting company with multiple departments and many employees.
Each department, run by a manager, has several employees who work on multiple projects.
Business Rules
 A department employs many employees.

Each employee works in one department.
 An employee may work on many projects.
 A project may have many employees working on it.
 A department is managed by one employee.
 An employee manages one department.
DEPARTMENT
1
1
employs
manages
M
M
1
EMPLOYEE
Database Design
N
works on
PROJECT
22
Database Design Exercises
3.
What problem would you encounter if you wanted to produce a listing by city using the table
below? How would you solve this problem by altering the file structure?.
The city names are contained within the address attribute
→ Decomposing this field at the application level is inefficient (slows down DB execution)
address
Street
City
State
Country
777 Bonham Ct., Durham, NC, USA
777 Bonham Ct
Durham
NC
USA
21 Jump St., Boston, MA
21 Jump St.
Boston
MA
132 Queen St., London, England
132 Queen St.
London
England
1431-C Broad Ave., Berlin, Germany
1431-C Broad Ave.
Berlin
Germany
3333 Tao St., Shanghai, China
3333 Tao St.
Shanghai
China
Database Design
23
Database Design Exercises
4.
Below is an example of the CLASS entity implemented in MS Access.
Do you see a problem? If so, how would you refine your data model?
Sample DB
There is a data redundancy problem (e.g., Class Name, Code, etc.)
→ Can lead to data anomalies (i.e., Update/Insertion/Deletion Anomaly)
M
PROFESSOR
N
teaches
1
COURSE
M
has
CLASS
N
ERD Refinement #1
Sample DB
takes
 Separate the course information into a COURSE entity.
 A professor can teach many courses. A course can be taught by many professors
 A course can consist of many classes. A class belongs to one course
 A student can enroll in many classes. Each class can have many students.
Database Design
M
STUDENT
24
Database Design Exercises
4.
Below is an example of the CLASS entity implemented in MS Access.
Do you see a problem? If so, how would you refine your data model?
ERD Refinement #2
 Decompose many-to-many relationship
M
N
PROFESSOR
M
1
COURSE
teaches
TEACH
1
Sample DB
M
1
M
has
CLASS
N
1
M
takes
ENROLL
M
M
1
STUDENT
Database Design
25
Database Design Exercises
4.
The E-R diagram below, which models the course enrollment in a college, has a serious flaw.
What is the problem and how can it be fixed? Show a revised E-R diagram.
PROFESSOR
1
M
teaches
TEACH
M
1
1
M
COURSE
has
CLASS
1
M
ERD misses the relationship between PROFESSOR and CLASS
→ does not record who taught the specific classes
takes
ENROLL
M
ERD Refinement #3
 Relate PROFESSOR to CLASS
Sample DB
1
STUDENT
COURSE
1
has
M
CLASS
M
teach
1
PROFESSOR
1
M
ENROLL
M
1
STUDENT
Database Design
26