Transcript Slide 1
Overview of Database Design
By
Nazife Dimililer
Database Management System
A DBMS is a data storage and retrieval system
which permits data to be stored nonredundantly while making it appear to the
user as if the data is well-integrated.
Database Management System
Application
#1
Application
#2
Application
#3
DBMS
DBMS manages data
resources like an operating
system manages hardware
resources
Database
containing
centralized
shared data
Advantages of Database Approach
• Program-Data Independence
– Metadata stored in DBMS, so applications don’t need to worry
about data formats
– Data queries/updates managed by DBMS so programs don’t
need to process data access routines
– Results in: increased application development and maintenance
productivity
• Minimal Data Redundancy
– Leads to increased data integrity/consistency
Advantages of Database Approach
• Improved Data Sharing
– Different users get different views of the data
• Enforcement of Standards
– All data access is done in the same way
• Improved Data Quality
– Constraints, data validation rules
• Better Data Accessibility/ Responsiveness
– Use of standard data query language (SQL)
• Security, Backup/Recovery, Concurrency
– Disaster recovery is easier
Costs and Risks of the
Database Approach
• Up-front costs:
– Installation Management Cost and Complexity
– Conversion Costs
• Ongoing Costs
– Requires New, Specialized Personnel
– Need for Explicit Backup and Recovery
• Organizational Conflict
– Old habits die hard
The Range of
Database Applications
• Personal Database – standalone desktop
database
• Workgroup Database – local area network
(<25 users)
• Department Database – local area network
(25-100 users)
• Enterprise Database – wide-area network
(hundreds or thousands of users)
Evolution of DB Systems
•
•
•
•
•
•
•
•
Flat files - 1960s - 1980s
Hierarchical – 1970s - 1990s
Network – 1970s - 1990s
Relational – 1980s - present
Object-oriented – 1990s - present
Object-relational – 1990s - present
Data warehousing – 1980s - present
Web-enabled – 1990s - present
Database Design Phases
• Conceptual Design
Model the data without any physical considerations for each
user view.
• Logical Design
Choose the data model that will be used and modify the
conceptual data model to fit the data model without any
other physical considerations. Validate the model using
normalization and transaction requirements.
• Physical Design
Choose the actual DBMS and implement the data model
efficiently. Performance, security and reliability are key issues.
Systems Development Life Cycle
Project Identification
and Selection
Project Initiation
and Planning
Analysis
Logical Design
Physical Design
Implementation
Maintenance
Systems Development Life Cycle
Project Identification
and Selection
Purpose --preliminary understanding
Deliverable –request for project
Project Initiation
and Planning
Analysis
Database activity –
enterprise modeling
Logical Design
First step in database development
Specifies scope and general content
Overall picture of organizational data, not
specific design
Entity-relationship diagram
Descriptions of entity types
Relationships between entities
Business rules
Physical Design
Implementation
Maintenance
Systems Development Life Cycle
Project Identification
and Selection
Purpose – state business situation and solution
Deliverable – request for analysis
Project Initiation
and Planning
Analysis
Logical Design
Physical Design
Database activity –
conceptual data modeling
Implementation
Maintenance
Systems Development Life Cycle
Project Identification
and Selection
Purpose –thorough analysis
Deliverable – functional system specifications
Project Initiation
and Planning
Analysis
Logical Design
Physical Design
Database activity –
conceptual data modeling
Implementation
Maintenance
Systems Development Life Cycle
Project Identification
and Selection
Project Initiation
and Planning
Purpose –information requirements structure
Deliverable – detailed design specifications
Analysis
Logical Design
Physical Design
Database activity –
logical database design
Implementation
Maintenance
Systems Development Life Cycle
Purpose –develop technology specs
Deliverable – program/data
structures, technology purchases,
organization redesigns
Project Identification
and Selection
Project Initiation
and Planning
Analysis
Logical Design
Physical Design
Database activity –
physical database design
Implementation
Maintenance
Systems Development Life Cycle
Purpose –programming, testing, training,
installation, documenting
Deliverable – operational programs,
documentation, training materials
Project Identification
and Selection
Project Initiation
and Planning
Analysis
Logical Design
Physical Design
Database activity –
database implementation
Implementation
Maintenance
Systems Development Life Cycle
Project Identification
and Selection
Purpose –monitor, repair, enhance
Deliverable – periodic audits
Project Initiation
and Planning
Analysis
Logical Design
Physical Design
Database activity –
database maintenance
Implementation
Maintenance
Simplified Database Development
Procedure
Start
Draw ERD
Convert to
Relational
Schema
Validate using
Normalization
Validate against
user
transactions
Stop
Documentation
Entity Document
Entity Name
Description
Aliases
Occurrence
Name of entity
A short
Description of
entity
Other names the
users used to
refer to this
entity
A common
Situation
where this
entity can be
found
Instructor
Employees
teaching courses
Lecturer,
professor
Instructors work
in departments
Documentation
Relationship Document
Entity Type
Relationship Type
Entity Type
Cardinality
Participation
(Optionality)
Name Of
participating
Entity : Entity
A
Name of
Name of
participating
Entity : Entity
B
Cardinality
from
Entity A to
Entity B
1:1
1:M
M:1
Participation constraints
on the relationship
from Entity A to
Entity B
(Optionalities)
Full (F) : Manadatory
Entity (min>0)
Partial (P) : Optional
Entity (min=0)
Instructor
workFor
Department
M:1
P:F
Relationship
Documentation
Attribute Document
Entity
Names of
Attributes
Description
Data type and
length
Constraint
Name of Entity
List of all
attributes of
the entity
Description of
each
attribute
Data type of
each attribute.
It is possible to
use domain
names you
have described
in the domain
document
Primary ,
Unique and
Secondary Key.
(Secondary
Keys are used
to search for
the entity)
Student
Student Id
Uniquely
identifies a
student.
6 fixed character
Primary Key
Name
Full name of
student
50 variable
character
Secondary Index
Gender
Gender of
student
1 fixed character
Documentation
Attribute Document Continued
Names of
Attributes
Default Value
Alias
Null Value?
(Yes or No)
Derived?
List of all
attributes
of the
entity
Default value
for attributes
Other
names, the
users used
for the
attribute
Yes : Null values are
allowed
No: Null values are
not allowed
Yes: It is derived
No: It is not
derived
Student Id
No
Name
No
Gender
cgpa
‘F’
Sex
Yes
Cumulative
grade
Yes
Yes
Documentation
Attribute Domain Document
Domain Name
Domain Characteristics
Examples of allowed values
Name of Domain for attributes
Description of domain
Illustrative examples
Cgpa domain
3 digit floating point
between 0.00 and 4.00
3.33, 4.00
Gender
1 character string (‘F’ or
‘M’)
M, F
APPLY NORMALIZATION
If you normalized your database design, you can be more
confident that there will be no redundancy in the final
database.
There are three basic checks that most relational database
designers carry out.
•1NF REMOVE REPEATING GROUPS
•2NF REMOVE PARTIAL DEPENDENCIES
•3NF REMOVE NON-KEY DEPENDENCIES
USEFUL DESIGN PRINCIPLES
1-Faitfulness
Entity sets, their attributes and relationships should
reflect reality
•Don’t attach pointless attributes.
Employee
Number-of-legs
•Relationships between attributes depend on the policy of
organization.
employee
employee
works
works
department
department
USEFUL DESIGN PRINCIPLES
2- Avoid Redundancy
id
Student
name
Advisor
advisorname
name
3- Simplicity counts
Avoid introducing more elements into your design
Than is absolutely necessary.
id
Student
name
kindergarten
highschool
id
USEFUL DESIGN PRINCIPLES
4- Choosing the right relationship
Adding to our design every possible relationship is
not often a good idea.
• It can cause to redundancy.
•Resulting database could require more space to
store redundant elements.
•Modifying the database could become more
complex.
USEFUL DESIGN PRINCIPLES
5- Picking the right kind of element.
title
studioname
movie
studioaddress
year
movie
title
year
Repeat studio name and address
for each movie
If studio doesn't have any movie,
we lost the address of the movie
studio
filmedBy
studioname
studioaddress
USEFUL DESIGN PRINCIPLES
5- Picking the right kind of element.
Example 1: Using
Multiway relationship
Example 1: If entity set has only one
attribute and relationship is 1:N
relationship.
f
E
E
K
F
e
e
K
e
k
k
K
k
e
f
k
F
K
Helpful pointers
• Transform “complex” attributes to entities.
ServiceRecord
ServiceRecord
ServiceDate
EquipmentNo
EmployeeName
Description
ServiceDate
EquipmentNo
EmployeeNo
Description
Employee
performedBy
EmployeeId
FirstName
LastName
Hiredate
Helpful pointers
• Use lookup entities(tables) for frequently used data.
s tude nt
Id
Name
Country
Em ploye e
Id
Name
Salary
Country
student
Id
Name
CountryCode
Em ployee
Id
Name
Salary
CountryCode
sresident
country
code
Name
eresident
Helpful pointers
• Split compound attributes
custom er
custom er
customerId
Name
Address
customerId
Name
StreetAddress
City
Country
Helpful pointers
• Transform weak entities to strong entities
Invoice
InvoiceNo
InvoiceDate
CustomerId
payments
Invoice
InvoiceNo
InvoiceDate
CustomerId
payments
Ins tallm e nt
Ins tallm e nt
InvoiceNo
InstallmentDate
Amount
InstallmentId
InvoiceNo
InstallmentDate
Amount
Helpful pointers
• Add History
Instructor
id
Name
Instructor
id
Name
Title
TitleChanges
InstructorTitle
ChangeDate
Title
Some helpful pointers
• Use consistent naming rules for all
entities,relationships and attributes
• Choose primary keys intelligently.
Primary keys should NOT change over time.
• Choose appropriate data types for attributes
Intelligent vs Surrogate Keys
• A surrogate key is an artificial or synthetic key that is used as a
substitute for a natural key aka intelligent key.
• "Surrogate key" may also be known as "System-generated
key", "Database Sequence number", "Synthetic key",
Data
Models unique
that useidentifier".
surrogate keys
"Technical key" or an
"Arbitrary,
have more normalization
• primary keys are hardusually
to change.
errors.because not only are
• Intelligent keys suffer from this problem
they used as primary and foreign keys but they also have
some business meaning associated with them
• The biggest advantage for intelligent keys is that users
understand what they mean whereas surrogate keys don't
make any business sense.
Surrogate vs. Intelligent Keys
Natural keys:
• are more logical
• can sometimes can mean
fewer joins
• help to encourage good
modeling
• are traditional/user friendly
• make snooping around in
the data easier
Surrogate keys:
• are shorter
• are easier to join
• take less storage
• enable natural key fields to
be easily changed
• are what Object Oriented
(and object relational)
databases use
Some helpful pointers : Physical Database
Design
• Purpose - translate the logical description of
data into the technical specifications for
storing and retrieving data
• Goal - create a design for storing data that will
provide adequate performance and insure
database integrity, security and recoverability
Some helpful pointers : Physical
Design Process
Inputs
Normalized
Volume
Decisions
relations
Attribute data types
estimates
Physical record descriptions
Attribute definitions
Response time
Data
expectations
security needs
Backup/recovery needs
Integrity expectations
DBMS
(doesn’t always match logical
design)
technology used
Leads to
File
organizations
Indexes and
database
architectures
Query optimization
Some helpful pointers : Designing
Fields
• Field: smallest unit of data in
database
• Field design
–Choosing data type
–Coding, compression, encryption
–Controlling data integrity
Some helpful pointers : Field Data
Integrity
•
•
•
•
Default value - assumed value if no explicit value
Range control – allowable value limitations
(constraints or validation rules)
Null value control – allowing or prohibiting empty
fields
Referential integrity – range control (and null value
allowances) for foreign-key to primary-key matchups
Some Helpful Pointers :
Denormalization
• Transforming normalized relations into unnormalized physical
record specifications
• Benefits:
– Can improve performance (speed) be reducing number of table lookups
(i.e reduce number of necessary join queries)
• Costs (due to data duplication)
– Wasted storage space
– Data integrity/consistency threats
• Common denormalization opportunities
– One-to-one relationship
– Many-to-many relationship with attributes
– Reference data (1:N relationship where 1-side has data not used in any
other relationship)
Common Design problems
•
•
•
•
Misplaced relationships
Incorrect Cardinalities
Missing Relationships
Overuse of specialized data modeling tools
(ex: Inheritance, multiway relationships)
• Redundant Relationships
Goals of Database Development
•
•
•
•
Develop a Common Vocabulary
Define the meaning of Data
Ensure Data Quality
Find an Efficient Implementation
Final Word
• Remember that the goal of the DB
development is to produce a DB that provides
an important resource for an organization.
• The DB should be designed so that it can serve
the customers and other team members
efficiently.