Transcript (tables).

Chapter 3
Database and Data Warehouse
Module B: Designing and Building a
Relational Database
I. The Hierarchy of Data
•
•
•
•
Database
p.84-85
File (for Entity Class, Table p.97)
Record (for instance of entity class, Row p.97)
Field (Attribute, Column)
– Data Types (only one type for a field)
• Primary Key
p.97
– Atomic primary key
– Composite primary key
2
II. Database and DBMS
•Database definition
p.86
A collection of information that you organize
and access according to the logical structure of
that information:
- centralizing and logically relating data,
- minimizing redundancy, and
- being independent from application programs.
3
• Important Characteristics of a Database p.87-89
– A collection of information: logically related tables.
– A logical structure: It’s recorded in data dictionary,
e.g., the Design View for Table in Access. A data
dictionary is a file that stores definitions of data
elements, such as field name and record that
contains this field, data type, format, description,
ownership, security and so on.
– Logical Tie among Information: Relationships
among tables. Two table are connected by common
field(s) that are the primary key of one of those
tables. Primary key is a field or field(s) that
uniquely identify a record in a table. Foreign key is
a primary key of one table that also appears in
another table as common field(s).
(continued)
4
• Important Characteristics of a Database (continued)
– Data Integrity Constraints: rules that help ensure
the quality of the information. For example,
– Data entry error prevention: Validation Rule in
Design View of Access Table.
– Referential integrity: Enforce referential integrity
for Relationships of tables in Access database.
For common fields between two tables, new
value(s) must be first assigned to the primary
key and then can be used in another table as a
value(s) of foreign key (p.89). You have to delete
records in the reverse order.
5
• Five Components of a DBMS
p.90 Fig. 3.6
– DBMS engine: It accepts logical requests from other
DBMS subsystems and converts them into physical
equivalent to actually accesses database.
– Data definition language: specify logical structure
of database, e.g., Design View of Table in Access..
– Data manipulation language: an interface helps you
add, change, delete and query information in a
database. For example, Datasheet View of Table,
Report generator, and QBE (Query-by-Example)
and SQL (Structured Query Language) for
extracting information to answer certain questions.
(continued)
6
• Five Components of a DBMS
(continued)
– Application generator: It enables you to create
applications without writing any programs. For
Access database applications, you can create tables,
queries, reports, forms (p.94 Fig. 3.12), even macros
without writing any programming commands.
– Data administration subsystem:
p.94
– Backup and recovery
– Security control
– Query optimization:
minimize response times
– Reorganization: maintain statistics to increase speed and
performance of physical access.
– Concurrency control
– Change management: see the impact of structural change.
For example, relationships of tables in Access.
7
What are Differences between Spreadsheets
and DBMS?
• Linkage between elements
– Spreadsheet between cells in same table
– DBMS between elements in different tables
• Orientation
– Spreadsheet is toward calculations
– DBMS is toward organization and linkage of data
elements
• Control
– DBMS has stronger control for data security and
integrity.
8
– Spreadsheet is limited.
III. Database Models
p.96
• Four database models: hierarchical, network,
relational, and object-oriented database model.
• Relational Database Model: Most popular database
model. Data are stored in multiple two-dimensional
tables.
– A table represents information of for an entity class.
– A record stores information of an instance.
– A piece of information from one table can be related
to a piece of information from another table if they
have common field(s) (primary key and foreign key,
p. 98 Fig. 3.14).
9
What are the four important goals for relational database
design?
• Have the capability of storing all pertinent data in the database.
– All pertinent data can be found in the database.
– First determination of all the attributes (fields).
– Then, how many relations (tables).
– How many databases.
• Eliminate redundant data.
– Splitting relations (tables) is a standard design procedure to eliminate
redundantly duplicated data.
• Keep the number of relations (tables) to a minimum.
– Too many relations (tables) in the database may make it
cumbersome to use from a user’s standpoint.
• Have the relations (tables) normalized.
– Splitting relations (tables) following specific procedure to eliminate
deletion and update problems.
*: Glenn A. Jackson, 1988, “Relational Database Design with Microcomputer
Applications”
10
III. Database Models
(continued)
• Object Oriented Model: (p.99)
– An object combines information (attributes) of an
entity class and procedures (methods) that can act on
the information.
P.99
– Coupling (P.99 Fig. 4.15): Object “Customer” is
coupled to object “Video Rental” if and only if
object “Customer” sends a message to object “Video
Rental”.
– Emphasis on reuse:
p.100
– quality
– productivity
– shift from “creating” software to “assembling”
11
software
IV. Develop Your Own Relational Database
1. The Four-Step Procedure for Database Design p.118
• Defining entity classes and primary keys: Tables
• Defining relationships among entity classes: (Fig. B.1)
– One-to-Many (1:M): “An employee must be
assigned to one and only one department while one
department can have more than one employee.”
Employee is on “Many side” and Department is on
“One side”.
– Many-to-Many (M:M): An employee can be
assigned to more than one job while one job can be
assigned to more than one employee.” It should be
split into two one-to-many relationships (Fig. 4.18).
– Drop E-R diagram, use Standard Notation (see
project demonstration).
12
1. The Four-Step Procedure for Database Design
(continued)
• Defining data fields for each table: Design View of
Access table.
– What information is for an entity class?
– Specify properties for error prevention.
• Using a data definition language to create the
database: from paper (standard notation) to Access
table.
13
2. Three Normalization Rules
p.120
Purpose: to reduce data redundancy.
• Rule 1. Eliminating repeating groups or many-to-many
relationship (Design steps 1 & 2)
– Split table: one table for one entity
– Split many-to-many relationship into two one-tomany relationships.
• Rule 2. Each field in a table depends only on the
primary key and entire primary key (composite
primary key p.97) (Design step 3)
• Rule 3. Remove all derived fields (you can later use
calculated fields in Access) p.123
14
3. Design a query
(Access Skill Set 1 & Exercise)
• QBE (Query By Example) vs. SQL:
– QBE is part of application generator. It enables
you to generate queries without writing any
programming commands
– SQL is a query language for programming.
– QBE is easier to use. SQL is more flexible and
powerful. Managers may prefer to use QBE.
15
•
To design a query, you need to answer four
questions:
– What output do you want to see? (fields to
show, how to sort)
– What tables are involved?
– What constraints (criteria) are given?
– How are the tables joined together?
(common field(s))
Exercise:
Work on
16
4. Standard Notation (Drop E-R diagram)
(Access Database Project Demonstration)
Department(Dept, Dept Sup)
1

Employee(Emp ID, Name, Dept)
1

Employee-Job(Emp ID, Job, Hours)

1
Job(Job, Job Name)
17
V. Microsoft Access
• Seven objects of Access database: table, query,
report, form, page, macro, VB module and Page.
– Table: stores data for an entity (e.g., customer,
order, loan, etc.) in rows (records) and columns
(fields). A record includes all information
about a specific entity. A field is a piece of
information in a record.
– Query: answers questions by retrieving certain
fields of certain records from one or more
tables that satisfy a set of criteria.
18
– Form: A form provides a user-friendly interface to
enter, maintain and view records, and it is based
on a table or a query.
– Report: It presents data from tables or queries in
the way that you want.
– Macro: a set of Access macro commands that
automates tasks.
– Module: a program written in Visual Basic to
automate tasks.
– Page: publish database table or query as a Web
page.
19
VI. Data Warehouse and Data Mining p.100
Three Key Features of Data Warehouse
• Data warehouses combine information from
different databases (compatibility p.101 Fig. 3.16).
• Data warehouses are multidimensional (Hypercube
p.102 Fig. 3.17) while the relational database is
represented in a series of two-dimensional tables.
• Data warehouses support decision making: Unlike
most databases, data warehouses are not
transaction-oriented. Data warehouses support
online analytical processing (OLAP vs. OLTP p.102).
20
What is Data Mining?
• Data mining uses various tools and algorithms to
search large data sets for “hidden” or “buried”
relationship not yet identified by management.
• Data Mining Tools
–
–
–
–
Regression for relationship
Cluster analysis for distinct statistical segments
Neural network for a predictive pattern.
More
* Indiana University
21