Data and Database Design

Download Report

Transcript Data and Database Design

Organizing Data and
Information for Use in
Decision Making
(MIS)
Views of Data
• The Physical View
– Concerned with
• The Logical View
– Concerned with how data is represented so
that it will be meaningful to users.
FILE
in Access
TABLE
in Access
Bit: a single zero or one
• E
(usually called simply Entity)
– It is a concept that relates to the items about which you wish to
store information--these “items” can include conceptual ideas such
as “philosophers’ thoughts,” people such as “student,” things such
as “refrigerators” or places such as “city.”
–G
– EMPLOYEES Table
• Instance
– An
of an Entity Class is called an
of that class. Manhattan is an instance of the
entity class city.
– A More
– A record in Access
• Attribute
–C
• Employee #, Last Name, First Name, Hire Date, Dept. Number
– The columns in Access
• A PRIMARY KEY is a field in a database file
that
– ID Number
Social Security Number
Four main database models
• Hierarchical
• Network
• Relational (the
• Object-Oriented
)
Relational Database Model
• Uses a series of two-dimensional
called
to store information
relating to a
• Tables = Relation: A specific entity class
• Rows =
or individual file folders.
• Columns = Attributes or
• The relational database model is the widely
used database model at the present time.
Spreadsheet Data Used to create a database
Steps to Create a Relational
Database
• Define entity classes and primary keys
• Define relationships among entity classes
• Define attributes (fields) for each relation
• Use a data definition language to create the
database.
Step One:
Need a database that keeps track of the following information
about your health club:
The first name, last name, and id number for each of your members
The address (street, city, state) where your members live
The gender of each member and what class(es) they are taking
The course name, number, and time that each of course meets
The first name, last name, and id number for each of your
employees.
The home phone number for each employee, as well as their salary
amount and the date that they were hired.
Data Assumptions/Traits
• Member Number is unique. Each member will have only one
Member Number. There is one Member for each Member Number.
• At this time, each member can enroll in only one course (at a time)
that is offered by your health club. Therefore, each member will have
only one class number recorded at any time. (At some point in your
assignment, you will make it possible for your members to take more than one
course at a time)
• Class Number is the unique identifier for each class that your health
club offers. There is only one course for each class number.
• Employee Number is the unique identifier for each employee that
works at your health club. There is only one employee for each
employee number.
Data Assumptions/Traits
•
•
•
•
•
A member can take only one course at a time.
Only one instructor can teach a particular course.
Only one section of a course is taught at any one time.
A course may be taken by many different students.
An instructor can teach more than one course at any one
time.
• THEREFORE WE HAVE THE FOLLOWING
ENTITY CLASSES:
– MEMBERSHIP - primary key is Member Number
– COURSES - primary key is Class Number
– EMPLOYEES – primary key is Employee Number
Step 2: Define
between entity classes.
Using an Entity Relationship (E-R) Model Diagram,
we can represent the entity classes and their
relationships:
MEMBERSHIP
1:M
Key: Member Number
COURSES
Key: Class Number
COURSES
Key: Class Number
1:M
EMPLOYEES
Key: Employee Number
STEP 3: Define the attributes in each Entity
which has been converted into a
Normalization
• A technique used to make complex
databases more efficient
• Break one large
– Eliminate all repeating groups in records
– Eliminate
– Assure that each field in the relation depends
only on the primary key of that relation
Normalized Data:
together by a
in
are
found
A
of this current example is that each
member can
. That is
highly inefficient and it is not a good way to run your
business.
Creating a
allows you to store
information in that
table related to all
of the different
courses that your
members are
enrolling in.
•Class Number
Enrollment table.
from the Membership table and placed in the
•Neither Member Number nor Class Number is unique by itself in the
Enrollment table. However, you can create a
key that
together, and that can be your unique identifier.
Step #4: Use a Data Definition
Language to create the database
You need a database management
system (DBMS) to do this
Microsoft
is a software
(database management system)
that
a
that
follows the principles found in the
Model
What Is a Database
Management System
An
that
provides all the necessary capabilities for
b
files,
ex
required for making
decisions, and formatting the information
into structured reports.
DIFFERENT FROM A DATABASE
DATABASE
a collection of information that you
A database is actually composed of two parts:
1. The information itself / the files that are logically
associated
2. The logical structure of the information which is called
the data dictionary.
The data dictionary contains the logical properties that
describe information in a database.
DATABASE MANAGEMENT
SYSTEM (DBMS)
the software you use to specify the logical
organization for a database and access it.
A DBMS contains 5 software components:
Database Management System
Engine
accepts logical requests from the various other
DBMS subsystems, converts them to their
physical equivalent, and actually accesses the
database and data dictionary as they exist on a
storage device.
You can work with the information
DATA DEFINITION SUBSYSTEM
helps you create and maintain the data
dictionary and define the structure of the files in
a database.
• A data dictionary serves the
.
as an
• You use this subsystem to define the information
logical
when you first create a
database.
• Once you’ve created a database, you use this
subsystem to
fields,
fields, or
field properties.
What might be found in a data
dictionary in Access?
• F
and their
, as well as
the
• I
are rules that help
assure the quality of the information in a
database. (Validation Rules in Access)
– Setting a
for a field
– A birth date can be now or in the past: not in the
future.
– A registration database at your school includes
integrity constraints concerning
for
certain classes.
– Validation Text:
seen in Access.
What might be found in a data
dictionary in Access?
• Input Masks and Field Sizes
• Formats, Default Values, and Captions
• With Referential Integrity, you create a feature
that will
– When two different tables are linked/joined by a
relationship where referential integrity is enforced, you
cannot type data into a linked field if that same data does
not already exist in the original table.
– You
DATA MANIPULATION SUBSYSTEM
helps you add, change, and delete information in
a database and mine it for valuable information.
• This subsystem is most often the
between you as a user and the
information contained in a database.
• Tools in this subsystem include views, report
generators, query-by-example tools, and
structured query language.
DATA MANIPULATION TOOLS
•
- allows you to see the content of
a database file, make whatever changes you
want, perform simple sorting, and search to
find the location of specific information.
– Add records.
– Delete records
– Sort records.
– Search for records.
DATA MANIPULATION TOOLS
• REPORT GENERATOR - helps you quickly
define formats of reports and what information
you want to see in a report. You can specify
exactly what you want to see and where you
want to see it.
– Report Wizard
– Report Design Screen
DATA MANIPULATION TOOLS
• QUERY-BY-EXAMPLE (QBE) TOOL - helps
you graphically design the answer to a
question. You create this query
• You specify
that you want
the database searched for, and the query tool
will return records that match the conditions or
criteria that you specified.
• Queries are used to
useful
Query Design screen (above)
and Query Result (below)
DATA MANIPULATION TOOLS
• STRUCTURED QUERY LANGUAGE (SQL) a standardized fourth-generation language
found in most database environments. SQL is
the same as QBE, except that you perform a
query by creating a statement instead of
pointing, clicking, dragging.
– SQL is a
– Uses
• SELECT
• FROM
• WHERE
APPLICATION GENERATION
SUBSYSTEM
contains facilities to help you develop
transaction-intensive applications. This
subsystem includes:
• Tools for creating
• Application buttons that are used to execute a
certain task.
• Uses a programming language specific to the
database management system that you are
using.
DATA ADMINISTRATION
SUBSYSTEM
helps you manage the overall database
environment by providing facilities for:
• B
• Security management
–W
–W
• Concurrency control ensures the validity of
database updates.
• Change management allows you to assess
the impact of proposed structural changes.
One of the goals of a database
management system is to provide
easy
,
while at the same time
. If you have
multiple files, each containing the
same information, then you will
most likely find redundant
elements and erroneous data.
Relational Database Model
• Uses a series of two-dimensional
• Connects or relates data in different files
through the use of a
• Based on mathematical principles which
allow for more logical manipulation of data.
• Most flexible type of organization.
THE CONCEPT OF KEYS
• A KEY is a field or combination of fields used to
identify records so they can be easily retrieved and
processed.
• A PRIMARY KEY is a field in a database file
that.
–I
Social
• You can create relationships between
tables/files through
/file.
Student and Department
Student Entity
Student Number
Address
.
.
Department ID
Department Entity
Department ID
Address
.
.
This way, if we wanted to know all the students that belong
to a particular department, we could get the listing by joining
the two entities on the Department ID value.
Relational Database Model
State DMV Database
Driver’s
license
file/table
Street City State Zip 1XYZ234
Driver’s Expiration
Name address
J.Doe
license no.
date
Car owner
Car
Model Car
A405261
file/table
License No. year make
Street City St. Zip
J.
Doe address
Name
Moving
violation
citation
file/table
Moving Date
Citation violation
Fines paid/
1XYZ234
Driver’s
License
No.
number
cited
not paid
type
Parking
violation
citation
file/table
Parking Date
Citation violation
Fines paid/
Car
License
Number
A405261
number
cited
not paid
(MIS)
type
Creating
between database files.
–C
/file.
– You might have to physically draw the connection
between two tables/files. This is usually achieved
by drawing a connecting line between two fields.
INFORMATION STORED SEPARATELY BUT CAN
BE RELATED THROUGH KEY JOINS
Another example: which movies are provided by
which distributors?
Recent Database Developments
• The Data Warehouse and Data Mining
• Object-Oriented Database Management
Systems
Organizational Databases
• Organizational databases are constantly changing as
the organizational
systems are constantly updating and changing the
contents of the organizational database to reflect
current business activities.
– Organizational databases support
. Constantly being updated and changed.
– Not very useful for decision making involving the
analysis of
• Organizational databases rarely maintain historical data, just
information related to current operations.
• A logical collection of information gathered
from
operational databases.
– Contains historical data that has been extracted from
many different operational databases.
– Historical data is used for decision making.
• Operational databases support transaction
processing (OLTP).
• Data warehouses support
(OLAP) because historical data is analyzed in
order to make
• D
are the software tools used
to query information in data warehouse.
– Use advanced statistical techniques to search for
patterns and anomalies in the data.
– Attempts to find answers to questions the user did
not even think to ask.
• A relational database stores information in a series of
.
• Data warehouses are
, containing
Each dimension is an
of information.
D
perform
analysis in data warehouses
• C
– common term for the representation of multidimensional information (
)
Applications of Data Mining
• M
: identifying common
characteristics of customers who buy the same products
from your company.
• Customer Churn: predicting which customers are
likely to leave your company and go to a competitor.
• F
: identifying which transactions are
most likely to be fraudulent.
• Direct Marketing: identifying the best prospects in
order to obtain the highest response rate.
• Market
: trying to understand
which products are commonly purchased together.
• T
: trying to reveal differences between
one period and another period.
• Info in an Excel spreadsheet and a relational
database (Access) appears in the form of a two
dimensional table of rows and columns.
• By adding a
add
and columns and layers).
, you can
: 3-D (rows
– Creating a 3-dimensional Pivot Table in Excel is a means
of conceptually building a data warehouse. Page fields
represent the depth layer
• Pivot Tables can help you see relationships in the
data
• The following are summary slides
that I don’t plan on covering in
class, but I am making available
for you to review, if you wish.
• The Interpreting SQL slides will
be seen at a later date (after
everyone has made queries in lab)
TO SUMMARIZE
• How we view information:
– The physical view of information deals with how
information is physically arranged, stored, and accessed
on some type of secondary storage device.
– The logical view of information focuses on how you
need to arrange and access information to meet your
particular business needs.
• A database is a collection of information that
you organize and access according to the
logical structure of that information.
• The data dictionary contains the logical
structure of information in a database.
Database Models
• Relational Database Model
– All the data is arranged in a series of related tables.
Student, Class, Instructor
– Tables are linked together by common fields.
• Object-Oriented Database Model
– Allows you to store the data and the procedures
used to manipulate that data together.
– Can be used to store text, sound, video, and images.
– Stores data, computes GPA, and creates a transcript.
Normalization
• Used to create tables in a relational
database.
• Break one large table into several smaller
tables
– Eliminate redundant data (duplicate copies)
– With redundancy removed, you only need to
make a change once and all linkages are
automatically updated.
Proper Database Design
• Field: Name: Brian Kovar
• Field: Address:
– 123 North Main, Manhattan, KS 66502
•Field: Last Name: Kovar
•Field: First Name: Brian
•Field: Address: 123 North Main
•Field: City: Manhattan
•Field: State: KS
•Field: Zip Code: 66502
Interpreting SQL Code
SQL (The language used to query a database)
• S
is used to specify the
• F
is used to specify the
selected fields are coming from.
you want to include.
the
• W
is used to
used to narrow
down the data prior to being displayed.
• O
is used to specify how the records
(in ascending or descending
order), as well as the sort order.
• I
the
specifies that two or more tables are
and it also specifies the field that forms
between the
tables.
• S
the following fields from the Employees
table: Last Name, Title and Salary.
• The
used to narrow down the records is
where the title is “Sales Representative.”
• The results should be
in descending
order (high to low number order) based on salary.
• All of the fields in the query come from the
Employees table.
SELECT Max(Employees.Salary) AS MaxOfSalary,
Min(Employees.Salary) AS MinOfSalary,
Avg(Employees.Salary) AS AvgOfSalary,
StDev(Employees.Salary) AS StDevOfSalary
FROM Employees;
Select the Salary field from the Employees table (4
times).
Find the maximum salary, the minimum salary, the
average of salaries and the standard deviation of
salaries.
• Select the following fields from the Customers table:
Company Name, City and Country.
• The criteria used to narrow down the records is where
the country begins with U (it does not matter what
comes after the U as long as the first letter of the
country is a U).
• All of the fields in the query come from the Customers
table.
All of the fields in the query come from either the Customers
table or the Orders table.
The Customers and Orders tables are joined together by the
common field of Customer ID (Inner Join signifies this).
Select the following fields from the Customers table: Customer
ID and Company Name.
Select the following fields from the Orders table:
Order Date and Shipped Date.
The criteria used to narrow down the records is that the
company name should match “B’s Beverages”.
• O
is used to specify how the records
should be sorted (in ascending or descending order), as
well as the sort order.
• F
uses the city field and the records are sorted
in alphabetical order by city (order by defaults to
ascending order)
• If there happen to be multiple entries from the same
city, a
(descending order by last
name)