databasesx - Kansas State University

Download Report

Transcript databasesx - Kansas State University

Organizing Data and
Information for Use in
Decision Making
(MIS)
Views of Data
• The Physical View
– Concerned with where things are stored at
and what they are stored on.
• The Logical View
– Concerned with how data is represented
so that it will be meaningful to users.
– What are the relationships between various
data elements?
FILE
in Access
TABLE
in Access
Bit: a single zero or one
• Entity class (usually called simply Entity)
– It is a concept that relates to the items about which you wish to
store information--these “items” can include conceptual ideas such
as “philosophers’ thoughts,” people such as “student,” things such
as “refrigerators” or places such as “city.”
– Generic In Nature
– EMPLOYEES Table
• Instance
– An occurrence of an Entity Class is called an Instance of
that class. Manhattan is an instance of the entity class
city.
– A More Specific Example
– A record in Access
• Attribute
– Characteristics of our records
• Employee #, Last Name, First Name, Hire Date, Dept. Number
– The columns in Access
• A PRIMARY KEY is a field in a database file
that uniquely describes each record.
– ID Number
Social Security Number
Four database models
• Hierarchical
– XML data is stored in a hierarchical format that
you can expand and collapse.
– The XML instance document can be thought of
as a “database”
• Network
• Relational (the focus of our class)
• Object-Oriented
Relational Database Model
• Uses a series of two-dimensional tables called
relations to store information relating to a particular
entity class.
– Connects or relates data in different tables through the
use of a key field or common elements.
• Tables = Relation: A specific entity class
• Rows = Records or individual file folders.
• Columns = Attributes or field names
• The relational database model is the widely used
database model at the present time.
Spreadsheet Data Used to create a database
Steps to Create a Relational
Database
• Define entity classes and primary keys
• Define relationships among entity classes
• Define attributes (fields) for each relation
• Use a data definition language to create the
database.
Step One: Define Entity Classes
and Primary Keys
Need a database that keeps track of the following information
about your health club:
The first name, last name, and id number for each of your members
The address (street, city, state) where your members live
The gender of each member and what class(es) they are taking
The course name, number, and time that each of course meets
The first name, last name, and id number for each of your
employees.
The home phone number for each employee, as well as their salary
amount and the date that they were hired.
Data Assumptions/Traits
• Member Number is unique. Each member will have only one
Member Number. There is one Member for each Member Number.
• At this time, each member can enroll in only one course (at a time)
that is offered by your health club. Therefore, each member will have
only one class number recorded at any time. (At some point in your
assignment, you will make it possible for your members to take more than one
course at a time)
• Class Number is the unique identifier for each class that your health
club offers. There is only one course for each class number.
• Employee Number is the unique identifier for each employee that
works at your health club. There is only one employee for each
employee number.
Data Assumptions/Traits
•
•
•
•
•
A member can take only one course at a time.
Only one instructor can teach a particular course.
Only one section of a course is taught at any one time.
A course may be taken by many different students.
An instructor can teach more than one course at any one time.
• THEREFORE WE HAVE THE FOLLOWING ENTITY
CLASSES:
– MEMBERSHIP - primary key is Member Number
– COURSES - primary key is Class Number
– EMPLOYEES – primary key is Employee Number
Step 2: Define relationships between entity classes.
Using an Entity Relationship (E-R) Model Diagram,
we can represent the entity classes and their
relationships:
MEMBERSHIP
1:M
Key: Member Number
COURSES
Key: Class Number
COURSES
Key: Class Number
1:M
EMPLOYEES
Key: Employee Number
STEP 3: Define the attributes in each Entity
which has been converted into a relation.
Normalization
• A technique used to make complex
databases more efficient
• Break one large table into several smaller
tables
– Eliminate all repeating groups in records
– Eliminate redundant data (duplicate copies)
– Assure that each field in the relation depends
only on the primary key of that relation
Normalized Data: tables are linked
together by a common field found in
more than one table.
A limitation of this current example is that each
member can enroll in only one course at a time. That
is highly inefficient and it is not a good way to run
your business.
Creating a 4th table
allows you to store
information in that
table related to all
of the different
courses that your
members are
enrolling in.
•Class Number is removed from the Membership table and placed in the
Enrollment table.
•Neither Member Number nor Class Number is unique by itself in the
Enrollment table. However, you can create a composite primary key that
joins those two fields together, and that can be your unique identifier.
Step #4: Use a Data Definition
Language to create the database
You need a database management
system (DBMS) to do this
Microsoft Access is a software
program (database management system)
that creates a database that follows
the principles found in the
Relational Database Model
What Is a Database
Management System
An integrated set of programs that
provides all the necessary capabilities for
building and maintaining database files,
extracting information required for
making decisions, and formatting the
information into structured reports.
DIFFERENT FROM A DATABASE
DATABASE
a collection of information that you organize and
access according to the logical structure of that
information.
A database is actually composed of two parts:
1. The information itself / the files that are logically
associated
2. The logical structure of the information which is
called the data dictionary. The DATABASE
STRUCTURE
– The data dictionary contains the logical properties
that describe information in a database.
DATABASE MANAGEMENT
SYSTEM (DBMS)
the software you use to specify the logical
organization for a database and access it.
A DBMS contains 5 software components:
Database Management System
Engine
accepts logical requests from the various other
DBMS subsystems, converts them to their
physical equivalent, and actually accesses the
database and data dictionary as they exist on a
storage device.
You can work with the information logically
without having to worry about where it is
physically stored or what it is stored on.
DATA DEFINITION SUBSYSTEM
helps you create and maintain the data
dictionary and define the structure of the files in
a database.
• A data dictionary serves the same purpose as an
XML schema.
• You use this subsystem to define the information
logical structure when you first create a database.
• Once you’ve created a database, you use this
subsystem to define new fields, delete fields, or
change field properties.
What might be found in a data
dictionary in Access?
• Field Names and their data types, as well as
the primary key.
• INTEGRITY CONSTRAINTS are rules that
help assure the quality of the information in a
database. (Validation Rules in Access)
– Setting a maximum or minimum value for a field
– A birth date can be now or in the past: not in the
future.
– A registration database at your school includes
integrity constraints concerning prerequisites for
certain classes.
– Validation Text: error message seen in Access.
What might be found in a data
dictionary in Access?
• Input Masks and Field Sizes
• Formats, Default Values, and Captions
• With Referential Integrity, you create a feature
that will prevent you from typing invalid data into
your table.
– When two different tables are linked/joined by a
relationship where referential integrity is enforced, you
cannot type data into a linked field if that same data does
not already exist in the original table.
– You can’t assign something to an item that doesn’t exist.
DATA MANIPULATION SUBSYSTEM
helps you add, change, and delete information in
a database and mine it for valuable information.
• This subsystem is most often the primary
interface between you as a user and the
information contained in a database.
• Tools in this subsystem include views, report
generators, query-by-example tools, and
structured query language.
DATA MANIPULATION TOOLS
• VIEW - allows you to see the content of a
database file, make whatever changes you
want, perform simple sorting, and search to
find the location of specific information.
– Add records.
– Delete records
– Sort records.
– Search for records.
DATA MANIPULATION TOOLS
• REPORT GENERATOR - helps you quickly
define formats of reports and what information
you want to see in a report. You can specify
exactly what you want to see and where you
want to see it.
– Report Wizard
– Report Design Screen
DATA MANIPULATION TOOLS
• QUERY-BY-EXAMPLE (QBE) TOOL - helps
you graphically design the answer to a
question. You create this query by pointing,
clicking, and dragging.
• You specify the conditions or criteria that you
want the database searched for, and the query
tool will return records that match the
conditions or criteria that you specified.
• Queries are used to extract useful information
from a database.
Query Design screen (above)
and Query Result (below)
DATA MANIPULATION TOOLS
• STRUCTURED QUERY LANGUAGE (SQL) a standardized fourth-generation language
found in most database environments. SQL is
the same as QBE, except that you perform a
query by creating a statement instead of
pointing, clicking, dragging.
– SQL is a form of computer programming.
– Uses
• SELECT
• FROM
• WHERE
APPLICATION GENERATION
SUBSYSTEM
contains facilities to help you develop
transaction-intensive applications. This
subsystem includes:
• Tools for creating data entry screens.
• Application buttons that are used to execute a
certain task.
• Uses a programming language specific to the
database management system that you are
using.
DATA ADMINISTRATION
SUBSYSTEM
helps you manage the overall database
environment by providing facilities for:
• Backup and recovery
• Security management
– Who has access to what?
– What type of access do you have?
• Concurrency control ensures the validity of
database updates.
• Change management allows you to assess the
impact of proposed structural changes.
One of the goals of a database
management system is to provide
easy and logical access to data,
while at the same time avoiding
redundancy. If you have multiple
files, each containing the same
information, then you will most
likely find redundant elements
and erroneous data.
Relational Database Model
• Uses a series of two-dimensional tables
called relations.
• Connects or relates data in different files
through the use of a key field or common
elements.
• Based on mathematical principles which
allow for more logical manipulation of data.
• Most flexible type of organization.
THE CONCEPT OF KEYS
• A KEY is a field or combination of fields used to
identify records so they can be easily retrieved and
processed.
• A PRIMARY KEY is a field in a database file
that uniquely describes each record.
– ID Number
Social Security Number
• You can create relationships between
tables/files through common fields that are
found in more than one table/file.
Student and Department
Student Entity
Student Number
Address
.
.
Department ID
Department Entity
Department ID
Address
.
.
This way, if we wanted to know all the students that belong
to a particular department, we could get the listing by joining
the two entities on the Department ID value.
Relational Database Model
State DMV Database
Driver’s
license
file/table
Street City State Zip 1XYZ234
Driver’s Expiration
Name address
J.Doe
license no.
date
Car owner
Car
Model Car
A405261
file/table
License No. year make
Street City St. Zip
J.
Doe address
Name
Moving
violation
citation
file/table
Moving Date
Citation violation
Fines paid/
1XYZ234
Driver’s
License
No.
number
cited
not paid
type
Parking
violation
citation
file/table
Parking Date
Citation violation
Fines paid/
Car
License
Number
A405261
number
cited
not paid
(MIS)
type
Creating relationships between database files.
– Common field found in more than one table/file.
– You might have to physically draw the connection
between two tables/files. This is usually achieved
by drawing a connecting line between two fields.
INFORMATION STORED SEPARATELY BUT CAN
BE RELATED THROUGH KEY JOINS
Another example: which movies are provided by
which distributors?
Organizational Databases
• Organizational databases are constantly changing
as the organizational transaction processing
systems are constantly updating and changing the
contents of the organizational database to reflect
current business activities.
– Organizational databases support day-to-day
operations. Constantly being updated and changed.
– Not very useful for decision making involving the
analysis of historical data.
• Organizational databases rarely maintain historical data, just
information related to current operations.
Data Warehouses
• A logical collection of information gathered
from many different operational databases.
– Contains historical data that has been extracted from
many different operational databases.
– Historical data is used for decision making.
• Operational databases support transaction
processing (OLTP).
• Data warehouses support analytical processing
(OLAP) because historical data is analyzed in
order to make a decision.
• Data Mining tools are the software tools used to
query information in data warehouse.
– Use advanced statistical techniques to search for
patterns and anomalies in the data.
– Attempts to find answers to questions the user did
not even think to ask.
• A relational database stores information in a series of two
dimensional tables.
• Data warehouses are multidimensional, containing layers of
rows and columns. Each dimension is an attribute of
information.
Data-mining agents perform
multidimensional analysis in data
warehouses
• Cube – common term for the representation of multidimensional information (layers, rows, columns)
Applications of Data Mining
• Market Segmentation: identifying common
characteristics of customers who buy the same products
from your company.
• Customer Churn: predicting which customers are
likely to leave your company and go to a competitor.
• Fraud Detection: identifying which transactions are
most likely to be fraudulent.
• Direct Marketing: identifying the best prospects in
order to obtain the highest response rate.
• Market Basket Analysis: trying to understand which
products are commonly purchased together.
• Trend Analysis: trying to reveal differences between
one period and another period.
• Info in an Excel spreadsheet and a relational
database (Access) appears in the form of a two
dimensional table of rows and columns.
• By adding a Report Filter or Slicer to a Pivot Table,
you can add another dimension of information: 3-D
(rows and columns and layers).
– Creating a 3-dimensional Pivot Table in Excel is a means
of conceptually building a data warehouse. Report
Filters/Slicers represent the depth layer
• Pivot Tables can help you see relationships in the
data
The End
Recent Database Developments
• The Data Warehouse and Data Mining
• Object-Oriented Database Management
Systems
OBJECT-ORIENTED
DATABASE MODEL
• A database model that brings together, stores, and
allows you to work with both information and the
procedures that act on that information.
• An object is a software module containing
– information that describes an entity class
– along with a list of procedures that can act on the
information describing the entity class.
– Audio, video, and graphics can be stored in an object.
OBJECT-ORIENTED
DATABASE MODEL
• Basic principles of Object-Oriented are
different:
– Hierarchical, network, and especially Relational
Database Structures try to separate the data from the
programs which manipulate them.
– Object-Oriented model does the opposite: they store
the Entity together with the Procedures (code) that
work on the Entity data.
Object-Oriented
Database Model
Bank Account Object
Attributes
•Name
•Acct. #’s
•Address
•Phone
Operations
•Get Accounts
Checking Acct. Object
Attributes
•Acct. #’s
•Balance
•Credit Line
•Monthly Stmt.
Operations
•Calc. Int. Owed
•Print Mthly. Stmt.
Savings Acct. Object
Attributes
•Account #
•Balance
Operations
•Calc. Int. Paid
•Print Qtrly. Stmt.
The ObjectOriented database
model is gaining
in popularity and
some day may be
more popular than
our current
relational database
model.
• Object-Oriented Databases Technologies
Are Becoming More Popular because we /
businesses need to store things that are not just
words.
• The Object-Oriented Model supports unique and
complex data types (because it does not limit data
storage to two dimensional tables). You can store
and manipulate different aspects of:
– graphics
– video
– text
drawings
sound
music
photographs
TO SUMMARIZE
• How we view information:
– The physical view of information deals with how
information is physically arranged, stored, and accessed
on some type of secondary storage device.
– The logical view of information focuses on how you
need to arrange and access information to meet your
particular business needs.
• A database is a collection of information that
you organize and access according to the
logical structure of that information.
• The data dictionary contains the logical
structure of information in a database.
Database Models
• Relational Database Model
– All the data is arranged in a series of related tables.
Student, Class, Instructor
– Tables are linked together by common fields.
• Object-Oriented Database Model
– Allows you to store the data and the procedures
used to manipulate that data together.
– Can be used to store text, sound, video, and images.
– Stores data, computes GPA, and creates a transcript.
Normalization
• Used to create tables in a relational
database.
• Break one large table into several smaller
tables
– Eliminate redundant data (duplicate copies)
– With redundancy removed, you only need to
make a change once and all linkages are
automatically updated.
Proper Database Design
• Field: Name: Brian Kovar
• Field: Address:
– 123 North Main, Manhattan, KS 66502
•Field: Last Name: Kovar
•Field: First Name: Brian
•Field: Address: 123 North Main
•Field: City: Manhattan
•Field: State: KS
•Field: Zip Code: 66502
Hierarchical Database Model
• The oldest database structure
– Started by IBM in the mid 1960s
• Its structure resembles an Organizational Chart
structure applied to entities in a database structure.
• Parent/child relationship between entities.
– A “parent entity” can have many “children entities” but a
“child entity” can have only one “parent entity”
– Allows 1:1 and 1:M relationships only.
• Only one access path to any particular data element.
• Very rigid structure. Very difficult to add new fields to
a database.
Kansas State University
College of
Business
Accounting
MANGT 366
College of Arts
& Science
MIS
Marketing
MANGT 367
George W. Bush
Bill Clinton
George Bush
Ronald Reagan
College of
Engineering
Finance
MANGT 656
Management
MANGT 670
The Hierarchical
Database Model
NETWORK DATABASE MODEL
• The second oldest model
• Characteristics
– It allows a “parent” entity to have many “children
entities”
– A “child entity” is allowed to have many “parent
entities”
• Can represent 1:1, 1:M, and M:M relationships.
• Different branches of data can have different
relationships. Can access data by more than
one path.
Network Database Model
A college class scheduling system
Courses
Instructors
Students
Journalism 101
Film 200
TV 210
D. Barry
R. DeNiro
D. Rather
Student A Student B Student C Student D Student E
Database Models
• Hierarchical Database Model
– Structure Resembles an Organizational Chart
– Tree with Branches: Must travel up and down set
paths in order to reach a destination.
• Network Database Model
– Allows for numerous interconnecting relationships.
One student can have many teachers. One teacher
can have many different students. One teacher can
teach many different classes. One class can have
many different students.
Traditional Approach to Data
Management
Traditional Approach to Data
Management
• Separate data files are created by or for different
application programs. Seen from a file perspective.
• Weaknesses of this approach
– Data redundancy: the same data can be found in more
than one location.
– Data redundancy conflicts with the integrity of the
overall data.
– Program-data dependence: programs and data that are
developed for one application are incompatible with
programs and data developed for another application.
– Inflexible: Very difficult to make changes
The Database Approach
• All of the data
used by all of
the applications
is stored in
one location:
the database
– Removes
data
redundancy
– Helps insure
data integrity.
Hierarchical Database Model
A cruise ship reservations system
Ports of
Departure
Miami
Los Angeles
New York
Names
of Ships
QE 2
The Love Boat
The Oriana
Sailing
Dates
April 15
May 30
July 15
Cabin
Numbers
A-1
A-2
A-3
Organizing Data and
Information: Chapter 3
(MIS)
• What view of data is of most interest to most
people in business (non-computer types of
occupations such as accountants, marketers,
managers, and people in finance)?
– The Logical View of Data
– Concerned with how data is represented so that
it will be meaningful to users. What are the
relationships between various data elements?
• Primary Key
• Attribute
Instance
Entity Class
• The wording of “Cities in Kansas” is
– An entity class
• Manhattan would be an _____________
– Instance of the “Cities in Kansas” Entity Class
• An attribute of Manhattan is ____________
• Give an example of a good primary key
Organizing Data and
Information: Chapter 3
(MIS)
• The Relationship line is based on a common
field found in more than one table
(everything about it must be identical).
– What is the purpose of the relationship line?
– It is used to link together your tables (also
referred to as the “join line.”)