Database Management - Northern Illinois University

Download Report

Transcript Database Management - Northern Illinois University

Database Management
Character, file, field, record, database???
What’s “File Processing”?
•
•
The “old” way of doing things; still often
used in practice.
Separate information stored on separate
files.
File Processing Example:
Sales
Knows how
many of
Products A,
B, and C have
been sold.
File stores
Prod. Name,
Production
Schedule,
and Sales.
Production
Knows how
much of
Products A,
B, and C have
been produced.
File stores
Prod. Name,
Production
Schedule, and
Number Produced.
Marketing
Knows the
price of
Products A,
B, and C.
File stores
Prod. Name
and Product
Price.
Any problems here?
•
•
•
•
Duplication (redundancy).
Inconsistency.
Does anyone know how much money we
made? No integration.
Set format. Data dependence. Y2K!!
Database Management
Database Management System (DBMS)
•
•
•
•
Provides one integrated repository for data
to be stored and queried.
Standards for data can be defined and
enforced.
Reports and queries are easy (er).
SQL, etc.
Database Management Ex.:
Sales
Production
Marketing
(App. Progs)
DBMS
Database
Prod. Name
Production Schedule
Sales
Number Produced
Product Price
DATABASE MANAGEMENT SYSTEMS
• Four components of a DBMS
McGraw-Hill/Irwin
© 2006 The McGraw-Hill Companies, Inc. All rights reserved.
BUT...
•
•
•
Expensive.
Difficult.
Slow / inefficient.
Another Look
(thanks to John Gallaugher, Boston College)
Server - responds to client requests
request
response
Client - makes requests
of the DBMS server
• Database
DBMS - the program.
Manages interaction with
databases.
database - the collection of data.
Created and defined to meet the
needs of the organization.
• a collection of related data. Usually organized according to topics:
e.g. customer info, products, transactions
• Database Management System (DBMS)
– a program for creating & managing databases; ex. Oracle, MSAccess, Sybase
A Simple Database
CUSTID
2001
2002
2003
2004
2005
FIRST
John
Abby
Warren
Peter
Charles
LAST
Gallaugher
Johnson
Buffet
Lynch
Schwab
CITY
Newton
Boston
Omaha
Marblehead
San Francisco
STATE
MA
MA
NE
MA
CA
• File/Table
• Customers
• Field/Column
• 5 shown: CUSTID, FIRST, LAST, CITY, STATE
• Record/Row
• 5 shown: one for each customer
…
…
…
…
…
…
A More Complex Example
LAST
Gallaugher
Gallaugher
Gallaugher
Johnson
Johnson
Buffet
Buffet
Buffet
Lynch
Lynch
Schwab
Schwab
CITY
Newton
Newton
Newton
Boston
Boston
Omaha
Omaha
Omaha
Marblehead
Marblehead
San Francisco
San Francisco
STATE
MA
MA
MA
MA
MA
NE
NE
NE
MA
MA
CA
CA
BUY/SELLSTOCK SHARES
Buy
MSFT
1000
Buy
INTC
2400
Sell
IBM
3000
Sell
IBM
3000
Sell
INTC
2000
Buy
INTC
1500
Buy
IBM
1700
Sell
AAPL
1900
Buy
AAPL
2000
Sell
AAPL
10000
Buy
MSFT
4500
Buy
INTC
17000
PRICE DATE
90 1/4 12/24/98
80 1/8
7/3/99
114 3/8
7/1/99
110 1/8 6/30/99
94 7/8 8/30/99
90 3/8
7/2/99
101 7/8
1/4/99
18 1/2 2/14/99
19 2/14/99
21 7/8 3/15/99
101 1/8 1/15/99
80 1/8
7/2/99
TIME
CUSTID
12:01 PM
2001
10:51 AM
2001
9:03 AM
2001
4:53 PM
2002
3:15 PM
2002
11:27 AM
2003
2:02 PM
2003
5:00 PM
2003
5:30 PM
2004
11:44 AM
2004
12:38 AM
2005
4:53 PM
2005
• Entry & Maintenance is complicated
• redundant data exists, increases chance of error,
complicates updates/changes, takes up space
Normalize Data:
Remove Redundancy
CUSTID
2001
2002
2003
2004
2005
FIRST
John
Abby
Warren
Peter
Charles
LAST
Gallaugher
Johnson
Buffet
Lynch
Schwab
CITY
Newton
Boston
Omaha
Marblehead
San Francisco
STATE
MA
MA
NE
MA
CA
Customer Table
Transaction Table
One
Many
CUSTID
2001
2001
2001
2002
2002
2003
2003
2003
2004
2004
2005
2005
BUY/SELLSTOCK SHARES
Buy
MSFT
1000
Buy
INTC
2400
Sell
IBM
3000
Sell
IBM
3000
Sell
INTC
2000
Buy
INTC
1500
Buy
IBM
1700
Sell
AAPL
1900
Buy
AAPL
2000
Sell
AAPL
10000
Buy
MSFT
4500
Buy
INTC
17000
PRICE DATE
90 1/4 12/24/98
80 1/8
7/3/99
114 3/8
7/1/99
110 1/8 6/30/99
94 7/8 8/30/99
90 3/8
7/2/99
101 7/8
1/4/99
18 1/2 2/14/99
19 2/14/99
21 7/8 3/15/99
101 1/8 1/15/99
80 1/8
7/2/99
TIME
12:01 PM
10:51 AM
9:03 AM
4:53 PM
3:15 PM
11:27 AM
2:02 PM
5:00 PM
5:30 PM
11:44 AM
12:38 AM
4:53 PM
Key Terms
• Relational DBMS
• manages databases as a collection of files/tables in which all
data relationships are represented by common values in related
tables (referred to as keys).
• a relational system has the flexibility to take multiple files and
generate a new file from the records that meet the matching
criteria (join).
• SQL - Structured Query Language
• Most popular relational database standard. Includes a
language for creating & manipulating data.
Using SQL for Querying
•
SQL (Structured Query Language)
Data language
English-like, nonprocedural, very user friendly
language
Free format
Example:
SELECT
FROM
WHERE
Name, Salary
Employees
Salary >2000
THE VALUE OF QUALITY INFORMATION
• Five common characteristics of high-quality
information
McGraw-Hill/Irwin
© 2006 The McGraw-Hill Companies, Inc. All rights reserved.
THE VALUE OF QUALITY INFORMATION
• Low-quality information example
McGraw-Hill/Irwin
© 2006 The McGraw-Hill Companies, Inc. All rights reserved.
THE VALUE OF QUALITY INFORMATION
•
The four primary sources of low-quality
information include:
1. Online customers intentionally enter inaccurate
information to protect their privacy
2. Information from different systems that have different
information entry standards and formats
3. Call center operators enter abbreviated or erroneous
information by accident or to save time
4. Third party and external information contains
inconsistencies, inaccuracies, and errors
McGraw-Hill/Irwin
© 2006 The McGraw-Hill Companies, Inc. All rights reserved.
Understanding the Costs of Low-quality
Information
• Potential business effects resulting from low-quality
information
– Inability to accurately track customers
– Difficulty identifying valuable customers
– Inability to identify selling opportunities
– Marketing to nonexistent customers
– Difficulty tracking revenue due to inaccurate invoices
– Inability to build strong customer relationships – which
increases buyer power
McGraw-Hill/Irwin
© 2006 The McGraw-Hill Companies, Inc. All rights reserved.
Structures
•
•
•
•
Hierarchical: The old way. “Tree”.
• Access elements by moving down tree.
• One-to-many.
Network: Criss-cross patterns.
• Many-to-many.
Relational: a common element relates “tables” to
one another. Permits “ad hoc”.
Object-oriented: “objects” have data, processes,
and properties “encapsulated” in them.
Database Structures
Hierarchical Structure
Network Structure
Relation
Relational Structure
Dept
Empno
Dept
A
1
A
B
2
B
C
3
C
Pros and Cons
Ad Hoc Flexibility ==>
Obj.
Relat.
Net.
Hier.
Speed ==>
Data Dictionaries
The Data Dictionary
• A reference work of data about data (metadata) compiled by the systems
analyst to guide analysis and design.
• As a document, the data dictionary collects, coordinates, and confirms the
meaning of data terms to various users throughout the organization.
Uses of the Data Dictionary
• Documentation, Elimination of data redundancy
• Validate the data flow diagram for completeness and accuracy
• Provide a starting point for developing screens and reports
• Determine contents of data stored in files
• Develop the logic for data flow diagram processes
Data Flow Diagrams (“DFD”)
Data Flow
Process
File or Data Store
Source or Entity
DFD Example: Apartment Rental
Copy of lease
Lease
Tenant Info
D1 Tenant File
1
New
Tenant
Process
Receipt
Tenant
Bank
Deposit
Bank
2
Notice
Payments
Collection
Process
Cash Report
Unpaid
Charges
Tenant
Info
D1 Tenant File
Delinquencies
Ext.
Mgr
3
Delinquent
Process
Delinquency
Report
Entity Relationship Diagrams
Dept. Projects
* Project
Deadline
Resources
Dept. Employee
works on
“zero” “one”
* Name
Title
Address
“many”
New Names, Same Ideas
•
•
Data Mining, OLAP
Data Warehousing
Data Mining
• automated information discovery process,
uncovers important patterns in existing data
• can use neural networks or other approaches.
Requires ‘clean’, reliable, consistent data.
Historical data must reflect the current
environment.
• e.g. “What are the characteristics that identify
when we are likely to lose a customer?”
• OLAP is user-driven discovery
Warehouses & Marts
• Data Warehouse
• a database designed to support decision-making in an organization.
It is batch-updated and structured for fast online queries and
exploration. Data warehouses may aggregate enormous amounts
of data from many different operational systems.
• Data Mart
– a database focused on addressing the concerns of a specific
problem or business unit (e.g. Marketing, Engineering). Size
doesn’t define data marts, but they tend to be smaller than data
warehouses.
Data Warehouses & Data Marts
3rd party data
Data Mart
(Marketing)
TPS
& other
operational
systems
Data
Warehouse
= operational clients
= query, OLAP, mining, etc.
Data Mart
(Engineering)