Chapter 1: Database Overview
Download
Report
Transcript Chapter 1: Database Overview
Introduction to databases
© 2007 by Prentice
Hall
1
Objectives
•
•
•
•
•
•
•
•
•
•
2
Define terms
Show limitations of conventional file processing
Show advantages of databases
Identify costs and risks of databases
List components of database environment
Describe evolution of database systems
Describe database system development life cycle
Explain prototyping and agile development approaches
Explain roles of individuals
Explain the three-schema architecture for databases
Definitions
• Database: organized collection of logically related
data
• Data: stored representations of meaningful objects
and events
– Structured: numbers, text, dates
– Unstructured: images, video, documents
• Information: data processed to increase knowledge
in the person using the data
3
Data in context
Context helps users understand data
4
Summarized data
Graphical displays turn data into useful
information that managers can use for
decision making and interpretation
5
Metadata: Descriptions of the properties or
characteristics of the data, including data types, field
sizes, allowable values, and data context
6
Old file processing systems at Pine Valley Furniture
Company
Duplicate Data
7
Disadvantages of File Processing
• Program-Data Dependence
– All programs maintain metadata for each file they use
• Duplication of Data (data redundancy)
– Different systems/programs have separate copies of the same data
• Limited Data Sharing
– No centralized control of data
• Lengthy Development Times
– Programmers must design their own file formats
• Excessive Program Maintenance
– 80% of information systems budget
8
Problems with Data Dependency
•
Each application program typically maintains its own
data.
–
–
•
Each application program includes code for the
metadata of each file (e.g., “record layouts” appear in
each cobol program).
–
9
One application’s data might be duplicated in another
application.
(Then the users must do “double data entry” or there needs
to be an interface program that copies data from one
application to the other.)
So, if you want to add a new field to a table (e.g., birth-date
added to each employee), every single program must be
modified and retested.
Problems with Data Dependency
• Each application program has its own
processing routines for reading, inserting,
updating, and deleting data.
– Too many lines of code makes it hard to
maintain (modify while keeping out the bugs)
• Lack of coordination and central control
• Non-standard file formats
10
Problems with Data Redundancy
• Data inconsistency: Data changes in one file
could cause inconsistencies (if not changed in
other copies of that data – which one is the
latest/correct version?)
• Compromises in data integrity: As example of
“compromised data integrity” would be
allowing records (that are referenced by other
data) to be deleted.
11
SOLUTION:
The DATABASE Approach
• Central repository of data -- accessed by all
applications. (so, no data redundancy)
• The data is managed (protected) by a software
called a Database Management System (no
programs can access the database except thru
the DBMS) – so, no data inconsistencies.
12
Database Management System
A software system that is used to create, maintain, and provide
controlled access to user databases
Order Filing
System
Invoicing
System
Payroll
System
DBMS
Central database
Contains employee,
order, inventory,
pricing, and
customer data
DBMS manages data resources like an operating system manages hardware resources
13
Advantages of the Database Approach
•
•
•
•
•
•
•
•
•
•
14
Program-data independence
Less data redundancy
Improved data consistency
Improved data sharing
Increased application development productivity
Enforcement of standards
Improved data quality
Improved data accessibility and responsiveness
Reduced program maintenance
Improved decision support
Costs and Risks of the
Database Approach
• Installation and conversion costs
• management cost and complexity - specialized
personnel are needed to support the Database and
DBMS.
• Organizational conflict – (user groups may not agree
about the details of the shared data, or who should
be allowed to access/update it)
15
Definitions
•
Relational Database: a database that contains related tables and little/no redundancy. Typically
accessed by business applications.
•
Data model: Graphical representation (picture) of a database, showing it’s tables and how they are
related.
•
Database Management System (DBMS) : software for managing the database. No program can see
or modify the database unless it “passes through” the DBMS which acts guards to ensure
– Everyone sees only what they are allowed to see and modifies only what they are allowed to
modify
•
•
– Data integrity is maintained (e.g., you can’t delete data that is being referenced by other data.
Repository: A database management system can work with ANY data model. How? It stores the
details about the data model (table names, field names and types etc). These details aer called
“METADATA”.
Database Application:
– A business application program that supports business functions and accesses a database, e.g.,
add new data, update data, delete data, read data, summarize/report data)
– Usually called a Web application (if used in browser, run over the internet) or a windows app (if
installed directly on a PC and run over a local area network).
16
One customer may
place many orders,
but each order is
placed by a single
customer
One-to-many
relationship
17
One order has
many order lines;
each order line is
associated with a
single order
One-to-many
relationship
18
One product can
be in many
order lines, each
order line refers
to a single
product
One-to-many
relationship
19
Therefore, one
order involves
many products
and one product is
involved in many
orders
Many-to-many
relationship
20
Enterprise Database Applications
• Enterprise Resource Planning (ERP)
– Integrate all enterprise functions (manufacturing,
finance, sales, marketing, inventory, accounting,
human resources)
• Data Warehouse
– Integrated decision support system derived from
various operational databases
21
Enterprise Data Warehouse
– A data warehouse is an Integrated decision support
system derived from various operational databases.
The data warehouse contains historical copies of
transactions of those operational systems.
• By getting all that data into one place, users can then
ask questions of the warehouse , such as which
customers buy the most, when, how often.
22
Enterprise Data Warehouse
• Each day data is cleaned/reconciled (one system may call
something one way and another system may call it another
way) so that it can be loaded into the warehouse.
• While the data is being loaded, it is also being summarized.
This is to make it more efficient to access. Otherwise there is
too much detailed data that needs to be summarized on the
fly and performance suffers.
• During the load process, the warehouse is not available for
users (because the totals wont be correct until the load is
complete).
23
Enterprise Data Warehouse
• It is interesting to note that good design principles for
relational databases (no data redundancy, many constraints to
maintain data integrity) are BAD design for data warehouse
• Data Warehouses (because they load lots of data from
different sources) use lots of data redundancy (e.g., precalculated totals) to make queries run faster.
• Data Warehouses have few or no constraints so that the load
process goes faster (so that the warehouse becomes available
for users as soon as possible).
• We’ll learn more about warehouses later…
24
An enterprise data warehouse
• Data is extracted from multiple systems within a corporation.
• This data is transformed (as necessary) and merged possibly
adding data from external sources) into the data warehouse.
Evolution of DB Systems
26
Summary
• Define terms
• Show limitations of conventional file
processing
• Show advantages of databases
• Identify costs and risks of databases
• Data Models
• Define Enterprise Data Warehouse
• Describe evolution of database systems