Transcript Chapter 3

Chapter 3
Databases and Data Warehouses
McGraw-Hill/Irwin
© 2008 The McGraw-Hill Companies,
All Rights Reserved
STUDENT LEARNING OUTCOMES
1. Describe business intelligence and its role in
an organization.
2. Differentiate between databases and data
warehouses with respect to their focus on
OLTP and OLAP.
3. List and describe the key characteristics of a
relational database.
3-2
STUDENT LEARNING OUTCOMES
4. Define the five software components of a
database management system.
5. List and describe the key characteristics of a
data warehouse.
6. Define the four major types of data-mining
tools in a data warehouse environment.
7. List key considerations in information
ownership in an organization.
3-3
Can Companies Keep Your Personal
Information Secure and Private?
• Databases and data warehouses are
organizational repositories of information
• Much of the information is personal
• It must be secure
• If hackers get your personal information, you
can suffer from identity theft
3-4
Can Companies Keep Your Personal
Information Secure and Private?
• Top-10 incidents of personal information loss
by organizations
• Could affect over 53 million people
• CardSystems lost information on 40 million
customers
• Many others
3-5
Can Companies Keep Your Personal
Information Secure and Private?
• Have you been a victim of identity theft?
– What happened?
– What did you do to recover?
– How long did it take?
3-6
INTRODUCTION
• Businesses need business intelligence (BI)
• Business intelligence – knowledge about your
customers, competitors, business partners,
environment, and internal operations
– Enables effective decision making
– Information on steroids
3-7
INTRODUCTION
• IT tools help process information to create
business intelligence according to…
– OLTP (online transaction processing)
– OLAP (online analytical processing)
3-8
INTRODUCTION
• OLTP – gathering and processing transaction
information and updating existing information
to reflect transaction
– Databases support OLTP
– Operational database – database that supports
OLTP
3-9
INTRODUCTION
• OLAP – manipulation of information to
support decision making
– Databases can help some
– Data warehouses support only OLAP, not OLTP
– Data warehouses – special forms of databases that
support decision making
3-10
INTRODUCTION
3-11
INTRODUCTION
• This chapter – database and data warehouse
concepts
• Along with some privacy and security
considerations
3-12
RELATIONAL DATABASE MODEL
• Database – logical collection of information
you organize and access according to the
logical structure of the information
• Relational database – uses a series of twodimensional tables or files to store information
in the form of a database
3-13
Databases Are…
•
•
•
•
Collections of information
Created with logical structures
With logical ties within the information
With built-in integrity constraints
3-14
Databases – Collections of Information
• Databases have many tables
• Solomon Enterprises as a concrete provider.
Tables include:
–
–
–
–
–
Order
Customer
Concrete Type
Employee
Truck
3-15
Databases – Collections of Information
3-16
Databases – Created with Logical
Structures
• In databases, row numbers are irrelevant
• In databases, columns have logical names such
as Order Date and Customer Name
• Data dictionary – contains the logical structure
of the information in a database
3-17
Databases – Logical Ties within the
Information
• Logical ties must exist between the tables
• Logical ties are created with primary and
foreign keys
• Primary key – field (or group of fields in some
cases) that uniquely describe each record
3-18
Databases – Logical Ties within the
Information
• Foreign key – primary key of one file that
appears in another file
• Foreign keys help create relationships among
tables
• Table = file = relation (don’t confuse yourself)
3-19
Databases – Logical Ties within the
Information
3-20
Databases – Built-in Integrity
Constraints
• Integrity constraint – rule that helps ensure
the quality of information
• Examples
–
–
–
–
Primary keys must be unique
Foreign keys cannot be blank
Sales price cannot be negative
Phone numbers must have an area code
3-21
DBMS TOOLS
• Database management system (DBMS) –
helps you specify the logical organization for a
database and access and use the information
within a database
– Word processing software = document
– Spreadsheet software = workbook
– DBMS software = database
3-22
DBMS TOOLS
• 5 software components
1.
2.
3.
4.
5.
DBMS engine
Data definition subsystem
Data manipulation subsystem
Application generation subsystem
Data administration subsystem
3-23
DBMS TOOLS
3-24
DBMS Engine
• DBMS engine – accepts logical requests,
converts them into their physical equivalent,
and accesses the database and data dictionary
• DBMS engine separates the logical from the
physical
3-25
DBMS Engine
• Physical view – how information is arranged,
stored, and accessed on a storage device
• Logical view – how you (knowledge worker)
need to arrange and access information
• Databases – you work only with logical views
3-26
Data Definition Subsystem
• Data definition subsystem – helps you create
and maintain the data dictionary and define the
structure of the files in a database
• Must create data dictionary for a database
before entering any information
3-27
Data Manipulation Subsystem
• Data manipulation subsystem – helps you
add, change, and delete information
• Primary interface between you and a database
–
–
–
–
Views
Report generators
QBE tools
SQL
3-28
Views
• View – allows you to see the contents of a
database file
• Similar to a spreadsheet view
– Make changes
– Sort
– Query
3-29
Views
3-30
Report Generators
• Report generator – helps you quickly define
formats of reports and what information you
want to see in a report
• Save report formats to use later
• Uses a wizard interface
3-31
Report Generators
Specify the fields you want
in a report
Specify the layout of
the report
3-32
Report Generators
3-33
QBE Tools
• Query-by-example (QBE) tool – helps you
graphically design the answer to a question
• “What driver most often delivers concrete to
Triple A Homes?”
3-34
QBE Tools
3-35
SQL
• Structured query language (SQL) –
standardized fourth-generation language found
in most DBMSs
• Performs same task as QBE
• Uses sentence structure instead
• Mostly used by IT people
3-36
Application Generation Subsystem
• Application generation subsystem – contains
facilities to help you develop transactionintensive applications
– Data entry screens (called forms in Access)
– Programming languages
• Mostly used by IT people
3-37
Data Administration Subsystem
• Data administration subsystem – helps you
manage the overall database environment
–
–
–
–
–
Backup and recovery
Security management
Query optimization
Concurrency control
Change management
3-38
Data Administration Subsystem
• Backup and recovery
– Periodically back up information
– Recover a database after a failure
• Security management
– Who has access to what information
– Who can perform CRUD tasks on information
3-39
Data Administration Subsystem
• Query optimization
– Restructure physical view to optimize response
times to queries
• Concurrency control
– What happens if two people simultaneously try to
change the same information?
3-40
Data Administration Subsystem
• Change management
– What is the effect of structural changes to a
database?
– What if you add a new column?
– What happens if you delete a column?
– What happens if you change a column’s attributes?
3-41
DATA WAREHOUSES & DATA
MINING
• Data warehouses support OLAP and decision
making
• Data warehouses do not support OLTP
• Data-mining tools are tools for working with
data warehouse information
– DBMS software = database
– Data-mining tools = data warehouse
3-42
What Is a Data Warehouse?
• Data warehouse – logical collection of
information – gathered from operational
databases – used to create business intelligence
that supports business analysis activities and
decision-making tasks
3-43
What Is a Data Warehouse?
3-44
What Is a Data Warehouse?
•
•
•
•
•
Multidimensional
Rows and columns
Also layers
Many times called hypercubes
What are the dimensions in Figure 3.8 on page
97?
3-45
What Are Data-Mining Tools?
• Data-mining tools – software tools that you
use to query information in a data warehouse
–
–
–
–
Query-and-reporting tools
Intelligent agents
Multidimensional analysis tools
Statistical tools
3-46
What Are Data-Mining Tools?
3-47
Query-and-Reporting Tools
• Query-and-reporting tools – similar to QBE
tools, SQL, and report generators in the typical
database environment
– Also similar to pivot tables in Excel
3-48
Intelligent Agents
• Use various AI tools such as neural networks
and fuzzy logic to form the basis for
“information discovery” and building BI
• Help you find hidden patterns in information
• Chapter 4 focuses on these
3-49
Multidimensional Analysis Tools
• Multidimensional analysis (MDA) tools –
slice-and-dice techniques that allow you to
view multidimensional information from
different perspectives
– Bring new layers to the front
– Reorganize rows and columns
3-50
Statistical Tools
• Help you apply various mathematical models
to the information stored in a data warehouse
to discover new information
– Regression
– Analysis of variance
– And so on
3-51
Data Marts
• Data warehouses are organizationwide
• Data marts have subsets of an
organizationwide data warehouse
• Data mart – subset of a data warehouse in
which only a focused portion of the data
warehouse information is kept
3-52
Data Marts
3-53
Data Mining as a Career Opportunity
• Knowledge of data mining can be a substantial
career opportunity for you
–
–
–
–
–
Business Objects
SAS
Cognos
Informatica
Many others
3-54
Considerations in Using a Data
Warehouse
• Do you need a data warehouse?
– DBMS may offer all you need
• Do all employees need the entire data
warehouse?
– Consider a data mart
• How up-to-date must information be?
– “Snapshot” concept
• What data-mining tools do you need?
– Training can be expensive
3-55
INFORMATION OWNERSHIP
• Strategic management support
• The sharing of information with responsibility
• Information cleanliness
3-56
Strategic Management Support
• Chief privacy officer (CPO) – ensuring that
information is used in an ethical way
• Chief security officer (CSO) – ensuring
security of information (e.g., firewalls)
• Chief information officer (CIO) – oversees
every aspect of an organization’s information
resource
3-57
Strategic Management Support
• Data administration – plans for, oversees the
development of, and monitors the information
resource
• Database administration – responsible for the
more technical aspects and operational aspects
of managing information
• Both often report to the CIO
3-58
The Sharing of Information with
Responsibility
• If you create it, you “own” it
• You will also share it with others
• Because you “own” it, you are responsible for
its quality
3-59
Information Cleanliness
• Database and data warehouse information
must be “clean”
– No errors
– No duplicates
3-60
Information Cleanliness
• Extraction, transformation, and loading
(ETL) – what information you want from each
database, how the information is associated,
and what rules to follow in consolidating the
information to ensure its cleanliness in a data
warehouse
3-61
CAN YOU…
1. Describe business intelligence and its role in
an organization.
2. Differentiate between databases and data
warehouses with respect to their focus on
OLTP and OLAP.
3. List and describe the key characteristics of a
relational database.
3-62
CAN YOU…
4. Define the five software components of a
database management system.
5. List and describe the key characteristics of a
data warehouse.
6. Define the four major types of data-mining
tools in a data warehouse environment.
7. List key considerations in information
ownership in an organization.
3-63