Transcript Slide 1

CHAPTER 6
DATABASES
AND DATA
WAREHOUSES
Opening Case
Searching for
Revenue - Google
McGraw-Hill/Irwin
©2008 The McGraw-Hill Companies, All Rights Reserved
6-2
Chapter Six Overview
• SECTION 6.1 – DATABASE FUNDAMENTALS
–
–
–
–
–
–
Understanding Information
Database Fundamentals
Database Advantages
Relational Database Fundamentals
Database Management Systems
Integrating Data Among Multiple Databases
• SECTION 6.2 – DATA WARAEHOUSE FUNDAMENTALS
–
–
–
–
–
Accessing Organizational Information
History of Data Warehousing
Data Warehouse Fundamentals
Business Intelligence
Data Mining
SECTION 6.1
DATABASE
FUNDAMENTALS
McGraw-Hill/Irwin
©2008 The McGraw-Hill Companies, All Rights Reserved
6-4
LEARNING OUTCOMES
1. List, describe, and provide an example of
each of the five characteristics of high quality
information
2. Define the relationship between a database
and a database management system
3. Describe the advantages an organization can
gain by using a database.
6-5
LEARNING OUTCOMES
4. Define the fundamental concepts of the
relational database model
5. Describe the role and purpose of a database
management system and list the four
components of a database management
system
6. Describe the two primary methods for
integrating information across multiple
databases
6-6
UNDERSTANDING INFORMATION
• Information is everywhere in an organization
• Employees must be able to obtain and analyze
the many different levels, formats, and
granularities of organizational information to
make decisions
• Successfully collecting, compiling, sorting, and
analyzing information can provide tremendous
insight into how an organization is performing
6-7
UNDERSTANDING INFORMATION
• Information granularity – refers to the
extent of detail within the information (fine
and detailed or coarse and abstract)
– Levels
– Formats
– Granularities
6-8
Information Quality
• Business decisions are only as good as the
quality of the information used to make the
decisions
• Characteristics of high quality information include:
– Accuracy
– Completeness
– Consistency
– Uniqueness
– Timeliness
6-9
Information Quality
• Low quality information example
6-10
Understanding the Costs of
Poor Information
•
The four primary sources of low quality
information include:
1. Online customers intentionally enter inaccurate
information to protect their privacy
2. Information from different systems have
different entry standards and formats
3. Call center operators enter abbreviated or
erroneous information by accident or to save
time
4. Third party and external information contains
inconsistencies, inaccuracies, and errors
6-11
Understanding the Costs of
Poor Information
• Potential business effects resulting from
low quality information include:
– Inability to accurately track customers
– Difficulty identifying valuable customers
– Inability to identify selling opportunities
– Marketing to nonexistent customers
– Difficulty tracking revenue due to inaccurate
invoices
– Inability to build strong customer relationships
6-12
Understanding the Benefits of
Good Information
• High quality information can significantly
improve the chances of making a good
decision
• Good decisions can directly impact an
organization's bottom line
6-13
DATABASE FUNDAMENTALS
• Information is everywhere in an
organization
• Information is stored in databases
– Database – maintains information about
various types of objects (inventory), events
(transactions), people (employees), and
places (warehouses)
6-14
DATABASE FUNDAMENTALS
• Database models include:
– Hierarchical database model – information is
organized into a tree-like structure (using
parent/child relationships) in such a way that it
cannot have too many relationships
– Network database model – a flexible way of
representing objects and their relationships
– Relational database model – stores information
in the form of logically related two-dimensional
tables
6-15
DATABASE ADVANTAGES
• Database advantages from a business
perspective include
– Increased flexibility
– Increased scalability and performance
– Reduced information redundancy
– Increased information integrity (quality)
– Increased information security
6-16
Increased Flexibility
• A well-designed database should:
– Handle changes quickly and easily
– Provide users with different views
– Have only one physical view
• Physical view – deals with the physical storage of
information on a storage device
– Have multiple logical views
• Logical view – focuses on how users logically
access information
6-17
Increased Scalability and
Performance
• A database must scale to meet increased
demand, while maintaining acceptable
performance levels
– Scalability – refers to how well a system can
adapt to increased demands
– Performance – measures how quickly a
system performs a certain process or
transaction
6-18
Reduced Redundancy
• Databases reduce information
redundancy
– Redundancy – the duplication of information
or storing the same information in multiple
places
• Inconsistency is one of the primary
problems with redundant information
6-19
Increased Integrity (Quality)
• Information integrity – measures the quality of
information
• Integrity constraint – rules that help ensure the
quality of information
– Relational integrity constraint – rule that enforces
basic and fundamental information-based constraints
– Business-critical integrity constraint – rule that
enforce business rules vital to an organization’s success
and often require more insight and knowledge than
relational integrity constraints
6-20
Increased Security
• Information is an organizational asset and must
be protected
• Databases offer several security features
including:
– Password – provides authentication of the user
– Access level – determines who has access to the
different types of information
– Access control – determines types of user access,
such as read-only access
6-21
RELATIONAL DATABASE
FUNDAMENTALS
• Entity – a person, place, thing, transaction, or
event about which information is stored
– The rows in each table contain the entities
– In Figure 6.5 CUSTOMER includes Dave’s Sub Shop
and Pizza Palace entities
• Entity class (table) – a collection of similar
entities
– In Figure 6.5 CUSTOMER, ORDER, ORDER LINE,
DISTRIBUTOR, and PRODUCT entity classes
6-22
RELATIONAL DATABASE
FUNDAMENTALS
• Attributes (fields, columns) – characteristics
or properties of an entity class
– The columns in each table contain the attributes
– In Figure 6.5 attributes for CUSTOMER include:
•
•
•
•
Customer ID
Customer Name
Contact Name
Phone
6-23
RELATIONAL DATABASE
FUNDAMENTALS
• Primary keys and foreign keys identify the
various entity classes (tables) in the
database
– Primary key – a field (or group of fields) that
uniquely identifies a given entity in a table
– Foreign key – a primary key of one table that
appears an attribute in another table and acts
to provide a logical relationship among the
two tables
6-24
Potential relational
database for CocaCola
6-25
DATABASE MANAGEMENT SYSTEMS
• Database management systems (DBMS) –
software through which users and application
programs interact with a database
6-26
DATABASE MANAGEMENT SYSTEMS
• Four components of a DBMS
6-27
Data Definition Component
• Data definition component – creates and
maintains the data dictionary and the structure
of the database
• The data definition component includes the data
dictionary
– Data dictionary – a file that stores definitions of
information types, identifies the primary and foreign
keys, and maintains the relationships among the
tables
6-28
Data Definition Component
• Data dictionary essentially defines the logical properties of
the information that the database contains
6-29
Data Manipulation Component
• Data manipulation component – allows users to
create, read, update, and delete information in a
database
• A DBMS contains several data manipulation tools:
– View – allows users to see, change, sort, and query the
database content
– Report generator – users can define report formats
– Query-by-example (QBE) – users can graphically
design the answers to specific questions
– Structured query language (SQL) – query language
6-30
Data Manipulation Component
• Sample report using Microsoft Access Report Generator
6-31
Data Manipulation Component
• Sample report using Access Query-By-Example (QBE) tool
6-32
Data Manipulation Component
• Results from the query in Figure 6.10
6-33
Data Manipulation Component
• SQL version of the QBE Query in Figure 6.10
6-34
Application Generation and Data
Administration Components
• Application generation component – includes
tools for creating visually appealing and easy-touse applications
• Data administration component – provides
tools for managing the overall database
environment by providing faculties for backup,
recovery, security, and performance
• IT specialists primarily use these components
6-35
INTEGRATING DATA AMONG MULTIPLE
DATABASES
• Integration – allows separate systems to
communicate directly with each other
– Forward integration – takes information
entered into a given system and sends it
automatically to all downstream systems and
processes
– Backward integration – takes information
entered into a given system and sends it
automatically to all upstream systems and
processes
6-36
INTEGRATING DATA
AMONG MULTIPLE DATABASES
• Forward and backward integration
6-37
INTEGRATING DATA
AMONG MULTIPLE DATABASES
• Building a central repository specifically
for integrated information
6-38
OPENING CASE QUESTIONS
Google
1. How did the Web site RateMyProfessors.com
solve its problem of low-quality information?
2. Review the five common characteristics of highquality information and rank them in order of
importance to Google’s business
3. What would be the ramifications to Google’s
business if the search information it presented to
its customers was of low quality?
6-39
OPENING CASE QUESTIONS
Google
4. Describe the different types of databases.
Why should Google use a relational
database?
5. Identify the different types of entity, entity
classes, attributes, keys, and relationships
that might be stored in Google’s AdWords
relational database
SECTION 6.2
DATA
WAREHOUSE
FUNDAMENTALS
McGraw-Hill/Irwin
©2008 The McGraw-Hill Companies, All Rights Reserved
6-41
LEARNING OUTCOMES
7. Describe the roles and purposes of data
warehouses and data marts in an
organization
8. Compare the multidimensional nature of
data warehouses (and data marts) with
the two-dimensional nature of databases
6-42
LEARNING OUTCOMES
9. Identify the importance of ensuring the
cleanliness of information throughout an
organization
10. Explain the relationship between
business intelligence and a data
warehouse
6-43
HISTORY OF DATA WAREHOUSING
• Data warehouses extend the transformation of
data into information
• In the 1990’s executives became less
concerned with the day-to-day business
operations and more concerned with overall
business functions
• The data warehouse provided the ability to
support decision making without disrupting the
day-to-day operations
6-44
DATA WAREHOUSE FUNDAMENTALS
• Data warehouse – a logical collection of
information – gathered from many different
operational databases – that supports business
analysis activities and decision-making tasks
• The primary purpose of a data warehouse is to
aggregate information throughout an
organization into a single repository for
decision-making purposes
6-45
DATA WAREHOUSE FUNDAMENTALS
• Extraction, transformation, and loading
(ETL) – a process that extracts information from
internal and external databases, transforms the
information using a common set of enterprise
definitions, and loads the information into a data
warehouse
• Data mart – contains a subset of data
warehouse information
6-46
DATA WAREHOUSE FUNDAMENTALS
6-47
Multidimensional Analysis
• Databases contain information in a series
of two-dimensional tables
• In a data warehouse and data mart,
information is multidimensional, it contains
layers of columns and rows
– Dimension – a particular attribute of
information
6-48
Multidimensional Analysis
• Cube – common term for the representation
of multidimensional information
6-49
Multidimensional Analysis
• Data mining – the process of analyzing data to
extract information not offered by the raw data
alone
• To perform data mining users need data-mining
tools
– Data-mining tool – uses a variety of techniques to
find patterns and relationships in large volumes of
information and infers rules that predict future
behavior and guide decision making
6-50
Information Cleansing or Scrubbing
• An organization must maintain highquality data in the data warehouse
• Information cleansing or scrubbing – a
process that weeds out and fixes or
discards inconsistent, incorrect, or
incomplete information
6-51
Information Cleansing or Scrubbing
• Contact information in an operational system
6-52
Information Cleansing or Scrubbing
• Standardizing Customer name from Operational Systems
6-53
Information Cleansing or Scrubbing
6-54
Information Cleansing or Scrubbing
• Accurate and complete information
6-55
BUSINESS INTELLIGENCE
• Business intelligence – information that
people use to support their decisionmaking efforts
• Principle BI enablers include:
– Technology
– People
– Culture
6-56
DATA MINING
• Data-mining software includes many forms of AI such
as neural networks and expert systems
6-57
DATA MINING
•
Common forms of data-mining analysis
capabilities include:
– Cluster analysis
– Association detection
– Statistical analysis
6-58
Cluster Analysis
•
Cluster analysis – a technique used to divide
an information set into mutually exclusive
groups such that the members of each group
are as close together as possible to one
another and the different groups are as far
apart as possible
•
CRM systems depend on cluster analysis to
segment customer information and identify
behavioral traits
6-59
Association Detection
•
Association detection – reveals the
degree to which variables are related
and the nature and frequency of these
relationships in the information
– Market basket analysis – analyzes such
items as Web sites and checkout scanner
information to detect customers’ buying
behavior and predict future behavior by
identifying affinities among customers’
choices of products and services
6-60
Statistical Analysis
•
Statistical analysis – performs such
functions as information correlations,
distributions, calculations, and variance
analysis
– Forecast – predictions made on the basis
of time-series information
– Time-series information – time-stamped
information collected at a particular
frequency
6-61
OPENING CASE QUESTIONS
Google
6. How could Google use a data warehouse to
improve its business operations?
7. Why would Google need to scrub and cleanse
the information in its data warehouse?
8. Identify a data mart that Google’s marketing
and sales department might use to track and
analyze its AdWords revenue
6-62
CLOSING CASE ONE
Fishing for Quality
1. Explain the importance of high-quality information
for the Alaska Department of Fish and Game
2. Review the five common characteristics of high
quality information and rank them in order of
importance for the Alaska Department of Fish
and Game
3. How could data warehouses and data marts be
used to help the Alaska Department of Fish and
Game improve the efficiency and effectiveness of
its operations?
6-63
CLOSING CASE ONE
Fishing for Quality
4. What two data marts might the Alaska
Department of Fish and Game want to build to
help it analyze its operational performance?
5. Do the managers at the Alaska Department of
Fish and Game actually have all of the
information they require to make an accurate
decision? Explain the statement “it is never
possible to have all of the information required
to make the best decision possible”
6-64
CLOSING CASE TWO
Mining the Data Warehouse
1. Explain how Ben & Jerry’s is using
business intelligence tools to remain
successful and competitive in a
saturated market
2. Identify why information cleansing and
scrubbing is critical to California Pizza
Kitchen’s business intelligence tool’s
success
6-65
CLOSING CASE TWO
Mining the Data Warehouse
3. Illustrate why 100 percent accurate and
complete information is impossible for
Noodles & Company to obtain
4. Describe how each of the companies above is
using BI from their data warehouse to gain a
competitive advantage
6-66
CLOSING CASE THREE
Harrah’s
1. Identify the effects poor information might have
on Harrah’s service-oriented business strategy
2. How does Harrah’s uses database
technologies to implement its service-oriented
strategy?
3. Harrah’s was one of the first casino companies
to find value in offering rewards to customers
who visit multiple Harrah’s locations. Describe
the effects on the company if it did not build
any integrations among the databases located
at each of its casinos
6-67
CLOSING CASE THREE
Harrah’s
4. Estimate the potential impact to Harrah’s
business if there is a security breach in
its customer information
5. Identify three different types of data
marts Harrah’s might want to build to
help it analyze its operational
performance
6-68
CLOSING CASE THREE
Harrah’s
6. What might occur if Harrah’s fails to clean or
scrub its information before loading it into its
data warehouse?
7.
Describe cluster analysis, association detection,
and statistical analysis and explain how Harrah’s
could use each one to gain insights into its
business