Transcript Chapter 13
13
Chapter 13
The Data Warehouse
Database Systems: Design, Implementation, and
Management, Fifth Edition, Rob and Coronel
13
In this chapter, you will learn:
• How operational data and decision support differ
• What a data warehouse is and how its data are
prepared
• What star schemas are and how they are
constructed
• What steps are required to implement a data
warehouse successfully
• What data mining is and what role it plays in
decision support
Database Systems: Design, Implementation, & Management, 5th Edition, Rob & Coronel
2
13
The Need for Data Analysis
• External and internal forces require tactical and
strategic decisions
• Search for competitive advantage
• Business environments are dynamic
• Decision-making cycle time is reduced
• Different managers require different decision
support systems (DSS)
Database Systems: Design, Implementation, & Management, 5th Edition, Rob & Coronel
3
13
Decision
Support Systems
• Decision Support
– Is a methodology
– Extracts information from data
– Uses information as basis for decision making
Database Systems: Design, Implementation, & Management, 5th Edition, Rob & Coronel
4
13
Decision
Support Systems
• Decision support system (DSS)
– Arrangement of computerized tools
– Used to assist managerial decision
– Extensive data “massaging” to produce
information
– Used at all levels in organization
– Tailored to focus on specific areas and needs
– Interactive
– Provides ad hoc query tools
Database Systems: Design, Implementation, & Management, 5th Edition, Rob & Coronel
5
13
DSS Components
Figure 13.1
Database Systems: Design, Implementation, & Management, 5th Edition, Rob & Coronel
6
13
Operational vs. Decision Support Data
• Operational data
– Relational, normalized database
– Optimized to support transactions
– Real time updates
• DSS
– Snapshot of operational data
– Summarized
– Large amounts of data
• Data analyst viewpoint
– Timespan
– Granularity
– Dimensionality
Database Systems: Design, Implementation, & Management, 5th Edition, Rob & Coronel
7
13
The DSS Database Requirements
• Database schema
– Support complex (non-normalized) data
– Extract multidimensional time slices
• Data extraction and filtering
• End-user analytical interface
• Database size
– Very large databases (VLDBs)
– Contains redundant and duplicated data
Database Systems: Design, Implementation, & Management, 5th Edition, Rob & Coronel
8
13
Data Warehouse
• Integrated
– Centralized
– Holds data retrieved from entire organization
• Subject-Oriented
– Optimized to give answers to diverse questions
– Used by all functional areas
• Time Variant
– Flow of data through time
– Projected data
• Non-Volatile
– Data never removed
– Always growing
Database Systems: Design, Implementation, & Management, 5th Edition, Rob & Coronel
9
13
Creating a Data Warehouse
Figure 13.3
Database Systems: Design, Implementation, & Management, 5th Edition, Rob & Coronel
10
13
Data Marts
• Single-subject data warehouse subset
• Decision support to small group
• Can be test for exploring potential benefits of
Data warehouses
• Address local or departmental problems
Database Systems: Design, Implementation, & Management, 5th Edition, Rob & Coronel
11
13
DSS Architectural Styles
•
•
•
•
Traditional mainframe-based OLTP
Managerial information system (MIS) with 3GL
First-generation departmental DSS
First-generation enterprise data warehouse using
RDMS
• Second-generation data warehouse using
MDBMS
Database Systems: Design, Implementation, & Management, 5th Edition, Rob & Coronel
12
13
Twelve Data Warehouse Rules
1. Separated from operational environment
2. Data are integrated
3. Contains historical data over long time horizon
4. Snapshot data captured at given time
5. Subject-oriented data
6. Mainly read-only data with periodic batch
updates from operational source, no online
updates
7. Development life cycle differs from classical
one, data driven not process driven
Database Systems: Design, Implementation, & Management, 5th Edition, Rob & Coronel
13
13
Twelve Data Warehouse Rules (Con’t.)
8. Contains different levels of data detail
–
–
Current and old detail
Lightly and highly summarized
9. Characterized by read-only transactions to large
data sets
10. Environment has system to trace data resources,
transformation, and storage
11. Metadata critical components
–
–
Identify and define data elements
Provide the source, transformation, integration, storage,
usage, relationships, and history of data elements
12. Contains charge-back mechanism for usage
–
Enforces optimal use of data
Database Systems: Design, Implementation, & Management, 5th Edition, Rob & Coronel
14
13
Online Analytical
Processing (OLAP)
• Advanced data analysis environment
• Supports decision making, business modeling,
and operations research activities
• Characteristics of OLAP
–
–
–
–
Use multidimensional data analysis techniques
Provide advanced database support
Provide easy-to-use end-user interfaces
Support client/server architecture
Database Systems: Design, Implementation, & Management, 5th Edition, Rob & Coronel
15
13
OLAP Client/Server Architecture
Figure 13.6
Database Systems: Design, Implementation, & Management, 5th Edition, Rob & Coronel
16
13
OLAP Server Arrangement
Figure 13.7
Database Systems: Design, Implementation, & Management, 5th Edition, Rob & Coronel
17
13
OLAP Server with Multidimensional
Data Store Arrangement
Figure 13.8
Database Systems: Design, Implementation, & Management, 5th Edition, Rob & Coronel
18
13
OLAP Server with
Local Mini-Data-Marts
Figure 13.9
Database Systems: Design, Implementation, & Management, 5th Edition, Rob & Coronel
19
13
Relational OLAP (ROLAP)
• OLAP functionality
• Uses relational DB query tools
• Extensions to RDBMS
– Multidimensional data schema support
– Data access language and query performance
optimized for multidimensional data
– Support for very large databases (VLDBs)
Database Systems: Design, Implementation, & Management, 5th Edition, Rob & Coronel
20
13
Typical ROLAP
Client/Server Architecture
Figure 13.10
Database Systems: Design, Implementation, & Management, 5th Edition, Rob & Coronel
21
13
Multidimensional OLAP (MOLAP)
• OLAP functionality to multidimensional
databases (MDBMS)
• Stored data in multidimensional data cube
• N-dimensional cubes called hypercubes
• Cube cache memory speeds processing
• Affected by how the database system
handles density of data cube called sparsity
Database Systems: Design, Implementation, & Management, 5th Edition, Rob & Coronel
22
13
MOLAP Client/Server Architecture
Figure 13.11
Database Systems: Design, Implementation, & Management, 5th Edition, Rob & Coronel
23
13
Star Schema
• Data-modeling technique
• Maps multidimensional decision support into
relational database
• Yield model for multidimensional data analysis
while preserving relational structure of
operational DB
• Four Components:
–
–
–
–
Facts
Dimensions
Attributes
Attribute hierarchies
Database Systems: Design, Implementation, & Management, 5th Edition, Rob & Coronel
24
13
Simple Star Schema
Figure 13.12
Database Systems: Design, Implementation, & Management, 5th Edition, Rob & Coronel
25
13
Slice and Dice View of Sales
Figure 13.14
Database Systems: Design, Implementation, & Management, 5th Edition, Rob & Coronel
26
13
Star Schema Representation
• Facts and dimensions represented by physical
tables in data warehouse DB
• Fact table related to each dimension table (M:1)
• Fact and dimension tables related by foreign keys
• Subject to the primary/foreign key constraints
Database Systems: Design, Implementation, & Management, 5th Edition, Rob & Coronel
27
13
Star Schema for Sales
Figure 13.17
Database Systems: Design, Implementation, & Management, 5th Edition, Rob & Coronel
28
13
Performance-Improving Techniques for
Star Schema
• Normalization of dimensional tables
• Multiple fact tables representing different
aggregation levels
• Denormalization of the fact tables
• Table partitioning and replication
Database Systems: Design, Implementation, & Management, 5th Edition, Rob & Coronel
29
13
Data Warehouse
Implementation Road Map
Figure 13.21
Database Systems: Design, Implementation, & Management, 5th Edition, Rob & Coronel
30
13
Data Mining
• Seeks to discover unknown data characteristics
• Automatically searches data for anomalies and
relationships
• Data mining tools
–
–
–
–
–
Analyze data
Uncover problems or opportunities
Form computer models based on findings
Predict business behavior with models
Require minimal end-user intervention
Database Systems: Design, Implementation, & Management, 5th Edition, Rob & Coronel
31
13
Extraction of Knowledge from Data
Figure 13.22
Database Systems: Design, Implementation, & Management, 5th Edition, Rob & Coronel
32
13
Data Mining Process
Figure 13.23
Database Systems: Design, Implementation, & Management, 5th Edition, Rob & Coronel
33