Chapter 15 - Personal.kent.edu

Download Report

Transcript Chapter 15 - Personal.kent.edu

Database Processing
Business
Intelligence &
Data
Warehousing
Chapter 15
David M. Kroenke
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
15-1
Business Intelligence (BI) Systems
• Business Intelligence (BI) systems are
information systems that assist managers
and other professionals:
– To analyze current and past activities, and
– To predict future events.
• Two broad categories:
– Reporting
– Data mining
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
15-2
The Relationship of
Operational and BI Applications
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
15-3
Reporting Applications
• Reporting system applications:
– Filter, Sort, Group, Simple Calculations using
SQL
– Classify entities (customers, products,
employees, etc.)
• RFM Analysis
– Deal with critical report delivery
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
15-4
Data Mining Applications
• Data mining applications are used to:
– Perform what-if analysis
– Make predictions
– Facilitate decision making
• Data mining applications use sophisticated
statistical and mathematical techniques.
• Report delivery is not as critical.
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
15-5
Need for Data Warehousing
• Integrated, company-wide view of high-quality
information (from disparate databases)
• Separation of operational and informational systems
and data (for improved performance)
Comparison of Operational and Informational Systems
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
15-6
Definitions
• Data Warehouse:
– A subject-oriented, integrated, time-variant, non-updatable
collection of data used in support of management decisionmaking processes
– Subject-oriented: e.g. customers, patients, students,
products
– Integrated: Consistent naming conventions, formats,
encoding structures; from multiple data sources
– Time-variant: Can study trends and changes
– Non-updatable: Read-only, periodically refreshed
• Data Mart:
– A data warehouse that is limited in scope
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
15-7
Data Warehouse vs. Data Mart
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
15-8
Components of a Data Warehouse
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
15-9
Data Warehouse and Data Marts
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
15-10
Data Warehouses and Data Marts:
Problems of Using Transaction Data for BI
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
15-11
The ETL Process
• Extract/Capture
– Static vs. Incremental
• Transform
– Scrub or data cleansing
– Data selection, joining, aggregation
• Load and Index
– Refresh vs. Update
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
15-12
Data Warehouse ETL Sequence
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
15-13
Components of a Star Schema
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
15-14
Star Schema Example
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
15-15
Issues Regarding Star Schema
• Dimension table keys should be surrogate:
– Keys may change over time
– Length/format consistency
• Granularity of Fact Table – what level of detail?
– Transactional grain – finest level
– Aggregated grain – more summarized
– Finer grain: better market basket analysis capability, but much more
data (more dimension tables, more rows in fact table)
• Duration of the database – how much history should be kept?
– Natural duration – 13 months or 5 quarters
– Financial institutions may need longer duration
– Older data is more difficult to source and cleanse
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
15-16
Data Warehousing at Wal-Mart
• As of 2000 (Foote and Krishnamurthi, 2001)
– Held 7.5 TB, with plans to reach 24 TB (1TB = 250M
pages of text)
– Kept 65 weeks of data
– Had invested $4 Billion
– Power users generated $12,000/query
• As of 2005 (Wall Street Journal, December 3-4, 2005)
– Held 570 TB (more than Internet’s fixed pages)
– Predicted Hurricane Ivan would spur demand for easy
breakfasts
• Stocked Florida stores with Pop-Tarts
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
15-17
Reporting Systems:
RFM Analysis
• RFM Analysis analyzes and ranks customers
according to purchasing patterns:
– R = Recent (most recent order)
– F = Frequent (how often an order is made)
– M = Money (dollar amount of orders)
• Customers are sorted into five groups, each
containing 20% of the customers.
• Each group is given a numerical value:
– 1 = Top 20%
– 2, 3, 4 = Each 20% in between top and bottom 20%
– 5 = Bottom 20%
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
15-18
Reporting Systems:
RFM Analysis (Continued)
Ajax ordered recently
(1), orders often (1) but
does not order the most
expensive items (3) –
Try to sell Ajax more
expensive goods!
Bloominghams has not ordered
recently (5), but has ordered often
(1) and purchased the most
expensive items (1).
This customer may be looking for
a different vendor – better call!
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
15-19
Reporting Systems:
OnLine Analytical Processing [OLAP]
• An OLAP report has measures and dimensions:
– Measure — A data item of interest.
– Dimension — A characteristic of a measure.
• OLAP cube — A presentation of a measure with
associated dimensions.
– An OLAP cube can have any number of axes.
– The terms OLAP cube and OLAP report are
synonymous.
• OLAP allows drill-down — a further division of
the data into more detail.
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
15-20
Reporting Systems:
OLAP Drill Down:
Product Family by Store Type
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
15-21
Reporting Systems:
OLAP Drill Down:
Product Family and Store Location by Store Type
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
15-22
Reporting Systems:
OLAP Drill Down:
Store Location and Product Family by Store Type
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
15-23
Data Mining Applications
• Data mining applications use sophisticated
statistical and mathematical techniques to find
patterns and relationships that can be used to
classify and predict.
– Unsupervised data mining — Statistical techniques
are used to identify groups of entities with similar
characteristics.
• Cluster Analysis
– Supervised data mining:
• A model is developed.
• Statistical techniques are used to estimate parameter values
of the model.
– Regression analysis
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
15-24
Data Mining Applications:
The Convergence of the Disciplines
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
15-25
Data Mining Applications:
Three Popular Data Mining Techniques
• Decision tree analysis — Classifies
entities into groups based on past history.
• Logistic regression — Produces
equations that offer probabilities that
certain events will occur.
• Neural Networks — Complex statistical
prediction techniques
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
15-26
Data Mining Applications:
Market Basket Analysis
• Market Basket Analysis — Determines
patterns of associated buying behavior.
– Support — The probability that two items will be
purchased together.
– Confidence — The probability that an item will be
purchased given the fact that the customer has
already purchased another particular item.
– Lift — the ratio of confidence to the basic probability
that a particular item will be purchased.
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
15-27
Data Mining Applications:
Market Basket Analysis Example
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
15-28