Transcript Chapter 5

Chapter 9
Business Intelligence and
Information Systems for
Decision Making
Study Questions
Q1: How big is an exabyte, and why does it matter?
Q2: How do business intelligence (BI) systems provide
competitive advantages?
Q3: What problems do operational data pose for BI
systems?
Q4: What are the purpose and components of a data
warehouse?
Q5: What is a data mart, and how does it differ from a data
warehouse?
Q6: What are the characteristics of data-mining systems?
Q7: What are OLAP reports?
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
9-2
Q1: How Big Is an Exabyte?
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
9-3
Q1: Why Does an Exabyte Matter?
• Businesses collect massive amounts of data
– Storage capacity is increasing as cost is decreasing
– Storage capacity is becoming almost unlimited, so
businesses collect more at little extra cost
• Buried in that data are important patterns of
relationships that can yield valuable
information to help businesses make better
decisions
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
9-4
Q2: How Do Business Intelligence (BI) Systems
Provide Competitive Advantages?
• Primary BI systems:
1. Reporting systems
• Integrate data from multiple systems
• Sorting, grouping, summing, averaging, comparing data
2. Data-mining systems
• Use sophisticated statistical techniques, regression
analysis, and decision tree analysis
• Used to discover hidden patterns and relationships
• Market-basket analysis –purchasing patterns
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
9-5
Q2: How Do Business Intelligence (BI) Systems
Provide Competitive Advantages? (cont’d)
3. Knowledge management systems (KMs)
• Create value by collecting and sharing human knowledge
about products, products uses, best practices, other
critical knowledge
• Used by employees, managers, customers, suppliers,
others who need access to company knowledge
4. Expert systems (ES)
• Encapsulates knowledge in form of “If/Then” rules
– If Patient_Temp > 103, Then start High_Fever_Procedure
• ES can improve diagnostic and decision quality of nonexperts
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
9-6
Q3: What Problems Do Operational
Data Pose for BI Systems?
• Dirty data.
–
–
–
–
Values may be missing
Inconsistent data
Non-integrated data
Wrong granularity (Coarse vs. Fine)
• Too much data causes:
1.
Curse of dimensionality
•
2.
Problem caused by the exponential increase in
volume associated with adding extra dimensions to a
(mathematical) space.
Too many rows or data points
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
9-7
Q4: What are the Purpose and
Components of a Data Warehouse?
• Purpose:
– To extract and clean data from various
operational systems and other sources
– To store and catalog data for BI processing
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
9-8
Q4: What are the Purpose and
Components of a Data Warehouse?(cont.)
Data warehouse architecture consists of the following
layers:
• Operational database layer - The source data for the data
warehouse - An organization's Enterprise Resource Planning
systems fall into this layer.
• Data access layer - The interface between the operational and
informational access layer - Tools to extract, transform, load
(ETL) data into the warehouse fall into this layer.
• Metadata layer - The data directory - This is usually more
detailed than an operational system data directory.
• Informational access layer - The reporting and analyzing tools.
Business intelligence tools fall into this layer.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
9-9
Components of a Data Warehouse
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
9-10
Q5: What Is a Data Mart, and How Does It
Differ from a Data Warehouse?
• Created to address particular needs
– Business function
– Problem
– Opportunity
• Smaller than data warehouse
• Data extracted from data warehouse for a functional
area
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
9-11
Components of a Data Mart
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
9-12
Q6: What Are the Characteristics of
Data-Mining Systems?
• Data mining—application of statistical
techniques to find patterns and relationships
in body of data for purpose of classifying and
predicting
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
9-13
Unsupervised Data Mining
• Analysts do not create model before running
analysis
• Apply data-mining technique and observe
results
• Hypotheses created after analysis as
explanation for results
• Common statistical technique used:
– Cluster analysis to identify groups with similar
characteristics
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
9-14
Supervised Data Mining
• Model developed before analysis
• Statistical techniques used to estimate
parameters
• Examples:
– Regression analysis—measures impact of
set of variables on one another
– Used for making predictions
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
9-15
Supervised Data Mining (cont’d)
Neural networks
• Used for predicting values and making
classifications
• Complicated set of nonlinear equations
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
9-16
Q7: What are OLAP reports?
• OnLine Analytical Processing
– is an approach to quickly answer multidimensional analytical queries
– Dynamic online view based on
• Measures
– Data item to be manipulated – total sales, average cost
• Dimensions
– Characteristic of measure – purchase date, customer type,
location, sales region
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
CE17-17
Q7: What are OLAP reports? (cont.)
• OLAP cube
– Presentation of measure with associated
dimensions (a.k.a. OLAP report)
• Users can alter format
• Users can drill down into data
– Divide data into more detail
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
CE17-18
Role of OLAP Server and OLAP
Database
• Figure CE17-12
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
CE17-19
Active Review
Q1: How big is an exabyte, and why does it matter?
Q2: How do business intelligence (BI) systems provide
competitive advantages?
Q3: What problems do operational data pose for BI
systems?
Q4: What are the purpose and components of a data
warehouse?
Q5: What is a data mart, and how does it differ from a data
warehouse?
Q6: What are the characteristics of data-mining systems?
Q7: What are OLAP reports?
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
9-20
Chapter
Extension 16
Database Marketing
Study Questions
Q1: What is a database marketing opportunity?
Q2: How does RFM analysis classify
customers?
Q3: How does market-basket analysis identify
cross-selling opportunities?
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
CE16-22
Q1: What Is a Database Marketing
Opportunity?
• Database marketing
– Application of business intelligence systems for
planning and executing marketing programs
– Databases are a key component
– Data-mining techniques also important
• Process of sorting through large amounts of data and
picking out relevant information
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
CE16-23
Q2: How Does RFM Analysis
Classify Customers?
RFM
–
–
–
–
RFM program analyzes and ranks customers according to their
purchase patterns
How recently (R) a customer has ordered?
How frequently (F) a customer has ordered?
How much money (M) a customer has spent per order?
Divides customers into five groups and assigns a score
• R score 1 = top 20% in most recent orders
• R score 5 = bottom 20% (longest since last order)
• F score 1 = top 20% in most frequent orders
• F score 5 = bottom 20% least frequent orders
• M score 1 = top 20% in most money spent
• M score 5 = bottom 20% in amount of money spent
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
CE16-24
Example of RFM Score Data
• Figure CE16-1
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
CE16-25
Q3: How Does Market-Basket Analysis
Identify Cross-Selling Opportunities?
• Market-basket analysis is a data-mining technique for
determining sales patterns
– Uses statistical methods to identify sales patterns in large
volumes of data
– Shows which products customers tend to buy together
– Helps identify cross-selling opportunities
• "Customers who bought book X also bought book Y”
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
CE16-26
Market-Basket Example
• Figure CE16-2
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
CE16-27
Market-Basket Terminology
• Support
Probability that two items will be bought
together
– Fins and masks purchases together 150 times,
thus support for fins and a mask is 150/1,000, or
15%
– Support for fins and weights is 60/1,000, or 6%
– Support for fins along with a second pair of fins is
10/1,000, or 1%
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
CE16-28
Market-Basket Terminology (cont’d)
• Confidence
What proportion of the customers who bought a
mask also bought fins?
– Conditional probability estimate
– Example:
» Probability of buying fins = 28% (280/1000)
» Probability of buying swim mask = 27% (270/1000)
– After buying Mask,
» Probability of buying Fins = 150/270 or 55.56%
 Likelihood that a customer will also buy fins almost doubles,
from 28% to 55.56%. Thus, all sales personnel should try to sell
fins to anyone buying a mask
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
CE16-29
Market-Basket Terminology (cont’d)
• Lift
 Ratio of confidence to base probability of buying
item
– Shows how much base probability increases or decreases
when other products are purchased
• Example:
– Lift of fins and a mask is confidence of fins given a mask,
divided by the base probability of fins.
– Lift of fins and a mask is .5556/.28 = 1.98
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
CE16-30
Decision Tree for Loan Evaluation
• Common business application
–
–
–
–
–
Classify loan applications by likelihood of default
Rules identify loans for bank approval
Identify market segment
Structure marketing campaign
Predict problems
• Too bad the Banks didn’t use decision trees!!
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
CE16-31
Active Review
Q1: What is a database marketing opportunity?
Q2: How does RFM analysis classify
customers?
Q3: How does market-basket analysis identify
cross-selling opportunities?
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
CE16-32
Chapter 9
Ch Ext.16
Business Intelligence and
Information Systems for
Decision Making