Business Intelligence/ Decision Models
Download
Report
Transcript Business Intelligence/ Decision Models
Business Intelligence/
Decision Models
Week 3
Data Preparation
and Transformation
Last Week
OLTP, data warehouse repository
and data mart structures (flat and
relational files)
Data integrity and normalization
DB interrogation (SQL) for:
OLAP and Reporting
Migration into data mining suites
Time/
Cost
Cumulated Productivity
Learning by association
or problem solving
This Week
CRISP
(Cross Industry Standard Procedure for Data Mining)
Data preparation (import, aggregate and merge)
Data transformation (for analytics)
CRISP-DM Phases
Source SPSS Inc. 2008
Case Study
A large telecom (XYZ PHONE) has discovered that it is losing
customers at a much higher rate than in previous years.
Reporting through the corporate dashboard (OLAP)has shown
churn rates growing by a large margin last year.
Source SPSS Inc. 2008
Define Business Objectives
Strategic objective definition
Related business goal identification
Decrease customer churn by 1%
Cost-benefit analysis
Retain high value customers
Identify process problems that need to be changed
Clear success factor (metric)
Increase revenues by retaining more customers
Increase revenues by $750,000
Actionable BI objectives
XYZ wants to retain more customers by identifying likely
churners 2 months prior and putting an action in place to
retain them
Source SPSS Inc. 2008
Timeline Example
XYZ’s project: 13 weeks
8 weeks a) business understanding and b) data preparation
• Involved line of business manager and data expert
• Included better defining high-value and churner definition
2 weeks data understanding
• Heavy reliance on data expert and database administrator
2 weeks modeling and evaluation
• Models developed by data miner and results evaluated by line of business
manager
1 week deployment ?
• Heavy involvement of database administrator
Model deployment entailed setting up a data model for monthly
scoring of customer base with resulting reports feeding a mail
offer
Source SPSS Inc. 2008
Time Allocation
Generally accepted industry timeline standards
50 to 70 percent data preparation
20 to 30 percent data understanding
10 to 20 percent modeling, evaluation, and business
understanding
5 to 10 percent deployment
Source PSS Inc. 2008
Data Import and
Transformation
Lab Objectives
Extract data from
Customer file
Transactional file
Transform data into information
Data preparation
Aggregate data from transactional file
Merge aggregate data & customer file
Data Import Step by Step
Import files from Access or Excel
Customer and Transaction files
Document variables labels and value labels using
the data dictionary
Aggregate the transaction file by cust_id with
summary data and key variables
Merge Customer and aggregated transaction file
using cust_id as a common key
Aggregating Transaction
File
Order
_id
Date
Cust_
id
Prod_
num
Amt
Cust_
id
Freq
Date1
Date2
Amt_
sum
4433
10/21
1011
231
120
1011
4
10/21
11/13
696
4434
10/30
2234
143
240
2234
2
10/30
11/07
840
4435
11/05
2876
432
175
2876
3
11/05
11/13
796
4436
11/05
3454
143
240
3454
2
11/05
11/12
380
4437
11/07
2234
223
600
4438
11/08
1011
254
211
4439
11/08
2876
534
300
4440
11/08
1011
143
240
4441
11/12
3454
322
150
4442
11/13
2876
512
321
4443
11/13
1011
412
125
Lab Objectives (Cont)
Data transformation
Compute
Compute
Compute
Compute
Compute
customers’ length on file
recency of last purchase
frequency of purchases
amount spent
customer status
Purpose
CLV (Week4)
RFM (Week5)
Data Transformation Step
by Step
Revisit measurement variables (nominal, ord,
scale)
Define date formats
Auto recode nominal string variables
Define missing values
Calculate length on file or tenure
Calculate time since last purchase
(Date last purchase – Date first purchase) tenure
(Date of current file – Date last purchase)
Define customer status (active or lapsed)
Merging Customer and
Transaction Summary Files
Cust_
id
Name
Address
Type
CC
Freq
Date1
Date2
Amt_
sum
1011
Jean
NY
1
Visa
4
10/21
11/13
696
2234
John
OH
1
MC
2
10/30
11/07
840
2876
Janet
CA
2
Visa
3
11/05
11/13
796
3454
Jane
NY
3
Amex
2
11/05
11/12
380
Data Transformation
Cust
_
ids
Name
Address
Type
CC
Freq
Dte1
Dte2
Amt
Days
Recency
1011
Jean
1/NY
1/Res
1/Visa
4
10/21
11/13
696
23
17
2234
John
2/OH
1/Res
2/MC
2
10/30
11/07
840
8
23
2876
Janet
3/CA
2/Bus
1/Visa
3
11/05
11/13
796
8
17
3454
Jane
1/NY
3/DNK
3/Amx
2
11/05
11/12
380
7
18
Purpose of this exercise?
Prepare data for next two weeks:
Lifetime Customer Value
RFM Analysis
…