Business Intelligence/ Decision Models

Download Report

Transcript Business Intelligence/ Decision Models

Business Intelligence/
Decision Models
Week 3
Data Preparation
and Transformation
Last Week
OLTP, data warehouse repository
and data mart structures (flat and
relational files)
 Data integrity and normalization
 DB interrogation (SQL) for:

OLAP and Reporting
 Migration into data mining suites

Time/
Cost
Cumulated Productivity
Learning by association
or problem solving
This Week

CRISP

(Cross Industry Standard Procedure for Data Mining)

Data preparation (import, aggregate and merge)

Data transformation (for analytics)
CRISP-DM Phases
Source SPSS Inc. 2008
Case Study

A large telecom (XYZ PHONE) has discovered that it is losing
customers at a much higher rate than in previous years.

Reporting through the corporate dashboard (OLAP)has shown
churn rates growing by a large margin last year.
Source SPSS Inc. 2008
Define Business Objectives

Strategic objective definition


Related business goal identification



Decrease customer churn by 1%
Cost-benefit analysis


Retain high value customers
Identify process problems that need to be changed
Clear success factor (metric)


Increase revenues by retaining more customers
Increase revenues by $750,000
Actionable BI objectives

XYZ wants to retain more customers by identifying likely
churners 2 months prior and putting an action in place to
retain them
Source SPSS Inc. 2008
Timeline Example

XYZ’s project: 13 weeks

8 weeks a) business understanding and b) data preparation
• Involved line of business manager and data expert
• Included better defining high-value and churner definition

2 weeks data understanding
• Heavy reliance on data expert and database administrator

2 weeks modeling and evaluation
• Models developed by data miner and results evaluated by line of business
manager

1 week deployment ?
• Heavy involvement of database administrator

Model deployment entailed setting up a data model for monthly
scoring of customer base with resulting reports feeding a mail
offer
Source SPSS Inc. 2008
Time Allocation

Generally accepted industry timeline standards




50 to 70 percent data preparation
20 to 30 percent data understanding
10 to 20 percent modeling, evaluation, and business
understanding
5 to 10 percent deployment
Source PSS Inc. 2008
Data Import and
Transformation
Lab Objectives

Extract data from
Customer file
 Transactional file
 Transform data into information


Data preparation
Aggregate data from transactional file
 Merge aggregate data & customer file

Data Import Step by Step

Import files from Access or Excel

Customer and Transaction files

Document variables labels and value labels using
the data dictionary

Aggregate the transaction file by cust_id with
summary data and key variables

Merge Customer and aggregated transaction file
using cust_id as a common key
Aggregating Transaction
File
Order
_id
Date
Cust_
id
Prod_
num
Amt
Cust_
id
Freq
Date1
Date2
Amt_
sum
4433
10/21
1011
231
120
1011
4
10/21
11/13
696
4434
10/30
2234
143
240
2234
2
10/30
11/07
840
4435
11/05
2876
432
175
2876
3
11/05
11/13
796
4436
11/05
3454
143
240
3454
2
11/05
11/12
380
4437
11/07
2234
223
600
4438
11/08
1011
254
211
4439
11/08
2876
534
300
4440
11/08
1011
143
240
4441
11/12
3454
322
150
4442
11/13
2876
512
321
4443
11/13
1011
412
125
Lab Objectives (Cont)

Data transformation
Compute
 Compute
 Compute
 Compute
 Compute


customers’ length on file
recency of last purchase
frequency of purchases
amount spent
customer status
Purpose
CLV (Week4)
 RFM (Week5)

Data Transformation Step
by Step





Revisit measurement variables (nominal, ord,
scale)
Define date formats
Auto recode nominal string variables
Define missing values
Calculate length on file or tenure


Calculate time since last purchase


(Date last purchase – Date first purchase) tenure
(Date of current file – Date last purchase)
Define customer status (active or lapsed)
Merging Customer and
Transaction Summary Files
Cust_
id
Name
Address
Type
CC
Freq
Date1
Date2
Amt_
sum
1011
Jean
NY
1
Visa
4
10/21
11/13
696
2234
John
OH
1
MC
2
10/30
11/07
840
2876
Janet
CA
2
Visa
3
11/05
11/13
796
3454
Jane
NY
3
Amex
2
11/05
11/12
380
Data Transformation
Cust
_
ids
Name
Address
Type
CC
Freq
Dte1
Dte2
Amt
Days
Recency
1011
Jean
1/NY
1/Res
1/Visa
4
10/21
11/13
696
23
17
2234
John
2/OH
1/Res
2/MC
2
10/30
11/07
840
8
23
2876
Janet
3/CA
2/Bus
1/Visa
3
11/05
11/13
796
8
17
3454
Jane
1/NY
3/DNK
3/Amx
2
11/05
11/12
380
7
18
Purpose of this exercise?

Prepare data for next two weeks:
Lifetime Customer Value
 RFM Analysis
 …
