When you have a problem….

Download Report

Transcript When you have a problem….

National Tsing Hua University
Department of Industrial Engineering and
Engineering Management
Dynamic customer segment analysis and
behavior prediction using data mining
Group 1:
Margaret Dlamini
Saumen Bhaumick
Daniel Chen
Ricky Huang
July Panoso
Abstract






CRM is mainly to Understand customer well
By
Studying the difference between the
Customers through customer segmentation.
Track customers shift from segment to
segment
Discover customer segment knowledge
Predict Customer segments behavior pattern
CRM
We believe keeping and managing the
customer is most important:
• Attractive Personalized services to
satisfy Customer needs
• CRM- Closer and deeper relationships
with customers
Understanding Customers.
• Analyzing Customers Information.
• Differentiate Customers through
Segmentation
• Increase Customer loyalty through
Customized products
 Predict Customers Purchase behavior
Contact and Serve Customers
Through Channels
To understand customers its essential to
integrate the data collected thru.




Web browsing
Purchase behavior
Complaints
Demographics
THE DATA
The Customer segments and related
knowledge discovered from multiple
data sources change as Customer base
changes
 Thus valid for a particular period
 Most existing predictions methods
fundamentally are based on numerical
and historical data patterns (using simple
regression or neural network techniques)
FLUCTUATIONS
This can be quite fluctuating caused
due to
 Promotions
 New product launching
 Customer support policies
Customer Segment




This study tracks the customer shift
among customer segments
Monitor changes overtime
To discover customer segment
knowledge
Predict Customer’s segment
behavior pattern
Prediction on Customers
behavior
By studying the segment shift each
customer might shift
 Build a career path of each customer
 By aggregating each customers
career path, derive the Dominant
career paths (majority of customers
follow)
Process to Segment
Customers

Choose a basis of segmentation,
with appropriate variables
(demographic or behavioral)



Use a multivariate analysis to group
together or split customers.
Evaluate and validate the outputs.
Analyze the results in economic
terms
Segmentation Design schemes





Measure used for segmentation
Number of resulting segments
View about the change overtime
Segmentation techniques used
Number of the customers selected
Segmentation Measures
The segmentation variables consists of one or a
combination of the following
 Demographic
 Geographic
 Psychographic or Behavioral
The behavioral purchase pattern can be
RFM (Recency, Frequency and Monetary)
FRAT (Frequency, Recency, Amount & Type)
Number of Resulting
Segments



Minimize combined direct and
opportunity cost of the Segmentation as
critera for optimum number of
segments
Allow the derivation of equal sized
segments
Judgmental decisions are on the basis
of number of segments
View about change overtime


Through occasion based design that
assumes that people vary in their needs
across occasions of product purchase.
Other way is to consider time-segmented
customers through repeated
measurements of the same customer at
different point in times
Segmentation Techniques
Statistical Methods
 K -mean algorithm
 Discriminant Analysis
 Logistic Regression
Machine learning Techniques
 Neural Networks (Normally its considered that
neural network are more accurate compared to
statistical methods)
Number of Customers


The Customer segmentation can
incorporate all the customers or
can be limited to sample of them.
If the segmentation is based on
sample, its important to predict
how many customer falls in that
group (Via inferential statistics)
Profitability


Predict changes in the segment to
derive static characteristic of the
segment
Changes in the segment closely
relates to increase or decrease in
profitability obtained from the
segment
Research Overview



This study focus on behavioral variables
include customer’s product usage.
Recency, Frequency, Monetary (RFM) analysis.
Self-Organizing Map (SOM) : uses neural
clustering method to divided the retailer’s
customer into numerous groups.
Cont.



This paper collect data from July 2001 to
September 2002.
Segment customers five times during fifteen
months
One quarter is a time window to create new
segmentation.
Cont.




Individual career path: present a single
customer’s history of shifts.
Dominant career path: a descriptive pattern,
which explains common histories most customer
might follow.
One leading to a loyal segment and the other
leading to a vulnerable segment.
This study also provide a analytical method for
predicting time-variant segment movement a
customer might show.
SEGMENTING CUSTOMERS
We should be use a clustering analysis
of product usage or purchase.
Purchase transactions
have four features:






Customer number or
customer ID
Recency value
Frequency value
Monetary value
Data preparation for the
segmentation
We have 3 situations:



Newcomers (don’t have any purchase
before period t)
Old customers (but made purchase during
period t)
Old customers (but don’t make purchase
during period t)
How can we calculate RFM?
Newcomers (do not have any purchase
history before period t)
rt = measures how long they made
purchase
ft = measures how frequently they
make purchase
mt = measures how much money they
spend
Old customers (but made purchase during
period t)
Rt-1 - rt = Recency value for period t
Ft-1 - ft = Frequency value for period t
Mt-1 - mt = Monetary value for period t
Note. Rt-1, Ft-1 ,Mt-1, stand for cumulative to period t-1
Old customers (but don’t make purchase
during period t)
Rt-1 + 3 months = Recency value for period t
t
t
Ft-1 + 3 months = Frequency value for period
Mt-1 + 3 months = Monetary value for period
Self-organization of customers
The SOM does unsupervised clustering


Records within a group or cluster tend to
be similar to each other
Records in different groups are dissimilar
The SOM will end up with a few output
units:
-
Strong units
Weak units
The strong Units represent probable
cluster centers
Segmentation results
2 techniques to speed up the SOM:


It is to vary the size of the neighborhoods:
From large to small
The other is to have the winning neuron use
a larger learning rate than that of the
neighboring neurons
Summary of customer statistics
per quarter
Summary of customer segment
characteristics for the third quarter
of 2001
Loyal
Vulnerable
Newcomer
Result of the successive five-time
segmentation
Discovering individual career path
and dominant career path


Five-time segmentation makes it
possible to combine segment shift
histories into a career path.
Natural life cycle
Migration


External factors
Changes in segments
Over successive quarters there are
changes in the number of customers in
a segment indicating certain strategies
that management should review for the
CRM
Segment shifts of customers from
Q3 2001 to Q4 2001
To Q4 2001
From Q3 2001
R↓F↑M↑
R↓F↓M↓
R↑F↓M↓
R↓F↑M↑
24,577
2,267
3,952
30,796
R↓F↓M↓
5,472
16,181
14,563
36,216
R↑F↓M↓
2,778
9,387
17,788
29,953
R↑F↑M↑
461
148
282
891
33,288
27,983
36,585
97,856
Customers afters shifts
Customer Before Shifts
Dominant career paths of length
3, which
lead
to
segment
R↓F↑M↑
Path
No.of customers Probability (%)
R↓F↑M↑→
R↓F↑M↑→
R↓F↑M↑
20,495
42.0
R↓F↓M↓→
R↑F↓M↓→
R↓F↑M↑
5,658
11.6
R↑F↓M↓→
R↓F↑M↑→
R↓F↑M↑
3,386
6.9
R↑F↓M↓→
R↓F↑M↑→
R↓F↑M↑
2,999
6.1
R↓F↓M↑→
R↓F↑M↑→
R↓F↑M↑
2,457
5.0
Dominant Career Paths of length 5,
which lead to segment R↑F↓M↓
Path
R↑F↓M↓→
No.of
customers Probability
(%)
R↑F↓M↓→
R↑F↓M↓→
R↑F↓M↓
8,645
20.90
R↑F↓M↓→
R↓F↓M↓→
R↑F↓M↓→
R↑F↓M↓→
R↑F↓M↓→
R↑F↓M↓
5,460
13.20
R↓F↓M↓→
R↓F↓M↓→
R↑F↓M↓→
R↑F↓M↓→
R↑F↓M↓
2,010
4.86
R↑F↓M↓→
R↓F↓M↓→
R↑F↓M↓→
R↑F↓M↓→
R↑F↓M↓
1,924
4.65
R↓F↑M↑→
R↑F↓M↓→
R↑F↓M↓→
R↑F↓M↓
875
2.11
R↑F↓M↓
→
Predicting Career Paths


Prediction of customer’s segment shifts
can be classified as a classification task
from the data mining perspective.
This case study use a decision tree
induction technique and choose C5.0 to
predict the time-variant career paths.
Decision Tree Induction
Technique



The C5.0 algorithm has a special
method form improving its accuracy
rate called boosting.
Boosting working by building mutiple
models in a seqience.
The next tree is used to modify and
improve the previous one.
Data Preparation for the
Prediction


The case generate 6 models for
categorical predictions.
Choose the best model with the highest
accuracy.
Training six prediction models
Quarter/
Model
Q3 2001
Q4 2001
Q1 2002
PMa
Attribute
Attribute
Class
Attribute
Attribute
Class
Attribute
Attribute
Attribute
Attribute
Class
Attribute
Attribute
Attribute
Class
Attribute
Attribute
Attribute
Class
PMb
PMc
PMd
Attribute
PMe
PMf
Attribute
Q2 2002
Q3 2002
Class
Summary of the Prediction
Accuracy of C5.0 Models
Model
No. of
attributes
Pruning
severity
Prediction
Accuracy
(%)
PMa
2
75
59.74
PMb
2
80
61.68
PMc
2
70
71.27
PMd
3
94
62.28
PMe
3
78
71.38
PMf
4
75
71.13
Prediction Accuracy Statistics
for Best Model, PMe
Predicted Values at Q4 2002
Actual
Values
at Q4
2002
R↓F↑M↑
R↑F↓M↓
Total
R↓F↓M↓
2102
2009
4111
R↓F↓M↑
3907
3071
6978
R↓F↑M↑
36286
9165
45451
R↑F↓M↓
8183
33133
41316
Total
50478
47378
97856
Prediction Accuracy Statistics
for Best Model, PMe
Predicted Values at Q4 2002
Actual
Values
at Q4
2002
R↓F↑M↑
R↓F↓M↓
2102
R↓F↓M↑
R↑F↓M↓
Total
2009
4111
3907
3071
6978
R↓F↑M↑
36286
9165
45451
R↑F↓M↓
8183
33133
41316
Total
50478
47378
97856
Newcomer Segment
Prediction Accuracy Statistics
for Best Model, PMe
Total Predict Accuracy
(36286+
33133)
/ 97856
*100% = 71%
Predicted
Values
at Q4
2002
Actual
Values
R↓F↑M↑
R↑F↓M↓
Total
R↓F↑M↑ Predict Accuracy (36286) / 50478 *100% = 72%
R↓F↓M↓
2102
2009
4111
R↑F↓M↓ Predict Accuracy (33133) / 47378 *100% = 70%
at Q4
2002
R↓F↓M↑
3907
3071
6978
R↓F↑M↑
36286
9165
45451
R↑F↓M↓
8183
33133
41316
Total
50478
47378
97856
Performance evaluation of
PMe Model


Because the training set contains only a
few cases about the newcomer
segments(7.8%), the model PMe could
hardly learn the pattern about them.
Accuracy predictions for rare categories
will earn a higher performance
evaluation.
Conclusion


This paper have proposed segment-based
knowledge discovery method used for
derivation of the descriptive pattern: predict
the path customer will shift.
Try to resolve the fundamental problems :
changing characteristics of customer in
segment and change in its composition.
Cont.


Further research
Extend the prediction accuracy
 Using neural network
 Building a separate classifier for different
segments and combining result from
multiple classifier.