Slides11(CRM)

Download Report

Transcript Slides11(CRM)

CRM Segmentation
Knowing Your Customers
How to Conduct a Fruitful CRM Segmentation
 Understand the business case being studied
 Know the data well
 Have the skills of segmentation
 Know the principles of data mining
 Know the software tool (SAS EM 5.2)
2
Techniques for CRM Segmentation
 Clustering – Previously learned
 K-means and SOM
 RFM Cell Based Segmentation – Chapter 4
 Use RFM to create cells of the dataset
 Use the primary target variable to explore the relations of each
RFM cell to the target using a Decision Tree model
 Using Decision Tree to Create Cluster Segments – Chapter 5
 Set RFM as the target to classify observations into the classes
as the clusters
3
Issues in CRM Segmentation
 Clustering of Multi-attributes – Chapter 6
 Updating Clustering Segments – Chapter 7
 Using Segments in Predictive Models – Chapter 8
 Handling Missing Data in Segmentation – Chapter 9
 Product affinity – Chapter 10
 Different clustering techniques – Chapter 11
 Segmentation of Textual Data – Chapter 12
4
Chapter 4




Cell-based segmentation
Cell groups - RFM
Example development of RFM cells
Tree segmentation using RFM
 Demonstrations/exercises:
 Data set: BUYTEST
 SAS code: RFM.SAS, RFM_FORMAT.SAS
 Develop a combined RFM input from several inputs
 Study the effects of RFM in the classification model
5
Chapter 5
 Processing a dataset over 100,000 observations
 Using decision tree to create cluster segments
 Demonstrations/exercises:
 Data set: CUSTOMERS, REVISED_CUSTOMER
 SAS code: in the next slide
 Key configurations






6
Set PURCHLST and PURCHFST to ordinal
Set PUBLIC_SECTOR to binary
Transform EST_SPEND, LOC_EMPLOYEE
Keep a few inputs (as shown in Fig 5.6)
Use Filter node
In the decision tree, NEW_RFM is set as the target variable
Code for Revised_Customer
data sampsio.revised_customer;
set crm.customers;
if rfm in ('A' 'B') then rfm_new = 'A'; else
if rfm in ('C' 'D') then rfm_new = 'B'; else
if rfm in ('E' 'F') then rfm_new = 'C'; else
if rfm in ('G' 'H') then rfm_new = 'D'; else
if rfm in ('J' 'K') then rfm_new = 'E';
run;
7
Chapter 6




Understanding customers of many attributes
Data assay and profiling
Understanding what the cluster segmentation found
Clustering on very large data sets
 Variable reduction
 Demonstrations/exercises
 Data set: NYTOWNS
 SAS code: SOFTMAX.SAS, SOFTMAX_FORMAT.SAS
8
SELECTED SLIDES FROM CHAPTER 3,
“EFFECTIVE WEB MINING”, SAS WEB MINING
COURSE NOTES
9
Section 3.1
Web Site Statistics for
Evaluating Visitors
Overview



11
Introduce descriptive and exploratory statistics using
base SAS software.
Propose statistics for evaluating visitors and site
usage.
Demonstrate an analysis for the financial services
data.
Some Common Web Log Statistics







12
Most popular pages
Frequency of referring sites
Page count statistics: means, percentiles, variation
Session count statistics
Frequency of Web browser usage
Frequency of operating systems
Frequency of error types
Problems and Pitfalls in Statistical Analysis





13
Most results are comparative, so a baseline, control
group, or comparison group needs to be identified so
results are not interpreted in a vacuum.
Simpson’s paradox should be understood when
looking at percentages and values that are relative to
counts in a sub-group.
Correlation does not imply causation. Spurious
correlation results when two unrelated variables are
statistically significantly correlated.
Statistical significance does not imply practical
significance.
All statistical models are wrong, but some are useful.
Baselines and Comparisons
Which statement is more informative?
 Our Web server recorded 11,000 page views
yesterday.
 Our Web server recorded an increase of 1000 page
views yesterday compared to the previous day.
 Our Web server recorded a 10% increase in page
views yesterday compared to the previous day.
continued...
14
Simpson’s Paradox
Visitor
Type
Total
Visitors
Visitors
Who
Register
Percentage
Registered
L
380
115
30.3%
H
165
70
42.4%
continued...
15
Simpson’s Paradox
Visitors from Referring Site Alpha
Visitor Type
Total
Visitors
Visitors
Who
Register
Percentage
Registered
L
30
15
50%
H
125
60
48%
Visitors from Referring Site Bravo
Visitor Type
16
Total
Visitors
Visitors
Who
Register
Percentage
Registered
L
350
100
28.6%
H
40
10
25.0%
Correlation and Causation
Sales:
Brick-andMortar
r = 0.917
forecast
Sales:
Internet
continued...
17
Correlation and Causation
Sales:
Brick-andMortar
r = 0.917
forecast
Sales:
Internet
actual
18
Analysis of the
Financial
Services Data
This demonstration illustrates the analysis of
Financial Services Data.
19
Section 3.2
Introduction to Clustering and
Segmentation
Overview




21
Introduce the segmentation problem with three
general methods of solution.
Relate clustering and segmentation to dimensionality
reduction.
Examine methods for partitioning higher dimensional
data into groups.
Discuss strategies for working with higher-dimensional
data.
– Understand and interpret the data.
– Preprocess the data for input into prediction and
decisioning tools.
Web Applications of Clustering and
Segmentation


22
Can visitors/buyers be divided into meaningful groups
with respect to business goals?
– Personalization
– Target marketing
If distinct groups exist, then finding them can
– reduce dimensionality in deriving predictive models
– help explain outcomes
– suggest strategies for improving Web design and
enhance marketing efforts.
Customer Segmentation?



23
A Priori
– not based on data analysis
Unsupervised Classification
– alike with respect to several attributes
Supervised Classification
– alike with respect to a target, defined
by a set of inputs
Dimension 1
Dimensionality Reduction
Travel
Financial
Contests
Auctions
Search
Dimension 2
24
Unsupervised Learning Methods


Projection methods
– Principle components
– Principle curves
– Projection pursuit
Distance/dissimilarity methods
– Multidimensional scaling
– Correspondence analysis
continued...
25
Unsupervised Learning Methods


26
Clustering methods
– Partitioning methods: k-means, trees
– Parametric mixture separation
– Hierarchical clustering
Self-organizing maps
– Batch SOM
– Kohonen SOM
– Kohonen Vector Quantization
Distance/Dissimilarity Methods
Distances between points in higher-dimensional
space can be preserved when moving to lower
dimensional space.
z
y
v
x
d  ( x1  x2 ) 2  ( y1  y2 ) 2  ( z1  z2 ) 2  (u1  u2 ) 2  (v1  v2 ) 2
27
u
Unsupervised Learning for Clustering


28
A new data point is evaluated by calculating its
distance from each cluster seed, and then the data
point is assigned to the cluster with the closest seed.
The cubic clustering criterion provides a methodology
for selecting a “best” number of clusters when the
number of clusters is not specified in advance.
Unsupervised Learning Limitations




29
Uniformly distributed data can be divided into clusters
— clusters may have no business value.
Unsupervised learning algorithms often do not handle
categorical variables well.
Different methods can produce substantially different
clusters.
Some unsupervised learning methods do not scale
well, for example, hierarchical clustering.
Unsupervised Learning with Categorical
Variables
Three binary variables
Variable
Education
Gender
Voter
o.w.=otherwise
30
Dummy Variables
E=1 if College Grad, 0 o.w.
G=1 if Female, 0 o.w.
V=1 if Registered Voter, 0 o.w.
Binary Data: Data Has Exactly 8 Perfect
Clusters!
E
1
G
0
1
V
=cluster (1 cluster is hidden)
31
Two Types of Unsupervised Learning Methods
Variable Level: Dimension reduction through
variable mapping, for example, (x,y,z)(u,v)
Principle components, factor analysis
Observation Level: Cluster observations based
on values of variables — categorical cluster
segments replace original variables
Clustering, SOMs, VQ
32
Clustering Visitors
to the Financial
Services Web Site
33
Clustering Visitors
to the Financial
Services Web Site
This demonstration illustrates how to cluster
visitors to the Financial Services Web site.
34
Section 3.3
Customer Profiling
Overview





36
Target populations for profiling.
Consider business reasons for profiling with caveats.
Exercise simple profiling methods using descriptive
statistics and graphical techniques.
Use model-based profiling methods.
Demonstrate Web site visitor profiling using Enterprise
Miner.
Who to Profile




37
Web site visitors
Registered users
Customers
Prospects
Why Profile



38
Web site personalization
– Content
– Banner ads
– Up-sell/cross-sell
Target marketing
Customer retention
– Loyalty
– Anti-churn promotions
Simple Profiling Methods


39
Descriptive statistics
Visualization
Early Customer Profiling (A Priori Profiling)
Teenager
Young Adult
Generation X
AGE
Middle Age
Senior Citizen
40
Baby Boomer
A Profile Lattice
Field 2
Q1
Q2
Q3
Q4
Q1
1
2
3
4
Q2
5
6
7
8
Q3
9
10
11
12
Q4
13
14
15
16
Field 1
41
Visualization of the Profiles
Frequency
42
Duration
Model-Based Profiling Methods


43
Dimension reduction techniques
Clustering methods
– EM Cluster node
– EM Self Organizing Maps (SOM)
– EM Kohonen Vector Quantization
RFD Profiles
Recency
Frequency
Duration
44
Principle
Curve
Normalized
Profile
RFD Blimp
Minor Axes
Major Axis
45
Profile Analysis of
ACME.com Data
46
Profile Analysis of
ACME.com Data
This demonstration illustrates a profile analysis
of ACME.com data.
47
Demonstration: Profile Analysis
High
Dimensional
Data
48
Principle
Curves of
Grouped
Attributes
Customer
Profiles
Unsupervised Learning Rules-of-Thumb



49
K-means clustering is preferred when there are few
missing values, and when clusters are not expected to
fall within a rectangular lattice.
SOMs are appropriate when problem formulation
supports a lattice of clusters.
Kohonen VQ may work better than k-means clustering
when missing values are present.
ACME Customer Profiles







50
Segment 1: Game boy
Segment 2: Information hungry
Segment 3: Web addicts
Segment 4: Short attention span
Segment 5: Music lovers
Segment 6: Celebrity crazy (teenagers?)
Segment 7: Couch Potatoes