Data Mining & Visualization

Download Report

Transcript Data Mining & Visualization

MGS 4020
Business Intelligence
Data Mining and Data Visualization
Apr 16, 2013
Georgia State University - Confidential
MGS4020_10.ppt/Apr 16, 2013/Page 1
Agenda
Data Mining
Georgia State University - Confidential
Marketing
Analytics
Example
MGS4020_10.ppt/Apr 16, 2013/Page 2
What is Data Mining?
•
A set of activities used to find new, hidden, or unexpected patterns in data
•
Verification versus Discovery
•
Accuracy in predicting consumer
Georgia State University - Confidential
behavior
MGS4020_10.ppt/Apr 16, 2013/Page 3
OLAP – Online Analytical Processing
•
MOLAP – Multidimensional OLAP
•
ROLAP – Relational OLAP
Data Warehouse
/ Data Mart
RDBMS
Georgia State University - Confidential
MGS4020_10.ppt/Apr 16, 2013/Page 4
Techniques and Technologies
•
Techniques Used to Mine the Data
•
Classification
•
Association
•
Sequence
•
Cluster
•
Data Mining Technologies
•
Statistical Analysis
•
Neural Networks, Genetic Algorithms and Fuzzy Logic
•
Decision Trees
Georgia State University - Confidential
MGS4020_10.ppt/Apr 16, 2013/Page 5
Market Basket Analysis
•
Market Basket Analysis
•
Most common and useful in Marketing
•
What products customers purchase together
Diapers and Beer sell well on Thursday nights
•
Benefits
•
Better target marketing
•
Product positioning with stores (virtual stores)
•
Inventory management
•
Limitations
•
Large volume of real transactions needed
•
Difficult to correlate frequently purchased items with infrequently
purchased items
•
Results of previous transactions could have been affected by other
marketing promotions
Georgia State University - Confidential
MGS4020_10.ppt/Apr 16, 2013/Page 6
Market Basket Analysis
Association Rules for Market Basket Analysis
•
All associations are unidirectional and take on the following form:
 Left-hand side rule IMPLIES Right-hand side rule

Left and Right hand side can both contain multiple items (Multidimensional Market Analysis)

Examples:
Steak IMPLIES Red Wine
Hunting Magazines IMPLIES Smokeless Tobacco
Georgia State University - Confidential
MGS4020_10.ppt/Apr 16, 2013/Page 7
Market Basket Analysis
3 Measures of Market Basket Analysis
•
Support – the percentage of baskets in the analysis where the rule is true
•
Of 100 baskets 11 contained both steaks and red wine.
•
11% support
•
Confidence – the percentage of Left-hand side items that also have rightside items
•
Of the 17 baskets that contained steak, 11 contained red wine.
•
65% confidence
•
Lift – compares the likelihood of finding the right-hand item in any random
basket
•
Also referred to as Improvement
•
Lift of less than 1 means it is less predictive than random choice
•
If Confidence is 35%, but the right-hand side items is in 40% of the
baskets, the rule offers no Improvement of random selection.
Georgia State University - Confidential
MGS4020_10.ppt/Apr 16, 2013/Page 8
Market Basket Analysis
Market Basket Analysis results can be:
•
Trivial
•
Hot Dogs IMPLIES Hot Dog Buns
•
TV IMPLIES TV Warranty
•
Inexplicable
Virtual Items – Associating non-items or other attributes into the correlation study
“New Customer”
Georgia State University - Confidential
MGS4020_10.ppt/Apr 16, 2013/Page 9
Limitations of Data Mining
•
All relevant data items / attributes may not be collected by the operational
systems
•
Data noise or missing values (data quality)
•
Large
database requirements and multi-dimensionality
Georgia State University - Confidential
MGS4020_10.ppt/Apr 16, 2013/Page 10
Agenda
Data Mining
Georgia State University - Confidential
Marketing
Analytics
Example
MGS4020_10.ppt/Apr 16, 2013/Page 11
Why use Analytics?
Some Benefits Are Quantifiable
•
•
•
•
•
•
15% to 51%+ increase in net
sales
Other Benefits Not So Easily
Quantified
•
Decisions based on exhibited
behaviors
•
Makes data actionable
•
Easier to measure results
For one product over a 3 yr
period, $650mm in cost savings &
over $350mm in increm
contribution
•
Validate instincts and opinions
•
Enhanced what-if analysis &
planning
>50% more accurate targeting of
likely residential movers
•
Less guesswork, more facts
•
Built-in process improvement
ROI of over 2500%
Annual increm revenue of >
$178mm
24% reduction in churn rate from
modeling/targeting likely churners
Georgia State University - Confidential
MGS4020_10.ppt/Apr 16, 2013/Page 12
Advanced analytics can help to answer the following
questions …
•
How do I determine which offers to make to my customers?
•
What do my best customers look like, and where can I find more of them?
•
What is the return on my marketing investment? How might my marketing plans be
tweaked to optimize investment?
•
Who are my most valuable customers? What are my key value drivers?
•
Which of my customers have the greatest potential for growth – and which have little or
no potential?
•
Which of my customers are most vulnerable? What are the triggers causing them to
leave or churn?
•
Where should I employ my assets to meet customer demand?
Georgia State University - Confidential
MGS4020_10.ppt/Apr 16, 2013/Page 13
Marketing Analytics Landscape
Strategy & Tactics: Guiding the business & helping to make numbers
Business Planning, Forecasting, Corp Strategy, Financial Metrics, Profitability Analysis
Acquisition
Growth
Where can I find
new customers?
Where can I find
more revenue &
profit from my
current
customers?
• Customer
Acquisition
• Propensity to buy &
response modeling
• Prospect
profiling
• Marketing
Optimization
• Event driven
marketing
• Market Basket
Analysis
• Online and Retail
Channels
Retention
Reacquisition
Which of my
customers are
at risk and how
can I keep
them?
• Customer and
product churn
modeling
• Retentive stickiness
of key products
Which
customers do
I want to win
back?
• Customer
reacquisition
• Customer
profitability
analysis
• Prediction of key
events (eg,
residential movers)
Customer Knowledge – Who are my customers?
Segmentation & Profiles, External Data, Mkt Share/Wallet Share, Channel Preference Modeling
Georgia State University - Confidential
MGS4020_10.ppt/Apr 16, 2013/Page 14
Direct Marketing Campaign Platform
Direct Marketing Campaign Platform
ACQUIRE
• POS
• Partners
STORE
• Advertising
DIFFERENT CHANNELS
ACTIVATION
highest
value
customers
RETAIN
Vehicles:
PURCHASED
PROMOTION
PURCHASE
E-mail
Address
• Statements
NO PURCHASE
• Newsletters
Triggered Promotions
(for example)
• Inserts
• Direct mail
Days since last purchase = X
X = 30 days for PTNM
• Personalized kits
X = 60 days for GOLD
• E-mail
• Telephone
Test
Area
downgrade
trigger
X = 120 days for CLUB
lowest
value
customers
*
*
< 1 purchase in last 12 mo
REACTIVATE
Vehicles:
• Direct Mail
If:
Vc  Cost to reactivate
If:
Vc < Cost to reactivate
• E-mail
• Statements
If : Time since inactive = X, and
Point balance > X
“FIRE”
Georgia State University - Confidential
Ugly Postcard???
MGS4020_10.ppt/Apr 16, 2013/Page 15
General Data Mining Methods
Classification:
Association:
Sequencing:
Clustering:
•
Predicting which customers will purchase, based on demographics,
psychographics, firmographics, service history, transactions, credit
history, etc. Statistical algorithms and decision trees are used for these
problems with much success.
•
Market Basket Analysis: which customers who purchase an additional
telephone line are also likely to purchase dialup internet service?
Pattern matching works well: associative rules, fuzzy logic, neural
networks.
•
Which types of activities precede each other; eg, do customer hospitality
and gaming activities show patterns or sequences? We use a
combination of statistical modeling and simulations to identify these
trigger points for action, and to estimate the marginal value of each.
•
Clustering is useful for determining similar groups based on how closely
they resemble each other. Multitude of clustering techniques exist, with
the primary difference being in how they define what is “close”.
Clustering can be very useful for marketing messaging and advertising,
strategy development and implementation, and channel development.
Georgia State University - Confidential
MGS4020_10.ppt/Apr 16, 2013/Page 16
Analytics Process
DISCOVERY
DATA
PREPARATION
KNOWLEDGE
DEVELOPMENT
LEVERAGING
ANALYTICS
POST
ANALYSIS
FEEDBACK
IDENTIFYING
OPPORTUNITIES
DATA WAREHOUSE
HYPOTHESIS
TESTING
SCOPING
EFFORT
EXTERNAL DATA
APPEND
OBJECTIVE
SETTING
DATA EXTRACTION
SEGMENTATION
DEVELOPING
HYPOTHESES
DATA VALIDATION
OFFER
OPTIMIZATION
Georgia State University - Confidential
STATISTICAL
MODELING
CUSTOMER
BEHAVIOR
SCORING
RESULTS
DECOMPOSITION
DIRECT MAIL
TELEMARKETING
EMAIL
LOYALTY
CAMPAIGN
FEEDBACK FOR
REFINING
ANALYTICS
MGS4020_10.ppt/Apr 16, 2013/Page 17
Summary
•
Analytics allow quantifiable, intelligent decision making
•
Analytics can be leveraged across all areas of a business
•
Different analytical methods apply to different situations
•
Modeling enables you to combine potential hundreds of factors into a single
decision metric (or a few key scores/clusters)
•
Analytics are more powerful when tied to bottom line profitability
Georgia State University - Confidential
MGS4020_10.ppt/Apr 16, 2013/Page 18
Agenda
Data Mining
Georgia State University - Confidential
Marketing
Analytics
Example
MGS4020_10.ppt/Apr 16, 2013/Page 19
InterContinental Brand Reactivation Promotion
•
Frequent travelers (points collectors) who had 1+ stays at InterContinental
hotels in the US between Jan 1, 2001 and Jun 30, 2002.
•
Frequent travelers (points collectors) who had 0 stays at InterContinental
hotels in the US between Jul 1, 2002 and Dec 31, 2003.
•
A set of activities used to find new, hidden, or unexpected patterns in data
•
Accuracy in predicting and reactivating these consumers
Georgia State University - Confidential
behavior
MGS4020_10.ppt/Apr 16, 2013/Page 20
SQL
SELECT
MBR.MEMBERSHIP_ID,
MBR.FIRST_NAME,
MBR.LAST_NAME,
MBR.ADDR_LINE_1,
MBR.ADDR_LINE_2,
MBR.ADDR_LINE_3,
MBR.ADDR_LINE_4,
MBR.ADDR_LINE_5,
MBR.CITY,
MBR.STATE_DESTINATION,
MBR.ZIP_CODE,
MBR.TYPE,
SUM (CASE WHEN EVENT.CHECK_OUT_DATE BETWEEN '01-01-2001' AND '06-302002'
THEN 1 ELSE 0 END) AS ONE_PLUS_STAYS,
SUM (CASE WHEN EVENT.CHECK_OUT_DATE BETWEEN '07-01-2002' AND '12-312003'
THEN 1 ELSE 0 END) AS ZERO_STAYS
Georgia State University - Confidential
MGS4020_10.ppt/Apr 16, 2013/Page 21
SQL
FROM
MBR,
EVENT,
PROPERTY,
XREF
WHERE
( MBR.MEMBERSHIP_ID=XREF.MEMBERSHIP_ID )
AND ( PROPERTY.PROPERTY_ID=EVENT.PROPERTY_ID )
AND ( EVENT.MEMBERSHIP_ID=XREF.MEMBERSHIP_ID )
AND (
MBR.MARKET_REGION_CODE = '05388'
AND MBR.TYPE IN ('BASE','GOLD','PLTNM')
AND MBR.PREF_ALLIANCE_CODE = 'POINT'
AND PROPERTY.BRAND_MAJOR_CODE = ‘INTERCONTINENTAL'
AND PROPERTY.MARKET_REGION = 'US'
)
Georgia State University - Confidential
MGS4020_10.ppt/Apr 16, 2013/Page 22
SQL
GROUP BY
MBR.MEMBERSHIP_ID,
MBR.FIRST_NAME,
MBR.LAST_NAME,
MBR.ADDR_LINE_1,
MBR.ADDR_LINE_2,
MBR.ADDR_LINE_3,
MBR.ADDR_LINE_4,
MBR.ADDR_LINE_5,
MBR.CITY,
MBR.STATE_DESTINATION,
MBR.ZIP_CODE,
MBR.TYPE
HAVING
ONE_PLUS_STAYS >= 1 AND
ZERO_STAYS = 0
Georgia State University - Confidential
MGS4020_10.ppt/Apr 16, 2013/Page 23
Cluster Analysis
•
Definition: The identification and grouping of consumers that share similar
characteristics
•
Yields: better understanding of prospects/customers
•
Translates into: improved business results through revised strategies
attributes
•
Definition: The identification and grouping of consumers that share similar
characteristics
•
Process:
•
Data Selection
•
Missing Values
•
Standardization
•
Removal of Outliers
•
Cluster Analysis Considerations
Georgia State University - Confidential
MGS4020_10.ppt/Apr 16, 2013/Page 24
Cluster Analysis
•
Only want a small subset of variables for clustering
•
Weed out undesirable variables
•
Can use PROC FACTOR, PROC CORR
•
Can use expert system
•
Consideration for observations, weighting
•
Probably done with factor analysis
•
If not, then two options
•
Set Missing to Mean of data
•
Set Missing to Value of Equivalent Performance
•
No right or wrong answer
•
Might do both - depending on variables
Georgia State University - Confidential
MGS4020_10.ppt/Apr 16, 2013/Page 25
Clustering
Midscale /
Business
Traveler
Midscale /
Leisure
Traveler
Upscale /
Leisure
Traveler
Prospect
Base
Country
Club /
Resort Set
Upscale /
Business
Traveler –
Prosperous
Traveler
Georgia State University - Confidential
Upscale /
Business
Traveler –
Loan
Dependent
Other
MGS4020_10.ppt/Apr 16, 2013/Page 26
Cluster Analysis
Attribute
Cluster
Name
A
B
C
D
E
(ALL)
Age of Head of
Household
38
62
48
44
52
43
7
12
9
6
7
7
48
45
102
73
71
13
1
3
6
2
3
69
6
29
51
7
30
0
5
6
5
3
2
11
55
21
15
32
16
24
2
10
15
8
7
Length of Residence
in high income group
zip codes
Household Income
(,000)
72
Weekday Check in
Weekend Check in
No. Stays (resort)
between Jan 1, 2001
and Jun 30, 2002
No. Stays (mid
properties) between
Jan 1, 2001 and Jun
30, 2002
No. Stays (upscale
properties) between
Jan 1, 2001 and Jun
30, 2002
Georgia State University - Confidential
MGS4020_10.ppt/Apr 16, 2013/Page 27
Cluster Analysis
Cluster
Population %
Resp. Index
Avg. Profit
A
6
250
(75)
B
16
30
5
C
5
110
48
D
8
175
86
E
7
80
(5)
.
.
.
.
.
.
.
.
All
100
100
35
Georgia State University - Confidential
MGS4020_10.ppt/Apr 16, 2013/Page 28
Cluster Analysis
Cluster 1
Cluster 1
------------
Cluster 1
Calculate Scores
(ROI, Response, Utilization)
Overlay Profitability Estimate
High
Evaluate Risk-Return Tradeoff
(by Offer and by Cluster)
Low
Make Final Selections
RISK
RETURN
Low
Mail
No-Mail
High
DM/Offer 1
DM /Offer 2
Georgia State University - Confidential
--------
DM /Offer N
MGS4020_10.ppt/Apr 16, 2013/Page 29