Transcript Slide 1
Market
Basket
Analysis &
Neural
Networks
(chaps 7 & 11)
Retail
Checkout
Data
11-2
MARKET BASKET ANALYSIS
• INPUT: list of purchases by purchaser
– do not have names
• Identify purchase patterns
– what items tend to be purchased together
• obvious: steak-potatoes; beer-pretzels
– what items are purchased sequentially
• obvious: house-furniture; car-tires
– what items tend to be purchased by season
McGraw-Hill/Irwin
©2007 The McGraw-Hill Companies, Inc. All rights reserved
11-3
Market Basket Analysis
• Categorize customer purchase behavior
• Identify actionable information
– purchase profiles
– profitability of each purchase profile
– use for marketing
• layout or catalogs
• select products for promotion
• space allocation, product placement
McGraw-Hill/Irwin
©2007 The McGraw-Hill Companies, Inc. All rights reserved
11-4
Market Basket Analysis
• Affinity Positioning
– coffee, coffee makers in close proximity
• Cross-Selling
– cold medicines, tissue, orange juice
– Monday Night Football kiosks on Monday
p.m.
McGraw-Hill/Irwin
©2007 The McGraw-Hill Companies, Inc. All rights reserved
11-5
Possible Market Baskets
Customer 1: beer, pretzels, potato chips, aspirin
Customer 2: diapers, baby lotion, grapefruit juice,
baby food, milk
Customer 3: soda, potato chips, milk
Customer 4: soup, beer, milk, ice cream
Customer 5: soda, coffee, milk, bread
Customer 6: beer, potato chips
McGraw-Hill/Irwin
©2007 The McGraw-Hill Companies, Inc. All rights reserved
11-6
Co-occurrence Table
Beer
Pot. Chips
Milk
Diapers
Soda
Beer Pot. Milk
Chips
3
2
1
2
3
1
1
2
4
0
0
1
0
1
2
beer & potato chips - makes sense
McGraw-Hill/Irwin
Diap. Soda
0
0
1
1
0
0
1
2
0
2
milk & soda - probably noise
©2007 The McGraw-Hill Companies, Inc. All rights reserved
11-7
Jaccard Coefficient
Ratio of cases together over total cases
Beer
PotChip
Milk
PotChip
0.333
Milk
0.143
0.143
Diapers
0
0
0.200
Soda
0
0.200
0.333
McGraw-Hill/Irwin
Diapers
0
©2007 The McGraw-Hill Companies, Inc. All rights reserved
11-8
Market Basket Analysis
• Steve Schmidt - president of ACNielsenUS
• Market Basket Benefits
– selection of promotions, merchandising
strategy
• sensitive to price: Italian entrees, pizza, pies,
Oriental entrees, orange juice
– uncover consumer spending patterns
• correlations: orange juice & waffles
– joint promotional opportunities
McGraw-Hill/Irwin
©2007 The McGraw-Hill Companies, Inc. All rights reserved
11-9
Market Basket Analysis
•
•
•
•
Retail outlets
Telecommunications
Banks
Insurance
– link analysis for fraud
• Medical
– symptom analysis
McGraw-Hill/Irwin
©2007 The McGraw-Hill Companies, Inc. All rights reserved
11-10
Market Basket Analysis
• Chain Store Age Executive (1995)
1) Associate products by category
2) What % of each category was in each market
basket
• Customers shop on personal needs, not
on product groupings
McGraw-Hill/Irwin
©2007 The McGraw-Hill Companies, Inc. All rights reserved
11-11
Purchase Profiles
Beauty conscious
Kids’ play
Smoker
Health conscious
Casual drinker
Pet lover
Sports conscious
New family
Gardener
Men’s image conscious
Casual reader
Hobbyist
Convenience food
Sentimental
Illness (OTC)
Home handyman
Automotive
Illness (prescription)
TV/stereo enthusiast
Photographer
Personal care
Seasonal/traditional
Homemaker
Men’s fashion
Student/home office
Home Comfort
Kid’s fashion
Fashion footwear
Women’s fashion
McGraw-Hill/Irwin
©2007 The McGraw-Hill Companies, Inc. All rights reserved
11-12
Purchase Profiles
• Beauty conscious
– cotton balls
– hair dye
– cologne
– nail polish
McGraw-Hill/Irwin
©2007 The McGraw-Hill Companies, Inc. All rights reserved
11-13
Purchase Profile Use
• Each profile has an average profit per
basket
Kids’ fashion
$15.24
Push these
Men’s fashion
$13.41
Push these
….
Smoker
$2.88 Don’t push these
Student/home office
$2.55 Don’t push these
McGraw-Hill/Irwin
©2007 The McGraw-Hill Companies, Inc. All rights reserved
11-14
Market Basket Analysis
• LIMITATIONS
– takes over 18 months to implement
– market basket analysis only identifies
hypotheses, which need to be tested
• neural network, regression, decision tree analyses
– measurement of impact needed
– difficult to identify product groupings
– complexity grows exponentially
McGraw-Hill/Irwin
©2007 The McGraw-Hill Companies, Inc. All rights reserved
11-15
Market Basket Analysis
• BENEFITS:
– simple computations
– can be undirected (don’t have to have
hypotheses before analysis)
– different data forms can be analyzed
McGraw-Hill/Irwin
©2007 The McGraw-Hill Companies, Inc. All rights reserved
11-16
Market Basket Software
• Market Basket Analysis is highly
unstructured
• Most popular data mining software doesn’t
support
– Clementine does
• Specialty software market for this specific
purpose
– DataSage Customer Analysis
– Xaffinity
McGraw-Hill/Irwin
©2007 The McGraw-Hill Companies, Inc. All rights reserved
Neural
Networks
Automatic Model
Building
(Machine
Learning)
Artificial
Intelligence
11-18
High-Growth Product
• Used for classifying data
– target customers
– bank loan approval
– hiring
– stock purchase
– trading electricity
– DATA MINING
• Used for prediction
McGraw-Hill/Irwin
©2007 The McGraw-Hill Companies, Inc. All rights reserved
11-19
Description
• Use network of connected nodes (in
layers)
• Network connects input, output
(categorical)
– inputs like independent variable values in
regression
– outputs: {buy, don’t} {paid, didn’t}
{red, green, blue, purple}
{character recognition - alphabetic characters}
McGraw-Hill/Irwin
©2007 The McGraw-Hill Companies, Inc. All rights reserved
11-20
Perceptron
Bias
W1
I1
I2
W2
W3
I3
In
Inputs
F(x)
X
O
Wn
Synaptic
Weights
Neuron
Basic building block
Comprised of Synaptic Weights and Neuron
Weights scale the input values
Combination of weights and transfer function F(x) transform inputs to
needed output O
Trained by changing weights until desired output is achieved
McGraw-Hill/Irwin
©2007 The McGraw-Hill Companies, Inc. All rights reserved
11-21
Network
Input
Layer
Hidden
Layers
Output
Layer
Good
Bad
McGraw-Hill/Irwin
©2007 The McGraw-Hill Companies, Inc. All rights reserved
11-22
Operation
• Randomly generate weights on model
– based on brain neurons
• input electrical charge transformed by neuron
• passed on to another neuron
– weight input values, pass on to next layer
– predict which of the categorical output is true
• Measure fit
– fine tune around best fit
McGraw-Hill/Irwin
©2007 The McGraw-Hill Companies, Inc. All rights reserved
11-23
Operation
• Useful for PATTERN RECOGNITION
• Can sometimes substitute for
REGRESSION
– works better than regression if relationships
nonlinear
– MAJOR RELATIVE ADVANTAGE OF NEURAL
NETWORKS:
YOU DON’T HAVE TO UNDERSTAND THE MODEL
McGraw-Hill/Irwin
©2007 The McGraw-Hill Companies, Inc. All rights reserved
11-24
Neural Network Testing
• Usually train on part of available data
– package tries weights until it successfully categorizes
a selected proportion of the training data
• When trained, test model on part of data
– if given proportion successfully categorized, quits
– if not, works some more to get better fit
• The “model” is internal to the package
• Model can be applied to new data
McGraw-Hill/Irwin
©2007 The McGraw-Hill Companies, Inc. All rights reserved
11-25
Business Application
• Best in classifying data
mortgage underwriting
bond rating
commodity trading
asset allocation
fraud prevention
• Predicting interest rate, inventory
firm failure
bank failure
takeover vulnerability
stock price
corporate merger profitability
McGraw-Hill/Irwin
©2007 The McGraw-Hill Companies, Inc. All rights reserved
11-26
Neural Network Process
1. Collect data
2. Separate into training, test sets
3. Transform data to appropriate units
•
Categorical works better, but not necessary
4. Select, train, & test the network
•
•
•
Can set number of hidden layers
Can set number of nodes per layer
A number of algorithmic options
5. Apply (need to use system on which built)
McGraw-Hill/Irwin
©2007 The McGraw-Hill Companies, Inc. All rights reserved
11-27
Marketing Applications
• Direct marketing
– database of prospective customers
• age, sex, income, occupation, education, location
• predict positive response to mail solicitations
• THIS IS HOW DATA MINING CAN BE
USED IN MICROMARKETING
McGraw-Hill/Irwin
©2007 The McGraw-Hill Companies, Inc. All rights reserved
11-28
Neural Nets to Predict Bankruptcy
Wilson & Sharda (1994)
Monitor firm financial performance
Useful to identify internal problems, investment evaluation, auditing
Predict bankruptcy - multivariate discriminant analysis of financial ratios
(develop formula of weights over independent variables)
Neural network - inputs were 5 financial ratios - data from Moody’s
Industrial Manuals (129 firms, 1975-1982; 65 went bankrupt)
Tested against discriminant analysis
Neural network significantly better
McGraw-Hill/Irwin
©2007 The McGraw-Hill Companies, Inc. All rights reserved
11-29
CASE: Support CRM
Drew et al. (2001), Journal of Service Research
• Identify customers to target
• Customer hazard function:
– Likelihood of leaving to a competitor
(CHURN)
• Gain in Lifetime Value (GLTV)
– NPV: weight EV by prob{staying}
– GLTV: quantified potential financial effects of
company actions to retain customers
McGraw-Hill/Irwin
©2007 The McGraw-Hill Companies, Inc. All rights reserved
11-30
Systems
A great many products
• general NN products
$59 to $2,000
@Brain BrainMaker
Discover-It
• components
DATA MINING along with megadatabases
other products
• specialty products
construction bidding, stock trading, electricity trading
McGraw-Hill/Irwin
©2007 The McGraw-Hill Companies, Inc. All rights reserved
11-31
Potential Value
• THEY BUILD THEMSELVES
– humans pick the data, variables, set test limits
• CAN DEAL WITH FAST-MOVING
SITUATIONS
– stock market
• CAN DEAL WITH MASSIVE DATA
– data mining
• Problem - speed unpredictable
McGraw-Hill/Irwin
©2007 The McGraw-Hill Companies, Inc. All rights reserved