Data Models & Technologies

Download Report

Transcript Data Models & Technologies

Data Mining & Knowledge
Discovery: Personalization
Technologies for One to
One Marketing
Bhagi Narahari
Outline of Lecture

What and Why of Data Mining and KDD?



How ?
Personalization



Importance and Applications to E-commerce
personalized one-to-one business on the internet
Part I: Overview of Personalization
Part 2: The Data Mining Process
Predictive Modelling

A “black box” that makes predictions about
the future based on information from the
past and present
Age
balance
income
Model
(Crystal ball?)
How much will customer
spend on next
catalog order ?
What is Data Mining?

It is the exploration and analysis by
automatic or semiautomatic means, of large
quantities of data in order to discover
meaningful patterns and rules.
Why now? (A historical
perspective)




Because data is now available (wasn’t
always)
Distributed sources
Technology evolution
Competition (do what you can to outdo)
Why DM?

CRM (Customer Relationship Management) important success factor in E-commerce
price differentiation no longer enough
 customer service more important



Links with suppliers already exist (B2B) - JIT,
joint forecasting, planning, procurement
Current emphasis on links with customers feedback, input in design, etc.
CRM



Identifying profitable customers
Better service for more valued customers
Retaining profitable customers
Getting a new customer costs a lot more than
retaining an existing one
 takes 5X to acquire new customers
(Peppers&Rogers)
 An increase from 75% to 80% in retention reduces
costs by about 10%


Larger share of customer pool
CRM

Product differentiations based on “price”
and “quality” are increasingly difficult


need to differentiate based on relationships
Increasingly sophisticated mass marketing
increases probability of success

cost of mass marketing is driven down by
internet (reach)
CRM

Goal: Positively interact with your customers
and prospects
define customer segments
 lights out execution of campaigns against
segments
 attribution and evaluation of responses

Personalization in Ecommerce

Positive:
much better chance of personalization
customer identification
tracking across visits and within visit
 ability to do ‘what if’ experiments


Negative:
cost of switching is much less
 is web based shopping good for ‘touchy feely’ things
 price differentiation across geographies not easy

Personalization
Customer Chain
Product
Discovery
Customer Service
& Support
Product
Evaluation
Order
Payment
Terms
Negotiation
Order
Placement
Market
Research
Customer Service
& Support
Market
Stimulation/
Education
Producer
Chain
Order billing
and payment
management
Terms
Negotiations
Order
Receipt
B2C Personalization Objectives

Know the customer


profile - registration, cookies
Determine what the customer wants
Ask: Questionnaires
what is the incentive for truthfulness
 Deduce: click streams, history, collaborative
filtering (Amazon!!)


Deliver
Customize the look and feel
 offer special promotions
 offer customized products (Holy Grail)

Use of Personalization

In addition to storing and retrieving
information on the individual’s profile “on
the fly”

can also use mining software to analyze the
information in the database to make
recommendations or comments specific to the
individual
Impact of Personalization


Customer relationship
Learn more about customers


learn and understand the why and how they
prefer to do business with your organization
In tandem with tracking provides you with a
tool to monitor your website

what works, what does’nt, what makes your
audience “click”
Security and Privacy as
Barrier to Personalization





Large number of customers concerned
about personalization (double click!)
will they pay more to preserve privacy?
Some falsify info to preserve privacy
customers give more info to trusted site
need secure site with clear privacy policies
stated at site
Personalization
Know the Customer
Identify
Login
Credit Card#
Questionnaires
Past history
Click Streams
Give the customer
his/her wants
Look
&feel
Profile
Mapping to
“peers”
Product
selection&
promotions
Extrapolation
from past
New
Product
Predicting the wants
Extrapolation
from peers (firefly.com)
Know the customer

Cookies


OPS: Open Profiling Standard


combined with eTrust certification
Registration


backlash (users do not trust them)
User certificates: logons
Key Question:
how do you know that this customer is same as
that goes to your storefront
 need standard warehouse techniques like
address resolution, cred.card resolution etc.

Know the Customer:OPS

Two drivers
user should not retype again & again basic info
 data is used in a trusted fashion (not leaked,
other data not see etc.) by users


Two parts
Common data
demographics (country,zip,age,gender)
Contact (name, address, CreditCard…)
User agent preferences
 Per-site Sections (can be shared across sites, if
user allows)

What if no profile???

Deduce
collect information: history of purchases, time
spent on pages
 ask questions (offer rewards)
 combine with database marketing data


Predict behaviour
buy probabilities
 build customer relationship


mining is key!
Personalization: Actions to
take- Look and feel

Personalized pages

specific data
 specific presentation and design
 sent through various mediums
Manage Customers not products: 1-1 marketing

Strategy.com

deliver personalized pages
eg: stock portfolio, personal info including alarm,
travel reservations
 use different mediums
WAP enable phones (eg: Sprint PCS Web)

Storefront Personalization

Customers visit Store Website
Howard buys ties
 Rob buys Baby Products
 Ray buys toys
 Amy buys clothes


Provide a view of the store to these customers

present them with what they are likely to buy?
Howard: ties, and men’s formal wear
Ray: Toys and gadgets
Rob: Infant, Toddler section
Amy: Women’s Clothes section
More Actions: Product
Presentations & Promotions
Basic Storefront Product Hierarchy
Clothes
Men’s
Shirts
Pants
John’s View
Women’s
Children’s
Casuals Evening
Infants
Mary’s View
Kids
BroadVision.com

BroadVision One-to-One application
allows businesses to develop and manage
personalized web sites
 interactively profile each visitor and dynamically
match info based on their profile and business
rules specified by providers of site & services
users do not go through hoops finding relevant
data

DM Terminology
Rule Based Systems
OLAP
Data Marts
ROLAP
SQL
Data Warehouse
Data Stores
Genetic Algorithms
Neural Networks
Data Mining
How?



Determine probability of buying as a
function of customer attributes such as age,
income, past buying patterns, ..
Target customers by ranking from highest to
lowest probabilities
Other techniques: Decision Trees, Neural
Networks, ….
KDD



Knowledge Discovery in Databases
It is the process of identifying valid, novel,
potentially useful, and understandable
patterns in data (Fayyad, Piatesky-Shapiro,
and Smyth)
It involves data preparation, pattern
extraction, knowledge evaluation, and
refinement, in iteration
KDD


Data mining is a step in the KDD process
that involves the application of certain
algorithms to extract patterns
Steps in the KDD process:
Select Data
Data Cleansing and Pre-processing
Data Mining
Results interpretation
Implementation
Pre-processing in KDD


80-90% of KDD process is spent here
Why?
Operational data is incomplete, inconsistent, in
different formats across systems
DM techniques might require data in a specific
format
Data Mining Problems

Classification/Segmentation
Binary (Yes/No)
 Multiple Category (Large/Medium/Small)




Forecasting (how much)
Association Rule extraction (market basket
analysis)
Sequence detection

balance increase -> missed payment -> default
Typical DM tasks

Prediction and Classification
Directed
 Decision trees, Neural networks, memory based
reasoning, logistic regression
 Examples:
How many units will be sold on a given day?
What will be the stock price on a given day?
Will a customer buy the product or not?

DM tasks

Affinity grouping
Undirected
 Which products go together naturally?
 The beer-diaper syndrome?
 Market basket analysis
 Examples:
Which products peak in demand simultaneously?

DM tasks

Clustering task
Undirected
 Segmenting into similar clusters
 Different from classification
 Examples
Customers with similar buying profiles
Products with similar demand patterns

DM success factors




Integration with data warehouses and DSS
Users should develop a good understanding
of techniques
Recognize that these tools cannot
automatically find patterns without being
told what to do
Most methods now used are extensions of
analytical methods that have been around
for decades
Legal and Ethical Issues

Privacy concerns
becoming more important
 will impact the way that data can be used and
analyzed
 ownership issues
 European data laws have implications on US


Often data included in the data warehouse
cannot legally be used in decision making
process


Race, Gender, Age
Data contamination will become critical
Making Decisions
Data
Data
Data
Data
Data Warehouse?
Models
Decisions
Data Warehouse


Bill Inmon: “A data warehouse is a subjectoriented, integrated, time-variant, nonvolatile collection of data in support of
management decisions.”
is managed data that is situated after and
outside the operational systems
Data Warehousing

Increasing need to find, summarize, and
interpret large amounts of data effectively


Especially when data is distributed across many
different databases
Transaction processing systems not easily
accessible to other systems

Plus TP systems have time constraints
Enter the Data Warehouse




To deliver decision data to decision makers
by integrating data from various TPS to a
single storage which can then
feed a range of decision support
applications
through an OLAP interface!
Data Complications



Noise
Missing data
Transformation
numeric data
 text


Need to differentiate between variables you
can control and those you cannot
Actionable: size of discount, number of offers etc.
 Non-actionable: age, income ..

Data Mining Techniques








Market Basket Analysis
Memory Based Reasoning
Cluster Detection
Link Analysis
Decision Trees and Rule Induction
Neural Networks
Genetic Algorithms
OLAP
OLAP: On Line Analytical
Processing



While a data warehouse brings data
together, OLAP lets you look at data and
manipulate interactively
OLAP allows users to “slice and dice” data
Allows user to drill-down into detail data
Relational vs
Multidimensional
Consolidations
Multidimensional
Terminology

East, West, Central are input members of the Region
dimension. Total Region is an output member of the Region
dimension. Similarly, Nuts, Screws, Bolts, Washers, and Total
are members of the Product dimension.

Variables are typically numerical measures like Sales, Costs,
Profits, Expenses, and so forth.

Dimensions are roughly equivalent to Fields in a relational
database. Cells are roughly equivalent to Records.
Steps in DW and OLAP
Data
Data
Data
Data Loader
Data Converter
Data Scrubber
Data Transformer
Data Warehouse
OLAP Server
OLAP Interface
Cluster Detection




Undirected data mining
Finds records that are similar to each other
(clusters)
Clusters are found using geometric
methods, statistical methods, and neural
networks
Good way to start any analysis
Market Basket Analysis



Form of clustering used for finding items
that occur together (in a transaction or
market basket)
Likelihood of different products being
purchased together as rules
Planning store layouts, limiting specials to
one of the products in a set,...
Transaction data
Customer
Products
1
Milk, Soda
2
Milk, Beer,
diapers
Milk, cleaner
3
4
5
Beer, diapers,
soda
Beer, soda
Co-occurrence matrix
Beer Clean Milk Soda Diapers
er
Beer 3
0
1
2
2
Clea 0
1
1
0
0
Milk 1
1
3
0
1
Soda 2
1
0
3
1
Diap 2
0
1
1
2
Support and confidence




For a rule that says: If A then B
Support is defined as the ratio of number of
transactions that include both A and B to
total number of transactions
Confidence is defined by the ratio of the
number of transactions that include both A
and B to the number of transactions that
include A.
How do you specify ‘significant’ support and
confidence ?
Algorithm for Finding
Association Rules


Input is Min-Support and Min-Confidence
Find all sets of items with Min-Support
(frequent itemsets)

Frequent Itemsets Property: Every subset of a
frequent itemset must also be a frequent itemset
iterative algorithm: start with frequent itemsets
with one item, and construct larger itemsets using
only smaller frequent itemsets.
MBA example

Using the sample data create a cooccurrence table

Let relevant Support = 25% and Confidence=
50%:


Beer and Diapers appear in 3/5= 60%

If beer then diapers has confidence of 2/3=67%

Thus, “If customer buys beer then customer buys
diapers” satisfies 25% support & 50% confidence
Conclusion drawn by mining system:

Customers who buy beer also buy diapers
Applying MBA Results

Is the relationship useful ?
Beer and Diapers may not be of use
 Victoria’s Secret transaction mining led to
specific apparel sent to specific stores -Microstrategy software


Who defines “usefullness”
only as good as rules specified by
humans/marketing workforce
 NBA mining: designers of s/w did not include
height mismatches at first…coaches made the
correction

Data Mining Algorithms

Four algorithms commonly cited
Association Rule (used in over 90% of the cases!)
 Nearest Neighbor
quick and easy but models get large
 Decision Tree
 Neural Network
difficult to interpret and large time

Decision Trees

Series of if/then rules

easy to understand, complexity in implementation
Balance<10K
Balance > 10K
yes
Age< 48
No
Age > 48
yes
CRM and Data Mining

Recall:customer segmentation is key in CRM
data mining can help improve understanding of
customer behaviour
helps located meaningful segments from customer
data
 users want to turn that understanding into an
automated interactions with their customers

Integrating Data Mining &
CRM



Data mining application owns the modelling
process
CRM application owns the campaign
execution process
Goals:
minimize pain involved with using models in
campaigns
 score records only when and where necessary

Integrating Mining & CRM

Step 1:
analytic user creates model using mining system
 model is then exported into campaign
management system


Step 2:
Marketing user creates campaign that includes
predictive models
 when campaign executes, data mining engine
scores customers dynamically

Benefits of Integration


Pre-generated model selection
Score defined segments “on the fly”
eliminates need to score entire database
 improve efficiency of campaigns



Reduces manual intervention and error
Accelerates the market cycle
increases likelihood of reaching customers
before competitors
 improves campaign results and lower costs

Summary


“Using the new media of the one-to-one
future, you will be able to communicate
directly with customers individually…..” Don Peppers & Martha Rogers (One-to-One
Future)
“What are you afraid of?…..Even if you’re
not afraid of these things, the beauty is,with
proper marketing, we can make you afraid”-Michael Saylor, CEO Microstrategy.