Statistics for Marketing and Consumer Research

Download Report

Transcript Statistics for Marketing and Consumer Research

Further advanced methods
Chapter 17
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
1
Data mining
• Data mining is “the exploration of a large
set of data with the aim of uncovering
relationships between variables” (Oxford
Dictionary of Statistics)
• Also known as Knowledge Discovery in
Databases (KDD)
• Making extensive use of information
technology, through the automation of data
analysis procedures
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
2
Statistics and data mining
• Statistics is also exploited, but it is adapted to deal
with (very) large data sets
• Statistical approaches are those who valorize
computer intensive methods
• Data mining merges statistics with other
disciplines:
•
•
•
•
•
Computer science
Machine learning
Artificial intelligence
Database technology
Pattern recognition
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
3
Data warehousing
• The common denominator among the techniques is always
the use of very large databases
• These databases are the outcome of data warehousing,
which
• Organizes all of the data available to a company into a common
format
• allows integration of different data types
• Allows analysis through data mining
• The organization of company information in data
warehouses requires recognition of
• linkages of data which relate to the same objects
• the time dimension (to monitor changes)
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
4
Marketing applications
• A typical application is market basket
analysis
• customer purchasing patterns are discovered by
looking at the databases of transactions in one
or more stores of the same chain (e.g. through
loyalty cards)
• the contents of the trolley are analyzed to
detect repeated purchases and brand switching
behaviors
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
5
Problems with data mining
• Data mining is a complex and automated
process, which faces many risks:
• Data-sets may be contaminated (affected by
error)
• Data may be affected by selection biases and
non-independent observations
• Automated data analysis could find spurious
relationships (as in spurious regression)
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
6
Steps for successful data mining
1.
2.
3.
4.
5.
6.
7.
8.
9.
data warehousing
target data selection
data cleaning
preprocessing
transformation and reduction
data mining
model selection (or combination)
evaluation and interpretation
consolidation and use of the extracted knowledge
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
7
Frequentist vs. Bayesian statistics – the
Frequentist paradigm
• Assumption: true and fixed population parameters exist albeit unknown
• Statistics can exploit sampling to estimate these unknown parameters
• Observations are associated with probabilities: the probability of a
given outcome for a random event can be proxied by the frequency of
that outcome
• The larger is the sample the closer is the estimated probability to the
true probability
• Example: a linear regression model tries to estimate the true
coefficients which link the explanatory variables and the dependent
variable using a sample of observations
• A key concept of the frequentist approach is the confidence interval
where a range of values contains the true and fixed value with a
confidence level
• The confidence level is nothing more than the frequency with which an
interval contains the true and fixed value considering different random
samples.
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
8
The Bayesian approach
• The unknown parameters in the population are not
fixed, but treated as a random variable with their
own probability distribution
• One is allowed to exploit knowledge or beliefs
about the shape of the probability distribution
which existed prior to estimation
• Once data are collected, Bayesian methods exploit
this information to update this and the final
outcome is a posterior distribution which depends
on the data and the prior knowledge
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
9
Bayes rule
• The estimation of the posterior distribution opens the way to Bayesian
statistical operations and is based on the Bayes rule which relates the
probability of the outcomes of two random events in the following way
P( A, B) P( B A) P( A)
P( A B ) 

P( B)
P( B)
P(A|B) is the probability of the first random event to generate the
outcome A when the second random event has generated the outcome
B, thus it is the probability of A conditional on B
P(A,B) is the joint probability that both events A and B happen
P(B) is the unconditional probability of the event B
• The Bayes theorem shows that P(A,B) can be also expressed as the
product P(B|A)P(A), that is the product between the probability that
the event B happens conditional on the outcome A and the probability
of the event A
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
10
Bayes estimation
• To understand the use of the Bayes rule the two random
events could be
– the value of unknown parameter (A), which in Bayesian statistics is
determined by a random variable
– the available data (B) which is also the outcome of a random
variable since it was obtained through sampling
• The Bayes theorem says that the probability to obtain the
parameter estimate A given the observed sample B (the
posterior probability) can be computed through the Bayes
rule as a function of the probability of observing sample B
when the parameter estimate is A and the unconditional
probabilities of the parameter estimate A
• The unconditional probability of the parameter estimate A
is the prior probability
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
11
Use of the Bayes rule
• The Bayes rule is very helpful when it is easier to
estimate P(B|A) than P(A|B)
• If the probability of having the sample B
conditional on the unknown parameter A can be
computed, and
• some prior information on the probability of the
parameter A is available
• the unconditional probability of the sample B is known
then it becomes possible to find the probability
distribution of the parameter A conditional on the
data which is the final objective of estimation
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
12
Unconditional probability
• The denominator of the Bayes rule can be
rewritten as:
o
P ( B )   P ( B Ai ) P ( Ai )
i 1
which means that the unconditional probability of
the sample B can be seen as the sum of
probabilities of the sample B conditional on all of
the possible estimates Aj weighted by the
probability of each estimate Aj
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
13
Estimation
• Two elements have to be considered
1) P(B|A) is the likelihood function of A, that is the
probability of a given set of observations depending on a
set of parameters and its generally known (frequentists
use it in maximum likelihood methods as well)
2) the denominator of the Bayes rule is a constant and it is
generally not necessary to estimate it so that estimation
can be based on the following result
P( A B)  P( B A) P( A)
Where the sign which substitutes the equal sign means
that the left-hand side is proportional to the right-hand
side
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
14
Example
• Estimation of a single regression coefficient in a
bivariate regression
• Caviar expenditure (c) as a function of income (i)
• Data come from a random sample which generates
a set of observations included in the vectors (c)
(for simplicity consider (i) as the observations of a
fixed exogenous variable).
• The equation is
c=bi
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
15
Frequentist estimation of the
regression coefficient
• Start from some assumptions on the probability
distribution of the data and the error term
– E.g. normal distribution
• Get point estimates that are the most likely given
the observed sample
– E.g. maximum likelihood estimates
• Since the sample is random, confidence intervals
can be built for the coefficient estimate
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
16
Bayesian estimation
• Start with the assumption that caviar expenditure follows a
Normal probability distribution (the prior distribution)
around its mean, which is equal to b i
• Second, assume a given standard deviation for this Normal
distribution, e.g. the standard deviation of caviar
expenditure is 0.02
• Consider the value b=0.05
• If the prior distribution holds, we should have that c is
normally distributed around 0.05i.
• Now it becomes necessary to evaluate the probability to
get the observed sample c given that b=0.05
• Generate c* by multiplying i by 0.05
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
17
Bayesian estimation
• Considering that c is a random sample from a normal distribution, one
can get the likelihood of c* conditional on b=0.05 using the known
likelihood function
• The unconditional (prior) probability that b=0.05 is also known, given
that we have assumed that the distribution is normal, with a 0.05 mean
and a standard deviation of 0.02
– It means that the probability of b=0.05 is about 20%
• With a computer and given the prior distribution of b, one can compute
the unconditional probabilities for all possible values of b and the
probabilities of all possible values of c*
• Using a slightly different notation of the Bayes rule which defines
L(b|c) as the likelihood function of the sample c:
P(b c)  L(b c) P(b )
Where the left-hand side is the (unknown) posterior probability of b.
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
18
Posterior distribution
• As mentioned, for any fixed value of b it is possible to compute
– the likelihood function
– the unconditional probability using the prior
• Suppose that for b=0.05 the likelihood of observing the collected data
set is 10%. Then, one may compute
P(0.05 c)  0.10  0.20  0.02
The above result does not mean that the probability is 2%, since there is a
proportionality relationship (not an equality one)
• However, repeating the experiment for the whole range of values for b
allows one to compute the probability distribution for b conditional on
the observed sample (the posterior distribution)
• This ultimately allows one to determine the most likely estimate for b.
• This estimate will be different from 0.05 unless we had an excellent
prior.
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
19
Final output
• The posterior distribution might also differ from the normal
distribution (although not in this case)
• From the posterior distribution it is possible to compute the
percentiles (see appendix); thus a 95% Bayesian confidence
interval can be obtained by considering the values of b
corresponding to the 2.5th percentile and the 97.5th one
from the posterior distribution
• The final result depends on the quality of the prior
• However, Bayesian statistics have extended the above
founding concepts very much and there are many ways to
relax the relevance of the prior assumption and check for
their robustness
• For example, there are non-informative priors which do not
assume particular knowledge of the parameters as they are
uniformly distributed around the maximum range of
possible values
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
20
Why Bayesian statistics are becoming so
popular
• One of the reasons for the Bayesian statistics comeback in the 21st century is
the fact that the Bayes rule can be applied iteratively
• This means that the prior distribution can be updated
• The progress in automated computing power has led to excellent results in
estimating complex models through Bayesian methods
• For example, modern Bayesian methods exploit the posterior distribution to
generate a larger number of draws from which estimates are actually
computed
• Bayesian statistics and marketing
• In a recent article, Rossi and Allenby (2003) have explored the major role that
Bayesian methods can play in marketing and include a long and annotated list
•
•
•
•
hypothesis testing with scanner data
extensions of conjoint analysis
Bayesian multidimensional scaling
the multinomial probit
many other Bayesian alternatives to frequent multivariate statistics
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
21