Statistics for Marketing and Consumer Research

Download Report

Transcript Statistics for Marketing and Consumer Research

Discrete choice models
Chapter 16
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
1
Choice models and preferences
• Choice modeling is the preferred model for studies on
consumer preferences
• Choice models are closely related stated preference theory
• Stated preference survey: consumers state their choices among a
potential set of alternatives (e.g. different brands, different
product characteristics, different stores)
• Options can include both real and hypothetical market alternatives
• Choice models start from stated preferences to go back to their
determinants
• The alternative to stated preference is revealed preference
• where consumers are not asked directly what they prefer or choose
but their actual choices and determinants are observed indirectly,
for example considering what they purchase in different situations
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
2
Stated vs. revealed preference
• Example
• A customer finds that the price of her favourite washing powder in
her usual supermarket has doubled
• Will she buy less washing powder, like a smaller pack?
• Or would she move to a different brand?
• Would she go back home without buying washing powder at
all?
• It could be difficult to define a model which explains
choices using revealed preference, i.e. observing behaviors
at the checkout till
– if the customer decides not to buy washing powder at all, how
would it be possible to infer this choice simply from a look at the
products in her shopping trolley?
– if the customer buys an alternative brand with exactly the same size
and price as before the price increase would a revealed preference
model capture that consumer decision?
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
3
Stated vs. revealed preference
• Revealed preference allows one to model these
behaviors, but only after an expensive collection of
information on the frequency quantity and brands
of washing powder purchases
• Stated preference alternative
• a survey where the consumer is asked to choose
between a set of alternative choices which differ by
brand, pack size and price
• Provided that the survey is designed in an appropriate
way (not necessarily easy) the collected data open the
way to a more effective model
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
4
Choice models
• Consumer models are usually targeted on the average
behaviour
• With revealed preference one might apply a regression
model; where we purchased the quantity is the dependent
variable and price and other explanatory variables are on
the right-hand side.
• With stated preference models a discrete choice variable
is on the left-hand side of the equation
•
•
•
Example
the choice whether to purchase washing powder or not (binary
dependent variable);
choice among a set of alternative brands (categorical DV)
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
5
Why regression does not work
• With binary or categorical dependent variables standard
regression analysis is not appropriate
• Example
• binary dependent variable y coded to be zero for non-purchases and one for
purchases
with
• X is a continuous metric variable y  a  b x  
• Problems
0 for non purchases
y
1 for purchases
• After least square estimation predictions of y using the value of x would
produce many other values than zero and one including values below zero
and values above one
• Different coding for the binary dependent variable (e.g. one and two, or
zero and ten) would lead to very different estimates for the a and b
coefficients which makes the interpretation of the regression parameters
difficult
• The above model does not meet the assumptions of the regression model
since multivariate normality of the dependent variable for any value of the
explanatory variables is broken
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
6
Discrete choice models
yi  a  b xi  
• Discrete choice models generalize the regression model for
the situations where y is a non-metric variable
– a binary (0-1) variable or
– an ordinal variable (like a questionnaire item assuming the values
completely disagree, disagree, neither, agree, completely agree) or
– a categorical variable (for example a nominal variable recording the
preferred holiday destination).
• The right-hand side variable is generally assumed to be metric
• Binary and categorical variables on the right-hand side can be
translated into dummies and used as explanatory variables
like in regression analysis
• Non-metric dependent variables violate the normality and the
homoskedasticity assumptions of regression; an alternative
approach is used to estimate discrete choice models
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
7
Binary choice model
• Y can assume the discrete values zero or one
• To model y as a function of x one can exploit a latent
variable (as for SEM)
– y assumes either value zero or one depending on the threshold value
d of a metric and continuous latent variable z
• The regression model is rewritten as
 yi  0 if zi  d

 yi  1 if zi  d
• the dependent variable y is one when a latent continuous
variable z is above the threshold d and zero otherwise
• The model is completed by a regression equation linking
the latent variable to the explanatory variable
zi  a  b xi   i
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
8
The auxiliary regression
zi  a  b xi   i
•
•
•
The above model has a metric and continuous dependent variable
After some assumptions the distribution of  is known
Problems:
1) z is not observed and
2) d is unknown
•
•
•
Problem 2 is easily resolved: as long as the intercept a appears in
the regression equation, one may arbitrarily choose d (the easiest
way is to fix it at zero) and the only result which will change is the
estimate of the intercept a
Problem 1 requires one to create z for each observation as a
function of y, taking into account the information which we have,
that is the proportions of zero and one for the y variable
It is necessary to make an assumption on the probability
distribution for this latent variable and how it is linked to y, i.e. a
link function between y and z must be specified
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
9
The link functions
• The link function specifies the relationship
between z and y through the expected value of the
appropriate distribution function for the generic
observation yi
• For example, with binary data, one can assume
that the probabilities of each observation yi follow
a binomial distribution
• there are a number of transformations of y which
create a z variable compatible with the binomial
distribution
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
10
The logistic transformation
•
•
•
Probabilities that y=1 (on the vertical axis) concentrate around zero for values of x
below a certain threshold, then go quickly towards 1 when x is above the threshold.
The function fits well with the need for approximating the probabilities of a binary
outcome as a function of the explanatory variable.
The logistic transformation of y into z is obtained by applying the logit link function to
the expected value of y.
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
11
Logistic regression
• The logit transformation is the link function for logistic
regression
• The logit transformation is the log of the odds that y=1
relative to y=0
• The logit link allows to transform the binary variable y into a
continuous variable z
• The final equation is a regression model with a continuous
variable on the left-hand side
• The only difference from the standard regression model is
that the distribution of the error is not normal but logistic.
• Estimation of a and b can be obtained by maximum likelihood
which works with any known probability distribution of the
errors and returns the maximum likelihood estimates (the
most probable values for the parameters)
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
12
Types of discrete choice models
• Logistic regression: at least one of the explanatory
variables is metric and continuous
• Logit model: all of the variables on the right-hand side are
non-metric (binary or categorical)
• This is a conventional distinction; often the two terms are
used interchangeably
• In a logit model with a categorical or binary x variable, the
coefficient b is mathematically related to the odds ratio
(with respect to the baseline category of x) of having a
positive outcome
– For example, if the dependent variable is one when the consumer
buys a specific brand and x measures whether the consumer has kids
or not, one can compute with eb the odds ratio of buying the brand
for consumers with kids as compared to consumers without kids.
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
13
Probit model
• The Probit model is also applied to binary dependent variables but
with different assumptions on the link function and the error
distribution
• The link function (called probit) is the inverse of the standard normal
cumulative distribution function
• This link function guarantees that the distribution of the model which is
finally estimated is still normal
• The choice between the probit and the logit distribution depends on
the type of dependent variable
• if the dependent variable can be reasonably assumed to be a proxy for a
true underlying variable which is normally distributed then the probit model
should be chosen
• if the dependent variable is considered to be a truly qualitative and
binomial character then logit modelling should be preferred
• generally the two models lead to very similar results, unless cases are
concentrated to the tails of the distributions in which case the logit link
function should be chosen
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
14
Generalizations
• ordered logit (ordered probit) models
• the dependent variable is not binary but categorical and
the categories are ordered
• multinomial logit (multinomial probit)
• The dependent variable is categorical but categories
cannot be ordered
• multivariate logit (multivariate probit)
• Several discrete choice models are estimated
simultaneously (there are multiple dependent variable)
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
15
Discrete choice models in SPSS
• Trust data-set
• Binary logistic regression
• Example application (as for discriminant analysis)
• Dependent variable: buying chicken at the butcher’s
shop
• Explanatory variables:
•
•
•
•
weekly expenditure on chicken
age
safety of butcher’s chicken
trust in supermarkets
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
16
Binary logistic regression
Dependent variable
Explanatory variables
It is possible to
opt for step-wise
selection of
explanatory
variables
Declare categorical
variables
Additional
statistics
Save predicted values or residuals
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
17
Additional statistics
The test compares
the expected
frequencies with
those actually
observed after
dividing the subject
in ten equal groups
according to their
predicted
probabilities
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
In logistic regression, the
exponential function of
the coefficients are odds
ratio – this option
provides confidence
intervals
18
Output
Model Summary
Step
1
-2 Log
likelihood
467.079a
Cox & Snell
R Square
.157
These are goodness-of-fit measures
similar to the regression R square
Nagelkerke
R Square
.217
a. Estimation terminated at iteration number 5 because
parameter estimates changed by less than .001.
Hosmer and Lemeshow Test
Step
Chi-square
1
3.030
df
The hypothesis of equality between
observed and predicted frequencies is not
rejected
Sig.
8
.932
Classification Table(a)
Predicted
Butcher
Observed
Step 1
Butcher
no
yes
Percentage
Correct
No
243
34
87.7
Yes
89
54
37.8
Overall Percentage
70.7
The classification
table shows the
frequencies of
correctly predicted
observations
a The cut value is .500
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
19
Coefficient estimates
These are odds ratios and are interpreted as follows: A one-year
increase in age (q51) leads to a 2.2% increase in the odds of
purchasing chicken at the butcher shop (i.e. the ratio between the
probability of doing it and the probability of not doing it)
Variables in the Equation
B
Step
a
1
q51
q43b
q21d
q5
Constant
.022
-.269
.441
.085
-3.169
As requested, 95% confidence
S.E.
Wald
df
Sig.
Exp(B)
intervals
for
the
odds
ratio
are
shown
.007
8.988
1
.003
1.022
.074
.077
.028
.615
13.327
32.888
8.975
26.539
1
1
1
1
.000
.000
.003
.000
.764
1.554
1.088
.042
95.0% C.I.for EXP(B)
Lower
Upper
1.008
1.037
.661
.883
1.337
1.807
1.030
1.150
a. Variable(s) entered on step 1: q51, q43b, q21d, q5.
All predictors
are significantly
different from
zero
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
A unit increase in trust in supermarket (q43b)20
decreases the same odds ratio by 23.6%.
Logit and probit models
• Logit and probit models (all explanatory
variables are categorical) can also be
estimated using this alternative menu
• However, SPSS data need to be structured
as counts of “success” cases (response
frequency), with an additional column for the
total number of cases
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
21
Data for logit/probit models
• This can be easily accomplished: one can create the total observed
variables by creating new variables of ones (for example tot). If does
that, one can repeat the above analysis by selecting q8d as the
response frequency and tot as the total observed variable.
This is the original binary
variable
This is the artificial variable of
ones
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
22
Logit model estimation
Binary variable
Artificial variable
Covariates
Results are very similar to
those obtained from
logistic regression
Model choice
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
23
The Generalized Linear Model (GLM)
• The GLM is a comprehensive modeling procedure which includes logistic
regression, logit and probit (among others) as special cases
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
24
GLM
• It is comprehensive modeling approach for discrete
choice modeling where one or more dependent
categorical variables are modeled as the outcome
of one or more explanatory variables which can be
metric or non-metric.
• Depending on the type of link function the GLM
collapses into:
• logistic regression
• logit or probit models
• multinomial or multivariate logistic regression logit or
probit models
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
25
GLMs
A binary
dependent
variable here
leads to discrete
choice models
It is possible to choose
the dependent variable
distribution and the link
function
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
26
Defining discrete choice models
through GLMs
Here the
explanatory
variables are
selected
Here more
model options
(e.g. interaction)
are defined –
Note that this
procedure can
also be used for
log-linear
analysis
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
Provide
some details
on how to
estimate
parameters
Many additional
statistics can be
required
27
Predictors
Factors are categorical variables
Covariates are treated as metric
variables
If only covariates are considered, then the
model is a logistic regression
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
28
Model
•It is necessary to specify how
the predictors enter the model
•They need to be included as
main effect
•If desired, interactions (also
higher than two-way ones) may
be introduced (see loglinear
analysis)
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
29
Output
• As expected results are identical to logistic
regression
Parameter Estimates
95% Wald Confidence
Interval
Parameter
(Intercept)
q5
q51
q21d
q43b
(Scale)
B
3.169
-.085
-.022
-.441
.269
1a
Std. Error
.6152
.0283
.0072
.0769
.0738
Lower
1.963
-.140
-.036
-.592
.125
Upper
4.375
-.029
-.008
-.290
.414
Hypothesis Test
Wald
Chi-Square
26.539
8.975
8.988
32.888
13.327
df
1
1
1
1
1
Sig.
.000
.003
.003
.000
.000
Dependent Variable: Butcher
Model: (Intercept), q5, q51, q21d, q43b
a. Fixed at the displayed value.
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
30
Ordinal logit
The dependent
variable is ordered
Both factors
and covariate
can enter as
predictors
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
31
Output
The Pearson’s
Chi-square indicate a good fit, intended as the
Warnings
similarity between the predicted and observed data (but it is
There are 975 (77.0%) cells (i.e., dependent variable levels by combinations of predictor
sensitive
variable values) with zero
frequencies. to the large number of empty cells)
A large proportion of
Model Fitting Information
The Pseudo R-square statistics are quite low,
suggesting
that
empty
cells may
lead to
the model could be improved by the inclusion
of other
covariates
invalid
goodness-of-fit
-2 Log
Model
Likelihood
Chi-Square
df
Sig.
and factors.
measures
Intercept Only
1004.627
Final
987.352
17.276
7
Link function: Logit.
Goodness-of-Fit
Pearson
Chi-Square
1088.968
df
1073
Sig.
.360
816.931
1073
1.000
Deviance
.016
A significant Chi-square
statistic indicates that the
ordered logit model is better
than an intercept only model
Link function: Logit.
Pseudo R-Square
Cox and Snell
.051
Nagelkerke
.052
McFadden
.014
Link function: Logit.
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
32
Parameter estimates
Parameter Estimates
Threshold
[q43j = 1]
[q43j = 2]
[q43j = 3]
[q43j = 4]
[q43j = 5]
[q43j = 6]
q51
[q60=0]
[q60=1]
[q60=2]
[q60=3]
[q60=4]
[q60=5]
[q60=6]
Estimate
-2.861
-2.389
-1.738
-.672
.051
1.302
-.009
-.845
-.158
-.069
.121
-.081
.774
0a
Std. Error
.941
.935
.930
.925
.925
.930
.006
.905
.898
.912
.918
.952
1.111
.
Wald
9.250
6.528
3.492
.527
.003
1.960
1.961
.873
.031
.006
.017
.007
.486
.
df
1
1
1
1
1
1
1
1
1
1
1
1
1
0
Sig.
.002
.011
.062
.468
.956
.162
.161
.350
.861
.940
.895
.933
.486
.
95% Confidence Interval
Lower Bound
Upper Bound
-4.705
-1.017
-4.222
-.556
-3.560
.085
-2.486
1.142
-1.761
1.864
-.521
3.126
-.021
.004
-2.618
.928
-1.917
1.602
-1.857
1.719
-1.678
1.920
-1.947
1.786
-1.403
2.952
.
.
The location parameters translate the predictors into a value for the latent variable.
Location
The threshold determines the cut-off points for allocating an observation of a
given value of the dependent variable,according to the value of the latent variable.
The Wald test (corresponding to the t-test in regression) shows that the predictors
Link function: Logit.
do not
actually
significantly.
This is consistent with the poor Pseudo R
a. This
parameter iscontribute
set to zero because
it is redundant.
square statistics.
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
33
Marginal effects
• What could be interesting (at least for a
model with a better fit) is the computation
of the marginal effects
• They represent the change in the probability
of an observation of being classified in each
specific category of the dependent variable
according to the values of the predictors
• Unfortunately SPSS does not provide
marginal effects
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
34
Multinomial logit/probit
The process is similar to the one
leading to the estimation of
ordered logistic regression and
the output should also be
interpreted accordingly
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
35
Statistical packages for discrete choice
models
• SPSS lacks some useful features like the estimation of marginal effects
• In SAS discrete choice models can be estimated with several procedures in
SAS/STAT
–
–
–
–
CATMOD is employed for estimating logistic regression when the data are structured
as a frequency table
Binary and ordinal logistic regression can be obtained through the procedure
LOGISTIC
The same models can be estimated with the PROBIT procedure which also enables
estimation of probit models.
GENMOD allows one to specify a variety of link functions for generalized linear
models
• LIMDEP was specifically created for the estimation of limited dependent
variable models,which include discrete choice models
–
It is extremely flexible and contains all the required features and the most up-todate diagnostics
• STATA estimates discrete choice models with marginal effects
• Econometric views allow estimation of discrete choice models but the
availability of diagnostics is rather limited when compared to LimDep and no
marginal effects are displayed.
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
36
Conjoint analysis
• Very popular research technique in marketing
closely associated with stated preference analysis
• Mainly exploited for the development of new
products and the modification of product
characteristics
• Conjoint analysis is not a model or an estimation
technique but rather a methodology for
constructing the data collection instrument when
the final objective is choice modeling
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
37
Marketing applications of conjoint
analysis
• The most common application in consumer research is the analysis of
consumer evaluations of different combinations of product attributes
• Example
• A car manufacturer needs to take some decision about some options to be
provided for car configuration
• range of colours
• model of car stereo
• presence of air conditioning, etc.
• Rather than asking consumers about their evaluation of these attributes on
a one-by-one basis,conjoint analysis starts by creating potential
combinations of the product attributes
• E.g.
• Combination 1: red car, with an mp3 stereo player and no air-conditioning,
• Combination 2: red car, but with a standard CD player and air-conditioning, etc.
• Respondents choose among these alternative potential products defined by
the combination of attributes
• From the final choice,conjoint analysis elicits the relevance of each
attribute
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
38
Conjoint analysis
• When several attributes are considered
simultaneously the number of potential
combinations is quite high
• Conjoint analysis creates many different choice
sets each one containing a limited number of
options
• Conjoint analysis is based on the statistical control
of
• the way choices are allocated in the sample
• the distribution of attributes
• Hence, the collected data enable inference on
preferences and evaluations for the individual
attributes
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
39
Theoretical basis for conjoint analysis
• The underlying theory for conjoint analysis is based on the
economic concept of utility
• each individual has a specific set of preferences for bundles of
products (and attributes)
• individuals take decisions in a way to maximize the level of
satisfaction from consumption (the utility level)
• By observing many individuals it is possible to go back from
stated choices to preferences
• Conjoint analysis is inspired by scientific experimental
designs and the terminology reflects this association
• Attributes are called factors (e.g. car colour)
• The different values factors can assume are the levels (red, blue,
yellow, etc.)
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
40
Factors and choice sets
• An additional factor could be the price of the car
• By including price levels in the choice set it becomes possible to
evaluate how much consumers would be willing to pay for the car they
prefer
• Among the potential set of choices there are some nonsense choices
e.g. including all car options but setting a very low price
• Nonsense choices can be excluded by the researcher who has control on
the overall choice set
• Questionnaire
• Respondents must choose from the preferred combination of attributes or
• Respondents must rank all possible choices according to their preferences
• Conjoint analysis is a decompositional method (recall multidimensional
scaling techniques),as it starts from an overall evaluation to infer
preferences for the individual product attributes
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
41
Theories attributes and choice
• The experimental design and the modeling of
preferences depend on theories which link the
evaluation of single attributes to the final choice
• part-worth model: assumes that total utility of a choice
is equal to the sum of utilities of the attributes of that
specific choice
• vector linear model: applicable when all attributes are
measured on a metric (continuous) scale, assumes a
linear relationship between the utility of individual
attributes and total utility
• ideal point model: assumes that the consumer has an
ideal level for all factors and the total utility depends
upon the distance between the actual levels and the
ideal levels
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
42
Experimental design
• The key problem of conjoint analysis is the large number of
alternative combinations of attributes which arise when
there are many factors and levels
• E.g. a product with six attributes, each with three levels potentially
allows for 729 different combinations
• It would be unrealistic to assume that respondents are able
to choose among so many alternatives
• This problem can be solved by an appropriate experimental
design
• Objective: understand the relationship between the factors and the
potential choice with a number of observations as small as possible
• The experimental design sets the criteria to obtain the preference
information from an aggregation of respondents (full factorial designs:
all potential products are compared (729 in the example))
• fractional factorial designs: exploits the experimental design to reduce
the number of choices, still guaranteeing that the sample will produce
meaningful aggregate results
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
43
Types of conjoint analyses
• Traditional conjoint analysis
– each respondent is faced with the whole set of attributes
– it requires either a full factorial design or a fractional factorial
design
– all attributes appear in the choice set of each respondent (although
not for all levels)
– becomes inapplicable as the number of factors or levels increases
• Adaptive conjoint analysis
– these design issues are dealt with
– each respondent only deals with a sub-set of potential choices
– these sub-set can be defined in different ways. For example:
respondents could be asked to rank the factors first, then the
ranking is exploited to adapt data collection
– Computer software learns from the earlier responses and builds the
data-sets accordingly
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
44
Choice-based conjoint (1)
• The decomposition of the observed choices into weights
and preferences for single attributes is generally obtained
for an aggregate of consumers or for homogeneous groups
of consumers
• Several techniques can be employed for this purpose
• The evolution of discrete choice models has given
relevance to a specific type of adaptive conjoint analysis,
choice-based conjoint
• Choice-based conjoint gives the respondent the possibility
of evaluating all attributes, not in a single (often too
complex) choice, but rather within a sequence of smaller
choice sets where the possibility of choosing none of the
alternatives is also given
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
45
Example
• Example
• Car colour: red or blue
• Air conditioning: yes or no
• Single choice set
•
•
•
•
red with air conditioning (AC)
red without AC
blue with AC
blue without AC
• Choice-based conjoint
• first choose among
• red with AC
• blue without AC
• none of them
• then choose between
• blue with AC
• blue without AC
• none of them
• These choices are related and with a smaller set of choices it is possible to
compare all attributes
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
46
Choice-based conjoint (2)
• The advantages of choice-based conjoint are apparent with
complex cases
• Respondents do not need to compare too many stimuli at
once,
• They face a more realistic choice among a limited set of
alternatives
• With many factors and levels, each respondent can be
asked to face a limited number of choice sets
• The sufficient condition is that an homogeneous group of
respondents (i.e. respondents that are similar in terms of
characteristics that can influence the choice) is confronted
with the whole range of alternatives, then the estimation
technique will do the rest
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
47
Estimation and models
• The experimental design is at the core of a successful choice-based
conjoint
• There is an evolving research effort to guarantee the quality of the
analysis
• Once the data has been collected the natural estimation technique is
the multinomial logit
• choices represent the categorical dependent variable and the attribute
levels are the explanatory variables
• There are computer packages specifically developed for conjoint
analysis
• SPSS Conjoint module
• deals with the experimental design
• provides estimates based on an orthogonal decomposition of the design
matrix
• In SAS/STAT, the TRANSREG procedure is a useful support to define the
experimental design
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
48