No Slide Title
Download
Report
Transcript No Slide Title
Choice modelling an introduction
Class experiment
Everybody loves Chocolatefact:
Of the following choices:
White
Chewy NoNuts
Dark
Chewy NoNuts
White
Soft
NoNuts
Dark
Soft
NoNuts
White
Chewy Nuts
Dark
Chewy Nuts
White
Soft
Nuts
Dark
Soft
Nuts
Which one do you choose?
introduction
When we wish to look at launching new products or
change features of current products we would like to
ascertain the impact this will have in the market place
We need to be able to model the effect of each change on
a range of similar products
The way we do this is via choice modelling
whereby each respondent examine a number of market
scenarios and gets to choose which product they would
purchase
We need to be able to answer the marketing questions
We need to be able to model this appropriately
Introduction…
There are 3 components to choice modelling:
Design of experiment
Analysis of data
Presentation of results
Firstly an example
Multinomial logit
We begin with a very simple example. In this example, each of ten subjects was presented
with eight different chocolate candies and asked to choose one.
The eight candies consist of the 23 combinations of dark or milk chocolate, soft or chewy
centre, and nuts or no nuts. Each subject saw all eight candies and made one choice.
There are m = 8 attribute vectors in this example, one for each alternative. Let x
Dark/Milk = (1 = Dark, 0 = Milk),
Soft/Chewy = (1 = Soft, 0 = Chewy),
Nuts/No Nuts =(1 = Nuts, 0 = No Nuts).
The eight attribute vectors are
x1 = (0 0 0) (Milk, Chewy, No Nuts)
x2 = (0 0 1) (Milk, Chewy, Nuts )
x3 = (0 1 0) (Milk, Soft, No Nuts)
x4 = (0 1 1) (Milk, Soft, Nuts )
x5 = (1 0 0) (Dark, Chewy, No Nuts)
x6 = (1 0 1) (Dark, Chewy, Nuts )
x7 = (1 1 0) (Dark, Soft, No Nuts)
x8 = (1 1 1) (Dark, Soft, Nuts )
Multinomial logit model
Experimental choice data such as these are typically analyzed with a
multinomial logit model.
• The Multinomial Logit Model
The multinomial logit model assumes that the probability that an
individual will choose one of the m alternatives, ci , from choice set C is
where xi is a vector of alternative attributes and b is a vector of unknown
parameters. U(c i) = xi b is the utility for alternative ci, which is a
linear function of the attributes.
The probability that an individual will choose one of the m alternatives,
ci, from choice set C is the exponential of the utility of the alternative
divided by the sum of all of the exponentiated utilities.
Hypothetical calculations
Probability Choice as a function of
utility
Note: for
pricing
Data this is
usually
a negative
relationship
The input data
8 choices , 10 persons so 80 observations
Typically, two variables are used to identify the choice sets,
subject ID and choice set within subject (for larger studies we
aggregate this over ID)
The variable Subj is the subject number, and Set identifies the
choice set within subject. The chosen alternative is indicated
by c=1, which means first choice.
All second and subsequent choices are unobserved, so the
unchosen alternatives are indicated by c=2,
The data
Fitting the Multinomial Logit Model
The data are now in the right form for analysis. In the SAS
System, the multinomial logit model is fit with the SAS/STAT
procedure PHREG (proportional hazards regression), with the
ties=breslow option.
The likelihood function of the multinomial logit model has the
same form as a survival analysis model fit by PROC
PHREG. See Statistics 764 Survival Analysis notes – Chapter 7
The code
proc phreg data=chocs outest=betas;
strata subj set;
model c*c(2) = dark soft nuts / ties=breslow;
label dark = ’Dark Chocolate’ soft = ’Soft Centre’ nuts = ’With
Nuts’;
run;
The data= option specifies the input data set. The outest= option
requests an output data set called BETAS
with the parameter estimates.
The strata statement specifies that each combination of the
variables Set and Subj forms a set from which a choice was
made. Each term in the likelihood function is a stratum.
There is one term or stratum per choice set per subject, and
each is composed of information about the chosen and all the
unchosen alternatives.
SAS output
Interpretation
“Model Fit Statistics” and “Testing Global Null Hypothesis:
BETA=0,” contain the overall fit of the model.
The-2 LOG L statistic under “With Covariates” is 28.727 and the
Chi-Square statistic is 12.8618 with 3 df (p=0.0049),
which is used to test the null hypothesis that the attributes do not
influence choice.
Note that 41.589 (-2 LOG L Without Covariates, which is -2 LOG
L for a model with no explanatory variables) minus
28.727 (-2 LOG LWith Covariates, which is -2 LOG L for a model
with all explanatory variables) equals 12.8618
(Model Chi-Square, which is used to test the effects of the
explanatory variables).
Probability of choice
The parameter estimates are used next to construct the
estimated probability that each alternative will be chosen.
The DATA step program uses the following formula to
create the choice probabilities.
Probabilities
Fabric Softener Example
The study involves four fictitious fabric softener brand names
Sploosh, Plumbbob, Platter, and Moosey.
Each choice set consists of each of these four brands and a
constant alternative Another.
Each of the brands is available
at three prices, $1.49, $1.99, and $2.49. Another is only
offered at $1.99.
There are 50 subjects, each of which will see the same choice
sets.
Designing the experiment
In order to do any choice model we need to construct an
experimental design
Using SAS
We can use the %MKTRUNS autocall macro to help us choose the
number of choice sets. All of the autocall macros used in this
report are documented starting on page 261. To use this macro,
you specify the number of levels for each of the factors. With four
brands each with three prices, you specify four 3’s.
title ’Choice of Fabric Softener’;
%mktruns( 3 3 3 3 )
Output
The design
In this problem, the %MKTRUNS macro reports ten different
sizes with no violations Ideally, we would like to have a
manageable number of choice sets for people to evaluate and
a design that is both orthogonal and balanced.
When violations are reported, orthogonal and balanced
designs are not possible. While orthogonality and balance are
not required, they are nice properties to have. With 4 threelevel factors, the number of choice sets in all orthogonal and
balanced designs must be divisible by 3 x 3 = 9.
In this example we would go for 18 runs.
The design ….
In the next steps, an efficient experimental design is created. We
will use an autocall macro %MKTDES to create most of our
designs.
When you invoke the %MKTDES macro for a simple problem,
you only need to specify the factors, number of levels,
and number of runs. The macro does the rest.
For just main effects we simply type (note no second order
effects are asked for here – usually we ask for them):
%let n = 18; /* n choice sets */
%mktdes(factors=x1-x4=3, n=&n)
proc print;
run;
Design output
Design
For now, notice that the macro found a perfect,
orthogonal and balanced, 100% efficient design consisting of
three-level factors, x1-x4. The levels are the
integers 1 to 3.
Note that we would need to randomise the order of these
eventual this design to consumers.
Design ..
What consumers see
Etc…
The Data
How the data needs to be formatted
Person #1 first 3 scenarios – note this assumes the
price effect is the same for each brand – usually it’s different.
The analysis
proc phreg data=coded outest=betas;
title2 ’Discrete Choice Model’;
model c*c(2) = Sploosh Plumbbob Platter Moosey Another
Price / ties=breslow;
strata subj set;
run;
proc phreg data=coded
outest=betas;
title2 ’Discrete Choice Model’;
model c*c(2) = / ties=breslow;
strata subj set;
run;
The analysis