Handout 14-2

Transcript Handout 14-2

Research Method
Lecture 14-2
Multinomial logit
(Maddala Ch 12.2)
©
1
Multinomial logit as a
random utility model
It is useful to understand the multinomial
logit model as a random utility model
with extreme value distribution.
First, we consider a random utility model
with two choices. This turns out to be the
same as the logit model.
This can be naturally extended to more
than 2 choices, which becomes the
multinomial logit model.
2
Random utility model with
two choices
Consider the labor force participation of
married women.
The woman decides whether to
participate or not in the labor force by
comparing the utility from participation
and non-participation.
3
Consider that the utility from
participation, U1, and the utility of nonparticipation, U2 are given by:
Utility of participat ion : U1  x1   1
Utility of non - participat ion : U 2  x 2   2
where we are using the following vector
notation.
1  ( 10 , 11,..., 1k )T
 2  (  20 ,  21 ,...,  2 k )
x  (1, x1 ,..., xk )
T
Transpose to
make this a
column vector.
4
It is important to understand what the
above vector notation means:
U1  x1  1  10  11x1  ...  1k xk  1
U 2  x 2   2   20   21x1  ...   2 k xk   2
5
The assumptions
We make the following assumptions.
The two error terms ε1 and ε2 are
independent within a person.
These error terms are independent across
persons.
Both ε1 and ε2 follow type I extreme value
distribution, where the density and the
cumulative distribution functions are given
by
e e
f ( )  e e
F ( )  e
e
6
Given these assumptions, the probability
of participation is given by:
P1  P(U1  U 2 )  P( x1   1  x 2   2 )
 P( 2  x( 1   2 )   1 )


 P(
2
 x( 1   2 )   1 |  1 ) f ( 1 )d 1
2
 x( 1   2 )   1 ) f ( 1 )d 1



 F (



e
 e [ x ( 1 2 )1 ]
e
 e 1
e
 1
d 1



e

e
1 [1e
 x ( 1  2 )
]
e 1 d 1
7

1
1  e  x ( 1   2 )
e x1
 x1
e  e x 2
In a similar way, you can compute the probability of nonparticipation,
P2  P (U 2  U1 )
e  x ( 1   2 )

 x ( 1   2 )
1 e
x 2
e
 x1 x 2
e e
8
To summarize,
e x1
Participat ion : P1  P (U1  U 2 ) 
 x1
 x ( 1   2 )
1
e  e  e x 2
1
(1)
e  x ( 1   2 )
e x 2
Non - participat ion : P2  P (U 2  U1 ) 
 x1
 x ( 1   2 )
1
e  e  e x 2
( 2)
These probability is the basis for the
estimation. If a person is working, then
the likelihood contribution is P1. If the
person is not working, the likelihood
contribution is P2.
9
One important thing to note is that, you
cannot estimate β1 and β2 separately. You
can only estimate the difference (β1- β2).
This is fine since our purpose is to
investigate the participation probability,
and it is determined by the difference.
For estimation, we normalize either β1 or
β2 to be zero.
10
So, let us normalize be β2 to be zero: β2
=(β20, β21)=(0,0). In this case, we set the
equation 2 to be the base equation.
When we set the equation 2 to be the base
equation, P1 and P2 can be written as:
e x1
Participat ion : P1  P(U1  U 2 ) 
1  e x1
1
Non - participat ion(base eq) : P2  P(U 2  U1 ) 
1  e x 2
Notice, these are identical to the logit
model. Thus, when there are only two
choice, the random utility model is
identical to the logit model.
11
The important things to remember are
that
1.You have to normalize the parameters of
one equation to be zero.
2.When one equation is normalized in two
choice mode, it is identical to the logit
model.
This model can be extended to more than
two choices. Such models are called the
multinomial logit model.
12
Exercise
Using Mroz.dta, estimate the following
model using STATA mlogit command.
This command estimate the multinomial
logit model.
Utility of participat ion :U1  10  11age  1
Utility of non - participat ion :U 2   20   21age   2
Set the non-participation outcome as the
base outcome.
13
Answer
. mlogit inlf age, baseoutcome(0)
Iteration 0: log likelihood = -514.8732
Iteration 1: log likelihood = -512.43119
Iteration 2: log likelihood = -512.43089
Multinomial logistic regression
Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2
Log likelihood = -512.43089
inlf
Coef. Std. Err.
age
_cons
-.0202008 .0091652
1.13636 .3983383
z
=
=
=
=
753
4.88
0.0271
0.0047
P>|z|
[95% Conf. Interval]
-2.20 0.028
2.85 0.004
-.0381643 -.0022374
.3556314 1.917089
1
(inlf==0 is the base outcome)
14
. logit inlf age
Iteration 0:
Iteration 1:
Iteration 2:
log likelihood = -514.8732
log likelihood = -512.43119
log likelihood = -512.43089
Logistic regression
Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2
Log likelihood = -512.43089
inlf
Coef.
age
_cons
-.0202008
1.13636
Std. Err.
.0091652
.3983383
z
-2.20
2.85
P>|z|
0.028
0.004
=
=
=
=
753
4.88
0.0271
0.0047
Check that
mlogit is
identical to
the logit
model
[95% Conf. Interval]
-.0381643
.3556314
-.0022374
1.917089
15
Multinomial logit model
(Random utility model with 3
choices)
When we extent the model to 3 or more
choices, the model is called the
multinomial logit model.
For simplicity, I only explain the case
where we have 3 choices.
16
As an example, consider the choice of
working in either (1) national university,
(2) private university and (3) public
university.
Then, the random utility model is written
as:
Private univ : U1  x1   1
National Univ : U 2  x 2   2
Public Univ : U 3  x 3   3
We assume that ε1, ε2 and ε2 are
independent and follow type I extreme
value distribution.
17
Then, the probability that a person is
working in private university is written as
P1  P(U1  U 2 , U1  U 3 )
 P( x1   1  x 2   2 , x1   1  x 3   3 )
 P( 2  x( 1   2 )   1 ,  3  x( 1   3 )   1 )


 P(
2
 x( 1   2 )   1 ,  3  x( 1   3 )   1 |  1 ) f ( 1 )d 1
2
 x( 1   2 )   1 ) F ( 3  x( 1   3 )   1 ) f ( 1 )d 1



 F (



e
 e [ x ( 12 )1 ]
e
 e [ x ( 13 )1 ]
e
 e 1
e 1 d 1



e

e
1 [1e
 x ( 1  2 )
e
 x ( 1  3 )
]
e 1 d 1
18

1
1  e  x ( 1   2 )  e  x ( 1   3 )
e x1
 x1
e  e x 2  e x3
19
In a similar way, we can compute the
probability of working in national
university and public university as
follows.
Private univ : P1  P(U1  U 2 ,U1  U 3 ) 
1
1  e  x ( 1   2 )  e  x ( 1  3 )
National Univ : P2  P(U 2  U1 ,U 3  U1 ) 
Public Univ : P3  P(U 3  U1 ,U 3  U 2 ) 
e x1
 x1 x 2 x3
e e e
1
1  e  x (  2  1 )  e  x (  2  3 )
1
1  e  x ( 3  1 )  e  x ( 3   2 )
e x 2
 x1 x 2 x3
e e e
e x3
 x1 x 2 x3
e e e
20
Similarly to the case of two choice cases,
we have to normalize the parameters of
one equation to be zero. This is because
we can estimate only the differences: (β1β2) and (β2-β3).
21
So, let us set the parameters for equation 3
to be zero.
Then, the probability can be re-written as:
22
Let’s set equation 3 to be the base
equation (set the parameters for the third
equation to be zero). This means that we
set equation 3 to be the base equation.
Then, the probabilities are written as:
e x1
Private univ : P1  P(U1  U 2 ,U1  U 3 ) 
1  e x1  e x 2
e x 2
National Univ : P2  P(U 2  U1 ,U 3  U1 ) 
1  e x1  e x 2
Public Univ (base eq) : P3  P(U 3  U1 ,U 3  U 2 ) 
1
1  e x1  e x 2
23
If the person is working in private
university, the likelihood contribution of
that person is P1. If the person is working
in national university, the likelihood
contribution is P2. If the person is working
in public university, the likelihood
contribution is P3.
This model is called the multinomial logit
model.
24
Exercise
Estimate multinomial logit model of the
choice among working in Private,
National and Public University with
experience as the only explanatory
variable. Set public university as the base
equation. Use univtype.dta
25
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
log
log
log
log
likelihood
likelihood
likelihood
likelihood
=
=
=
=
-318.70745
-314.45652
-314.37852
-314.37833
Multinomial logistic regression
Number of obs
LR chi2(2)
Prob > chi2
Pseudo R2
Log likelihood = -314.37833
univtype
Coef. Std. Err.
z
P>|z|
=
=
=
=
360
8.66
0.0132
0.0136
[95% Conf. Interval]
1
exper
_cons
.0472459 .0194769
1.247718 .3268847
2.43 0.015
3.82 0.000
.0090719
.6070354
.08542
1.8884
exper
_cons
.0279146 .0201504
1.0401 .3375024
1.39 0.166
3.08 0.002
-.0115794
.3786071
.0674087
1.701592
2
(univtype==3 is the base outcome)
26
The estimated probability
After estimating the multinomial logit
model, you can estimate the probability of
working in either private, national or
public university given the value of
explanatory variable as:
e x1
Private univ : P1  P(U1  U 2 ,U1  U 3 ) 
1  e x1  e x 2
e x 2
National Univ : P2  P(U 2  U1 ,U 3  U1 ) 
1  e x1  e x 2
Public Univ (base eq) : P3  P(U 3  U1 ,U 3  U 2 ) 
1
1  e x1  e x 2
27
The partial effects
Partial effect chose “how much the
probability of choosing one alternative
increase if the explanatory variable
increases by one unit”.
Let us take the probability of choosing
private university as an example. In our
example x=(1, x1)T, where x1 = age.
28
Then, the partial effect is given as:
P1 e x1 ( 11  e x 2 11  e x 2  21 )

x1
(1  e x1  e x 2 )
You have to note the use of the vector
notation where 1  (10 , 11) and  2  ( 20 ,  21) .
Thus, β11 is the age coefficient for the first
equation, and β21 is the age coefficient for
the second equation.
The partial effect for other alternatives are
computed by taking derivatives in a
similar way.
29
The partial effect at average is computed by
plugging in the average values of the
explanatory variables, and it is computed
automatically by STATA.
It is extremely important to note that the sign of
the partial effect depends not only on the
parameters of that equation but also the
parameters of other equations.
In some case, you may have negative parameter
in one equation but the partial effect is positive.
Thus you always have to check the partial effect
before interpreting the results.
30
Exercise
Estimate multinomial logit model of
choice among working in Private,
National and Public University with
experience as the only explanatory
variable. Set public university as the base
equation. Use univtype.dta. Then,
compute the effect of experience on
working in a private university.
31
.
. mlogit univtype exper, baseoutcome(3)
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
log
log
log
log
likelihood
likelihood
likelihood
likelihood
=
=
=
=
-318.70745
-314.45652
-314.37852
-314.37833
Multinomial logistic regression
Number of obs
LR chi2(2)
Prob > chi2
Pseudo R2
Log likelihood = -314.37833
univtype
Coef.
exper
_cons
exper
_cons
=
=
=
=
360
8.66
0.0132
0.0136
Std. Err.
z
P>|z|
[95% Conf. Interval]
.0472459
1.247718
.0194769
.3268847
2.43
3.82
0.015
0.000
.0090719
.6070354
.08542
1.8884
.0279146
1.0401
.0201504
.3375024
1.39
3.08
0.166
0.002
-.0115794
.3786071
.0674087
1.701592
1
2
(univtype==3 is the base outcome)
32
Computing the
partial effect
manually
. egen avexper=mean(exper)
.
. gen eb1=exp([1]_b[_cons]+[1]_b[exper]*avexper)
. gen eb2=exp([2]_b[_cons]+[2]_b[exper]*avexper)
.
. gen partial =eb1*([1]_b[exper]+eb2*[1]_b[exper]-eb2*[2]_b[exper])/((1+eb1+eb2)^2)
. su partial
Variable
Obs
Mean
partial
360
.0059167
Std. Dev.
Min
Max
0 .0059167 .0059167
33
Computing the partial effect automatically
. mfx, varlist(exper) predict(p outcome(1))
Marginal effects after mlogit
y = Pr(univtype==1) (predict, p outcome(1))
= .5840146
variable
dy/dx
Std. Err.
exper
.0059167
.00231
z
P>|z| [
2.56 0.010
95% C.I. ]
X
.00139 .010444 17.0732
34

Handout 14-2

Transcript Handout 14-2

Directory