Chapter 16 Qualitative and Limited Dependent

Download Report

Transcript Chapter 16 Qualitative and Limited Dependent

ECON 4551
Econometrics II
Memorial University of Newfoundland
Qualitative and Limited Dependent Variable
Models
Adapted from Vera Tabakova’s notes

16.1 Models with Binary Dependent Variables

16.2 The Logit Model for Binary Choice

16.3 Multinomial Logit

16.4 Conditional Logit

16.5 Ordered Choice Models

16.6 Models for Count Data

16.7 Limited Dependent Variables
Principles of Econometrics, 3rd Edition
Slide 16-2

Examples:
 An economic model explaining why some individuals take a
second, or third, job and engage in “moonlighting.”
 An economic model of why the federal government awards
development grants to some large cities and not others.
 An economic model explaining why someone is in the labour force
or not
Principles of Econometrics, 3rd Edition
Slide16-3
 An economic model explaining why some loan applications are
accepted and others not at a large metropolitan bank.
 An economic model explaining why some individuals vote “yes”
for increased spending in a school board election and others vote
“no.”
 An economic model explaining why some female college students
decide to study engineering and others do not.
Principles of Econometrics, 3rd Edition
Slide16-4
As long as these exhaust the possible (mutually exclusive) options
1 individual drives to work
y
0 individual takes bus to work
(16.1)
If the probability that an individual drives to work is p, then
P  y  1  p. It follows that the probability that a person uses public
transportation is P  y  0  1  p .
f ( y)  p y (1  p)1 y ,
y  0,1
(16.2)
E  y   p; var  y   p 1  p 
Principles of Econometrics, 3rd Edition
Slide16-5
y  E ( y)  e  p  e
(16.3)
E ( y )  p  1  2 x
(16.4)
y  E ( y )  e  1  2 x  e
Principles of Econometrics, 3rd Edition
(16.5)
Slide16-6
One problem with the linear probability model is that the error term is
heteroskedastic; the variance of the error term e varies from one
observation to another.
y value
e value
Probability
1
1   1  2 x 
p  1   2 x
0
  1  2 x 
1  p  1  1  2 x 
Principles of Econometrics, 3rd Edition
Slide16-7
var  e   1  2 x 1  1  2 x 
Using generalized least squares, the estimated variance is:
ˆ i2  var  ei    b1  b2 xi 1  b1  b2 xi 
yi*  yi ˆ i
xi*  xi ˆ i
(16.6)
So the problem of heteroskedasticity
is not insurmountable…
yi*  1ˆ i1  2 xi*  ei*
Principles of Econometrics, 3rd Edition
Slide16-8
p̂  b1  b2 x
dp
 2
dx
Principles of Econometrics, 3rd Edition
(16.7)
(16.8)
Slide16-9
Problems:

We can easily obtain values of p̂ that are less than 0 or greater than 1

Some of the estimated variances in (16.6) may be negative, so the
WLS would not work

Of course, the errors are not distributed normally

R2 is usually very poor and a questionable guide for goodness of fit
Principles of Econometrics, 3rd Edition
Slide16-10
Figure 16.1 (a) Standard normal cumulative distribution function (b) Standard normal
probability density function
Principles of Econometrics, 3rd Edition
Slide16-11
1 .5 z 2
( z ) 
e
2
1 .5u 2
e du
2
(16.9)
p  P[ Z  1  2 xp̂]  (1  2 x)
(16.10)
 ( z )  P[ Z  z ]  
z

Principles of Econometrics, 3rd Edition
Slide16-12
cumulative
density
dp d (t ) dt

  (1  2 x)2
dx
dt dx
(16.11)
where t  1  2 x and (1  2 x) is the standard normal probability
density function evaluated at 1  2 x.
Note that this is clearly a nonlinear model: the marginal effect varies depending
on where you measure it
Principles of Econometrics, 3rd Edition
Slide16-13
Equation (16.11) has the following implications:
1.
Since (1  2 x) is a probability density function its value is always
positive. Consequently the sign of dp/dx is determined by the sign of
2. In the transportation problem we expect 2 to be positive so that
dp/dx > 0; as x increases we expect p to increase.
Principles of Econometrics, 3rd Edition
Slide16-14
2.
As x changes the value of the function Φ(β1 + β2x) changes. The
standard normal probability density function reaches its maximum
when z = 0, or when β1 + β2x = 0. In this case p = Φ(0) = .5 and an
individual is equally likely to choose car or bus transportation.
The slope of the probit function p = Φ(z) is at its maximum when
z = 0, the borderline case.
Principles of Econometrics, 3rd Edition
Slide16-15
3.
On the other hand, if β1 + β2x is large, say near 3, then the
probability that the individual chooses to drive is very large and
close to 1. In this case a change in x will have relatively little effect
since Φ(β1 + β2x) will be nearly 0. The same is true if β1 + β2x is a
large negative value, say near 3. These results are consistent with
the notion that if an individual is “set” in their ways, with p near 0 or
1, the effect of a small change in commuting time will be negligible.
Principles of Econometrics, 3rd Edition
Slide16-16
Predicting the probability that an individual chooses the alternative
y = 1:
pˆ  (1  2 x)
1
yˆ  
0
Principles of Econometrics, 3rd Edition
pˆ  0.5
pˆ  0.5
(16.12)
Although you have to
be careful with this
Interpretation!
Slide16-17
f ( yi )  [(1 2 xi )]yi [1  (1 2 xi )]1 yi , yi  0,1
(16.13)
f ( y1 , y2 , y3 )  f ( y1 ) f ( y2 ) f ( y3 )
Suppose that y1 = 1, y2 = 1 and y3 = 0.
Suppose that the values of x, in minutes, are x1 = 15, x2 = 20 and x3 = 5.
Principles of Econometrics, 3rd Edition
Slide16-18
P[ y1  1, y2  1, y3  0]  f (1,1,0)  f (1) f (1) f (0)
P[ y1  1, y2  1, y3  0] 
[1  2 (15)]  [1  2 (20)]  1  [1  2 (5)]
(16.14)
In large samples the maximum likelihood estimator is normally
distributed, consistent and best, in the sense that no competing
estimator has smaller variance.
Principles of Econometrics, 3rd Edition
Slide16-19
Principles of Econometrics, 3rd Edition
Slide16-20
1  2 DTIMEi  .0644  .0299 DTIMEi
(se)
(16.15)
(.3992) (.0103)
dp
 (1  2 DTIME )2  (0.0644  0.0299  20)(0.0299)
dDTIME
 (.5355)(0.0299)  0.3456  0.0299  0.0104
Marginal effect of DT
Measured at
DTIME = 20
Principles of Econometrics, 3rd Edition
Slide16-21
If it takes someone 30 minutes longer to take public transportation
than to drive to work, the estimated probability that auto
transportation will be selected is
pˆ  (1  2 DTIME )  (0.0644  0.0299  30)  .798
Since this estimated probability is 0.798, which is greater than 0.5, we
may want to “predict” that when public transportation takes 30
minutes longer than driving to work, the individual will choose to
drive. But again use this cautiously!
Principles of Econometrics, 3rd Edition
Slide16-22
In STATA:
Use transport.dta
. sum
Variable
Obs
Mean
autotime
bustime
dtime
auto
21
21
21
21
49.34762
48.12381
-1.223809
.4761905
Principles of Econometrics, 3rd Edition
Std. Dev.
32.43491
34.63082
56.91037
.5117663
Min
Max
.2
1.6
-90.7
0
99.1
91.5
91
1
Slide16-23
1
.8
.6
.4
.2
0
= 1 if auto chosen
Linear fit??? 
-100
-50
0
bus time - auto time
50
100
Slide16-24
NORMAL distribution
Not t distribution, because
the properties of the probit
are asymptotic
You can
choose
p-values
Understand but
do not use this one!!!
What is the meaning
of this test?
Principles of Econometrics, 3rd Edition
Slide16-25
. probit
auto dtime
Iteration
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
4:
log
log
log
log
log
likelihood
likelihood
likelihood
likelihood
likelihood
=
=
=
=
=
-14.532272
-6.2074806
-6.165583
-6.1651585
-6.1651585
Probit regression
Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2
Log likelihood = -6.1651585
auto
Coef.
dtime
_cons
.029999
-.0644338
Std. Err.
.0102867
.3992438
z
P>|z|
2.92
-0.16
0.004
0.872
=
=
=
=
21
16.73
0.0000
0.5758
[95% Conf. Interval]
.0098374
-.8469372
.0501606
.7180696
. mfx compute
Marginal effects after probit
y = Pr(auto) (predict)
= .45971697
variable
dy/dx
dtime
.0119068
Std. Err.
.0041
z
2.90
P>|z|
0.004
[
95% C.I.
.003871
]
.019942
X
-1.22381
Evaluates at the means by default too
Principles of Econometrics, 3rd Edition
26
. probit
auto dtime
Iteration
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
4:
log
log
log
log
log
likelihood
likelihood
likelihood
likelihood
likelihood
=
=
=
=
=
-14.532272
-6.2074806
-6.165583
-6.1651585
-6.1651585
You can request these
iterations in GRETL too
Probit regression
Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2
Log likelihood = -6.1651585
auto
Coef.
dtime
_cons
.029999
-.0644338
. probit
Std. Err.
.0102867
.3992438
z
2.92
-0.16
P>|z|
0.004
0.872
=
=
=
=
21
16.73
0.0000
0.5758
[95% Conf. Interval]
.0098374
-.8469372
.0501606
.7180696
auto
Iteration 0:
Iteration 1:
What yields cnorm(-0.0597171)???
log likelihood = -14.532272
log likelihood = -14.532272
Probit regression
Number of obs
LR chi2(0)
Prob > chi2
Pseudo R2
Log likelihood = -14.532272
auto
Coef.
_cons
-.0597171
Std. Err.
.2736728
z
-0.22
=
=
=
=
21
-0.00
.
-0.0000
P>|z|
[95% Conf. Interval]
0.827
-.596106
.4766718
Slide16-27
This is a probability
Principles of Econometrics, 3rd Edition
IN STATA
* marginal effects
mfx
mfx,at (dtime=20)
* direct calculation
nlcom (normalden(_b[_cons]+_b[dtime]*30)*_b[dtime] )
and
nlcom (normal(_b[_cons]+_b[dtime]*30) )
Slide16-29
(l ) 
e l
1  e 
l 2
,  l  
(16.16)
1
  l   p[ L  l ] 
1  el
p  P  L  1  2 x    1  2 x  
Principles of Econometrics, 3rd Edition
(16.17)
1
1 e
 1 2 x 
(16.18)
Slide16-30
p
1
1  e 1 2 x 
exp  1  2 x 

1  exp  1  2 x 
1
1 p 
1  exp  1  2 x 
so
Pi
1P i
Principles of Econometrics, 3rd Edition
odds ratio exp
1 2 X
Slide16-31
so
Pi
1P i
odds ratio exp
1 2 X
Pi
ln1P
i
1 2 X
So the “logit”, the log-odds, is actually a fully linear function of X 
Principles of Econometrics, 3rd Edition
Slide16-32
1.
As Probability goes from 0 to 1, logit goes from –infinite to +
infinite
2.
The logit is linear, but the probability is not
3.
The explanatory variables are individual specific, but do not
change across alternatives
4.
The slope coefficient tells us by how much the log-odds changes
with a unit change in the variable
Slide16-33
1.
This model can be in principle estimated with WLS (due to the
heteroskedasticity in the error term) if we have grouped data (glogit in
STATA, while blogit will run ML logit on grouped data) IN GRETL If
you want to use logit for analysis of proportions (where the dependent
variable is the proportion of cases having a certain characteristic, at each
observation, rather than a 1 or 0 variable indicating whether the
characteristic is present or not) you should not use the logit command,
but rather construct the logit variable, as in genr lgt_p = log(p/(1 - p))
2.
Otherwise we use MLE on individual data
Slide16-34





McFadden’s pseudo R2 (remember that it does not have any
natural interpretation for values between 0 and 1)
Count R2 (% of correct predictions) (dodgy but
common!)
Etc.
Measures of goodness of fit are of secondary
importance
What counts is the sign of the regression
coefficients and their statistical and practical
significance





Using MLE
A large sample method
=> estimated errors are asymptotic
=> we use Z test statistics (based on the
normal distribution), instead of t statistics
A likelihood ratio test (with a test statistic
distributed as chi-square with df= number of
regressors) is equivalent to the F test
How do you obtain this?
Measures of Fit for probit of auto
Log-Lik Intercept Only:
D(19):
-14.532
12.330
McFadden's R2:
ML (Cox-Snell) R2:
McKelvey & Zavoina's R2:
Variance of y*:
Count R2:
AIC:
BIC:
BIC used by Stata:
0.576
0.549
0.745
3.915
0.905
0.778
-45.516
18.419
Log-Lik Full Model:
LR(1):
Prob > LR:
McFadden's Adj R2:
Cragg-Uhler(Nagelkerke) R2:
Efron's R2:
Variance of error:
hoAdj Count R2:
AIC*n:
BIC':
AIC used by Stata:
-6.165
16.734
0.000
0.438
0.733
0.649
1.000
0.800
16.330
-13.690
16.330
See http://www.soziologie.uni-halle.de/langer/logitreg/books/long/stbfitstat.pdf
. lstat
But be very careful with these
measures!
Probit model for auto
True
Classified
+
Total
D
~D
Total
9
1
1
10
10
11
10
11
21
Classified + if predicted Pr(D) >= .5
True D defined as auto != 0
Sensitivity
Specificity
Positive predictive value
Negative predictive value
Pr( +| D)
Pr( -|~D)
Pr( D| +)
Pr(~D| -)
90.00%
90.91%
90.00%
90.91%
False
False
False
False
Pr( +|~D)
Pr( -| D)
Pr(~D| +)
Pr( D| -)
9.09%
10.00%
10.00%
9.09%
+
+
-
rate
rate
rate
rate
for
for
for
for
true ~D
true D
classified +
classified -
Correctly classified
90.48%
So in STATA
The “ones” do not
Really have to be
Actual ones, just
Non-zeros
IN GRETL if you do
not have a binary
Dependent variable
It is assumed
Ordered unless
specified multinomial.
If not discrete: error!




To compute the deviance of the residuals:
predict “newname”, deviance
The deviance for a logit model is like the RSS
in OLS. The smaller the deviance the better
the fit.
And (Logit only) to combine with information
about leverage:
predict “newnamedelta”, ddeviance
(A recommended cut-off value for the
ddeviance is 4)
. logit
auto dtime, nolog
Logistic regression
Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2
Log likelihood = -6.1660422
auto
Coef.
dtime
_cons
.0531098
-.2375754
. predict pred, p
. predict dev, deviance
. predict delta, ddeviance
. list
pred if delta>4
pred
13.
.0708038
Std. Err.
.0206423
.7504766
z
2.57
-0.32
P>|z|
0.010
0.752
=
=
=
=
21
16.73
0.0000
0.5757
[95% Conf. Interval]
.0126517
-1.708483
.093568
1.233332
Variable
probit
logit
dtime
bustime
_cons
-.0052
.103
-4.73
-.0044
.184
-8.15
chi2
df
N
aic
bic
24.7
24.5
21
10.3
13.5
21
10.5
13.7
Why does rule of thumb
not work for dtime 


A matter of taste nowadays, since we all have
good computers
The underlying distributions share the mean of
zero but have different variances:
 Logit
2
3
 And normal 1

So estimated slope coefficients differ by a
factor of about 1.8 ( 3 ) . Logit ones are
bigger
Watch out for “perfect predictions”
 Luckily STATA will flag them for you  and drop
the culprit observations


Gretl has a mechanism for preventing the algorithm
from iterating endlessly in search of a nonexistent
maximum. One sub-case of interest is when the
perfect prediction problem arises because of a
single binary explanatory variable. In this case, the
offending variable is dropped from the model and
estimation proceeds with the reduced specification.
However, it may happen that no single “perfect
classifier” exists among the regressors, in which
case estimation is simply impossible and the
algorithm stops with an error.
 If this happens, unless your model is trivially misspecified (like predicting if a country is an oil
exporter on the basis of oil revenues), it is normally
a small-sample problem: you probably just don’t
have enough data to estimate your model. You may
want to drop some of your explanatory variables.



Learn about the test (Wald tests based on chi2) and lrtest commands (LR tests), so you can
test hypotheses as we did with t-tests and F
tests in OLS
They are asymptotically equivalent but can
differ in small samples



Learn about the many extra STATA
capabilities, if you use it, that will make your
postestimation life much easier
Long and Freese’s book is a great resource
GRETL is more limited but doing things by
hand for now will actually be a good thing!
. logit
auto dtime, nolog
Logistic regression
Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2
Log likelihood = -6.1660422
auto
Coef.
dtime
_cons
.0531098
-.2375754
Std. Err.
.0206423
.7504766
z
P>|z|
2.57
-0.32
0.010
0.752
=
=
=
=
21
16.73
0.0000
0.5757
[95% Conf. Interval]
.0126517
-1.708483
.093568
1.233332
For example 
. listcoef, help
logit (N=21): Factor Change in Odds
Odds of: 1 vs 0
auto
b
z
P>|z|
e^b
e^bStdX
SDofX
dtime
0.05311
2.573
0.010
1.0545
20.5426
56.9104
b
z
P>|z|
e^b
e^bStdX
SDofX
=
=
=
=
=
=
raw coefficient
z-score for test of b=0
p-value for z-test
exp(b) = factor change in odds for unit increase in X
exp(b*SD of X) = change in odds for SD increase in X
standard deviation of X
Principles of Econometrics, 3rd Edition
Slide16-47
Logistic regression
Log likelihood =
-113.6769
honcomp
Coef.
female
_cons
.6513706
-1.400088
. logit
Iteration
Iteration
Iteration
Iteration
Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2
Std. Err.
.3336752
.2631619
z
1.95
-5.32
P>|z|
0.051
0.000
log
log
log
log
-.0026207
-1.915875
likelihood
likelihood
likelihood
likelihood
= -115.64441
= -113.68907
= -113.67691
= -113.6769
Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2
-113.6769
honcomp
Odds Ratio
female
1.918168
Principles of Econometrics, 3rd Edition
1.305362
-.8842998
For example 
Logistic regression
Log likelihood =
200
3.94
0.0473
0.0170
[95% Conf. Interval]
honcomp female, or
0:
1:
2:
3:
=
=
=
=
Std. Err.
.6400451
z
1.95
=
=
=
=
200
3.94
0.0473
0.0170
P>|z|
[95% Conf. Interval]
0.051
.9973827
3.689024
Slide16-48

Stata users? Go through a couple of examples
available online with your own STATA session
connected to the internet. Examples:


http://www.ats.ucla.edu/stat/stata/dae/probit.htm
http://www.ats.ucla.edu/stat/stata/dae/logit.htm

http://www.ats.ucla.edu/stat/stata/output/old/lognoframe.htm
http://www.ats.ucla.edu/stat/stata/output/stata_logistic.htm
















binary choice models
censored data
conditional logit
count data models
feasible generalized least squares
Heckit
identification problem
independence of irrelevant
alternatives (IIA)
index models
individual and alternative specific
variables
individual specific variables
latent variables
likelihood function
limited dependent variables
linear probability model
Principles of Econometrics, 3rd Edition

















logistic random variable
logit
log-likelihood function
marginal effect
maximum likelihood estimation
multinomial choice models
multinomial logit
odds ratio
ordered choice models
ordered probit
ordinal variables
Poisson random variable
Poisson regression model
probit
selection bias
tobit model
truncated data
Slide 16-50

Long, S. and J. Freese for all topics (available
on Google!)


Multinomial Logit
Conditional Logit