Qualitative and Limited Dependent Variable

Download Report

Transcript Qualitative and Limited Dependent Variable

ECON 6002
Econometrics
Memorial University of Newfoundland
Qualitative and Limited Dependent Variable
Models
Adapted from Vera Tabakova’s notes

16.1 Models with Binary Dependent Variables

16.2 The Logit Model for Binary Choice

16.3 Multinomial Logit

16.4 Conditional Logit

16.5 Ordered Choice Models

16.6 Models for Count Data

16.7 Limited Dependent Variables
Principles of Econometrics, 3rd Edition
Slide 16-2

Examples:
 An economic model explaining why some individuals take a
second, or third, job and engage in “moonlighting.”
 An economic model of why the federal government awards
development grants to some large cities and not others.
 An economic model explaining why someone is in the labour force
or not
Principles of Econometrics, 3rd Edition
Slide16-3
 An economic model explaining why some loan applications are
accepted and others not at a large metropolitan bank.
 An economic model explaining why some individuals vote “yes”
for increased spending in a school board election and others vote
“no.”
 An economic model explaining why some female college students
decide to study engineering and others do not.
Principles of Econometrics, 3rd Edition
Slide16-4
As long as these exhaust the possible (mutually exclusive) options
1 individual drives to work
y
0 individual takes bus to work
(16.1)
If the probability that an individual drives to work is p, then
P  y  1  p. It follows that the probability that a person uses public
transportation is P  y  0  1  p .
f ( y)  p y (1  p)1 y , y  0,1
(16.2)
E  y   p; var  y   p 1  p 
Principles of Econometrics, 3rd Edition
Slide16-5
y  E ( y)  e  p  e
(16.3)
E ( y )  p  1  2 x
(16.4)
y  E ( y )  e  1  2 x  e
Principles of Econometrics, 3rd Edition
(16.5)
Slide16-6
One problem with the linear probability model is that the error term is
heteroskedastic; the variance of the error term e varies from one
observation to another.
y value
e value
Probability
1
1   1  2 x 
p  1  2 x
0
  1  2 x 
1  p  1  1  2 x 
Principles of Econometrics, 3rd Edition
Slide16-7
var  e   1  2 x 1  1  2 x 
Using generalized least squares, the estimated variance is:
ˆ i2  var  ei    b1  b2 xi 1  b1  b2 xi 
yi*  yi ˆ i
xi*  xi ˆ i
(16.6)
So the problem of heteroskedasticity
is not insurmountable…
yi*  1ˆ i1  2 xi*  ei*
Principles of Econometrics, 3rd Edition
Slide16-8
p̂  b1  b2 x
dp
 2
dx
Principles of Econometrics, 3rd Edition
(16.7)
(16.8)
Slide16-9
Problems:

We can easily obtain values of p̂ that are less than 0 or greater than 1

Some of the estimated variances in (16.6) may be negative, so the
WLS would not work

Of course, the errors are not distributed normally

R2 is usually very poor and a questionable guide for goodness of fit
Principles of Econometrics, 3rd Edition
Slide16-10
Figure 16.1 (a) Standard normal cumulative distribution function (b) Standard normal
probability density function
Principles of Econometrics, 3rd Edition
Slide16-11
1 .5 z 2
( z ) 
e
2
1 .5u 2
e du
2
(16.9)
p  P[ Z  1  2 xp̂]   (1  2 x)
(16.10)
( z )  P[ Z  z ]  
z

Principles of Econometrics, 3rd Edition
Slide16-12
dp d (t ) dt

  (1  2 x)2
dx
dt dx
(16.11)
where t  1  2 x and (1  2 x) is the standard normal probability
density function evaluated at 1  2 x.
Note that this is clearly a nonlinear model: the marginal effect varies depending
on where you measure it
Principles of Econometrics, 3rd Edition
Slide16-13
Equation (16.11) has the following implications:
1.
Since (1  2 x) is a probability density function its value is always
positive. Consequently the sign of dp/dx is determined by the sign of
2. In the transportation problem we expect 2 to be positive so that
dp/dx > 0; as x increases we expect p to increase.
Principles of Econometrics, 3rd Edition
Slide16-14
2.
As x changes the value of the function Φ(β1 + β2x) changes. The
standard normal probability density function reaches its maximum
when z = 0, or when β1 + β2x = 0. In this case p = Φ(0) = .5 and an
individual is equally likely to choose car or bus transportation.
The slope of the probit function p = Φ(z) is at its maximum when
z = 0, the borderline case.
Principles of Econometrics, 3rd Edition
Slide16-15
3.
On the other hand, if β1 + β2x is large, say near 3, then the
probability that the individual chooses to drive is very large and
close to 1. In this case a change in x will have relatively little effect
since Φ(β1 + β2x) will be nearly 0. The same is true if β1 + β2x is a
large negative value, say near 3. These results are consistent with
the notion that if an individual is “set” in their ways, with p near 0 or
1, the effect of a small change in commuting time will be negligible.
Principles of Econometrics, 3rd Edition
Slide16-16
Predicting the probability that an individual chooses the alternative
y = 1:
pˆ  (1  2 x)
1
yˆ  
0
Principles of Econometrics, 3rd Edition
pˆ  0.5
pˆ  0.5
(16.12)
Although you have to
be careful with this
Interpretation!
Slide16-17
f ( yi )  [(1 2 xi )]yi [1  (1 2 xi )]1 yi , yi  0,1
(16.13)
f ( y1 , y2 , y3 )  f ( y1 ) f ( y2 ) f ( y3 )
Suppose that y1 = 1, y2 = 1 and y3 = 0.
Suppose that the values of x, in minutes, are x1 = 15, x2 = 20 and x3 = 5.
Principles of Econometrics, 3rd Edition
Slide16-18
P[ y1  1, y2  1, y3  0]  f (1,1,0)  f (1) f (1) f (0)
P[ y1  1, y2  1, y3  0] 
[1  2 (15)]  [1  2 (20)]  1  [1  2 (5)]
(16.14)
In large samples the maximum likelihood estimator is normally
distributed, consistent and best, in the sense that no competing
estimator has smaller variance.
Principles of Econometrics, 3rd Edition
Slide16-19
Principles of Econometrics, 3rd Edition
Slide16-20
1  2 DTIMEi  .0644  .0299 DTIMEi
(se)
(16.15)
(.3992) (.0103)
dp
 (1  2 DTIME )2  (0.0644  0.0299  20)(0.0299)
dDTIME
 (.5355)(0.0299)  0.3456  0.0299  0.0104
Measured at
DTIME = 20
Principles of Econometrics, 3rd Edition
Slide16-21
If it takes someone 30 minutes longer to take public transportation
than to drive to work, the estimated probability that auto
transportation will be selected is
pˆ  (1  2 DTIME )  (0.0644  0.0299  30)  .798
Since this estimated probability is 0.798, which is greater than 0.5, we
may want to “predict” that when public transportation takes 30
minutes longer than driving to work, the individual will choose to
drive. But again use this cautiously!
Principles of Econometrics, 3rd Edition
Slide16-22
In STATA:
Use transport.dta
. sum
Variable
Obs
Mean
autotime
bustime
dtime
auto
21
21
21
21
49.34762
48.12381
-1.223809
.4761905
Principles of Econometrics, 3rd Edition
Std. Dev.
32.43491
34.63082
56.91037
.5117663
Min
Max
.2
1.6
-90.7
0
99.1
91.5
91
1
Slide16-23
1
.8
.6
.4
.2
0
= 1 if auto chosen
Linear fit??? 
-100
-50
0
bus time - auto time
50
100
Slide16-24
Principles of Econometrics, 3rd Edition
Slide16-25
. probit
auto dtime
Iteration
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
4:
log
log
log
log
log
likelihood
likelihood
likelihood
likelihood
likelihood
=
=
=
=
=
-14.532272
-6.2074806
-6.165583
-6.1651585
-6.1651585
Probit regression
Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2
Log likelihood = -6.1651585
auto
Coef.
dtime
_cons
.029999
-.0644338
. probit
Std. Err.
.0102867
.3992438
z
2.92
-0.16
P>|z|
0.004
0.872
=
=
=
=
21
16.73
0.0000
0.5758
[95% Conf. Interval]
.0098374
-.8469372
.0501606
.7180696
auto
Iteration 0:
Iteration 1:
log likelihood = -14.532272
log likelihood = -14.532272
Probit regression
Number of obs
LR chi2(0)
Prob > chi2
Pseudo R2
Log likelihood = -14.532272
auto
Coef.
_cons
-.0597171
Std. Err.
.2736728
z
-0.22
=
=
=
=
21
-0.00
.
-0.0000
P>|z|
[95% Conf. Interval]
0.827
-.596106
.4766718
Slide16-26
* marginal effects
mfx
mfx,at (dtime=30)
* direct calculation
nlcom (normalden(_b[_cons]+_b[dtime]*30)*_b[dtime] )
and
nlcom (normal(_b[_cons]+_b[dtime]*30) )
Slide16-27
(l ) 
el
1  e 
l 2
,   l  
(16.16)
1
  l   p[ L  l ] 
1  e l
p  P  L  1  2 x    1  2 x  
Principles of Econometrics, 3rd Edition
(16.17)
1
1 e
 1 2 x 
(16.18)
Slide16-28
p
1
1  e 1 2 x 
exp  1  2 x 

1  exp  1  2 x 
1
1 p 
1  exp  1  2 x 
so
Pi
1P i
Principles of Econometrics, 3rd Edition
odds ratio exp
1 2 X
Slide16-29
so
Pi
1P i
odds ratio exp
1 2 X
Pi
ln1P
i
1 2 X
So the “logit”, the log-odds, is actually a fully linear function of X 
Principles of Econometrics, 3rd Edition
Slide16-30
1.
As Probability goes from 0 to 1, logit goes from –infinite to +
infinite
2.
The logit is linear, but the probability is not
3.
The explanatory variables are individual specific, but do not
change across alternatives
4.
The slope coefficient tells us by how much the log-odds changes
with a unit change in the variable
Slide16-31
1.
This model can be in principle estimated with WLS (due to the
heteroskedasticity in the error term) if we have grouped data (glogit
in STATA, while blogit will run ML logit on grouped data)
2.
Otherwise we use MLE on individual data
Slide16-32





McFadden’s pseudo R2
Count R2 (% of correct predictions)
Etc.
Measures of goodness of fit are of secondary
importance
What counts is the sign of the regression
coefficients and their statistical and practical
significance





Using MLE
A large sample method
=> estimated errors are asymptotic
=> we use Z test statistics (based on the
normal distribution), instead of t statistics
A likelihood ratio test (with a test statistic
distributed as chi-square with df= number of
regressors) is equivalent to the F test
Measures of Fit for probit of auto
Log-Lik Intercept Only:
D(19):
-14.532
12.330
McFadden's R2:
ML (Cox-Snell) R2:
McKelvey & Zavoina's R2:
Variance of y*:
Count R2:
AIC:
BIC:
BIC used by Stata:
0.576
0.549
0.745
3.915
0.905
0.778
-45.516
18.419
Log-Lik Full Model:
LR(1):
Prob > LR:
McFadden's Adj R2:
Cragg-Uhler(Nagelkerke) R2:
Efron's R2:
Variance of error:
Adj Count R2:
AIC*n:
BIC':
AIC used by Stata:
-6.165
16.734
0.000
0.438
0.733
0.649
1.000
0.800
16.330
-13.690
16.330
See http://www.soziologie.uni-halle.de/langer/logitreg/books/long/stbfitstat.pdf
. lstat
But be very careful with these
measures!
Probit model for auto
True
Classified
+
Total
D
~D
Total
9
1
1
10
10
11
10
11
21
Classified + if predicted Pr(D) >= .5
True D defined as auto != 0
Sensitivity
Specificity
Positive predictive value
Negative predictive value
Pr( +| D)
Pr( -|~D)
Pr( D| +)
Pr(~D| -)
90.00%
90.91%
90.00%
90.91%
False
False
False
False
Pr( +|~D)
Pr( -| D)
Pr(~D| +)
Pr( D| -)
9.09%
10.00%
10.00%
9.09%
+
+
-
rate
rate
rate
rate
for
for
for
for
true ~D
true D
classified +
classified -
Correctly classified
90.48%
So in STATA
The “ones” do not
Really have to be
Actual ones, just
Non-zeros




To compute the deviance of the residuals:
predict “newname”, deviance
The deviance for a logit model is like the RSS
in OLS. The smaller the deviance the better
the fit.
And (Logit only) to combine with information
about leverage:
predict “newnamedelta”, ddeviance
(A recommended cut-off value for the
ddeviance is 4)
. logit
auto dtime, nolog
Logistic regression
Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2
Log likelihood = -6.1660422
auto
Coef.
dtime
_cons
.0531098
-.2375754
. predict pred, p
. predict dev, deviance
. predict delta, ddeviance
. list
pred if delta>4
pred
13.
.0708038
Std. Err.
.0206423
.7504766
z
2.57
-0.32
P>|z|
0.010
0.752
=
=
=
=
21
16.73
0.0000
0.5757
[95% Conf. Interval]
.0126517
-1.708483
.093568
1.233332
Variable
probit
logit
dtime
bustime
_cons
-.0052
.103
-4.73
-.0044
.184
-8.15
chi2
df
N
aic
bic
24.7
24.5
21
10.3
13.5
21
10.5
13.7


A matter of taste nowadays, since we all have
good computers
The underlying distributions share the mean of
zero but have different variances:
 Logit
2
3
 And normal 1

So estimated slope coefficients differ by a
factor of about 1.8 ( 3 ) . Logit ones are
bigger


Watch out for “perfect predictions”
Luckily STATA will flag them for you  and
drop the culprit observations


Learn about the test (Wald tests based on chi2) and lrtest commands (LR tests), so you can
test hypotheses as we did with t-tests and F
tests in OLS
They are asymptotically equivalent but can
differ in small samples


Learn about the many extra STATA
capabilities that will make your postestimation
life much easier
Long and Freese’s book is a great resource
. logit
auto dtime, nolog
Logistic regression
Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2
Log likelihood = -6.1660422
auto
Coef.
dtime
_cons
.0531098
-.2375754
Std. Err.
.0206423
.7504766
z
P>|z|
2.57
-0.32
0.010
0.752
=
=
=
=
21
16.73
0.0000
0.5757
[95% Conf. Interval]
.0126517
-1.708483
.093568
1.233332
For example 
. listcoef, help
logit (N=21): Factor Change in Odds
Odds of: 1 vs 0
auto
b
z
P>|z|
e^b
e^bStdX
SDofX
dtime
0.05311
2.573
0.010
1.0545
20.5426
56.9104
b
z
P>|z|
e^b
e^bStdX
SDofX
=
=
=
=
=
=
raw coefficient
z-score for test of b=0
p-value for z-test
exp(b) = factor change in odds for unit increase in X
exp(b*SD of X) = change in odds for SD increase in X
standard deviation of X
Principles of Econometrics, 3rd Edition
Slide16-44
Logistic regression
Log likelihood =
-113.6769
honcomp
Coef.
female
_cons
.6513706
-1.400088
. logit
Iteration
Iteration
Iteration
Iteration
Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2
Std. Err.
.3336752
.2631619
z
1.95
-5.32
P>|z|
0.051
0.000
log
log
log
log
-.0026207
-1.915875
likelihood
likelihood
likelihood
likelihood
= -115.64441
= -113.68907
= -113.67691
= -113.6769
Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2
-113.6769
honcomp
Odds Ratio
female
1.918168
Principles of Econometrics, 3rd Edition
1.305362
-.8842998
For example 
Logistic regression
Log likelihood =
200
3.94
0.0473
0.0170
[95% Conf. Interval]
honcomp female, or
0:
1:
2:
3:
=
=
=
=
Std. Err.
.6400451
z
1.95
=
=
=
=
200
3.94
0.0473
0.0170
P>|z|
[95% Conf. Interval]
0.051
.9973827
3.689024
Slide16-45

Go through a couple of examples available
online with your own STATA session
connected to the internet. Examples:


http://www.ats.ucla.edu/stat/stata/dae/probit.htm
http://www.ats.ucla.edu/stat/stata/dae/logit.htm

http://www.ats.ucla.edu/stat/stata/output/old/lognoframe.htm
http://www.ats.ucla.edu/stat/stata/output/stata_logistic.htm
















binary choice models
censored data
conditional logit
count data models
feasible generalized least squares
Heckit
identification problem
independence of irrelevant
alternatives (IIA)
index models
individual and alternative specific
variables
individual specific variables
latent variables
likelihood function
limited dependent variables
linear probability model
Principles of Econometrics, 3rd Edition

















logistic random variable
logit
log-likelihood function
marginal effect
maximum likelihood estimation
multinomial choice models
multinomial logit
odds ratio
ordered choice models
ordered probit
ordinal variables
Poisson random variable
Poisson regression model
probit
selection bias
tobit model
truncated data
Slide 16-47

Long, S. and J. Freese for all topics (available
on Google!)


Multinomial Logit
Conditional Logit