Quality Improvement Demonstration Study (QIDS) An Evaluation of

Download Report

Transcript Quality Improvement Demonstration Study (QIDS) An Evaluation of

Linear Probability and
Logit Models (Qualitative
Response Regression Models)
SA Quimbo
September 2012
Dummy dependent variables

Dependent variable is qualitative in nature

For example, dependent variable takes only two possible values,
0 or 1

Examples.







Labor force participation
Insurance decision
Voter’s choice
School enrollment decision
Union membership
Home ownership
Predicted dependent variable ~ estimated probability
Dummy dependent (2)

Discrete choice models


Agent chooses among discrete choices:
{commute, walk}
Utility maximizing choice is that which solves:
Max [U(commute), U(walk)]



Utility levels are not observed, but choices are
Use a dummy variable for actual choice
Estimate a demand function for public
transportation
where Y = 1 if individual chose to commute
= 0 otherwise
Binary Dependent Variables (cont.)
• Suppose we were to predict whether
NFL football teams win individual
games, using the reported point spread
from sports gambling authorities.
• For example, if the Packers have a
spread of 6 against the Dolphins, the
gambling authorities expect the Packers
to lose by no more than 6 points.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
28-4
Binary Dependent Variables (cont.)
• Using the techniques we have
developed so far, we might regress
Win
i
D
  0  1Spreadi   i
where i indexes games
• How would we interpret the
coefficients and predicted values from
such a model?
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
28-5
Binary Dependent Variables (cont.)
Win
i
D
  0  1Spreadi   i
• DiWin is either 0 or 1. It does not make sense
to say that a 1 point increase in the spread
increases DiWin by 1. DiWin can change only
from 0 to 1 or from 1 to 0.
• Instead of predicting DiWin itself, we predict
the probability that DiWin = 1.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
28-6
Binary Dependent Variables (cont.)
Win
i
D
  0  1Spreadi   i
• It can make sense to say that a 1 point
increase in the spread increases the
probability of winning by 1.
• Our predicted values of DiWin are the
probability of winning.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
28-7
Linear Probability Model (LPM)

Consider the ff. model:
Yi = β1 + β2Xi + ui
where
i
Yi
Xi

~ families
= 1 if family owns a house
= 0 otherwise
= family income
Dichotomous variable is a linear function of Xi
LPM (2)

The predicted values of Yi can be
interpreted as the estimated probability
of owning a house, conditional on
income
i.e.,
E(Yi|Xi) = Pr(Yi=1|Xi)
LPM (3)





Let Pi = probability that Yi=1
Probability that Yi=0 is 1-Pi
E(Yi)?
E(Yi) = (1)(Pi) + (0) (1-Pi) = Pi
Yi = β1 + β2Xi + ui Linear Probability
Model
LPM (4)

Assuming E(ui) = 0

Then E(Yi|Xi) = β1 + β2Xi

Or Pi = β1 + β2Xi

Where 0 ≤ Pi ≤ 1
Problems in Estimating the LPM

Non-normality of disturbances:
Yi = β1 + β2Xi + ui
ui = Yi – β1 – β2Xi
If Yi = 1: ui = 1 – β1 – β2Xi
Yi = 0: ui = - β1 – β2Xi
* ui’s are binomially distributed
-> OLS estimates are unbiased;
-> as the sample increases, ui’s will tend to be normal
Problems (2)

Heteroskedastic Disturbances
var(ui) = E(ui-E(ui))2 = E(ui2)
= (1 – β1 – β2Xi)2(Pi) + (- β1 – β2Xi)2(1-Pi)
= (1 – β1 – β2Xi)2(β1 + β2Xi) + (- β1 – β2Xi)2(1-β1-β2Xi)
= (β1 + β2Xi) (1-β1-β2Xi)
= Pi (1-Pi)
* Var (ui) will vary with Xi
Problems (3)
Transform model in such a way that the transformed
disturbances are not heteroskedastic:
Let wi = Pi (1-Pi)
Yi
 X
u

 1  2 i i
wi
wi
wi
wi
 u
var  i
 wi

1
1
 
var (ui ) 
wi  1
wi
 wi
Problems (4)

R2 may not be a good measure of
model fit
Home
ownership
SRF
XXXXXXXX
X X X X X X XXX
Income
Problems (4)


Assumed bounds (0≤ E(Yi|Xi) ≤1)
could be violated
Example, Gujarati (see next slide): six
estimated values are negative and six
values are in excess of one
Example

Hypothetical data on home ownership

Gujarati, p.588
Source
Model
Residual
Total
y
x
_cons
SS
8.027495
1.947505
9.975
df
1
38
39
Coef.
Std. Err.
0.102131 0.008161
-0.945686 0.122842
MS
8.027495
0.051251
0.255769
t
12.52
-7.7
P>t
0
0
Number of obs
F( 1, 38)
Prob > F
R-squared
Adj R-squared
Root MSE
40
156.63
0
0.8048
0.7996
0.22638
[95% Conf.
0.085611
-1.194366
Interval]
0.118651
-0.697007
Linear vs.
Non-linear Probability Models
P
CDF, RV
1
~ logistically or
normally distributed RVs
SRF, LPM example
Constant= -0.94
Slope = 0.10
X
0
Binary Dependent Variables (cont.)
• We need a procedure to translate
our linear regression results into
true probabilities.
• We need a function that takes a value
from -∞ to +∞ and returns a value from
0 to 1.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
28-19
Binary Dependent Variables (cont.)
• We want a translator such that:



The closer to -∞ is the value from our linear
regression model, the closer to 0 is our
predicted probability.
The closer to +∞ is the value from our
linear regression model, the closer to 1 is
our predicted probability.
No predicted probabilities are less than 0
or greater than 1.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
28-20
Figure 19.2 A Graph of Probability of
Success and X
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
28-21
Binary Dependent Variables
• How can we construct such a translator?
• How can we estimate it?
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
28-22
Probit/Logit Models (Chapter 19.2)
• In common practice, econometricians use
TWO such “translators”:


probit
logit
• The differences between the two models
are subtle.
• For present purposes there is no practical
difference between the two models.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
28-23
Probit/Logit Models
•
Both the Probit and Logit models have
the same basic structure.
1. Estimate a latent variable Z using a linear
model. Z ranges from negative infinity to
positive infinity.
2. Use a non-linear function to transform Z
into a predicted Y value between 0 and 1.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
28-24
Probit/Logit Model (cont.)
• Suppose there is some unobserved
continuous variable Z that can take on
values from negative infinity to infinity.
• The higher E(Z) is, the more probable it
is that a team will win, or a student will
graduate, or a consumer will purchase a
particular brand.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
28-25
Probit/Logit Model (cont.)
• We call an unobserved variable, Z, that
we use for intermediate calculations, a
latent variable.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
28-26
Deriving Probit/Logit (cont.)
We assume Yi acts "as if"
determined by latent variable Z.
Z i   0  1 X 1i  ..   K X Ki   i
Yi  1 if Z i  0
Yi  0 if Z i  0
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
28-27
Deriving Probit/Logit (cont.)
• Note: the assumption that the
breakpoint falls at 0 is arbitrary.
• 0 can adjust for whichever breakpoint
you might choose to set.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
28-28
Deriving Probit/Logit (cont.)
• We assume we know the distribution
of ui.
• In the probit model, we assume ui is
distributed by the standard normal.
• In the logit model, we assume ui is
distributed by the logistic.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
28-29
Probit model (one explanatory
variable: 𝑍𝑖 = 𝛽1 + 𝛽2 𝑋𝑖 + 𝑢𝑖 )
Pr(Yi  1)  Pr( Z i  0)  P (ui   1   2 X i )
t 2

2
 E ( Zi )
t 2

2
 1   2 X i
1
1

e dt 

2
2
 1  F (  1   2 X i )

e
dt
(13)
where
F (  1   2 X i ) 
1
2

 1   2 X i

e
t 2
2
dt
Here t is standardised normal variable, t ~ N 0,1
30
Probit model
Hence
Pr(Yi  0)  Pr( Z i  0)  P(ui   1   2 X i )
t 2
2
1  E ( Zi )
1

e dt 

2 
2
 F ( 1   2 X i )

 1   2 X

e
t 2
2
dt
(13)
Here t is standardised normal variable, t ~ N 0,1
31
Steps for a probit model
1. probability depends upon unobserved utility index Z
which depends upon observable variables such as
income. There is a thresh-hold of this index when after
which family starts owning a house, Z  Z .
i
1
i
*
i
Pi  F Z i 
Pi
0 I
i
32
Estimating a Probit/Logit Model
(Chapter 19.2)
• In practice, how do we implement a probit
or logit model?
• Either model is estimated using a statistical
method called the method of maximum
likelihood.
28-33
Estimating a Probit/Logit Model
• In practice, how do we implement a probit
or logit model?
• Either model is estimated using a statistical
method called the method of maximum
likelihood.
Copyright © 2006
Pearson Addison-
28-34
Alternative Estimation Method

Ungrouped/ individual data

Maximum Likelihood Estimation

Choose the values of the unknown
parameters (β1,β2) such that the
probability of observing the given Ys is
the highest possible
MLE
1
1
1  e Zi

Recall:

P’s are not observed but Y’s are;



Pi  E (Yi  1 | X i ) 
1  e ( 1   2 X i )

Pr(Y=1)=Pi
Pr(Y=0)=1-Pi
Joint probability of observing n Y values:
f(Y1,…,Yn)=Πi=1,…,nPiYi (1-Pi)1-Yi
MLE (Gujarati, page 634)
Danao (2013), page 485: “Under standard regularity conditions, maximum
likelihood estimators are consistent, asymptotically normal, and
asymptotically efficient. In other words, in large samples, maximum
likelihood estimators are consistent, normal, and best.
MLE (2)

Taking its natural logarithm, the log
likelihood function is obtained:
ln f(Y1,…,Yn)=Yi(β1+ β2Xi)
-  ln[1+exp(β1+β2Xi)]

Max log likelihood function by
choosing (β1,β2)
Example

Individual data
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
log likelihood =
log likelihood =
log likelihood =
log likelihood =
log likelihood =
-27.27418
-16.41239
-15.49205
-15.43501
-15.43463
Logit estimates
Log likelihood=
y
x
_cons
Number 0f obs =
LR chi 2(1) =
Prob > chi(2) =
Pseudo R2 =
-15.43463
Coef.
Std. Err. z
0.494246 0.139327
-6.582815 1.951325
P>z
3.55
-3.37
40
23.68
0
0.4341
[95% Conf. Interval]
0 0.2211705 0.767322
0.001 -10.40734 -2.758289
Interpreting the results







Iterative procedure to get at the maximum of the log
likelihood function
Use Z (standard normal variable) instead of t
Pseudo R2 – more meaningful alternative to R2; or,
use the count R2
LR statistic is equivalent to the F ratio computed in
testing the overall significance of the model
Estimated slope coefficient measures the estimated
change in the logit for a unit change in X
Predicted probability (at the mean income) of
owning a home is 0.63
Or, every unit increase in income increases the
odds of owning a home by 11 percent
Pseudo R2
Danao, page 487, citing Gujarati (2008): “In binary regressand
models, goodness of fit is of secondary importance. What matters are
the expected signs of the regression coefficients and their statistical
and practical significance”
Logit Model

Consider the home ownership model:
Yi = β1 + β2Xi + ui
where
i
Yi
Xi
~ families
= 1 if family owns a house
= 0 otherwise
= family income
Logistic Probability Distribution


PDF: f(x) = exp(x)/[1+exp(x)]2
CDF: F(a) = exp(a)/[1+exp(a)]




Symmetric, unimodal distribution
Looks a lot like the normal
Incredibly easy to evaluate the CDF and
PDF
Mean of zero, variance > 1 (more
variance than normal)
43
The Logistic Distribution Function

Assume that owning a home is a random event, and the
probability that this random event occurs is given by:
Pi  E (Yi  1 | X i ) 
1
1  e ( 1   2 X i )
1

1  e Zi
where Zi = β1 + β2Xi
(i) 0≤Pi≤1 and (ii) Pi is a nonlinear function of Zi

OLS is not appropriate
The Odds Ratio (1)
1
1  e Zi
1  e Z1  1
e Z1
1



1  e Zi
1  e Zi
e Z1
1  Pi  1 
1



Zi 
1

e


1
1


  Z1
 Z1
Z1  Z i 
e e e
 e 1


Pi
1 (1  e  Zi ) 1  e Zi
1  e Zi
e Zi 1  e Zi
Zi





e
1  Pi 1 (1  e Zi ) 1  e Zi 1  1/ e Zi
e Zi  1
The Odds Ratio (2)

If the probability of owning a home is
10 percent, then the odds ratio is
.10/(1-0.10),
or the odds are 1 to 9 in favor of
owning a home
Logit

ln (Pi/(1-Pi))=ln (ezi)=Zi

ln (Pi/(1-Pi)) = β1 + β2Xi
The log of the odds-ratio is a linear function
of X and the parameters β1 and β2
- Pi Є [0,1] but ln (Pi/(1-Pi)) Є (-∞, ∞)
- Li = ln (Pi/(1-Pi)), Li~ “logit”
-
The Logit Model

Li = ln (Pi/(1-Pi)) = β1 + β2Xi + ui




Although P Є [0,1], logits are unbounded.
Logits are linear in X but P is not linear in
X
L<0 if the odds ratio<1 and
L>0 if the odds ratio>1
β2 measures the change in L (“log-odds”)
as X changes by one unit
Estimating the Logit Model

Problem with individual households/units:
ln(1/0) and ln(0/1) are undefined

Solution:
Estimate Pi/(1-Pi) from the data, where
Pi=relative frequency = ni/Ni
Ni = number of families for a specific level
of Xi (say, income)
ni= number of families owning a home
Example using Grouped Data
Estimate the home ownership model using grouped
data and OLS:
Yi = β1 + β2Xi + ui
x
tothh
6
8
10
13
15
20
25
30
35
40
40
50
60
80
100
70
65
50
40
25
numhomeown
8
12
18
28
45
36
39
33
30
20
Estimating (2)

Problem: heteroskedastic disturbances
If the proportion of families owning a home
follows a binomial distribution, then




1

u i ~ N 0,



N
P
1

P
i
i


i




Estimating (3)

Solution: Transform the model such that the new
disturbance term is homoscedastic

Consider: wi = NiPi(1-Pi)
wi Li  wi 1  wi  2 X i  wi ui
var( w i ui )  w i var( ui )  w i
1
1
wi
Estimating (4)
Estimate the ff. by OLS:
wi Li  wi 1  wi  2 X i  wi ui
where
^
^
w i  Ni P i (1  P i )
^
Pi 
ni
Ni

 Pi
Li  log
^
 1 P i

^




Note: regression model
has two regressors and
no constant
Example (2)

STATA results:
Source
Model
Residual
Total
SS
df
63.39469
2.336665
65.73135
M
MS
2 31.6973
8 0.292083
10
6.5731
lstar
xstar
sqrtw
Coef.
Std. Err. t
0.078669 0.005448
-1.593238 0.111494
Number of obs
F( 2, 8)
Prob > F
R-squared
Adj R-squared
Root MSE
P>t
14.44
-14.29
10
108.52
0
0.9645
0.9556
0.54045
[95% Conf.
Interval]
0
0.0661066 0.091231
0
-1.850344 -1.336131
Interpreting the results
^
^
^
wi Li  wi 1  wi X i  2
wi Li  - 1.593238 wi  .0786686 wi X i
A unit increase in weighted income (=sqrt(w)*X)
increases the weighted log-odds (=sqrt(w)*L) by
0.0786
Interpreting (2)

(antilog of the estimated coefficient of weighted X – 1) *100
= percent change in the odds in favor of owning a house for
every unit increase in weighted X;
eV
Predicted probabilities: p 
1  eV
^

where V is the predicted logit (= predicted lstar divided by
sqrt(w))

How does a unit increase in X impact on predicted
probabilities?
-> varies with X
dP
-> dX = b2 * P i *(1- P i )
Estimating a Probit/Logit Model (cont.)
• The computer then calculates the  ’s
that assigns the highest probability to the
outcomes that were observed.
• The computer can calculate the  ’s for you.
You must know how to interpret them.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
28-57
TABLE 19.3 What Point Spreads Say About
the Probability of Winning in the NFL: III
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
28-58
Estimating a Probit/Logit Model (cont.)
• In a linear regression, we look to
coefficients for three elements:
1. Statistical significance: You can still
read statistical significance from the
slope dZ/dX. The z-statistic reported for
probit or logit is analogous to OLS’s
t-statistic.
2. Sign: If dZ/dX is positive, then
dProb(Y)/dX is also positive.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
28-59
Estimating a Probit/Logit Model (cont.)

The z-statistic on the point spread is -7.22,
well exceeding the 5% critical value of 1.96.
The point spread is a statistically significant
explanator of winning NFL games.

The sign of the coefficient is negative.
A higher point spread predicts a lower
chance of winning.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
28-60
Estimating a Probit/Logit Model (cont.)
3. Magnitude: the magnitude of dZ/dX
has no particular interpretation.
We care about the magnitude of
dProb(Y)/dX.

From the computer output for a probit or
logit estimation, you can interpret the
statistical significance and sign of each
coefficient directly. Assessing magnitude
is trickier.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
28-61
Probit/Logit (cont.)
• To predict the Prob(Y ) for a given
X value, begin by calculating the
fitted Z value from the predicted linear
coefficients.
• For example, if there is only one
explanator X:
E(Z )  Zˆi  0  1 Xˆ i
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
28-62
Probit/Logit Model (cont.)
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
28-63
Probit/Logit Model (cont.)
• Then use the nonlinear function to
translate the fitted Z value into
a Prob(Y ):
ˆ
Prob(Y )  F (Z )
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
28-64
Probit/Logit Model (cont.)
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
28-65
Estimating a Probit/Logit Model (cont.)
• Problems in Interpreting Magnitude:
1. The estimated coefficient relates X to Z.
We care about the relationship between
X and Prob(Y = 1).
2. The effect of X on Prob(Y = 1) varies
depending on Z.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
28-66
Estimating a Probit/Logit Model (cont.)
• There are two basic approaches to
assessing the magnitude of the
estimated coefficient.
• One approach is to predict Prob(Y ) for
different values of X, to see how the
probability changes as X changes.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
28-67
Estimating a Probit/Logit Model (cont.)
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
28-68
Estimating a Probit/Logit Model (cont.)
• Note Well: the effect of a 1-unit
change in X varies greatly, depending
on the initial value of E(Z ).
• E(Z ) depends on the values of
all explanators.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
28-69
Estimating a Probit/Logit Model (cont.)
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
28-70
Estimating a Probit/Logit Model (cont.)
• For example, let’s consider the effect of 1
point change in the point spread, when
we start 1 standard deviation above the
mean, at SPREAD = 5.88 points.
• Note: In this example, there is only one
explanator, SPREAD. If we had other
explanators, we would have to specify
their values for this calculation, as well.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
28-71
Estimating a Probit/Logit Model (cont.)
• Step One: Calculate the E(Z ) values
for X = 5.88 and X = 6.88, using the
fitted values.
• Step Two: Plug the E(Z ) values into the
formula for the logistic density function.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
28-72
Estimating a Probit/Logit Model (cont.)
Z (5.88)  0  0.1098 5.88  0.6456
Z (6.88)  0  0.1098 6.88  0.7554
ˆ)
exp(
Z
For the logit, F ( Zˆ ) 
1  exp( Zˆ )
F (0.7554)  F (0.6456)  3.20  3.44  0.024.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
28-73
Estimating a Probit/Logit Model (cont.)
• Changing the point spread from 5.88 to 6.88
predicts a 2.4 percentage point decrease in
the team’s chance of victory.
• Note that changing the point spread from
8.88 to 9.88 predicts only a 2.1 percentage
point decrease.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
28-74
Estimating a Probit/Logit Model (cont.)
• The other approach is to use calculus.
dProb(Y ) dProb(Y ) dZˆ dF ˆ


1
dX 1
X 1 dZˆ
dZˆ
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
28-75
Estimating a Probit/Logit Model (cont.)
dProb(Y ) dProb(Y ) dZˆ dF ˆ


1
dX 1
X 1 dZˆ
dZˆ
dF
Unfortunately,
varies, depending
dZˆ
on Zˆ . However, a sample value can
be calculated for a representative Zˆ
value. Typically, we use the Zˆ
calculated at the mean values for each X .
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
28-76
Estimating a Probit/Logit Model (cont.)
• Some econometrics software packages can
calculate such “pseudo-slopes” for you.
• In STATA, the command is “dprobit.”
• EViews does NOT have this function.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved.
28-77
Tobit Model
It is an extension of the probit model, named after Tobin. We
observe variables if the event occurs: i.e. amount spent if someone
buys a house. We do not observe the dependent variable for
people who have not bought a house. The observed sample is
censored, contains observations for only those who buy the house.
Yi 
 0  1 X t  ut if event occurs
0
otherwise
is equal to    X  u is the event is observed equal to zero if the
event is not observed.
Yt
0
1
t
t
It is unscientific to estimate the equation only with observed
sample without worrying about the remaining observations in the
truncated distribution. The Tobit model tries to correct this bias.
78
79
Censored Regression Model
80
Truncated Regression Model
81
82