Gra6020-2-2007spring

Download Report

Transcript Gra6020-2-2007spring

GRA 6020
Multivariate Statistics; The Linear
Probability model and
The Logit Model (Probit)
Ulf H. Olsson
Professor of Statistics
Statistical Models
• Statistical models are mathematical representations of
population behavior; they describe salient features of the
hypothesized process of interest among individuals in the
target population. When you use a particular statistical
model to analyze a particular set of data, you implicitly
declare that this population model gave rise to these sample
data.
Ulf H. Olsson
Regression Analysis
yi    1 x1i   2 x2i  .....   k xki   i





y i     1 x1i   2 x2i  .......  k xki

ei  yi  y i
Ulf H. Olsson
Regression analysis
• OLS
• Regression parameter
• St.error
• T-value
• P-value
• Confidence interval
• R-sq
• R-sq.adj
• F-value
• The error term
Ulf H. Olsson
Regression Analysis
• The error term has
• The x-variables are
constant variance
independent of the
• The error term follows a error term
normal distribution with • The x-variables are
expectation equal to
linearly independent
zero
• The dependent variable
is normally distributed
Ulf H. Olsson
OLS example (affairs)
Model Summary
Model
1
R
,363a
R Square
,132
Adjusted
R Square
,120
Std. Error of
the Estimate
3,09703
a. Predictors: (Constant), happines , gender, religiou,
children, educatio, age, occupati, years
ANOVAb
Model
1
Regres sion
Residual
Total
Sum of
Squares
860,240
5668,634
6528,873
df
8
591
599
Mean Square
107,530
9,592
F
11,211
Sig.
,000a
a. Predictors: (Constant), happines , gender, religiou, children, educatio, age,
occupati, years
b. Dependent Variable: affairs
Ulf H. Olsson
OLS example (affairs)
Coefficientsa
Model
1
(Constant)
gender
age
years
children
religiou
educatio
occupati
happines
Unstandardized
Coefficients
B
Std. Error
5,876
1,139
,054
,301
-,051
,023
,170
,041
-,143
,351
-,478
,112
-,014
,064
,104
,089
-,711
,120
Standardized
Coefficients
Beta
,008
-,144
,287
-,020
-,169
-,010
,057
-,237
t
5,161
,181
-2,258
4,109
-,409
-4,274
-,215
1,168
-5,908
Sig.
,000
,856
,024
,000
,683
,000
,830
,243
,000
a. Dependent Variable: affairs
Ulf H. Olsson
Kleins (OLS)
• CT = 16.237+0.193*PT+0.0899*PT_1+0.796*WT,
•
(1.303)
•
12.464
(0.0912) (0.0906) (0.0399)
2.115
0.992
19.933
• R² = 0.981
Ulf H. Olsson
Binary Response Models
y is a binary response var iable
x'  ( x1 , x2 ,......, xk ) is the full set of exp lanatory
var iables
Pr ob( y  1 | x)  G(  0  1 x1   2 x2  .....   k xk )
 G(  0  xβ)
•The Goal is to estimate the parameters
Ulf H. Olsson
The Linear Probability Model
y   0  1 x1   2 x2  .....   k xk  u
y  1 or y  0;
Pi  Pr ob( yi  1)  1  Pi  Pr ob( yi  0);
E ( y )  Pi   0  1 x1  ...   k xk
Ulf H. Olsson
The Linear Probability Model
• Number of problems
• The predicted value can be outside the interval
(0,1)
• The error term is not normally distributed
• => Heteroscedasticity =>Non-efficient estimates
• T-test is not reliable
Ulf H. Olsson
The Logit Model
z
e
G( z) 
z
1 e
•The Logistic Function
Ulf H. Olsson
The Probit Model
z
G( z )  ( z )    (u )du;  is the s tan dard

normal distribution
Ulf H. Olsson
The Logistic Curve G (The Cumulative
Normal Distribution)
Ulf H. Olsson
The Logit Model
G (  0  1 x1  .... k xk   )
 0  1 x1 ....  k xk 
e

 0  1 x1 ....  k xk 
1 e
1

(  (  0  1 x1 ....  k xk   ))
1 e
Ulf H. Olsson
Logit Model for Pi
y  1 or y  0;
Pi  Pr ob( yi  1)

1
(  (  0  1 x1 ....  k xk  ))

1 e
 Pi 
   0  1 x1  .... k xk  
ln 
 1  Pi 
Ulf H. Olsson
The Logit Model
• Non-linear => Non-linear Estimation =>ML
• Comparing estimates of the linear probability model and the
logit model ?
• Amemiya (1981) proposes:
• Multiply the logit estimates with 0.25 and further adding 0.5
to the constant term.
• Model can be tested, but R-sq. does not work. Some pseudo
R.sq. have been proposed.
Ulf H. Olsson
The Logit Model (example)
• Dependent variable: emp=1 if a person has a job, emp=0 if a
person is unemployed
• Independent variables: (x1) edu = yrs. at a university; (x2)
score= score on a dancing contest.
• Estimate a model to predict the probability that a person has
a job, given yrs. at a university and score at the dancing
contest. (data see SPSS-file:Binomgra1.sav)
Ulf H. Olsson
The Logit Model (example)
Coeffi cientsa
Model
1
(Const ant)
edu
sc ore
Unstandardized
Coeffic ient s
B
St d. Error
-,144
,241
,124
,065
,050
,034
St andardiz ed
Coeffic ient s
Beta
t
-,598
1,907
1,478
,402
,312
Sig.
,558
,074
,158
a. Dependent Variable: emp
Variables in the Equation
Step
a
1
edu
score
Constant
B
,703
,282
-3,640
S.E.
,413
,196
1,765
Wald
2,903
2,060
4,252
df
1
1
1
Sig.
,088
,151
,039
Exp(B)
2,020
1,325
,026
a. Variable(s) entered on step 1: edu, s core.
Ulf H. Olsson
The Latent Variable Model
y*   0  xβ   i
y  1 when y*  0 and y  0 when y*  0
P( y  1 | x)  P( y*  0 | x)  P(  (  0  xβ) | x)
1  P(  (  0  xβ) | x)  1  G ((  0  xβ))
 G (  0  xβ)
Ulf H. Olsson
The Latent Variable Model
P( y  1 | x)  P( y*  0 | x)
Ulf H. Olsson
Binary Response Models
• The magnitude of each effect  j is not especially useful since y*
rarely has a well-defined unit of measurement.
• But, it is possible to find the partial effects on the probabilities by
partial derivatives.
• We are interested in significance and directions (positive or
negative)
• To find the partial effects of roughly continuous variables on the
response probability:
p( x)
dG( z )
 g (  0  xβ)  j ; where g ( z ) 
x j
dz
Ulf H. Olsson
Binary Response Models
• The partial effecs will always have the same sign as
j
Typically , the l arg est effects :  0  xβ  0
  (0)  0.40 in the Pr obit case
 g (0)  0.25 in the Logit case
Ulf H. Olsson