Gra6020-2-2007spring
Download
Report
Transcript Gra6020-2-2007spring
GRA 6020
Multivariate Statistics; The Linear
Probability model and
The Logit Model (Probit)
Ulf H. Olsson
Professor of Statistics
Statistical Models
• Statistical models are mathematical representations of
population behavior; they describe salient features of the
hypothesized process of interest among individuals in the
target population. When you use a particular statistical
model to analyze a particular set of data, you implicitly
declare that this population model gave rise to these sample
data.
Ulf H. Olsson
Regression Analysis
yi 1 x1i 2 x2i ..... k xki i
y i 1 x1i 2 x2i ....... k xki
ei yi y i
Ulf H. Olsson
Regression analysis
• OLS
• Regression parameter
• St.error
• T-value
• P-value
• Confidence interval
• R-sq
• R-sq.adj
• F-value
• The error term
Ulf H. Olsson
Regression Analysis
• The error term has
• The x-variables are
constant variance
independent of the
• The error term follows a error term
normal distribution with • The x-variables are
expectation equal to
linearly independent
zero
• The dependent variable
is normally distributed
Ulf H. Olsson
OLS example (affairs)
Model Summary
Model
1
R
,363a
R Square
,132
Adjusted
R Square
,120
Std. Error of
the Estimate
3,09703
a. Predictors: (Constant), happines , gender, religiou,
children, educatio, age, occupati, years
ANOVAb
Model
1
Regres sion
Residual
Total
Sum of
Squares
860,240
5668,634
6528,873
df
8
591
599
Mean Square
107,530
9,592
F
11,211
Sig.
,000a
a. Predictors: (Constant), happines , gender, religiou, children, educatio, age,
occupati, years
b. Dependent Variable: affairs
Ulf H. Olsson
OLS example (affairs)
Coefficientsa
Model
1
(Constant)
gender
age
years
children
religiou
educatio
occupati
happines
Unstandardized
Coefficients
B
Std. Error
5,876
1,139
,054
,301
-,051
,023
,170
,041
-,143
,351
-,478
,112
-,014
,064
,104
,089
-,711
,120
Standardized
Coefficients
Beta
,008
-,144
,287
-,020
-,169
-,010
,057
-,237
t
5,161
,181
-2,258
4,109
-,409
-4,274
-,215
1,168
-5,908
Sig.
,000
,856
,024
,000
,683
,000
,830
,243
,000
a. Dependent Variable: affairs
Ulf H. Olsson
Kleins (OLS)
• CT = 16.237+0.193*PT+0.0899*PT_1+0.796*WT,
•
(1.303)
•
12.464
(0.0912) (0.0906) (0.0399)
2.115
0.992
19.933
• R² = 0.981
Ulf H. Olsson
Binary Response Models
y is a binary response var iable
x' ( x1 , x2 ,......, xk ) is the full set of exp lanatory
var iables
Pr ob( y 1 | x) G( 0 1 x1 2 x2 ..... k xk )
G( 0 xβ)
•The Goal is to estimate the parameters
Ulf H. Olsson
The Linear Probability Model
y 0 1 x1 2 x2 ..... k xk u
y 1 or y 0;
Pi Pr ob( yi 1) 1 Pi Pr ob( yi 0);
E ( y ) Pi 0 1 x1 ... k xk
Ulf H. Olsson
The Linear Probability Model
• Number of problems
• The predicted value can be outside the interval
(0,1)
• The error term is not normally distributed
• => Heteroscedasticity =>Non-efficient estimates
• T-test is not reliable
Ulf H. Olsson
The Logit Model
z
e
G( z)
z
1 e
•The Logistic Function
Ulf H. Olsson
The Probit Model
z
G( z ) ( z ) (u )du; is the s tan dard
normal distribution
Ulf H. Olsson
The Logistic Curve G (The Cumulative
Normal Distribution)
Ulf H. Olsson
The Logit Model
G ( 0 1 x1 .... k xk )
0 1 x1 .... k xk
e
0 1 x1 .... k xk
1 e
1
( ( 0 1 x1 .... k xk ))
1 e
Ulf H. Olsson
Logit Model for Pi
y 1 or y 0;
Pi Pr ob( yi 1)
1
( ( 0 1 x1 .... k xk ))
1 e
Pi
0 1 x1 .... k xk
ln
1 Pi
Ulf H. Olsson
The Logit Model
• Non-linear => Non-linear Estimation =>ML
• Comparing estimates of the linear probability model and the
logit model ?
• Amemiya (1981) proposes:
• Multiply the logit estimates with 0.25 and further adding 0.5
to the constant term.
• Model can be tested, but R-sq. does not work. Some pseudo
R.sq. have been proposed.
Ulf H. Olsson
The Logit Model (example)
• Dependent variable: emp=1 if a person has a job, emp=0 if a
person is unemployed
• Independent variables: (x1) edu = yrs. at a university; (x2)
score= score on a dancing contest.
• Estimate a model to predict the probability that a person has
a job, given yrs. at a university and score at the dancing
contest. (data see SPSS-file:Binomgra1.sav)
Ulf H. Olsson
The Logit Model (example)
Coeffi cientsa
Model
1
(Const ant)
edu
sc ore
Unstandardized
Coeffic ient s
B
St d. Error
-,144
,241
,124
,065
,050
,034
St andardiz ed
Coeffic ient s
Beta
t
-,598
1,907
1,478
,402
,312
Sig.
,558
,074
,158
a. Dependent Variable: emp
Variables in the Equation
Step
a
1
edu
score
Constant
B
,703
,282
-3,640
S.E.
,413
,196
1,765
Wald
2,903
2,060
4,252
df
1
1
1
Sig.
,088
,151
,039
Exp(B)
2,020
1,325
,026
a. Variable(s) entered on step 1: edu, s core.
Ulf H. Olsson
The Latent Variable Model
y* 0 xβ i
y 1 when y* 0 and y 0 when y* 0
P( y 1 | x) P( y* 0 | x) P( ( 0 xβ) | x)
1 P( ( 0 xβ) | x) 1 G (( 0 xβ))
G ( 0 xβ)
Ulf H. Olsson
The Latent Variable Model
P( y 1 | x) P( y* 0 | x)
Ulf H. Olsson
Binary Response Models
• The magnitude of each effect j is not especially useful since y*
rarely has a well-defined unit of measurement.
• But, it is possible to find the partial effects on the probabilities by
partial derivatives.
• We are interested in significance and directions (positive or
negative)
• To find the partial effects of roughly continuous variables on the
response probability:
p( x)
dG( z )
g ( 0 xβ) j ; where g ( z )
x j
dz
Ulf H. Olsson
Binary Response Models
• The partial effecs will always have the same sign as
j
Typically , the l arg est effects : 0 xβ 0
(0) 0.40 in the Pr obit case
g (0) 0.25 in the Logit case
Ulf H. Olsson