halsnarr.com

Download Report

Transcript halsnarr.com

Multiple Regression
The equation that describes how the dependent variable y is related to
the independent variables:
x1, x2, . . . xp
and error term e is called the multiple regression model.
y = b 0 + b 1x1 + b 2x2 + . . . + b pxp + e
where:
b0, b1, b2, . . . , bp are parameters
e is a random variable called the error term
The equation that describes how the mean value of y is related to the p
independent variables is called the multiple regression equation:
E(y) = b0 + b1x1 + b2x2 + . . . + bpxp
Multiple Regression
A simple random sample is used to compute sample statistics
b0, b1, b2 , . . . , bp
that are used as the point estimators of the parameters
b 0, b 1, b 2, . . . , b p
The equation that describes how the predicted value of y is related to the
p independent variables is called the estimated multiple regression
equation:
y^ = b0 + b1x1 + b2x2 + . . . + bpxp
Specification
1.
Formulate a research question:
How has welfare reform affected employment of low-income mothers?
Issue 1: How should welfare reform be defined?
Since we are talking about aspects of welfare reform that influence the
decision to work, we include the following variables:
•
Welfare payments allow the head of household to work less.
tanfben3 = real value (in 1983 $) of the welfare
payment to a family of 3 (x1)
•
The Republican lead Congress passed welfare reform twice, both of which
were vetoed by President Clinton. Clinton signed it into law after the
Congress passed it a third time in 1996. All states put their TANF
programs in place by 2000.
2000 = 1 if the year is 2000, 0 if it is 1994 (x2)
Specification
1.
Formulate a research question:
How has welfare reform affected employment of low-income mothers?
Issue 1: How should welfare reform be defined? (continued)
•
Families receive full sanctions if the head of household fails to adhere to a
state’s work requirement.
fullsanction = 1 if state adopted policy, 0 otherwise (x3)
Issue 2: How should employment be defined?
•
One might use the employment-population ratio of Low-Income Single
Mothers (LISM):
epr 
number of LISM that are employed
number of LISM living
Specification
2.
Use economic theory or intuition to determine what the true regression
model might look like.
Use economic graphs to derive testable hypotheses:
Consumption
Economic theory suggests
the following is not true:
H o : b1 = 0
550
U1
400
U0
300
Receiving the
welfare check
40
55
Leisure
increases LISM’s
leisure which
decreases hours worked
Specification
2.
Use economic theory or intuition to determine what the true regression
model might look like.
Use a mathematical model to derive testable hypotheses:
max U (C , L)  C  L
s.t.
H  L  80
C  P  wH
The solution of this problem is:
L* 
L*
1

0
P 2w
P
 40
2w
H *
0
P
Economic theory suggests the following is not true:
H o : b1 = 0
b1  0
Specification
3.
Compute means, standard deviations, minimums and maximums for the
variables.
state
year
epr
tanfben3
fullsanction
black
dropo
unemp
Alabama
1994
52.35
110.66
0
25.69
26.99
5.38
Alaska
1994
38.47
622.81
0
4.17
8.44
7.50
Arizona
1994
49.69
234.14
0
3.38
13.61
5.33
Arkansas
1994
48.17
137.65
0
16.02
25.36
7.50
West Virginia
2000
51.10
190.48
1
3.10
23.33
5.48
Wisconsin
2000
57.99
390.82
1
5.60
11.84
3.38
Wyoming
2000
58.34
197.44
1
0.63
11.14
3.81
Specification
3.
Compute means, standard deviations, minimums and maximums for the
variables.
1994
Mean
Std Dev
Min
Max
46.73
8.58
28.98
65.64
53.74
265.79
105.02
80.97
622.81
fullsanction
0.02
0.14
0.00
black
9.95
9.45
dropo
17.95
unemp
5.57
epr
tanfben3
2000
Mean Std Dev
Min
Max
Diff
7.73
40.79
74.72
7.01
234.29
90.99
95.24
1.00
0.70
0.46
0.00
1.00
0.68
0.34
36.14
9.82
9.57
0.26
36.33
-0.13
5.20
8.44
28.49
14.17
4.09
6.88
23.33
-3.78
1.28
2.63
8.72
3.88
0.96
2.26
6.17
-1.69
536.00 -31.50
Specification
80
80
70
70
60
60
50
50
epr
epr
Construct scatterplots of the variables. (1994, 2000)
40
40
30
30
20
20
10
10
0
0
0
200
400
600
800
0
5
10
15
tanfben3
20
25
30
dropo
80
80
70
70
60
60
50
50
epr
epr
4.
40
40
30
30
20
20
10
10
0
0
0
10
20
black
30
40
0
2
4
6
unemp
8
10
Specification
5.
Compute correlations for all pairs of variables. If | r | > .7 for a pair of
independent variables,
• multicollinearity may be a problem
• Some say avoid including independent variables that are highly
correlated, but it is better to have multicollinearity than omitted
variable bias.
epr
fullsanction
black
dropo
unemp
0.10
tanfben3
-0.03
-0.24
-0.53
-0.50
unemp
-0.64
-0.51
0.16
0.47
dropo
-0.44
-0.25
0.51
black
-0.32
0.07
fullsanction
0.43
Estimation
Least Squares Criterion:
min  ( ei ) 2  min  ( yi  yˆi ) 2
Computation of Coefficient Values:
In simple regression:
b0  y  b1x1
b1 
cov( x1 , y )
var( x1 )
In multiple regression:
 b0 
b 
 1
 
b 
 p
b  (XX)1 Xy
You can use matrix algebra or computer software
packages to compute the coefficients
Simple Regression
state
year
epr
y
Alabama
1994
52.35
110.66
Alaska
1994
38.47
622.81
cov( x1 , y )
var( x1 )
24.70
b1 
99.03 2
Arizona
1994
49.69
234.14
b1  0.0025
Arkansas
1994
48.17
137.65
West Virginia
2000
51.10
190.48
b0  y  b1x1
Wisconsin
2000
57.99
390.82
b0  50.23  ( .0025)250.04
Wyoming
2000
58.34
197.44
Mean
50.23
250.04
b0  50.869
Std dev
8.86
99.03
Covariance
-24.70
tanfben3
x1
b1 
yˆ  50.869  .0025x1
Simple Regression
100
Squared
Residuals
yˆ  50.869  .0025x1
x1  250.04
epr
y
yˆ
( x1  x1 )2
( y  y )2
( y  yˆ )2
( yˆ  y )2
tanfben3
x1
y  50.23
52.35
110.66
50.59
19425.76
4.47
3.10
0.13
38.47
622.81
49.28
138957.04
138.45
117.04
0.90
49.69
234.14
50.27
252.64
0.29
0.34
0.00
48.17
137.65
50.52
12630.56
4.25
5.51
0.08
51.10
190.48
50.38
3547.56
0.75
0.51
0.02
57.99
390.82
49.87
19820.99
60.19
65.87
0.13
58.34
197.44
50.37
2766.00
65.79
63.64
0.02
Sum
970931.62
7764.76
7758.48
6.28
( x  x )2
SST
SSE
SSR
Simple Regression
Test for Significance at the 5% level (a = 0.05)
s 2 = SSE/(n – 2) = 7758.48/(100 – 2) = 79.17
s  8.898
sb1 
s
( xi  x )2
t -stat 

8.898
 0.0090
970931.62
b1
.0025
 .277

sb1
.0090
-t
-1.984
a.025
/2 ==.025
a = .05
t.025 = 1.984
df = 100 – 2 = 98
We cannot reject H 0 : b1  0
Simple Regression
•
If estimated coefficient b1 was statistically significant, we would
interpret its value as follows:
-.0025 100
-.25
y
 .0025 

b1 
+1 100 +100
x1
Increasing monthly benefit levels for a family of three by $100
lowers the epr of LISM by 0.25 percentage points.
•
However, since estimated coefficient b1 is statistically insignificant,
we interpret its value as follows:
Increasing monthly benefit levels for a family of three
has no effect on the epr of LISM.
Our theory suggests that this estimate is biased towards zero
Simple Regression
Regression Statistics
Multiple R
0.0284
R Square
0.0008
Adjusted R Square
Standard Error
r 2.08%
·100% of the variability in
y LISM
epr of
can be explained by the model.
-0.0094
8.8977
Observations
100
ANOVA
df
SS
Regression
MS
1
6.281
6.281
Residual
Error
98
7758.483
79.168
Total
99
7764.764
Coefficients
Standard Error
t Stat
F
0.079
P-value
Intercept
50.8687
2.427
20.961
0.000
tanfben3
-0.0025
0.009
-0.282
0.779
Simple Regression
state
year
epr
y
Alabama
1994
52.35
4.71
Alaska
1994
38.47
6.43
cov(ln x1 , y )
var(ln x1 )
0.10
b1 
0.412
Arizona
1994
49.69
5.46
b1  0.6087
Arkansas
1994
48.17
4.92
West Virginia
2000
51.10
5.25
Wisconsin
2000
57.99
5.97
Wyoming
2000
58.34
5.29
Mean
50.23
5.44
Std dev
8.86
0.41
Covariance
0.10
tanfben3_ln
ln x1
b1 
b0  y  b1  ln x1
b0  50.23  (.6087)5.44
b0  46.9192
yˆ  46.9192  .6026ln x1
Simple Regression
yˆ  46.9192  .6087  ln x1
epr
y
tanfben3_ln
ln x1
100
Squared
Residuals
ln x1  5.44
y  50.23
yˆ (ln x1  ln x1 )2
( y  y )2
( y  yˆ )2
( yˆ  y )2
52.35
4.71
49.78
0.543
4.47
6.57
0.20
38.47
6.43
50.84
0.982
138.45
153.01
0.36
49.69
5.46
50.24
0.000
0.29
0.30
0.00
48.17
4.92
49.92
0.269
4.25
3.05
0.10
51.10
5.25
50.11
0.038
0.75
0.97
0.01
57.99
5.97
50.55
0.275
60.19
55.33
0.10
58.34
5.29
50.14
0.025
65.79
67.36
0.01
Sum
16.28
7764.76
7758.73
6.03
SST
SSE
SSR
(ln x1  ln x1 )2
Simple Regression
Test for Significance at the 5% level (a = 0.05)
s 2 = SSE/(n – 2) = 7758.73/(100 – 2) = 79.17
s  8.898
sb1 
s
(ln x1  ln x1 )2
t -stat 

8.898
 2.2055
16.28
b1
.6087
 .2760

sb1
2.2055
-t
-1.984
a.025
/2 ==.025
a = .05
t.025 = 1.984
df = 100 – 2 = 98
We cannot reject H 0 : b1  0
Simple Regression
•
If estimated coefficient b1 was statistically significant, we would
interpret its value as follows:
Suppose we want to know what happens to the epr of LISM if a state
decides to increase its welfare payment by 10%.
When we use a logged dollar-valued independent variable we have to
do the following first to interpret the coefficient:
yˆ (292)  46.9192  .6087  ln(292)
yˆ (266)  46.9192  .6087  ln(266)
yˆ (292)  yˆ (266)  46.9192  .6087  ln(292)  46.9192  .6087  ln(266)
yˆ  .6087  ln(292)  .6087  ln(266)
yˆ  .6087  ln(292)  ln(266)
yˆ  .6087  ln(292 / 266)
yˆ  .6087  ln(1.10)
Simple Regression
•
If estimated coefficient b1 was statistically significant, we would
interpret its value as follows:
.058
yˆ  .6087  ln(1.10)
.10)  .058
Increasing monthly benefit levels for a family of three by 10%
would result in a .058 percentage point increase in the average epr
of LISM
•
However, since estimated coefficient b1 is statistically insignificant,
we interpret its value as follows:
Increasing monthly benefit levels for a family of three
has no effect on the epr of LISM.
Our theory suggests that this estimate has the wrong sign and is
yˆ  .6087
biased towards zero.This
bias isln(1.10)
called omitted variable bias.
Simple Regression
Regression Statistics
Multiple R
0.0279
R Square
0.0008
Adjusted R Square
Standard Error
r 2.08%
·100% of the variability in
y LISM
epr of
can be explained by the model.
-0.0094
8.8978
Observations
100
ANOVA
df
SS
Regression
MS
1
6.031
6.031
Residual
Error
98
7758.733
79.171
Total
99
7764.764
Coefficients
Intercept
tanfben3_ln
Standard Error
t Stat
F
0.076
P-value
46.9192
12.038
3.897
0.000
0.6087
2.206
0.276
0.783
Multiple Regression
Least Squares Criterion:
min  ( ei ) 2  min  ( yi  yˆi ) 2
In multiple regression the solution is:
 b0 
b 
 1
 
b 
 p
b  (XX)1 Xy
You can use matrix algebra or computer software
packages to compute the coefficients
Multiple Regression
R Square
0.166
Adjusted R Square
0.149
Standard Error
8.171
Observations
r 2·100%
15% of the variability in
y LISM
epr of
can be explained by the model.
100
ANOVA
df
SS
Regression
MS
2
1288.797
644.398
Residual
Error
97
6475.967
66.763
Total
99
7764.764
Coefficients
Standard Error
t Stat
F
9.652
P-value
35.901
11.337
3.167
0.002
tanfben3_ln
1.967
2.049
0.960
0.339
2000
7.247
1.653
4.383
0.000
Intercept
Multiple Regression
R Square
0.214
Adjusted R Square
0.190
Standard Error
7.971
Observations
r 2·100%
19% of the variability in
y LISM
epr of
can be explained by the model.
100
ANOVA
df
SS
Regression
MS
3
1664.635
554.878
Residual
Error
96
6100.129
63.543
Total
99
7764.764
Coefficients
Intercept
Standard Error
t Stat
F
8.732
P-value
31.544
11.204
2.815
0.006
tanfben3_ln
2.738
2.024
1.353
0.179
2000
3.401
2.259
1.506
0.135
fullsanction
5.793
2.382
2.432
0.017
Multiple Regression
R Square
0.517
Adjusted R Square
0.486
Standard Error
6.347
Observations
100
ANOVA
df
SS
MS
F
6
4018.075
669.679
16.623
Residual
Error
93
3746.689
40.287
Total
99
7764.764
Regression
Coefficients
Standard Error
t Stat
P-value
104.529
15.743
6.640
0.000
tanfben3_ln
5.709
2.461
2.320
0.023
2000
2.821
2.029
1.390
0.168
3.768
1.927
1.955
0.054
black
0.291
0.089
3.256
0.002
dropo
0.374
0.202
1.848
0.068
unemp
3.023
0.618
4.888
0.000
Intercept
fullsanction
Multiple Regression
R Square
0.517
Adjusted R Square
0.486
Standard Error
6.347
Observations
100
ANOVA
df
SS
MS
F
6
4018.075
669.679
16.623
Residual
Error
93
3746.689
40.287
Total
99
7764.764
Regression
Coefficients
Standard Error
t Stat
P-value
104.529
15.743
6.640
0.000
tanfben3_ln
5.709
2.461
2.320
0.023
2000
2.821
2.029
1.390
0.168
3.768
1.927
1.955
0.054
black
0.291
0.089
3.256
0.002
dropo
0.374
0.202
1.848
0.068
unemp
3.023
0.618
4.888
0.000
Intercept
fullsanction
Multiple Regression
R Square
0.517
Adjusted R Square
0.486
Standard Error
6.347
Observations
100
r 2·100%
49% of the variability in
y LISM
epr of
can be explained by the model.
ANOVA
df
SS
MS
F
6
4018.075
669.679
16.623
Residual
Error
93
3746.689
40.287
Total
99
7764.764
Regression
Coefficients
Standard Error
t Stat
P-value
104.529
15.743
6.640
0.000
tanfben3_ln
5.709
2.461
2.320
0.023
2000
2.821
2.029
1.390
0.168
3.768
1.927
1.955
0.054
black
0.291
0.089
3.256
0.002
dropo
0.374
0.202
1.848
0.068
unemp
3.023
0.618
4.888
0.000
Intercept
fullsanction
Multiple Regression
yˆ  104.529  5.709ln x1  2.821x2  3.768x3  0.291x4  0.374x5  3.023x6
Coefficients
Intercept
Standard Error
104.529
t Stat
P-value
15.743
6.640
0.000
tanfben3_ln
5.709 lnx1
2.461
2.320
0.023
2000
2.821 x2
2.029
1.390
0.168
fullsanction
1.927
1.955
0.054
black
+3.768 x3
0.291 x4
0.089
3.256
0.002
dropo
0.374 x5
0.202
1.848
0.068
unemp
3.023 x6
0.618
4.888
0.000
Validity
Recall from chapter 14 that t and F tests are valid if the error term’s
assumptions are valid:
1.
E(e) is equal to zero
2.
Var(e) = s 2 is constant for all values of x1…xp
3.
Error e is normally distributed
4.
The values of e are independent
5.
The true model is linear:
y = b0 + b1∙ x1 + b2∙ x2 + … + bp∙ xp + e
These assumptions can be addressed looking at the residuals:
ei = yi – y^i
Validity
The residuals provide the best information about the errors.
1.
E(e) is probably equal to zero since E(e) = 0
2.
Var(e) = s 2 is probably constant for all values of x1…xp if “spreads”
^ time, x …x appear to be constant or
in scatterplots of e versus y,
1
p
White’s squared residual regression model is statistically insignificant
3.
Error e is probably normally distributed if the chapter 12 normality
test indicates e is normally distributed
4.
The values of e are probably independent if the autocorrelation
residual plot or Durbin-Watson statistics with various orderings of the
data (time, geography, etc.) indicate the values of e are independent
5.
The true model is probably linear if the scatterplot of e versus y^ is a
horizontal, random band of points
Note: If the absolute value of the i th standardized residual > 2, the i th
observation is an outlier.
Zero Mean
E(e) is probably equal to zero since E(e) = 0
yˆ  104.529  5.709ln x1  2.821x2  3.768x3  0.291x4  0.374x5  3.023x6
epr
y
tanfben3_ln
2000
fullsanction
black
dropo
unemp
epr hat
yˆ
residual
e
52.35
4.71
0
0
25.69
26.99
5.38
43.83
8.52
38.47
6.43
0
0
4.17
8.44
7.50
40.76
-2.29
49.69
5.46
0
0
3.38
13.61
5.33
51.19
-1.50
48.17
4.92
0
0
16.02
25.36
7.50
39.60
8.57
51.10
5.25
1
1
3.10
23.33
5.48
49.31
1.79
57.99
5.97
1
1
5.60
11.84
3.38
55.14
2.85
58.34
5.29
1
1
0.63
11.14
3.81
59.44
-1.10
ln x1
x2
x3
x4
x5
x6
Sum
0
Constant Variance
(homoscedasticity)
Var(e) = s 2 is probably constant for all values of x1…xp if “spreads” in
^ t, x …x appear to be constant
scatterplots of e versus y,
1
p
•
The only assumed source of variation on the RHS of the regression model
is in the errors (ej ), and that residuals (ei ) provide the best information
about them.
•
The means of e and e are equal to zero.
•
The variance of e estimates e:
2 2

(
e


e
SSE
i i 0)
s2 
n  p 1
≈
s2 
e(e2j j  0)2
N N
•
Non-constant variance of the errors is referred to as heteroscedasticity.
•
If heteroscedasticity is a problem, the standard errors of the coefficients
are wrong.
Constant Variance
(homoscedasticity)
20
15
15
15
10
10
10
5
0
-5
30
40
50
60
70
residual
20
residual
20
5
0
-5
4
5
6
5
0
7
-5
-10
-10
-10
-15
-15
-15
20
15
15
10
10
residual
20
5
0
-5
0
10
20
10
20
5
0
30
-5
-10
-10
-15
-15
dropo
0
30
Non-constantblack
variance in black?
tanfben3_ln
predicted epr
residual
residual
^
Heteroscedasticity is likely present if scatterplots of residuals versus t, y,
x1, x2 … xp are not a random horizontal band of points.
0
2
4
unemp
6
8
10
40
Constant Variance
(homoscedasticity)
To test for heteroscedasticity, perform White’s squared residual regression by
regress e2 on
x1, x2 … xp
x1x2, x1x3 … x1xp , x2x3, x2x4 … x2xp … xp – 1, xp
x12, x22 … xp2
If F-stat > F05 , we reject
R Square
0.296
Adjusted R Square
0.058
Standard Error
Observations
H0: no heteroscedasticity
1.24
51.205
100
<
1.66
s 2 is probably constant
ANOVA
Df
SS
MS
F
Regression
25
81517
3261
1.24
Residual
74
194024
2622
Total
99
275541
Constant Variance
(homoscedasticity)
If heteroscedasticity is a problem,
•
Estimated coefficients aren’t biased
•
Coefficient standard errors are wrong
•
Hypothesis testing is unreliable
t -stat 
b1
sb1
In our example, heteroscedasticity does not seem to be a problem.
If heteroscedasticity is a problem, do one of the following:
•
Use Weighted Least Squares with 1/xj or 1/xj0.5 as weights where xj is the
variable causing the problem
•
Compute “Huber-White standard errors”
Normality
Error e is probably normally distributed if the chapter 12 normality test
indicates e is normally distributed
Histogram of residuals
30
frequency
25
20
15
10
5
0
-20 1 -16 2 -12 3 -8 4 -4
5 0
6 4
residuals
7 8
8 12 9 16 10 20
Normality
Error e is probably normally distributed if the chapter 12 normality test
indicates e is normally distributed
H0: errors are normally distributed
Ha: errors are not normally distribution
The test statistic:
2
(
f

e
)
 2 -stat   i i
ei
i 1
k
has a chi-square distribution, if ei > 5.
To ensure this, we divide the normal distribution into k intervals all having the
same expected frequency.
k = 100/5 = 20
The expected frequency:
ei = 5
20 equal intervals.
Normality
The probability of
being in this interval is
Standardized residuals:
mean = 0
std dev = 1
1/20 = .0500
-1.645
1.645
z.
Normality
The probability of
being in this interval is
Standardized residuals:
mean = 0
std dev = 1
2/20 = .1000
-1.282
1.282
z.
Normality
The probability of
being in this interval is
Standardized residuals:
mean = 0
std dev = 1
3/20 = .1500
-1.036
1.036
z.
Normality
Standardized residuals:
mean = 0
std dev = 1
The probability of
being in this interval is
4/20 = .2000
-0.842
0.842
z.
Normality
Standardized residuals:
mean = 0
std dev = 1
The probability of
being in this interval is
5/20 = .2500
-0.674
0.674
z.
Normality
Standardized residuals:
mean = 0
std dev = 1
The probability of
being in this interval is
6/20 = .3000
-0.524
0.524
z.
Normality
Standardized residuals:
mean = 0
std dev = 1
The probability of
being in this interval is
7/20 = .3500
-0.385
0.385
z.
Normality
Standardized residuals:
mean = 0
std dev = 1
The probability of
being in this interval is
8/20 = .4000
-0.253 0.253
z.
Normality
Standardized residuals:
mean = 0
std dev = 1
The probability of
being in this interval is
9/20 = .4500
-0.126
0.126
z.
Normality
Standardized residuals:
mean = 0
std dev = 1
The probability of
being in this interval is
10/20 = .5000
0
z.
Normality
Observation Pred epr
Residuals
Std Res
1
54.372
-12.572
-2.044
2
55.768
-12.430
-2.021
3
55.926
-11.412
-1.855
4
54.930
-10.938
-1.778
5
62.215
-10.036
-1.631
6
59.195
-9.302
-1.512
7
54.432
-9.239
-1.502
8
37.269
-8.291
-1.348
9
48.513
-8.259
-1.343
10
44.446
-7.963
-1.294
11
43.918
-7.799
-1.268
99
50.148
15.492
2.518
100
58.459
16.259
2.643
Count the number of residuals that
are in the FIRST interval:
-infinity to -1.645
f1 = 4
Normality
Observation Pred epr
Residuals
Std Res
1
54.372
-12.572
-2.044
2
55.768
-12.430
-2.021
3
55.926
-11.412
-1.855
4
54.930
-10.938
-1.778
5
62.215
-10.036
-1.631
6
59.195
-9.302
-1.512
7
54.432
-9.239
-1.502
8
37.269
-8.291
-1.348
9
48.513
-8.259
-1.343
10
44.446
-7.963
-1.294
11
43.918
-7.799
-1.268
99
50.148
15.492
2.518
100
58.459
16.259
2.643
Count the number of residuals that
are in the SECOND interval:
-1.645 to -1.282
f2 = 6
Normality
LL
UL
f
e
f–e
(f – e)2/e
−∞
-1.645
4
5
-1
0.2
-1.645
-1.282
6
5
1
0.2
-1.282
-1.036
4
5
-1
0.2
-1.036
-0.842
4
5
-1
0.2
-0.842
-0.674
9
5
4
3.2
-0.674
-0.524
7
5
2
0.8
-0.524
-0.385
5
5
0
0
-0.385
-0.253
3
5
-2
0.8
-0.253
-0.126
4
5
-1
0.2
-0.126
0.000
7
5
2
0.8
0.000
0.126
2
5
-3
1.8
Normality
LL
UL
f
e
f–e
(f – e)2/e
0.126
0.253
3
5
-2
0.8
0.253
0.385
7
5
2
0.8
0.385
0.524
5
5
0
0
0.524
0.674
7
5
2
0.8
0.674
0.842
5
5
0
0
0.842
1.036
5
5
0
0
1.036
1.282
3
5
-2
0.8
1.282
1.645
5
5
0
0
1.645
∞
5
5
0
0
 2-stat =
11.6
Normality
a = .05 (column)
df = 20 – 3 = 17 (row)
Do Not Reject H0
2
.05
 27.587
Reject H0
.05
11.6
 2 -stat
17
27.587
2
There is no reason to doubt the
assumption that the errors are normally
distributed.
Normality
The previous test of normally distributed residuals was used because it
was the test we conducted in chapter 12. There are a number of normality
tests one can chose.
•
The Jarque-Bera test involves using the skew and kurtosis of the residuals.
•
The test statistic follows a chi-square distribution with 2 degrees of
freedom:
n
kurt 2  100 
.02142 
2
2
2
 -stat   skew 

.3276 
  1.791
6
4 
6 
4 
kurtosis measures "peakedness" of the probability distribution.
• High kurtosis → sharp peak, low kurtosis → flat peak.
• involves raising standardized residuals to the 4th power
• Excel: =kurt(A1:A100) → 0.0214
skewness measures asymmetry of the distribution.
• 0 skew → symmetric distribution, negative skew → skewed left,
positive skew → skewed right
• involves raising standardized residuals to the 3rd power
• Excel: =skew(A1:A100) → 0.3276
Normality
a = .05 (column)
df = 2 (row)
Do Not Reject H0
2
.05
 5.99
Reject H0
.05
1.791
 2 -stat
2
5.99
2
There is no reason to doubt the
assumption that the errors are normally
distributed.
Normality
If the errors are normally distributed,
•
parameter estimates are normally distributed
•
F and t significance tests are valid
If the errors are not normally distributed but the sample size is large,
•
parameter estimates are approximately normally distributed (CLT)
•
F and t significance tests are valid
If the errors are not normally distributed and the sample size is small,
•
parameter estimates are not normally distributed
•
F and t significance tests are not reliable
Independence
The values of e are probably independent if the autocorrelation residual
plot or if the Durbin-Watson statistic (DW-stat) indicate the values of e
are independent
no autocorrelation if DW-stat = 2
perfect "" autocorrelation if DW-stat = 4
perfect "+" autocorrelation if DW-stat = 0
The DW-stat varies when the data’s order is altered
•
If you have cross-sectional data, you really don’t have to worry about
computing DW-stat
•
If you have time series data, compute DW-stat after sorting by time
•
If you have panel data, compute the DW-stat after sorting by state
and then time.
Independence
Year
Residuals
(ei - ei-1)2
e i2
Alabama
1994
8.522
-
72.63
Alabama
2000
-4.110
159.57
16.89
Alaska
1994
-2.290
-
5.24
Alaska
2000
14.835
293.25
220.08
Arizona
1994
-1.497
-
2.24
Arizona
2000
-4.081
6.68
16.65
Arkansas
1994
8.567
-
73.39
Arkansas
2000
6.910
2.75
47.74
California
1994
-0.558
-
0.31
California
2000
-4.801
18.01
23.05
Colorado
1994
3.393
-
11.51
Colorado
2000
-10.036
180.35
100.73
Wyoming
1994
6.573
-
43.20
Wyoming
2000
-1.096
58.81
1.20
2889
3747
State
sum
DW-stat 
2889
3747
DW-stat  0.77
The errors may
not be
independent.
Independence
Autocorrelation Residual Plot
20.00
15.00
residual
10.00
5.00
0.00
-5.00
-10.00
-15.00
0
10
20
30
40
50
60
geographic order
70
80
90
100
Independence
If autocorrelation (or serial correlation) is a problem,
•
Estimated coefficients aren’t biased, but
•
Their standard errors may be inflated
•
Hypothesis testing is unreliable
t -stat 
b1
sb1
In our example, autocorrelation seems to be problematic.
If autocorrelation is a problem, do one of the following:
•
Change the functional form
•
Include an omitted variable
•
Use Generalized Least Squares
•
Compute “Newey-West standard errors” for the estimated coefficients.
Linearity
The true model is probably linear if the scatterplot of e versus y^ is a
horizontal, random band of points
2
e = -0.001y + 0.111y - 2.706
20
15
residuals
10
5
0
30
35
40
45
50
-5
-10
-15
epr-hat
55
60
65
Linearity
If you fit a linear model to data which are nonlinearly related,
•
Estimated coefficients are biased
•
Predictions are likely to be seriously in error
t -stat 
b1
sb1
In our example, nonlinearity does not seem to be a problem.
If the data are nonlinearly related, do one of the following:
•
Rethink the functional form
•
Transform one or more of the variables
Since the autocorrelation assumption appears to be invalid
t and F tests are unreliable.
Testing for Overall Significance
Overall Significance of the Model
• In simple linear regression, the F and t tests provide the same
conclusion.
t -stat  F -stat
p-valuet-stat = p-valueF-stat
• In multiple regression, the F and t tests have different purposes.
The F test is used to determine whether a significant relationship
exists between the dependent variable and the set of all the
independent variables.
…it is referred to as the test for overall significance.
• Hypotheses:
H 0 : b1 = b2 = . . . = bp = 0
Ha: At least one parameter is not equal to zero.
• Test Statistic:
• Reject H0 if F-stat > Fa
F-stat = MSR/MSE
(Fa is in column dfMSR and row dfMSE & a)
Testing for Overall Significance
yˆ  104.529  5.709ln x1  2.821x2  3.768x3  0.291x4  0.374x5  3.023x6
y  50.23
epr
y
yˆ
( y  y )2
( y  yˆ )2
( yˆ  y )2
52.35
43.83
4.47
72.63
41.05
38.47
40.76
138.45
5.24
89.82
49.69
51.19
0.29
2.24
0.91
48.17
39.60
4.25
73.39
112.97
51.10
49.31
0.75
3.19
0.85
57.99
55.14
60.19
8.11
24.12
58.34
59.44
65.79
1.20
84.77
Sum
7764.76
3746.69
4018.07
SST
SSE
SSR
Testing for Overall Significance
R Square
0.517
Adjusted R Square
0.486
Standard Error
6.347
Observations
100
r 2·100%
49% of the variability in
y LISM
epr of
can be explained by the model.
ANOVA
df
SS
MS
F
6
4018.075
669.679
16.623
Residual
Error
93
3746.689
40.287
Total
99
7764.764
Regression
Coefficients
Standard Error
t Stat
P-value
104.529
15.743
6.640
0.000
tanfben3_ln
5.709
2.461
2.320
0.023
2000
2.821
2.029
1.390
0.168
3.768
1.927
1.955
0.054
black
0.291
0.089
3.256
0.002
dropo
0.374
0.202
1.848
0.068
unemp
3.023
0.618
4.888
0.000
Intercept
fullsanction
Testing for Overall Significance
H 0 : b1 = b2 = . . . = bp = 0
dfN = 6 (column)
dfD = 93 and a = .05 (row)
Hence, we reject H0.
There is insufficient evidence
to conclude that the
coefficients are not all equal
to zero simultaneously.
Do not Reject H0
Reject H0
.05
≈1
2.20
16.623
F
Testing for Coefficient Significance
H 0 : b1 = 0
a = .05
a /2 = .025 (column)
t -stat 
Reject
b1 -5.709
 -2.32

sb1
2.461
Reject
Do Not Reject
.025
.025
-2.3 -1.986
df = 100 – 6 – 1 = 93 (row)
0
t
1.986
I.e., TANF
welfare
influence
the decision to work.
Reject
H0 payments
at a 5% level
of significance.
Testing for Coefficient Significance
H 0 : b2 = 0
a = .05
a /2 = .025 (column)
t -stat 
Reject
df = 100 – 6 – 1 = 93 (row)
b2 -2.821
 -1.39

sb2
2.029
Reject
Do Not Reject
.025
.025
-1.986 -1.39
0
t
1.986
I.e., welfare
general
not level
influence
the decision to work.
We reform
cannotinreject
H0 does
at a 5%
of significance.
Testing for Coefficient Significance
H 0 : b3 = 0
a = .05
a /2 = .025 (column)
t -stat 
Reject
b3 3.768
 1.96

sb3 1.927
Reject
Do Not Reject
.025
.025
-1.986
df = 100 – 6 – 1 = 93 (row)
0
t
1.96 1.986
I.e.,
full sanctions
for failure
to comply
work of
rules
influence the
Although
we cannot
reject
H0 at a with
5% level
significance,
to work.
we can at thedecision
10% level
(p-value = .054).
Testing for Coefficient Significance
H 0 : b4 = 0
a = .05
a /2 = .025 (column)
t -stat 
Reject
b4 -0.291
 -3.26

sb 4
0.089
Reject
Do Not Reject
.025
.025
-3.26 -1.986
df = 100 – 6 – 1 = 93 (row)
0
t
1.986
I.e., the Reject
share ofHtheatpopulation
that
is black influences the
a
5%
level
of
significance.
0
decision to work.
Testing for Coefficient Significance
H 0 : b5 = 0
a = .05
a /2 = .025 (column)
t -stat 
Reject
df = 100 – 6 – 1 = 93 (row)
b5 -0.374
 -1.85

sb5
0.202
Reject
Do Not Reject
.025
.025
-1.986 -1.85
0
t
1.986
I.e., the we
share
of the reject
population
is level
high school
dropout
Although
cannot
H0 atthat
a 5%
of significance,
thelevel
decision
to work.
we caninfluences
at the 10%
(p-value
= .068).
Testing for Coefficient Significance
H 0 : b6 = 0
a = .05
a /2 = .025 (column)
t -stat 
Reject
b6 -3.023
 -4.89

sb6
0.618
Reject
Do Not Reject
.025
.025
-4.89 -1.986
df = 100 – 6 – 1 = 93 (row)
0
t
1.986
I.e., the unemployment
ratelevel
influences
the decision to work.
Reject H0 at a 5%
of significance.
Interpretation of Results
•
Since the estimated coefficient b1 is statistically significant, we
interpret its value as follows:
.54
yˆ  -5.709ln(1.10)
.54
.10)  
Increasing monthly benefit levels for a family of three by 10%
would result in a .54 percentage point reduction in the average epr
of LISM
•
Since estimated coefficient b2 is statistically insignificant (at levels
greater than 15%), we interpret its value as follows:
Welfare reform in general
had no effect on the epr of LISM.
Interpretation of Results
•
Since estimated coefficient b3 is statistically significant at the 10%
level, we interpret its value as follows:
b3 
+3.768
y
 3.768 
+1
x3
The epr of LISM is 3.768 percentage points higher in states that
adopted full sanctions for families that fail to comply with work rules.
•
Since estimated coefficient b4 is statistically significant at the 5%
level, we interpret its value as follows:
y
b4 
x4
 -0.291 
-0.291 10 -2.91

+10
+1 10
Each 10 percentage point increase in the share of the black
population in states is associated with a 2.91 percentage point
decline in the epr of LISM.
Interpretation of Results
•
Since estimated coefficient b5 is statistically significant at the 10%
level, we interpret its value as follows:
b5 
y
x5
 -0.374 
-0.374 10 -3.74

+10
10
+1
Each 10 percentage point increase in the high school dropout
rate is associated with a 3.74 percentage point decline in the
epr of LISM.
•
Since estimated coefficient b6 is statistically significant at the 5%
level, we interpret its value as follows:
y
b6 
x6
 -3.023 
-3.023
+1
Each 1 percentage point increase in the unemployment rate is
associated with a 3.023 percentage point decline in the epr of
LISM.
Interpretation of Results
Substituting the means of black, dropo, and unemp into the predicted
equation:
yˆ  104.529  5.709ln x1  2.821x2  3.768x3  0.291x4  0.374x5  3.023x6
yields:
yˆ  81.37  2.821x2  3.768x3   5.709ln x1
For states that did not have full sanctions in place after the sweeping
legislation was enacted (x2 = 1, x3 = 0), the predicted equation is
yˆ  78.54  5.709ln x1
For states that adopted full sanctions after the sweeping legislation was
enacted (x2 = 1, x3 = 1), the predicted equation is
yˆ  82.31  5.709ln x1
Interpretation of Results
yˆ  78.54  5.709ln x1
yˆ  82.31  5.709ln x1
80
70
60
(p =1)
epr
50
40
30
20
10
0
4.0
4.5
5.0
5.5
lnx 1
6.0
6.5
7.0