Transcript Appendix 4

Problems in Applying the
Linear Regression Model
Appendix 4A
•
•
The assumptions of the linear regression model don’t
always hold in the real world
We now examine statistical problems, which is the
central focus of the economic sub-field called
econometrics
1.
2.
3.
4.
5.
6.
Autocorrelation
Heteroscedasticity
Specification and Measurement Error
Multicollinearity
Simultaneous equation relationships and the identification problem
Nonlinearities
Slide 1
1. Autocorrelation
•
also known as serial correlation
Problem:
»
»
•
Symptoms:
»
»
•
Coefficients are unbiased
but t-values are unreliable
look at a scatter of the error terms to see if there is
a pattern, or
see if Durbin Watson statistic is far from 2.
Cures:
Find more variables that explain these patterns
2. Take first differences of data: Q = a + b•P
1.
Slide 2
Scatter of Error Terms
Positive Autocorrelation
Figure 4A.1 page 171
Y
1
2
3
4
5
6
7
8
X
Slide 3
2. Heteroscedasticity
•
Problem:
» Coefficients are unbiased
» t-values are unreliable
• Symptoms:
» different variances for different sub-samples
» scatter of error terms shows increasing or decreasing
dispersion
• Cures:
1. Transform data, e.g., transform them into logs
2. Take averages of each sub-sample and use weighted
least squares
Slide 4
Scatter of Error Terms
Heteroscedasticity
Height
alternative
log Ht = a + b•AGE
1
2
5
8
AGE
Slide 5
3. Specification & Measurement Error
• Salary = a + b (Strike Outs) in
baseball
 b is positive !!!
• Why?
 Omitted variable which is the
number of Hits
• Salary = c + d (Strike Outs) + e ( Hits )
 here d is negative and e is positive
Slide 6
Specification & Measurement Error
• Problem:
» Coefficients are biased – we can even have the wrong
sign as in the baseball example
» Even adding more observations will not cure this bias
• Symptoms:
» The results don’t make economic sense
• Cure:
» Think through the relationships and find the missing
variables in the specification
» See if the new specification improves the fit (higher
R2) and makes economic sense.
Slide 7
4. Multicollinearity
• Sometimes
independent variables
aren’t independent.
EXAMPLE:
Let Q = Eggs sold
Q = a + b Pd + c P g
where Pd is price for a
dozen eggs
and Pg is the price for a
gross of eggs.
• Regression Output
Q = 22 - 7.8 Pd -.9 Pg
(1.2)
(1.45)
R-square = .87
(t-values in parentheses)
N = 100 observations
• Notice that:
» R-square is 87%
» But that neither coefficient
is statistically significant.
Slide 8
Multicollinearity
•
Problem:
» Coefficients are unbiased
» The t-values are small, often insignificant
• Symptoms:
» High R-squares but low t-values
• Cures:
1. Drop a variable. Usually the remaining
variable becomes significant.
2. Do nothing if forecasting, since the added Rsquare of more variables is worthwhile
Slide 9
5. Identification Problem
and the
• Problem:
Simultaneity Problem
» Coefficients are biased
• Symptom:
» Independent variables are known to be part
of a system of equations
• Cure:
» Use as many independent variables as
possible
Slide 10
Graphical Explanation of the
Identification Problem
• Suppose we estimate
the following demand
curve Q = a + b P.
• Suppose Supply varies
and Demand is FIXED.
• All points lie on the
demand curve
• The demand curve is said
to be identified.
P
S1
S2
S3
Demand
quantity
Slide 11
Suppose instead that
SUPPLY is Fixed
P
• Let DEMAND shift
and supply is fixed
on doesn’t change.
• All Points are on
the SUPPLY curve.
• We say that the
SUPPLY curve is
identified.
Supply
D3
D2
D1
quantity
Slide 12
When both Supply
and Demand Vary
• Often both supply and
demand vary.
• Equilibrium points are in
shaded region.
• A regression of
Q = a + b P will be
neither a demand nor a
supply curve.
P
S2
S1
?
D2
D1
quantity
Slide 13
Simultaneous Systems
1. Demand is Qd = a + b P + c Y + e1
2. Supply is Qs = d + e P + f W + e2
 Where P is price, Y is income, W is the wage rate, and each
has an error term.
 Notice that P is in both of the demand and supply function.
P is “endogenously” determined by both demand and supply.
 The simultaneity problem is that price is not independent, as
it is determined by the whole system
 The cure for this problem is usually to have as many
independent variables as possible in the demand regression
to make demand act like it is “fixed”.
Slide 14
6. Nonlinear Forms
• Semi-logarithmic transformations.
Sometimes taking the logarithm of the dependent
variable or an independent variable improves the R2.
Examples are:
• log Y =  + ß·X.
Y
Ln Y = .01 + .05X
X
» Here, Y grows exponentially at rate ß in X; that is, ß
percent growth per period.
• Y =  + ß·log X. Here, Y doubles each time X
increases by the square of X.
Slide 15
Reciprocal Transformations
• The relationship between variables may be
inverse. Sometimes taking the reciprocal of
a variable improves the fit of the regression
as in the example:
• Y =  + ß·(1/X)
Y E.g., Y = 500 + 2 ( 1/X)
• shapes can be:
» declining slowly
• if beta positive
» rising slowly
• if beta negative
X
Slide 16
• Quadratic, cubic, and higher degree polynomial
relationships are common in business and
economics.
 Profit and revenue are cubic functions of output.
 Average cost is a quadratic function, as it is
Ushaped
 Total cost is a cubic function, as it is S-shaped
• TC = ·Q + ß·Q2 + ·Q3 is a cubic total cost
function.
• If higher order polynomials improve the R-square, then
the added complexity may be worth it.
Slide 17
Multiplicative or Double Log
• With the double log form, the coefficients are
elasticities
• Q = A • P b • Yc • Ps d
» multiplicative functional form
• So: Ln Q = a + b•Ln P + c•Ln Y+ d•Ln Ps
• Transform all variables into natural logs
• Called the double log, since logs are on the left and
the right hand sides. Ln and Log are used
interchangeably. We use only natural logs.
Slide 18
Soft Drink Case, pp. 167-168
a cross section of 50 states
Linear Specification
Cans = 515 - 242 Price + 1.19 Income + 2.91Temp
Predictor
Constant
Price
Income
Temp
R-Sq = 69.8%
Coeff
514.8
-241.80
1.195
2.9136
StDev
113.2
43.65
1.688
0.7071
T
P
4.55 0.000
-5.54 0.000
0.71 0.483
4.12 0.000
R-Sq(adj) = 67.7%
The Price elasticity in Wyoming is = (Q/P)(P/Q) = -241.8(2.31/102)= -5.476
Slide 19
Double Log Soft Drink Case
Ln Cans = 2.47 - 3.17 Ln Price + 0.202 Ln Income
+ 1.12 Ln Temp
Predictor
Constant
Ln Price
Ln Income
Ln Temp
Coef
Std Dev
T
P
2.466
1.385
1.78 0.082
-3.1695 0.6485 -4.89 0.000
0.2020 0.1834 1.10 0.277
1.1196 0.2611 4.29 0.000
R-Sq = 67.4%
R-Sq(adj) = 65.1%
Characterize the demand for soft drinks in the US.
Are soft drinks inelastic? Are they luxuries?
Which specification fits the data better?
Slide 20