No Slide Title
Download
Report
Transcript No Slide Title
Christopher Dougherty
EC220 - Introduction to econometrics
(chapter 4)
Slideshow: quadratic variables and higher-order polynomials
Original citation:
Dougherty, C. (2012) EC220 - Introduction to econometrics (chapter 4). [Teaching Resource]
© 2012 The Author
This version available at: http://learningresources.lse.ac.uk/130/
Available in LSE Learning Resources Online: May 2012
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License. This license allows
the user to remix, tweak, and build upon the work even for commercial purposes, as long as the user
credits the author and licenses their new creations under the identical terms.
http://creativecommons.org/licenses/by-sa/3.0/
http://learningresources.lse.ac.uk/
QUADRATIC EXPLANATORY VARIABLES
Y 1 2 X 2 3 X 22 u
We will now consider models with quadratic explanatory variables of the type shown. Such
a model can be fitted using OLS with no modification.
1
QUADRATIC EXPLANATORY VARIABLES
Y 1 2 X 2 3 X 22 u
However, the usual interpretation of a parameter, that it represents the effect of a unit
change in its associated variable, holding all other variables constant, cannot be applied. It
is not possible for X2 to change without X22 also changing.
2
QUADRATIC EXPLANATORY VARIABLES
Y 1 2 X 2 3 X 22 u
dY
2 2 3 X 2
dX 2
Differentiating the equation with respect to X2, one obtains the change in Y per unit change
in X2. Thus, the impact of a unit change in X2 on Y, (2 + 23X2), is a function of X2.
3
QUADRATIC EXPLANATORY VARIABLES
Y 1 2 X 2 3 X 22 u
dY
2 2 3 X 2
dX 2
This means that 2 has an interpretation that is different from that in the ordinary linear
model where it is the unqualified effect of a unit change in X2 on Y.
4
QUADRATIC EXPLANATORY VARIABLES
Y 1 2 X 2 3 X 22 u
dY
2 2 3 X 2
dX 2
In this model, 2 should be interpreted as the effect of a unit change in X2 on Y for the
special case where X2 = 0. For nonzero values of X2, the coefficient will be different.
5
QUADRATIC EXPLANATORY VARIABLES
Y 1 2 X 2 3 X 22 u
dY
2 2 3 X 2
dX 2
Y 1 2 3 X 2 X 2 u
3 also has a special interpretation. If we rewrite the model as shown, 3 can be interpreted
as the rate of change of the coefficient of X2, per unit change in X2.
6
QUADRATIC EXPLANATORY VARIABLES
Y 1 2 X 2 3 X 22 u
dY
2 2 3 X 2
dX 2
Y 1 2 3 X 2 X 2 u
Only 1 has a conventional interpretation. As usual, it is the value of Y (apart from the
random component) when X2 = 0.
7
QUADRATIC EXPLANATORY VARIABLES
Y 1 2 X 2 3 X 22 u
dY
2 2 3 X 2
dX 2
Y 1 2 3 X 2 X 2 u
There is a further problem. We know that the estimate of the intercept may have no
sensible meaning if X2 = 0 is outside the data range. If X2 = 0 lies outside the data range, the
same type of distortion can happen with the estimate of 2.
8
QUADRATIC EXPLANATORY VARIABLES
-----------------------------------------------------------------------------. gen SSQ = S*S
. reg EARNINGS S SSQ
Source |
SS
df
MS
-------------+-----------------------------Model | 20372.4953
2 10186.2477
Residual | 91637.7357
537 170.647553
-------------+-----------------------------Total | 112010.231
539 207.811189
Number of obs
F( 2,
537)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
540
59.69
0.0000
0.1819
0.1788
13.063
-----------------------------------------------------------------------------EARNINGS |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------S | -2.772317
2.119128
-1.31
0.191
-6.935114
1.390481
SSQ |
.1829731
.0737308
2.48
0.013
.0381369
.3278092
_cons |
22.25089
14.92883
1.49
0.137
-7.075176
51.57695
------------------------------------------------------------------------------
We will illustrate this with the earnings function. The table gives the output of a quadratic
regression of earnings on schooling (SSQ is defined as the square of schooling).
9
QUADRATIC EXPLANATORY VARIABLES
-----------------------------------------------------------------------------. gen SSQ = S*S
. reg EARNINGS S SSQ
Source |
SS
df
MS
-------------+-----------------------------Model | 20372.4953
2 10186.2477
Residual | 91637.7357
537 170.647553
-------------+-----------------------------Total | 112010.231
539 207.811189
Number of obs
F( 2,
537)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
540
59.69
0.0000
0.1819
0.1788
13.063
-----------------------------------------------------------------------------EARNINGS |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------S | -2.772317
2.119128
-1.31
0.191
-6.935114
1.390481
SSQ |
.1829731
.0737308
2.48
0.013
.0381369
.3278092
_cons |
22.25089
14.92883
1.49
0.137
-7.075176
51.57695
------------------------------------------------------------------------------
The coefficient of S implies that, for an individual with no schooling, the impact of a year of
schooling is to decrease hourly earnings by $2.77.
10
QUADRATIC EXPLANATORY VARIABLES
-----------------------------------------------------------------------------. gen SSQ = S*S
. reg EARNINGS S SSQ
Source |
SS
df
MS
-------------+-----------------------------Model | 20372.4953
2 10186.2477
Residual | 91637.7357
537 170.647553
-------------+-----------------------------Total | 112010.231
539 207.811189
Number of obs
F( 2,
537)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
540
59.69
0.0000
0.1819
0.1788
13.063
-----------------------------------------------------------------------------EARNINGS |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------S | -2.772317
2.119128
-1.31
0.191
-6.935114
1.390481
SSQ |
.1829731
.0737308
2.48
0.013
.0381369
.3278092
_cons |
22.25089
14.92883
1.49
0.137
-7.075176
51.57695
------------------------------------------------------------------------------
The intercept also has no sensible interpretation. Literally, it implies that an individual with
no schooling would have hourly earnings of $22.25, which is implausibly high.
11
QUADRATIC EXPLANATORY VARIABLES
120
Hourly earnings ($)
100
80
60
40
quadratic
20
0
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22
-20
-----------------------EARNINGS |
Coef.
-----------+-----------S | -2.772317
SSQ |
.1829731
_cons |
22.25089
------------------------
Years of schooling (highest grade completed)
The quadratic relationship is illustrated in the figure. Over the range of the actual data, it
fits the observations tolerably well. The fit is not dramatically different from those of the
linear and semilogarithmic specifications.
12
QUADRATIC EXPLANATORY VARIABLES
120
Hourly earnings ($)
100
80
60
40
quadratic
20
0
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22
-20
-----------------------EARNINGS |
Coef.
-----------+-----------S | -2.772317
SSQ |
.1829731
_cons |
22.25089
------------------------
Years of schooling (highest grade completed)
However, when one extrapolates beyond the data range, the quadratic function increases as
schooling decreases, giving rise to implausible estimates of both 1 and 2 for S = 0.
13
QUADRATIC EXPLANATORY VARIABLES
120
Hourly earnings ($)
100
80
60
40
quadratic
20
0
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22
-20
-----------------------EARNINGS |
Coef.
-----------+-----------S | -2.772317
SSQ |
.1829731
_cons |
22.25089
------------------------
Years of schooling (highest grade completed)
In this example, we would prefer the semilogarithmic specification, as do all wage-equation
studies. The slope coefficient of the semilogarithmic specification has a simple
interpretation and the specification does not give rise to nonsensical predictions outside
the data range.
14
QUADRATIC EXPLANATORY VARIABLES
120
Hourly earnings ($)
100
80
60
40
quadratic
20
0
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22
-20
-----------------------EARNINGS |
Coef.
-----------+-----------S | -2.772317
SSQ |
.1829731
_cons |
22.25089
------------------------
Years of schooling (highest grade completed)
In this example, we would prefer the semilogarithmic specification, as do all wage-equation
studies. The slope coefficient of the semilogarithmic specification has a simple
interpretation and the specification does not give rise to nonsensical predictions outside
the data range.
15
QUADRATIC EXPLANATORY VARIABLES
Average annual percentage growth rates
Employment GDP
Australia
Austria
Belgium
Canada
Denmark
Finland
France
Germany
Greece
Iceland
Ireland
Italy
Japan
1.68
0.65
0.34
1.17
0.02
–1.06
0.28
0.08
0.87
–0.13
2.16
–0.30
1.06
3.04
2.55
2.16
2.03
2.02
1.78
2.08
2.71
2.08
1.54
6.40
1.68
2.81
Employment GDP
Korea
2.57
Luxembourg
3.02
Netherlands
1.88
New Zealand
0.91
Norway
0.36
Portugal
0.33
Spain
0.89
Sweden
–0.94
Switzerland
0.79
Turkey
2.02
United Kingdom 0.66
United States
1.53
7.73
5.64
2.86
2.01
2.98
2.79
2.60
1.17
1.15
4.18
1.97
2.46
The data on employment growth rate, e, and GDP growth rate, g, for 25 OECD countries in
Exercise 1.4 provide a less problematic example of the use of a quadratic function.
16
QUADRATIC EXPLANATORY VARIABLES
-----------------------------------------------------------------------------. gen gsq = g*g
. reg e g gsq
Source |
SS
df
MS
-------------+-----------------------------Model | 15.9784642
2 7.98923212
Residual |
8.7235112
22 .396523236
-------------+-----------------------------Total | 24.7019754
24 1.02924898
Number of obs
F( 2,
22)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
25
20.15
0.0000
0.6468
0.6147
.6297
-----------------------------------------------------------------------------e |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------g |
1.200205
.3862226
3.11
0.005
.3992287
2.001182
gsq | -.0838408
.0445693
-1.88
0.073
-.1762719
.0085903
_cons | -1.678113
.6556641
-2.56
0.018
-3.037877
-.3183494
------------------------------------------------------------------------------
The output from a quadratic regression is shown. gsq has been defined as the square of g.
17
QUADRATIC EXPLANATORY VARIABLES
quadratic
employment growth rate
3
2
hyperbolic
1
0
0
1
2
3
4
5
6
7
8
-1
-2
-3
-4
-----------------------e |
Coef.
-----------+-----------g |
1.200205
gsq | -.0838408
_cons |
22.25089
------------------------
GDP growth rate
The quadratic specification appears to be an improvement on the hyperbolic function fitted
in a previous slideshow. It is more satisfactory than the latter for low values of g, in that it
does not yield implausibly large negative predicted values of e. The only defect is that it
predicts that the fitted value of e starts to fall when g exceeds 7.
18
QUADRATIC EXPLANATORY VARIABLES
quartic
employment growth rate
3
2
cubic
1
quartic
0
0
1
2
3
4
5
6
7
8
-1
cubic
-2
GDP growth rate
Why stop at a quadratic? Why not consider a cubic, or quartic, or a polynomial of even
higher order? There are usually several good reasons for not doing so.
19
QUADRATIC EXPLANATORY VARIABLES
quartic
employment growth rate
3
2
cubic
1
quartic
0
0
1
2
3
4
5
6
7
8
-1
cubic
-2
GDP growth rate
Diminishing marginal effects are standard in economic theory, justifying quadratic
specifications, at least as an approximation, but economic theory seldom suggests that a
relationship might sensibly be represented by a cubic or higher-order polynomial.
20
QUADRATIC EXPLANATORY VARIABLES
quartic
employment growth rate
3
2
cubic
1
quartic
0
0
1
2
3
4
5
6
7
8
-1
cubic
-2
GDP growth rate
The second reason follows from the first. There will be an improvement in fit as higherorder terms are added, but because these terms are not theoretically justified, the
improvement will be sample-specific.
21
QUADRATIC EXPLANATORY VARIABLES
quartic
employment growth rate
3
2
cubic
1
quartic
0
0
1
2
3
4
5
6
7
8
-1
cubic
-2
GDP growth rate
Third, unless the sample is very small, the fits of higher-order polynomials are unlikely to be
very different from those of a quadratic over the main part of the data range.
22
QUADRATIC EXPLANATORY VARIABLES
quartic
employment growth rate
3
2
cubic
1
quartic
0
0
1
2
3
4
5
6
7
8
-1
cubic
-2
GDP growth rate
These points are illustrated by the figure, which shows cubic and quartic regressions with
the quadratic regression. Over the main data range, from g = 1.5 to g = 4, the fits of the
cubic and quartic are very similar to that of the quadratic.
23
QUADRATIC EXPLANATORY VARIABLES
quartic
employment growth rate
3
2
cubic
1
quartic
0
0
1
2
3
4
5
6
7
8
-1
cubic
-2
GDP growth rate
R2 for the quadratic specification is 0.647. For the cubic and quartic it is 0.651 and 0.658,
relatively small improvements.
24
QUADRATIC EXPLANATORY VARIABLES
quartic
employment growth rate
3
2
cubic
1
quartic
0
0
1
2
3
4
5
6
7
8
-1
cubic
-2
GDP growth rate
Further, the cubic and quartic curves both exhibit implausible characteristics. The cubic
declines even more rapidly than the quadratic for high values of g, and the quartic has
strange twists at its extremities.
25
Copyright Christopher Dougherty 2011.
These slideshows may be downloaded by anyone, anywhere for personal use.
Subject to respect for copyright and, where appropriate, attribution, they may be
used as a resource for teaching an econometrics course. There is no need to
refer to the author.
The content of this slideshow comes from Section 4.3 of C. Dougherty,
Introduction to Econometrics, fourth edition 2011, Oxford University Press.
Additional (free) resources for both students and instructors may be
downloaded from the OUP Online Resource Centre
http://www.oup.com/uk/orc/bin/9780199567089/.
Individuals studying econometrics on their own and who feel that they might
benefit from participation in a formal course should consider the London School
of Economics summer school course
EC212 Introduction to Econometrics
http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx
or the University of London International Programmes distance learning course
20 Elements of Econometrics
www.londoninternational.ac.uk/lse.
11.07.25