Chapter Three: Statistical Analysis of Economic Relations
Download
Report
Transcript Chapter Three: Statistical Analysis of Economic Relations
ECONOMICS AND ELECTING THE PRESIDENT
THE WORK OF RAY FAIR
http://fairmodel.econ.yale.edu
/
http://fairmodel.econ.yale.edu
/vote2008/index2.htm
http://fairmodel.econ.yale.edu
/RAYFAIR/PDF/2006CHTM.HT
M
Presidential Election Links
http://fairmodel.econ.yale.edu/
http://www.apsanet.org/content_58382.cf
m
http://www.douglashibbs.com/Election2012/2012ElectionMainPage.htm
Economic Growth and the United States Presidency:
Can You Evaluate the Players Without a Scorecard?
David J. Berri
Department of Applied Economics
California State University – Bakersfield
Bakersfield, California 93311
661-654-2027
[email protected]
James Peach
P. O. Box 30001/ MSC 3CQ
Department of Economics
New Mexico State University
Las Cruces, NM 88003
505-646-2113
[email protected]
ABSTRACT
In several academic papers and a book, Ray Fair (1978,
1996, 2002) has demonstrated a link between the state
of the macroeconomy and the outcome of the
Presidential Election in the United States. Beginning
with the 1916 election, Fair’s model, based on such
factors as economic growth, inflation, and incumbency,
was able to accurately predict the winner in virtually
every election. The purpose of this research is to take
the Fair model back to the 19th century. The question
we address is as follows: Can a version of Fair’s model
accurately predict in an environment where economic
data was not made available to the voter?
LOUIS BEAN (1948)
HOW TO PREDICT ELECTIONS
“Business depressions played a powerful role in throwing the
Republicans out of office in 1874, after 1908, and in 1932, and they
had exactly the same influence in ousting Democrats after the
panic of 1858 and during the economic setbacks of 1894 and 1920.”
“Harding in 1920, McKinley in 1896, and Cleveland in 1884 were
also depression-made presidents. Had the deciding electoral vote
been cast for the candidate who had the majority of the popular
vote in 1876, Tilden too, would have been a depression-made
President.”
THE WORK OF RAY FAIR
Fair, Ray C. 1978. “The Effect of Economic Events on Votes for
President.” The Review of Economics and Statistics (Vol. LX, No.
2):159-173 May 1978.
Fair, Ray C. 1978. “The Effect of Economic Events on Votes for
President: 1980 Results.” The Review of Economics and Statistics
(Vol. 64, No. 2):322-25 May 1978.
Fair, Ray. C. 1996. “Econometrics and Presidential Elections.”
Journal of Economic Perspectives (Vol. 10, No 3):89-102 (Summer
1996).
Fair, Ray C. 2002. “The Effect of Economic Events on Votes for
President: 2000 Update.”
http://fairmodel.econ.yale.edu/RAYFAIR/PDF/2002DHTM
Downloaded Feb 2, 2006.
Fair, Ray C. 2002. Predicting Presidential Elections and other things.
Stanford: Stanford Business Books.
A FAIR MODEL
VOTE=
a1 + a2GROWTH+ a3INFLATION +
a4PARTY + a5PERSON +
a6DURATION + a7GOODNEWS + ε
DEFINING THE VARIABLES EMPLOYED
HTTP://FAIRMODEL.ECON.YALE.EDU/RAYFAIR/PDF/2002DHTM.HTM
VOTE = Incumbent share of the two-party presidential vote.
GROWTH = annual growth rate of real per capita GDP in the
first three quarters of the election year.
INFLATION = absolute value of the growth rate of the GDP
deflator in the first 15 quarters of the administration (annual rate)
except for 1920, 1944, and 1948, where the values are zero.
DEFINING THE VARIABLES EMPLOYED
HTTP://FAIRMODEL.ECON.YALE.EDU/RAYFAIR/PDF/2002DHTM.HTM
PARTY = 1 if Democrats are in power, = -1 if Republicans are in
power
PERSON = 1 if the president is running, = 0 otherwise
DURATION = 0 if the incumbent party has been in power for
one term, 1 if the incumbent party has been in power for two
consecutive terms, 1.25 if the incumbent party has been in power
for three consecutive terms, 1.50 for four consecutive terms, and
so on.
WAR = 1 for the elections of 1920, 1944, and 1948 and 0 otherwise
GOODNEWS = number of quarters in the first 15 quarters of
the administration in which the growth rate of real per capita
GDP is greater than 3.2 percent at an annual rate except for 1920,
1944, and 1948, where the values are zero.
Table One
The Accuracy of the Fair Model
1916-2000
Actual VOTE Predicted VOTE
Incumbent Party
received by
received by
Year
Candidate
Challenger
incumbent
incumbent
Error
1916
1920
1924
1928
1932
1936
1940
1944
1948
1952
1956
1960
1964
1968
1972
1976
1980
1984
1988
1992
1996
2000
Wilson
Cox
Coolidge
Hoover
Hoover
Roosevelt
Roosevelt
Roosevelt
Truman
Stevenson
Eisenhower
Nixon
L. Johnson
Humphrey
Nixon
Ford
Carter
Reagan
G. Bush
G. Bush
Clinton
Gore
Hughes
Harding
Davis
Smith
Roosevelt
Landon
Willkie
Dewey
Dewey
Eisenhower
Stevenson
Kennedy
Goldwater
Nixon
McGovern
Carter
Reagan
Mondale
Dukakis
Clinton
Dole
G.W. Bush
51.7
36.1
58.2
58.8
40.8
62.5
55.0
53.8
52.4
44.6
57.8
49.9
61.3
49.6
61.8
48.9
44.7
59.2
53.9
46.5
54.7
50.3
50.9
39.2
57.3
57.6
38.8
63.8
55.7
52.5
50.5
44.4
57.3
51.6
61.1
50.2
59.4
48.9
45.7
62.0
51.3
51.7
53.7
48.9
Source: http://fairmodel.econ.yale.edu/RAYFAIR/PDF/2002DHTM.HTM
-0.8
3.1
-1.0
-1.2
-2.1
1.4
0.7
-1.2
-1.8
-0.2
-0.5
1.7
-0.3
0.6
-2.4
0.0
1.0
2.9
-2.6
5.1
-1.0
-1.3
Actual
Winner
Predicted
Winner
Wilson
Harding
Coolidge
Hoover
Roosevelt
Roosevelt
Roosevelt
Roosevelt
Truman
Eisenhower
Eisenhower
Kennedy
L. Johnson
Nixon
Nixon
Carter
Reagan
Reagan
G. Bush
Clinton
Clinton
G.W. Bush
Wilson
Harding
Coolidge
Hoover
Roosevelt
Roosevelt
Roosevelt
Roosevelt
Truman
Eisenhower
Eisenhower
Nixon
L. Johnson
Humphrey
Nixon
Carter
Reagan
Reagan
G. Bush
G. Bush
Clinton
G.W. Bush
SUMMARIZING FAIR: 1916-2000
Only incorrect in three elections: 1960, 1964, 1992.
Average absolute error: 1.5
Results are driven by economic variables with no consideration of
a candidate’s appearance, debating talents, advertisements, or
general campaign skills.
TAKING FAIR BACK TO 1824
New measures of growth and inflation are needed.
Louis Johnston and Samuel H. Williamson, "The Annual Real
and Nominal GDP for the United States, 1789 - Present."
Economic History Services, April 2002, URL :
http://www.eh.net/hmit/gdp/
This data has been updated. Updated data did not change our
general findings.
THE MODELS TO BE ESTIMATED
Model 1
Original Fair Model (1916-2000)
Model 2
Fair Model with new measures of GROWTH and INFLATION
(1916-2000)
Model 3
Fair Model with new measures of GROWTH and INFLATION,
no GOODNEWS (1916-2000)
Model 4
Fair Model with new measures of GROWTH and INFLATION,
no GOODNEWS (1916-2004)
Model 5
Fair Model with new measures of GROWTH and INFLATION,
no GOODNEWS (1824-1912)
Model 6
Fair Model with new measures of GROWTH and INFLATION,
no GOODNEWS (1824-2004)
Table Three
Various Estimates of Fair’s Model
Dependent Variable is VOTE
White Heteroskedasticity-Consistent Standard Errors & Covariance
Model 1
Model 2
Model 3
Model 4
Model 5
Model 6
1916-2000
1916-2000
1916-2000
1916-2004
1824-1912
1824-2004
0.691*
0.457*
0.409**
0.413**
-0.135
0.342
(6.169)
(3.526)
(2.569)
(2.649)
(-0.560)
(1.577)
INFLATION
-0.775*
-0.808*
-0.988*
-0.943*
0.438
-0.191
(3.915)
(4.132)
(5.405)
(5.051)
(0.511)
(0.731)
PARTY
-2.713*
-2.054**
-1.644***
-1.252
0.647
-0.594
(5.434)
(2.644)
(1.946)
(1.363)
(0.344)
(0.580)
PERSON
3.251***
2.145
1.701
1.578
-0.372
1.705
(1.837)
(1.007)
(0.698)
(0.663)
(0.083)
(0.625)
DURATION
-3.628**
-4.238**
-5.216**
-4.519***
1.414
-0.448
(2.517)
(2.410)
(2.315)
(1.968)
(0.611)
(0.208)
3.855
5.268
2.905
2.262
-1.881
-0.835
(1.102)
(0.832)
(0.102)
(0.214)
Sample:
GROWTH
WAR
GOODNEWS
INTERCEPT
R-Square
(1.213)
(1.369)
0.837**
0.539
(2.932)
(1.238)
49.607*
52.871*
57.769*
56.892
48.458
50.889
(19.550)
(15.512)
(18.169)
(17.736)
(10.618)
(15.184)
0.923
0.845
0.828
0.769
0.093
0.116
0.683
8.883*
23
-0.248
0.272
23
-0.020
0.850
46
Adjusted R-Square
0.885
0.767
0.759
F-Statistic
24.000*
10.888*
12.015*
Observation
22
22
22
t-statistics in parenthesis below each coefficient.
* - Significant at the 1% level
** - Significant at the 5% level
*** - Significant at the 10% level
Table Four
The Accuracy of the Fair Model
1824-1912
Forecast Based on Model 3
Year
1824
1828
1832
1836
1840
1844
1848
1852
1856
1860
1864
1868
1872
1876
1880
1884
1888
1892
1896
1900
1904
1908
1912
Incumbent Party
Candidate
Challenger
Jackson
J.Q. Adams
J.Q. Adams
Jackson
Jackson
Clay
Van Buren
W. Harrison
Van Buren
W. Harrison
Clay
Polk
Cass
Taylor
Scott
Pierce
Buchanan
Fremont
Breckinridge
Lincoln
Lincoln
McClellan
Grant
Seymour
Grant
Greeley
Hayes
Tilden
Garfield
Hancock
Blaine
Cleveland
Cleveland
B. Harrison
B. Harrison
Cleveland
Bryan
McKinley
McKinley
Bryan
T. Roosevelt
Parker
Taft
Bryan
Taft/Roosevelt
Wilson
* - did not win popular vote
Actual VOTE
received by
incumbent
57.2
43.8
59.2
58.1
47.0
49.3
47.3
46.3
57.8
31.2
55.0
52.7
55.9
48.5
50.0
49.9
50.4
48.3
47.8
53.2
60.0
54.5
54.7
Predicted VOTE
received by
incumbent
39.8
56.9
58.0
43.7
51.4
58.6
53.5
62.0
55.2
56.4
42.9
48.5
55.9
50.3
50.0
45.1
59.1
61.2
53.7
59.5
49.9
46.8
50.6
Error
17.4
13.1
1.2
14.4
4.4
9.4
6.2
15.7
2.6
25.2
12.1
4.2
0.0
1.8
0.2
4.8
8.7
13.0
5.9
6.3
10.1
7.6
4.1
Actual
Winner
J.Q. Adams*
Jackson
Jackson
Van Buren
W. Harrison
Polk
Taylor
Pierce
Buchanan
Lincoln
Lincoln
Grant
Grant
Hayes*
Garfield
Cleveland
B.Harrison*
Cleveland
McKinley
McKinley
T. Roosevelt
Taft
Wilson*
Predicted
Winner
J.Q. Adams
J.Q. Adams
Jackson
W. Harrison
Van Buren
Clay
Cass
Scott
Buchanan
Breckinridge
McClellan
Seymour
Grant
Hayes
Hancock
Cleveland
Cleveland
B. Harrison
Bryan
McKinley
Parker
Bryan
Taft-Roosevelt
WHY DOES THE FAIR MODEL FAIR POORLY BEFORE 1916?
Economic data did not exist.
U.S. economy not integrated.
Federal government was not held responsible for the
macroeconomy.
Non-economic issues were more important in the 19th century.
Econometrics and
Presidential Elections
Larry M. Bartels
OVERVIEW OF THE FAIR MODEL
One of the most interesting aspects of Fair's essay is the unusually
frank and detailed description it provides of the enormous amount of
exploratory research underlying published analyses of aggregate
election outcomes. What is the relevant sample period? Which
economic variables matter? Measured over what time span? What
does one do with third party votes, war years, or an unelected
incumbent? In fewer than a dozen pages, Fair raises and resolves
many such questions, as any data analyst must. In the process, he
makes clear how much of what Leamer (1978) has referred to as
“specification uncertainty” plagues this (or any other) statistical
analysis of presidential election outcomes.
CHOOSING A MODEL
….(Fair’s) choice of model specification seems to have been guided by
goodness-of-fit considerations rather than by a priori political or
economic considerations. His data set begins in 1916 because “some
experimentation . . . using observations prior to 1916" produced results
that “were not as good.” Gerald Ford is sometimes counted as an
incumbent and sometimes not, depending upon which treatment “improves
the fit of the equation.” Revised economic data produced significant
changes in several key coefficients, prompting renewed searching “to see
which set of economic variables led to the best fit,” and so on.
WHAT HAVE WE LEARNED?
What most electoral scholars really care about is what the
relationship between economic conditions and election outcomes tells
us about voting behavior and democratic accountability.
On that score, what have we learned, and what have we yet to learn?
The clearest and most significant implication of aggregate election
analyses is that objective economic conditions -- not clever television
ads, debate performances, or the other ephemera of day-to-day
campaigning -- are the single most important influence upon an
incumbent president's prospects for reelection.
Despite a good deal of uncertainty regarding the exact form of the
relationship, the relevant time horizon, and the relative importance of
specific economic indicators, there can be no doubt that presidential
elections are, in significant part, referenda on the state of the
economy.
WHY DOES THE ECONOMY MATTER?
Do economic conditions matter because people vote their own
pocketbooks, or because they respond to changes in the whole
nation's economic condition?
The work of Markus (1988) and others has demonstrated that
personal and national economic fortunes are both important. However,
this demonstration does almost nothing to resolve the related question
of whether voters' underlying motivations are selfish or altruistic.
(Selfish voters could rationally base forecasts of their own future
incomes on recent changes in the national economy, while altruistic
voters could rationally base expectations regarding their fellow
citizens' future economic fortunes on their own recent economic
experience.)
MORE ON “WHY”…
Can voters untangle the complex contributions of the president
and other actors to the making of government policy? Can they
untangle the even more complex contributions of government
policy, exogenous economic forces, and dumb luck to observed
levels of economic growth, wage changes, unemployment, or
inflation?
Alesina, Londregan, and Rosenthal (1993) attempted to
distinguish between “rational” and “naive” economic voting by
estimating separate electoral effects for economic “shocks” and
economic growth that was “predictable” on the basis of previous
growth, partisan effects, and military mobilization. They found no
significant difference between these potentially distinct effects, a
result they interpreted (1993, 23) as “consistent with the
hypothesis of naive retrospective voting.”
AND MORE ON “WHY”…
…we know remarkably little about why voters reward the
incumbent president for prosperity or punish him for economic
distress.
Do they have any rational basis for supposing that economic
conditions in the election year are indicative of future conditions if
the incumbent is reelected? (As far as I know, nobody has
demonstrated such a connection.)
Do they know or care what, if anything, the out party would do
differently? Or, as Ferejohn (1986) and others would have it, are
they simply holding up their end of a simple-minded implicit
contract intended to extract whatever effort a self-interested
incumbent may be able to exert on their behalf -- the only sort of
accountability feasible in a situation marked by massive uncertainty
and asymmetric information?
MY OWN THOUGHTS…
Three voters in the election…
Republicans (vote Republican)
Democrates (vote Democrat)
Independents
The only free agents are independents. These are voters who care so
little, they don’t join a party. And these are the voters that matter.
Why the economy? It is the one issue that matters to the independent.
POLITICS AND FOOTBALL…
Some football coaches believe the run sets up the pass. Others think
the pass sets up the run. Fans, though, don’t care. You win, you keep
your job. You lose, you lose your job.
Applied to politics… some people believe in smaller government
and low taxes. Others believe in more government to solve problems.
Independents, though, don’t care. The economy does well, you keep
your job. If not, your fired. What the politician believes is simply not
relevant.
Writing an academic paper
1.
2.
3.
4.
Introduction (tell cute story)
Literature Review
Describe Data
Create model:
1.
2.
3.
4.
Identify dependent variable
Identify independent variables
State hypothesis
Estimate model
DATA SUMMARY AND DESCRIPTION
Population Parameters – Summary and descriptive measures for the
population.
Sample Statistics – Summary and descriptive measures for a sample.
NOTE: We rarely have data for the population. Hence we need to be
able to draw inferences from a sample.
MEASURES OF CENTRAL TENDENCY
Mean – The average
Issue:You must note the distribution of the
sample. If it is unbalanced the mean may be
misleading.
Median – “Middle” observation
SYMMETRICAL VS. SKEWNESS
Symmetrical – A balanced distribution. Median = Mean
Skewness – A lack of balance.
Skewed to the left: Median > Mean
Skewed to the right: Median < Mean
If skewness is observed one may wish to examine a sub-sample of the
data.
MEASURES OF DISPERSION
Range – Difference between the largest and
smallest sample observations.
Only considers the extremes of the sample
Solution: Inter-quartile or percentile range
Variance and Standard Deviation
Sample Variance – Average squared deviation from the
sample mean.
Sample Standard Deviation – Squared root of the
sample variance.
COEFFICIENT OF VARIATION
Coefficient of variation – Standard deviation divided by the mean.
A measure that does not rely on the size of the observations or the
unit of measurement.
This is used to compare relative dispersion across a variety of data.
Hypothesis Testing
Hypothesis Testing – Statistical experiment
used to measure the reasonableness of a
given theory or premise
NOTE: WE DO NOT “PROVE” A THEORY
Type I Error – Incorrect rejection of a ‘true’
hypothesis.
Type II Error – Failure to reject a ‘false’
hypothesis.
Regression Analysis
Definitions
Regression analysis – statistical method for
describing the relationship between a
dependent variable Y and independent
variable(s) X.
Deterministic Relation = An identity
A relationship that is known with certainty.
Statistical Relation – An inexact relation
Regression Analysis
Types of Data
Time series – A daily, weekly, monthly, or annual
sequence of data. i.e. GDP data for the United
States from 1950 to 2012
Cross-section – Data from a common point in
time. i.e. GDP data for OECD nations in 1986.
Panel data – Data that combines both crosssection and time-series data. i.e. GDP data for
OECD nations from 1960 to 2012.
Steps in Regression Analysis
1.
2.
3.
4.
Specify the dependent and independent
variable(s) to be analyzed
Obtain reliable data.
Estimate the model.
Interpret the regression results.
Specifying the Regression Analysis
The choice of independent variables
Univariate analysis = Simple
regression model - A regression
model with only one independent
variable.
Issue: Cannot impose ceteris paribus
Multivariate analysis = Multiple
regression model - A regression
model with multiple independent
variables.
Univariate Analysis
Y = a + bX
Where
Y = The Dependent Variable, or what you
are trying to explain (or predict).
X = The Independent Variable, or what
you believe explains Y.
a = the y-intercept or constant term.
b = the slope or coefficient
The Least Squares Model
Ordinary Least Squares: a statistical
method that chooses the regression
line by minimizing the squared
distance between the data points and
the regression line.
Why not sum the errors? Generally
equals zero.
Why not take the absolute value of the
errors? We wish to emphasize large
errors.
The slope coefficient
How do we interpret the slope coefficient?
Example:
Winning percentage = -0.830 + 0.014*(Points per game)
Each additional one point per game results in a 0.014
increase in winning percentage.
How many wins is this? 1.1 over an 82 game season.
Is this the ‘truth’? We never know the truth, we are
simply attempting to derive estimates.
Is this a ‘good’ estimate? Clearly points alone do not
explain wins.
The constant term
How do we interpret the constant term?
The constant term must be included in the regression, or else we
are forcing the regression line through zero.
The constant term is used to impose a zero mean for the error
term, hence it acts as a garbage collector. In other words, it
captures all the factors not explicitly utilized in the equation.
The constant term is theoretically the value of Y when X is zero.
Frequently this is outside the range of possibility, and therefore
the constant term should not be interpreted.
The Error Term
Error term (e) = random,
included because we do not
expect a perfect relationship.
Sources of error
1. Omitted variables
2. Measurement error
3. Incorrect functional form
Multivariate Analysis
Introducing the idea of ceteris paribus.
One cannot impose ceteris paribus
unless all relevant variables are included
in the model.
Coefficient of Determination
Coefficient of Determination –
Percentage of Y-variation
explained by the regression
model.
Also referred to as R2
R2 = Variation Explained by Regression
Total Variation in Y
R2 ranges from 0 to 1.
TSS, ESS, RSS in words
Total Sum of Squares = TSS = How
much variation there is to explain.
Explained Sum of Squares = ESS =
How much variation you explained.
Residual Sum of Squares = RSS =
How much variation you did not
explain.
Adjusted R-Squared
Adding any independent
variable will increase R2.
To combat this problem, we
often report the adjusted R2.
F Statistic
F-statistic tells us if the independent
variables as a group explain a
statistically significant share of the
variation in the dependent variable.
Judging the significance of a variable
The t-statistic: estimated coefficient /
standard deviation of the coefficient.
The t-statistic is used to test the null
hypothesis (H0) that the coefficient is
equal to zero. The alternative
hypothesis (HA) is that the coefficient
is different than zero.
Rule of thumb: if t>2 we believe the
coefficient is statistically different from
zero. WHY?
Understand the difference between
statistical significance and economic
significance.
Multicollinearity
Multicollinearity - more than two
independent variables exhibit a linear
correlation.
Consequences
a. Standard errors will rise, t-stats will fall
b. Estimates will be sensitive to changes in
specification
c. Overall fit of regression will be unaffected
Other Econometric Issues
Omitted Variable Bias: You cannot impose
ceteris paribus if relevant independent variables
are not included in the model.
Small Sample Bias: You cannot adequately
assess a relationship with an inadequate
sample. Remember, we are trying to learn about
the underlying population.
THE BIG WORDS: Heteroskedasticity and
Autocorrelation