Transcript data

Chapter 11
Running
Your Own
Regression
Project
Copyright © 2011 Pearson Addison-Wesley.
All rights reserved.
Slides by Niels-Hugo Blunch
Washington and Lee University
Choosing Your Topic
• There are at least three keys to choosing a topic:
1. Try to pick a field that you find interesting and/or that you know
something about
2. Make sure that data are readily available with a reasonable
sample (we suggest at least 25 observations)
3. Make sure that there is some substance to your topic
– Avoid topics that are purely descriptive or virtually tautological in nature
– Instead, look for topics that address an inherently interesting economic or
behavioral question or choice
© 2011 Pearson Addison-Wesley. All rights reserved.
11-2
Choosing Your Topic (cont.)
• Places to look:
– your textbooks and notes from previous economics classes
– economics journals
• For example, Table 11.1 contains a list of the journals cited so far in this
textbook (in order of the frequency of citation)
 Master research or PhD research
© 2011 Pearson Addison-Wesley. All rights reserved.
11-3
Table 11.1a
Sources of Potential Topic Ideas
© 2011 Pearson Addison-Wesley. All rights reserved.
11-4
Table 11.1b
Sources of Potential Topic Ideas
© 2011 Pearson Addison-Wesley. All rights reserved.
11-5
Collecting Your Data
•
Before any quantitative analysis can be done, the data must be:
– collected
– organized
– entered into a computer
•
Usually, this is a time-consuming and frustrating task because of:
– the difficulty of finding data
– the existence of definitional differences between theoretical variables
and their empirical counterparts
– and the high probability of data entry errors or data transmission errors
•
But time spent thinking about and collecting the data is well spent, since a
researcher who knows the data sources and definitions is much less likely
to make mistakes using or interpreting regressions run on that data
•
We will now discuss three data collection issues in a bit more detail
© 2011 Pearson Addison-Wesley. All rights reserved.
11-6
What Data to Look For
•
Checking for data availability means deciding what specific variables you
want to study:
– dependent variable
– all relevant independent variables
•
At least 5 issues to consider here:
1. Time periods:
– If the dependent variable is measured annually, the explanatory variables
should also be measured annually and not, say, monthly
2. Measuring quantity:
– If the market and/or quality of a given variable has changed over time, it makes
little sense to use quantity in units
– Example: TVs have changed so much over time that it makes more sense to use
quantity in terms of monetary equivalent: more comparable across time
© 2011 Pearson Addison-Wesley. All rights reserved.
11-7
What Data to Look For (cont.)
3. Nominal or real terms?
– Depends on theory – essentially: do we want to “clean” for inflation?
– TVs, again: probably use real terms
4. Appropriate variable definitions depend on whether data are crosssectional or time-series
– TVs, again: national advertising would be a good candidate for an
explanatory variable in a time-series model, while advertising in or near
each state (or city) would make sense in a cross-sectional model
5. Be careful when reading (and creating!) descriptions of data:
– Where did the data originate?
– Are prices and/or income measured in nominal or real terms?
– Are prices retail or wholesale?
© 2011 Pearson Addison-Wesley. All rights reserved.
11-8
Where to Look for
Economic Data
• Although some researchers generate their own data through
surveys or other techniques (see Section 11.3), the vast majority
of regressions are run on publicly available data
• Good sources here include:
1. Government publications:
– Statistical Abstract of the U.S.
– the annual Economic Report of the President
– the Handbook of Labor Statistics
– Historical Statistics of the U.S. (published in 1975)
– Census Catalog and Guide
© 2011 Pearson Addison-Wesley. All rights reserved.
11-9
Where to Look for
Economic Data (cont.)
2. International data sources:
– U.N. Statistical Yearbook
– U.N. Yearbook of National Account Statistics
3. Internet resources:
– “Resources for Economists on the Internet”
– Economagic
– WebEC
– EconLit (www.econlit.org)
– “Dialog”
– Links to these sites and other good sources of data are on the
11-10
text’s Web
site:
www.pearsonhighered.com/studenmund
© 2011 Pearson Addison-Wesley.
All rights
reserved.
Missing Data
• Suppose the data aren’t there?
– What happens if you choose the perfect variable and
look in all the right sources and can’t find the data?
– The answer to this question depends on how much
data is missing:
1. A few observations:
– in a cross-section study:
• Can usually afford to drop these observations from the
sample
– in a time-series study:
• May interpolate value (taking the mean of adjacent values)
© 2011 Pearson Addison-Wesley. All rights reserved.
11-11
Missing Data (cont.)
2. No data at all available (for a theoretically relevant
variable!):
– From Chapter 6, we know that this is likely to cause
omitted variables bias
– A possible solution here is to use a proxy variable
– For example, the value of net investment is a variable
that is not measured directly in a number of countries
– Instead, might use the value of gross investment as a
proxy, the assumption being that the value of gross
investment is directly proportional to the value of net
investment
© 2011 Pearson Addison-Wesley. All rights reserved.
11-12
Advanced Data Sources
• So far, all the data sets have been:
1. cross-sectional or time-series in nature
2. been collected by observing the world around us, instead being
created
• It turns out, however, that:
1. time-series and cross-sectional data can be pooled to form panel
data
2. data can be generated through surveys
• We will now briefly introduce these more advanced data
sources and explain why it probably doesn't make sense to
use these data sources on your first regression project:
© 2011 Pearson Addison-Wesley. All rights reserved.
11-13
Surveys
• Surveys are everywhere in our society and are
used for many different purposes—examples
include:
– marketing firms using surveys to learn more about
products and competition
– political candidates using surveys to finetune their
campaign advertising or strategies
– governments using surveys for all sorts of purposes,
including keeping track of their citizens with instruments
like the U.S. Census
© 2011 Pearson Addison-Wesley. All rights reserved.
11-14
Surveys (cont.)
• While running your own survey might be tempting as a
way of obtaining data for your own project, running a survey
is not as easy as it might seem surveys:
– must be carefully thought through; it’s virtually impossible to go
– back to the respondents and add another question later
– must be worded precisely (and pretested) to avoid confusing the
respondent or "leading" the respondent to a particular answer
– must have samples that are random and avoid the selection,
survivor, and nonresponse biases explained in Section 17.2
• As a result, we don't encourage beginning researchers to
run their own surveys...
© 2011 Pearson Addison-Wesley. All rights reserved.
11-15
Panel Data
• Again, panel data are formed when cross-sectional and
time-series data sets are pooled to create a single data
set
• Two main reasons for using panel data:
– To increase the sample size
– To provide an insight into an analytical question that can't be
obtained by using time-series or cross-sectional data alone
© 2011 Pearson Addison-Wesley. All rights reserved.
11-16
Panel Data (cont.)
• Example: suppose we’re interested in the relationship
between budget deficits and interest rates but only have 10
years’ of annual data to study
– But ten observations is too small a sample for a reasonable
regression!
– However, if we can find time-series data on the same economic
variables-interest rates and budget deficits—for the same ten years
for six different countries, we’ll end up with a sample of 10*6 = 60
observations, which is more than enough
– The result is a pooled cross-section time-series data set—a
panel data set!
– Panel data estimation methods are treated in Chapter 16
© 2011 Pearson Addison-Wesley. All rights reserved.
11-17
Practical Advice for
Your Project
• We now move to a discussion of practical advice
about actually doing applied econometric work
• This discussion is structured in three parts:
1. The 10 Commandments of Applied Econometrics
(by Peter Kennedy)
2. What to check if you get an unexpected sign
3. A collection of a dozen practical tips, brought
together from other sections of this text that are worth
reiterating specifically in the context of actually doing
applied econometric work
© 2011 Pearson Addison-Wesley. All rights reserved.
11-18
Practical Advice for
Your Project
• We now move to a discussion of practical advice
about actually doing applied econometric work
• This discussion is structured in three parts:
1. The 10 Commandments of Applied Econometrics
(by Peter Kennedy)
2. What to check if you get an unexpected sign
3. A collection of a dozen practical tips, brought
together from other sections of this text that are worth
reiterating specifically in the context of actually doing
applied econometric work
© 2011 Pearson Addison-Wesley. All rights reserved.
11-19
The 10 Commandments of
Applied Econometrics
1. Use common sense and economic theory:
Example: match per capita variables with per capita variables, use real exchange rates to
explain real imports or exports, etc
2. Ask the right questions:
Ask plenty of, perhaps, seemingly silly questions to ensure that you fully understand the
goal of the research
3. Know the context:
Be sure to be familiar with the history, institutions, operating constraints, measurement
peculiarities, cultural customs, etc, underlying the object under study
4. Inspect the data:
a. This includes calculating summary statistics, graphs, and data cleaning (including
checking filters)
b. The objective is to get to know the data well
© 2011 Pearson Addison-Wesley. All rights reserved.
11-20
The 10 Commandments of
Applied Econometrics (cont.)
5. Keep it sensibly simple:
a. Begin with a simple model and only complicate it if it fails
b. This both goes for the specifications, functional forms, etc and for the
estimation method
6. Look long and hard at your results:
a. Check that the results make sense, including signs and magnitudes
b. Apply the “laugh test”
7. Understand the costs and benefits of data mining:
a. “Bad” data mining: deliberately searching for a specification that “works”
(i.e. “torturing” the data)
b. “Good” data mining: experimenting with the data to discover empirical
regularities that can inform economic theory and be tested on a second data
set
11-21
© 2011 Pearson Addison-Wesley. All rights reserved.
The 10 Commandments of
Applied Econometrics (cont.)
8. Be prepared to compromise:
a. The Classical Assumptions are only rarely are satisfied
b. Applied econometricians are therefore forced to compromise and adopt
suboptimal solutions, the characteristics and consequences of which are
not always known
c. Applied econometrics is necessarily ad hoc: we develop our analysis,
including responses to potential problems, as we go along…
9. Do not confuse statistical significance with meaningful magnitude:
a. If the sample size is large enough, any (two-sided) hypothesis can be
rejected (when large enough to make the SEs small enough)
b. Substantive significance—i.e. “how large?”—is also important, not just
statistical significance
© 2011 Pearson Addison-Wesley. All rights reserved.
11-22
The 10 Commandments of
Applied Econometrics (cont.)
10. Report a sensitivity analysis:
a. Dimensions to examine:
i. sample period
ii. the functional form
iii. the set of explanatory variables
iv. the choice of proxies
b. If results are not robust across the examined dimensions, then
this casts doubt on the conclusions of the research
© 2011 Pearson Addison-Wesley. All rights reserved.
11-23
What to Check If You Get an
Unexpected Sign
1. Recheck the expected sign
Were dummy variables computed “upside down,” for example?
2. Check your data for input errors and/or outliers
3. Check for an omitted variable
The most frequent source of significant unexpected signs
4. Check for an irrelevant variable
Frequent source of insignificant unexpected signs
5. Check for multicollinearity
Multicollinearity increases the variances and standard errors of the
estimated coefficients, increasing the chance that a coefficient could
have an unexpected sign
© 2011 Pearson Addison-Wesley. All rights reserved.
11-24
What to Check If You Get an
Unexpected Sign
6. Check for sample selection bias
An unexpected sign sometimes can be due to the fact that the
observations included in the data were not obtained randomly
7. Check your sample size
The smaller the sample size, the higher the variance on SEs
8. Check your theory
If nothing else is apparently wrong, only two possibilities remain:
the theory is wrong or the data is bad
© 2011 Pearson Addison-Wesley. All rights reserved.
11-25
A Dozen Practical Tips Worth
Reiterating
1. Don’t attempt to maximize R2 (Chapter 2)
2. Always review the literature and hypothesize the signs
of your coefficients before estimating a model (Chapter 3)
3. Inspect and clean your data before estimating a model.
Know that outliers should not be automatically omitted;
instead, they should be investigated to make sure that
they belong in the sample (Chapter 3)
4. Know the Classical Assumptions cold! (Chapter 4)
5. In general, use a one-sided t-test unless the expected
sign of the coefficient actually is in doubt (Chapter 5)
© 2011 Pearson Addison-Wesley. All rights reserved.
11-26
A Dozen Practical Tips Worth
Reiterating (cont.)
6. Don’t automatically discard a variable with an
insignificant t-score. In general, be willing to live with a
variable with a t-score lower than the critical value in order
to decrease the chance of omitting a relevant variable
(Chapter 6)
7. Know how to analyze the size and direction of the bias
caused by an omitted variable (Chapter 6)
8. Understand all the different functional form options and
their common uses, and remember to choose your
functional form primarily on the basis of theory, not fit
(Chapter 7)
© 2011 Pearson Addison-Wesley. All rights reserved.
11-27
A Dozen Practical Tips Worth
Reiterating (cont.)
9. Multicollinearity doesn’t create bias; the estimated
variances are large, but the estimated coefficients
themselves are unbiased: So, the most-used remedy for
multicollinearity is to do nothing (Chapter 8)
10. If you get a significant Durbin–Watson, Park, or White
test, remember to consider the possibility that a
specification error might be causing impure serial
correlation or heteroskedasticity. Don’t change your
estimation technique from OLS to GLS or use adjusted
standard errors until you have the best possible
specification. (Chapters 9 and 10)
© 2011 Pearson Addison-Wesley. All rights reserved.
11-28
A Dozen Practical Tips Worth
Reiterating (cont.)
11. Adjusted standard errors like Newey–West standard
errors or HC standard errors use the OLS coefficient
estimates. It’s the standard errors of the estimated
coefficients that change, not the estimated coefficients
themselves. (Chapters 9 and 10)
12. Finally, if in doubt, rely on common sense and
economic theory, not on statistical tests
© 2011 Pearson Addison-Wesley. All rights reserved.
11-29
The Ethical Econometrician
• We think that there are two reasonable goals for
econometricians when estimating models:
1. Run as few different specifications as possible while
still attempting to avoid the major econometric problems
• The only exception is sensitivity analysis, described in
Section 6.4
2. Report honestly the number and type of different
specifications estimated so that readers of the
research can evaluate how much weight to give to your
results
© 2011 Pearson Addison-Wesley. All rights reserved.
11-30
Writing Your Research Report
• Most good research reports have a number of elements in
common:
– A brief introduction that defines the dependent variable and states
the goals of the research
– A short review of relevant previous literature and research
– An explanation of the specification of the equation (model):
• Independent variables
• functional forms
• expected signs of (or other hypotheses about) the slope coefficients
– A description of the data:
• generated variables
• data sources
• data irregularities (if any)
© 2011 Pearson Addison-Wesley. All rights reserved.
11-31
Writing Your Research Report
(cont.)
• A presentation of each estimated specification, using our standard
documentation format
– If you estimate more than one specification, be sure to explain which one is
best (and why!)
• A careful analysis of the regression results:
– discussion of any econometric problems encountered
– complete documentation of all:
• equations estimated
• tests run
• A short summary/conclusion that includes any policy
recommendations or suggestions for further research
• A bibliography
• An appendix that includes all data, all regression runs, and all relevant
computer output
© 2011 Pearson Addison-Wesley. All rights reserved.
11-32