Chapter 8 Locating and Collecting Economic Data Introduction

Download Report

Transcript Chapter 8 Locating and Collecting Economic Data Introduction

Chapter 8
Locating and Collecting Economic
Data
Introduction
• In this chapter we focus on how data are
constructed and where they may be found.
Collecting and manipulating data is the key part
of an empirical research project. A research
project is nothing without adequate data and an
original testable hypothesis. We, as researchers,
have to be sure that there is enough data to
adequately test our hypothesis. Otherwise, we
might have experienced the dullness(!) of
investing a great deal of time and effort just to
see that the data are not available to test our
painstaking hypothesis.
Data Creation
• The vast majority of data are constructed
rather than collected. For this reason,
statistics is made up not only of facts but also
of knowledge which is created.
Steps in data construction
Best (2001) identifies 3 steps in the
construction of a data series:
• Defining the concept
• Deciding how the concept will be measured,
and
• Determining how to define the sample on
which the data will be based.
• Every data series is constructed for a specific
purpose. However, a given data series may not
be defined or measured in a way that best
matches your needs. So, sometimes [or
probably most of the time(!)] you may need to
construct your own data.
Sample data
• Most social science statistics are based on
sample data rather than populations. For
example, average family income is not the
average income of ALL families; rather, it is the
average income of families in the sample.
The Structure of Economic Data
• It is important to distinguish between those
organizations that collect or produce data and
those that publish it.
Characteristics of data sets
• Data comes in 3 forms: time-series, cross-section, and
longitudinal (panel) data. Time-series data gives
different observations or data points on the same
variable at different points accross time (ex.: Turkish
GDP per capita over the time period 1923-2009).
Cross-section data, by contrast, gives different
observations of a comparable variable at the same
point in time (ex.: average disposable personal income
across different cities of Turkey for 2009). Longitudinal
(panel) data take a cross-section sample and follow it
over time (ex.: a sample of family income for the same
10 families over 5 years).
• Longitudinal data is an example of a micro
data set, since the data points or observations
are of individual economic agents such as
individuals, households, or firms. Macro data
are compiled at national level. Besides, the
frequency of data changes as well. You may
find daily, weekly, monthly, quarterly, or
annual data.
Organizations That Collect and
Publish Data
•
•
•
•
•






A number of US governmental, international, and private
organizations gather economic and social statistics. There are;
Census bureau (www.census.gov)
Bureau of economic analysis (www.bea.doc.gov)
Bureau of labor statistics (www.bls.gov)
The federal reserve (www.federalreserve.gov)
International agencies;
International Monetary Fund
Worlg Bank
OECD
Eurostat
Asian Development Bank
Inter-American Development Bank
For Turkish data sources;
• Central Bank Of the Republic of Turkey
(www.tcmb.gov.tr)
• Turkish Statistical Institute (www.tuik.gov.tr)
• State Planning Organization of Turkey
(www.dpt.gov.tr)
Major Primary Data Collections
• US national income and product accounts (official national accounts of the
US).
• US flow of funds accounts (data on financial flows across the US economy)
• US balance of payments accounts and international investment position of
the US.
• US census of population and integrated public use microdata series.
• Current population survey.
• Current employment statistics.
• The economic census.
• Annual survey of manufacturers.
• Current industrial reports.
• American housing survey.
• Consumer expenditure survey.
• National longitudinal surveys.
• Panel study of income dynamics.
• Surveys of consumers.
• Survey of consumer finances.
Major Secondary Data Collections
•
•
•
•
•
•
•
•
•
•
•
These sources are usually more user friendly compared to the primary
sources.
Economic report of the president.
Economagic.
FRED II (federal reserve economic data) (an excellent source for US macro
and financial data).
Stat-USA/State of the nation.
Inter-university Consortium for political and social research.
International financial statistics (principal data set of the IMF).
World economic outlook database.
Penn world tables.
Joint BIS-IMF-OECD-WB statistics on external debt.
Eurostat
OECD main economic indicators and national accounts
Chapter 9
Putting Together Your Data Set
Introduction
• Empirical research can be divided into 2 types:
experimental and survey (nonexperimental).
In the first one, the data come from the
experiment. Collecting the data is the major
part of the study. For the latter, we use
preexisting data. Researchers generally donot
put the same care and effort into it. This is
undoubtedly a huge mistake!
Developing a Search Strategy for
Finding Your Data
It is a good idea to start with a search strategy.
We have 2 steps:
Step1: Before you search
You need to have a large sample size (large enough to obtain
statistically valid empirical test results). The second issue is that of a
random or representative sample which will be discussed in details
in the 10th chapter. The third one is obtaining data that correctly
measure the concepts that your theory deems important.
Once you have determined your list of desired variables, the next step
is to think about where those data are likely to be found. To
summarize step 1 by raising questions;
• What are the desired variables?
• How should each variable be defined?
• What data frequency and sample period or what level of analysis?
• What are potential sources for data on each variable?
Step 2: As you search
•
•
•
•
As you begin to investigate each data source, you need
to ask several questions
What data are in fact available?
If the data are not the ideal, are they good enough?
If the data are not acceptable, is there an available
proxy that is? (a proxy is a variable that should behave
roughly the same as your theoretical variable).
If there is no adequate proxy, how can the hypothesis
be reformulated to make it testable, given the data
available?
Data Manipulation
•
•
•
•
•
•
•
•
•
Data for any variable may be found in various forms
some of which are listed below:
Levels
Per capita (per person)
Changes
Rates of change (growth rates)
Annualized growth rates
Proportions
Nominal
Real
Index numbers
Level of variable
• This is the most basic form. It is the actual
value or size of the variable being measured
(ex.: level of Turkish GDP per capita in 2009 is
TLX). Researchers often use per capita form of
the variable which is found by dividing the
level of the variable by the appropriate
population.
Change in variable
• Sometimes it is more useful to examine the
change in a variable than the level. ex.: Say that
GDP of Turkey in 2008 and 2009 are X and Y
respectively (in TL). Then, the change between
2008 and 2009 would be (Y-X). A more
meaningful evaluation would be made by
calculating the rate of change (percentage
change or the growth rate). If we turn to the
example, the rate of change between 2008 and
2009 would be calculated as follows:
• G= [(Y - X) / X]*100
• For periods of time shorter than a year, annualized growth
rate is used. Let us give an example:
• Assume that the sales of a company grows by 10% each
quarter and that the beginning value for sales is TL100.
Then,
• 1.10*100= TL110 for the 1st Q.
• 1.10*110= TL121 for the 2nd Q.
• 1.10*121= TL133.1 for the 3rd Q.
• And finally, 1.10*133.1= TL146.4 for the final Q. So, the
annualized growth would be 46.4% which is 6.4% more
than the rough approximation (10%*4= 40%).
To formulate this rate:
• Gq = [(X1/X0)^4 – 1]*100 (X0: initial value; X1:
next period’s value)
• In our example: [(110/100)^4 – 1]*100=
%46.4.
• If the data is monthly then we should raise the
ratio of monthly values to the 12th power.
• A form of data similar to growth rate is
proportion. It is also called a share or a
percentage or a fraction. Let us give a numerical
example:
• Suppose that; GDP= TL 10446.2 =
(C=7303.7)+(I=1593.2)+(G=1972.9)+(X-M=-423.6)
• The proportion of consumption expenditures in
GDP would be calculated as follows:
• C/GDP = 7303.7/10446.2 = 0.699 = 69.9%.
• Similarly, the share of other components in
GDP would be calculated as:
• I/GDP = 1593.2/10446.2 = 0.153 = 15.3%.
• G/GDP = 1972.9/10446.2 = 0.189 = 18.9%.
• (X-M)/GDP = -423.6/10446.2 = 0.041 = -4.1%.
Real versus Nominal Magnitutes
• Let’s remember the simple identity below:
• V=PxQ where V: nominal (or value); P: price, and
Q: real (or volume).
• Nominal data are data measured by using the
actual market prices that existed during the time
period in question. Real data, at the micro level
refer to the actual quantities employed by a firm
(labor hours), produced by a firm (number of
widgets), or sold by a firm (sales volume).
Index numbers
• Real GDP is an example of what economists
call an index number, or more specifically, a
quantity index. The other type is a price index.
Index numbers, unlike most other statistics,
have no units. They are designed for
comparison purposes. For ex., one could use
an index number to compare the level of
whatever the index is measuring to an earlier
time period known as the base period.
•
•
•
•
The formula is as follows;
XT = (Xt/X0)*100
Xt is the value of the raw variable in a given
time period t in the series,
X0 is the value of the raw variable in the base
period, which is the period to be compared
against,
XT is the resulting index number. Note that in
the base period Xt=X0.
Quantity indices versus real
quantities
We have 2 ways to make real measurements:
1. Create a quantity index (weighted average
of the quantities), 2. Create a price index and
divide the nominal value by this price index.
Price indices versus implicit price
deflators
• We have 2 ways to make price measurements.
One is to create a price index. The other one is
to create a quantity index and divide the
nominal value by this quantity index. The
result is called the implicit price deflator.
How inflation distorts nominal values
Because prices tend to increase over time, it
would be misleading to compare the nominal
measurements. By using base year prices and
actual year quantities, real GDP excludes the
effects of changing prices over time.
Rebasing data series
• Base year is generally changed to keep it
“recent” because we do care more about the
recent economic changes than historic ones.
Let us give a numerical example:
• Suppose that we have the following annual
CPI data;
Year
1991
92
93
94
Base 1997
95
96
97
98
0,85
0,95
1,00
1,15
0,95
1,00
1,15
Base 1992
0,95
1,00
1,25
1,30
1,40
Linked
series
0,58
0,61
0,76
0,79
0,85
Now, suppose that we want to complete the series
with a 1997 base year. We need to transform the
values for the observations only available with a base
year of 1992, so they correctly show the change in CPI
between both parts of the data. As seen from the
table, year 1995 is the year of overlap, that is, we have
2 values for this year. The data with the earlier base
year need to be reduced to link to the data with the
later base year. The amount of the reduction is given
by the ratio of the two values for 1995. So, the
reduction factor would be: 85/140= 0.607.
To obtain the linked series with a 1997 base
year, each value with base year 1992 would be
multiplied by the reduction factor (0.607)
(shown in the third row).
Data smoothing
• Data that has volatility are sometimes
“smoothed” to better reveal the underlying
trends. There are a variety of techniques for
smoothing data. We will discuss only 2 of
them. 1. Moving averages, 2. Seasonal
adjustment.
• A moving average replaces the actual data point
in each period with an average of the (n-1)
preceding data points with the nth. The result is
that any abnormal observations become less
important, since they are averaged with more
normal ones.
• Some variables show seasonal patterns, whereby
they change predictably in certain months,
quarters, or seasons. This is a question for which
seasonally adjusted data are designed.
Constructing a data appendix to your
research
• It is good scientific practice to make your data
available so that other researchers may
replicate your work. In explaining your sources
and methods, you should provide clear
citations for exactly which sources provided
the raw data, as well as a complete
explanation of how you manipulated the raw
data to transform it into the form you actually
used.
• Thanks for affording time.