Chapter 8 Locating and Collecting Economic Data

Download Report

Transcript Chapter 8 Locating and Collecting Economic Data

•
In this chapter we focus on how data are
constructed and where they may be found.
Collecting and manipulating data is the key
part of an empirical research project. A
research project is nothing without adequate
data and an original testable hypothesis. We,
as researchers, have to be sure that there is
enough data to adequately test our
hypothesis. Otherwise, we might have
experienced the dullness(!) of investing a
great deal of time and effort just to see that
the data are not available to test our
painstaking hypothesis.
 The
vast majority of data are constructed
rather than collected. For this reason,
statistics is made up not only of facts but
also of knowledge which is created.
Best (2001) identifies 3 steps in the
construction of a data series:
 Defining the concept
 Deciding how the concept will be measured,
and
 Determining how to define the sample on
which the data will be based.
 Every
data series is constructed for a specific
purpose. However, a given data series may
not be defined or measured in a way that
best matches your needs. So, sometimes [or
probably most of the time(!)] you may need
to construct your own data.
 Most
social science statistics are based on
sample data rather than populations. For
example, average family income is not the
average income of ALL families; rather, it is
the average income of families in the
sample.
 It
is important to distinguish between those
organizations that collect or produce data
and those that publish it.
•
Data comes in 3 forms: time-series, crosssection, and longitudinal (panel) data. Timeseries data gives different observations or data
points on the same variable at different points
accross time (ex.: Turkish GDP per capita over
the time period 1923-2009). Cross-section data,
by contrast, gives different observations of a
comparable variable at the same point in time
(ex.: average disposable personal income across
different cities of Turkey for 2009). Longitudinal
(panel) data take a cross-section sample and
follow it over time (ex.: a sample of family
income for the same 10 families over 5 years).
 Longitudinal
data is an example of a micro
data set, since the data points or
observations are of individual economic
agents such as individuals, households, or
firms. Macro data are compiled at national
level. Besides, the frequency of data changes
as well. You may find daily, weekly, monthly,
quarterly, or annual data.
•
•
•
•
•






A number of US governmental, international, and
private organizations gather economic and social
statistics. There are;
Census bureau (www.census.gov)
Bureau of economic analysis (www.bea.doc.gov)
Bureau of labor statistics (www.bls.gov)
The federal reserve (www.federalreserve.gov)
International agencies;
International Monetary Fund
Worlg Bank
OECD
Eurostat
Asian Development Bank
Inter-American Development Bank
For Turkish data sources;
 Central Bank Of the Republic of Turkey
(www.tcmb.gov.tr)
 Turkish Statistical Institute (www.tuik.gov.tr)
 State Planning Organization of Turkey
(www.dpt.gov.tr)
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
US national income and product accounts (official national accounts of the
US).
US flow of funds accounts (data on financial flows across the US economy)
US balance of payments accounts and international investment position of
the US.
US census of population and integrated public use microdata series.
Current population survey.
Current employment statistics.
The economic census.
Annual survey of manufacturers.
Current industrial reports.
American housing survey.
Consumer expenditure survey.
National longitudinal surveys.
Panel study of income dynamics.
Surveys of consumers.
Survey of consumer finances.
•
•
•
•
•
•
•
•
•
•
•
These sources are usually more user friendly compared to
the primary sources.
Economic report of the president.
Economagic.
FRED II (federal reserve economic data) (an excellent
source for US macro and financial data).
Stat-USA/State of the nation.
Inter-university Consortium for political and social
research.
International financial statistics (principal data set of the
IMF).
World economic outlook database.
Penn world tables.
Joint BIS-IMF-OECD-WB statistics on external debt.
Eurostat
OECD main economic indicators and national accounts
 Empirical
research can be divided into 2
types: experimental and survey
(nonexperimental). In the first one, the
data come from the experiment. Collecting
the data is the major part of the study. For
the latter, we use preexisting data.
Researchers generally donot put the same
care and effort into it. This is undoubtedly a
huge mistake!
It is a good idea to start with a search
strategy. We have 2 steps:
You need to have a large sample size (large enough to
obtain statistically valid empirical test results). The
second issue is that of a random or representative
sample which will be discussed in details in the 10th
chapter. The third one is obtaining data that
correctly measure the concepts that your theory
deems important.
Once you have determined your list of desired
variables, the next step is to think about where those
data are likely to be found. To summarize step 1 by
raising questions;
• What are the desired variables?
• How should each variable be defined?
• What data frequency and sample period or what level
of analysis?
• What are potential sources for data on each variable?
•
•
•
•
As you begin to investigate each data source,
you need to ask several questions
What data are in fact available?
If the data are not the ideal, are they good
enough?
If the data are not acceptable, is there an
available proxy? (a proxy is a variable that
should behave roughly the same as your
theoretical variable).
If there is no adequate proxy, how can the
hypothesis be reformulated to make it
testable, given the data available?
•
•
•
•
•
•
•
•
•
Data for any variable may be found in various
forms some of which are listed below:
Levels
Per capita (per person)
Changes
Rates of change (growth rates)
Annualized growth rates
Proportions
Nominal
Real
Index numbers
 This
is the most basic form. It is the actual
value or size of the variable being measured
(ex.: level of Turkish GDP per capita in 2009
is TLX). Researchers often use per capita
form of the variable which is found by
dividing the level of the variable by the
appropriate population.
•
•
Sometimes it is more useful to examine the
change in a variable than the level. ex.: Say
that GDP of Turkey in 2008 and 2009 are X
and Y respectively (in TL). Then, the change
between 2008 and 2009 would be (Y-X). A
more meaningful evaluation would be made
by calculating the rate of change
(percentage change or the growth rate). If
we turn to the example, the rate of change
between 2008 and 2009 would be calculated
as follows:
G= [(Y - X) / X]*100
•
•
•
•
•
•
For periods of time shorter than a year,
annualized growth rate is used. Let us give an
example:
Assume that the sales of a company grows by
10% each quarter and that the beginning value
for sales is TL100. Then,
1.10*100= TL110 for the 1st Q.
1.10*110= TL121 for the 2nd Q.
1.10*121= TL133.1 for the 3rd Q.
And finally, 1.10*133.1= TL146.4 for the final Q.
So, the annualized growth would be 46.4% which
is 6.4% more than the rough approximation
(10%*4= 40%).
To formulate this rate:
 Gq = [(X1/X0)^4 – 1]*100 (X0: initial value;
X1: next period’s value)
 In our example: [(110/100)^4 – 1]*100=
%46.4.
 If the data is monthly then we should raise
the ratio of monthly values to the 12th
power.
•
•
•
•
A form of data similar to growth rate is
proportion. It is also called a share or a
percentage or a fraction. Let us give a
numerical example:
Suppose that; GDP= TL 10446.2 =
(C=7303.7)+(I=1593.2)+(G=1972.9)+(X-M=423.6)
The proportion of consumption expenditures
in GDP would be calculated as follows:
C/GDP = 7303.7/10446.2 = 0.699 = 69.9%.
 Similarly,
the share of other components in
GDP would be calculated as:
 I/GDP = 1593.2/10446.2 = 0.153 = 15.3%.
 G/GDP = 1972.9/10446.2 = 0.189 = 18.9%.
 (X-M)/GDP = -423.6/10446.2 = 0.041 = -4.1%.
•
•
•
Let’s remember the simple identity below:
V=PxQ where V: nominal (or value); P: price,
and Q: real (or volume).
Nominal data are data measured by using the
actual market prices that existed during the
time period in question. Real data, at the
micro level refer to the actual quantities
employed by a firm (labor hours), produced
by a firm (number of widgets), or sold by a
firm (sales volume).
 Index
numbers, unlike most other statistics,
have no units. They are designed for
comparison purposes. For ex., one could use
an index number to compare the level of
whatever the index is measuring to an earlier
time period known as the base period.
•
•
•
•
The formula is as follows;
XT = (Xt/X0)*100
Xt is the value of the raw variable in a given
time period t in the series,
X0 is the value of the raw variable in the
base period, which is the period to be
compared against,
XT is the resulting index number. Note that in
the base period Xt=X0.
Because prices tend to increase over time, it
would be misleading to compare the nominal
measurements. By using base year prices and
actual year quantities, real GDP excludes the
effects of changing prices over time.
 Base
year is generally changed to keep it
“recent” because we do care more about the
recent economic changes than historic ones.
Let us give a numerical example:
 Suppose that we have the following annual
CPI data;
Year
1991
92
93
94
Base 1997
95
96
97
98
0,85
0,95
1,00
1,15
0,95
1,00
1,15
Base 1992
0,95
1,00
1,25
1,30
1,40
Linked
series
0,58
0,61
0,76
0,79
0,85
Now, suppose that we want to complete the series
with a 1997 base year.
We need to transform the values for the
observations only available with a base year of
1992, so they correctly show the change in CPI
between both parts of the data.
Year 1995 is the year of overlap, that is, we have 2
values for this year.
The data with the earlier base year need to be
reduced to link to the data with the later base
year.
The amount of the reduction is given by the ratio
of the two values for 1995. So, the reduction
factor would be: 85/140= 0.607.
To obtain the linked series with a 1997 base
year, each value with base year 1992 would
be multiplied by the reduction factor (0.607)
(shown in the third row).