Transcript Part 1
Course on Data Analysis and
Interpretation
P
Presented by B. Unmar
Sponsored by GGSU
PART 1
Date: 5 July 2011
1
SOME IMPORTANT NOTES
Data Collection Methods
Data collection is an important aspect of any type of research
study. Inaccurate data collection can impact the results of a
study and ultimately lead to invalid results.
Data collection methods for impact evaluation vary along a
continuum. At the one end of this continuum are quantitative
methods and at the other end of the continuum are qualitative
methods for data collection.
Census. A census is a study that obtains data from every member
of a population. In most studies, a census is not practical, because
of the cost and/ or time required.
(Example: Housing & Population Census)
Sample survey. A sample survey is a study that obtains data from
a subset of a population, in order to estimate population attributes.
(Examples: HBS and CMPHS)
Date: 5 July 2011
2
Data Collection Methods
Primary
Face-toFace
Interview
Telephone Interview
Secondary
Internet
Date: 24 November 2010
Training Course in Basic Statistics
3
SOME IMPORTANT NOTES
Information can be collected in statistics using qualitative
or quantitative data.
Qualitative data, such as eye colour of a group of individuals, is not
computable by arithmetic relations. They are labels that advise in
which category or class an individual, object, or process falls. They
are called categorical variables.
Quantitative data sets consist of measures that take numerical values
for which descriptions such as means and standard deviations are
meaningful. They can be put into an order and further divided into
two groups: discrete data or continuous data.
Data are called "primary type" data if the analyst has been involved in
collecting the data relevant to his/her investigation. Otherwise, it is
called "secondary type" data.
Date: 5 July 2011
4
SOME IMPORTANT NOTES
Data come in the forms of Nominal, Ordinal, Interval, and Ratio
(remember the French word NOIR for the colour black).
Nominal items are usually categorical, in that they belong to a
definable category, such as 'employees'.
Items on an ordinal scale are set into some kind of order by their
position on the scale. We cannot do arithmetic with ordinal numbers they show sequence only. Example: The first, third and fifth person in
a race
Interval data is measured along a scale in which each position is
equidistant from one another. Interval data cannot be multiplied or
divided. Example: Level of happiness, rated from 1 to 10
In a ratio scale, numbers can be compared as multiples of one
another. Thus one person can be twice as tall as another person.
Important also, the number zero has meaning. Example: A person's
weight
Date: 5 July 2011
5
SOME IMPORTANT NOTES
Measures of central tendency (average): 1. Mean 2. Median 3. Mode
WHY represent a set of data by means of single number which, in its
way, is descriptive of the entire set?
To compare different sets of data (over time and space)
Examples
Distribution of population over time
Distribution of household size of poor and non-poor households
Output of industrial groups over time
Arithmetic mean versus Geometric mean
The arithmetic mean of a set of values is the sum of the values
divided by their number. The geometric mean of n numbers is the nth
root of the product of these number.
The geometric mean is relevant when the data to be averaged is in
terms of growth rates, Indices etc. (i.e) a percentage change over time
for several time periods. TRY AVERAGE GDP GROWTH RATE FOR
THE PERIOD 2005-2010 (+2.7
+5.6
+5.7
+5.5
+3.1
+4.4)
Date: 5 July 2011
6
SOME IMPORTANT NOTES
Median
The median is defined as the value of the middle term or the mean of
the values of the two middle terms when the data are arranged in
ascending or descending order of magnitude.
Mode
The mode is defined as the value, which occurs with the highest
frequency.
Square Root of
Variance = SD
Measure of variation (dispersion):
Range
Quartile deviation
Mean deviation
Standard deviation (most common method used to measure variation)
It indicates the degree of scatter of the different values about the
central value.
Date: 5 July 2011
7
SOME IMPORTANT NOTES
Statistical Inference
We attempt to extrapolate the findings of the study to the population
from which the sample was drawn.
Assumptions are:
Sample is representative of population
Sample is randomly drawn from population
Variability of sample & population are similar
Steps in Statistical Inference
Generating NULL and ALTERNATIVE hypothesis
Type I and Type II Error
Testing the hypothesis using appropriate statistical tests
Obtaining ‘p’ value
Concluding from the p value
Type 1 error: The probability of falsely concluding a difference when actually
there is no difference. Conventionally this is set at 5% of = 0.05 or lower. In
medicine the probability of a false +ve conclusion should be kept low.
Date: 5 July 2011
8
SOME IMPORTANT NOTES
Confidence interval
1.96 CI tells you that ..
If the study is repeated
several times the values of the
sample mean would fall within
this range 95 out of 100 times.
SOME IMPORTANT NOTES
Bivariate analysis: Consider the following cross table
TABLE 1
Absenteeism
Job satisfaction
Yes
No
Row
marginals
Yes
4
11
15
No
10
5
15
14
16
30
Column marginals
(a) Using the data from Table 1 calculate the row and column percentages
separately.
(b) Describe briefly what the row and column percentages would
emphasize upon.
10
Training Course in Basic Statistics
Date: 10 December 2010
SOME IMPORTANT NOTES
The question of whether to use row or column
percentages in part depends on what aspects of the
data one wants to highlight. It is sometimes suggested
that the decision depends on whether the independent
variable is across the top or along the side of the table:
if the former, column percentages should be used; if the
latter row percentages should be employed. Typically,
the independent variable will go across the table, in
which case column percentages should be used.
However, this suggestion implies that there is
straightforward means of identifying the independent
and dependent variables, but this is not always the case
and great caution should be exercised in making such
an inference.
Date: 5 July 2011
11
SOME IMPORTANT NOTES
Economic Statistics
Economic statistics is a branch of applied statistics focusing on the
collection, processing, compilation and dissemination of statistics
concerning the economy of a region, a country or a group of
countries. Economic statistics provide the empirical data needed in
economic research (econometrics) and they are the basis for
decision and economic policy making.
In Mauritius, official economic data are produced and disseminated
by CSO and Bank of Mauritius.
Economic indicators
An economic indicator (or business indicator) is a statistic about
the economy. Economic indicators allow analysis of economic
performance and predictions of future performance. Economic
indicators include various indices, earnings reports, and economic
summaries, such as unemployment, Consumer Price Index (CPI),
industrial production, Gross Domestic Product (GDP), retail sales,
stock market prices, and money supply changes.
Date: 5 July 2011
12
SOME IMPORTANT NOTES
Why Economic Data?
Good economic data is a precondition to effective
macroeconomic management. With the complexity of modern
economies and the lags inherent in macroeconomic policy
instruments, a country must have the capacity to promptly
identify any adverse trends in its economy and to apply the
appropriate corrective measure. This cannot be done without
economic data that is complete, accurate and timely.
Increasingly, the availability of good economic data is coming to
be seen by international markets as an indicator of a country that
is a promising destination for foreign investment. International
investors are aware that good economic data is necessary for a
country to effectively manage its affairs and, other things being
equal, will tend to avoid countries that do not publish such data.
The public availability of reliable and up-to-date economic data
also reassures international investors by allowing them to
monitor economic developments and to manage their investment
risk.
Date: 5 July 2011
13
SOME IMPORTANT NOTES
Social statistics is the use of statistical measurement systems to
study human behaviour in a social environment. This can be
accomplished through polling a particular group of people,
evaluating a particular subset of data obtained about a group of
people, or by observation and statistical analysis of a set of data
that relates to people and their behaviours.
Social indicators are defined as statistical measures relating to major
areas of social concern and/or individual well being. Examples of
social indicators are projections, forecasts, outlook statements,
time-series statistics, and extrapolations related to topics such as
population, housing, social security, income, education, and
health. In Mauritius our main social indicators cover the following
areas:
Population and vital statistics
Health
Education
Crime
Social Security
Environment
In Mauritius, official social indicators are produced and disseminated by
CSO and Ministry of Health & Quality of Life.
Date: 5 July 2011
14
SOME IMPORTANT NOTES
Key Uses of Social Indicators
Description: to inform citizens and policy makers about the
circumstances of their society, to track trends and patterns, and
to identify areas of concern as well as positive outcomes.
Monitoring: to track outcomes that may or may not require
policy intervention of some kind. Most people are familiar with
using indicators for the purpose of monitoring in the public
health field.
Setting goals: to establish quantifiable thresholds to be met
within a specific time period.
Increasing accountability: to achieve positive or improved
outcomes.
Reflective practice: to inform practices of communities and
individual programs on an ongoing basis.
Date: 17 November 2010
Training Course in Basic Statistics
15
SOME IMPORTANT NOTES
The Proper Use of Social Indicators
Social indicators can be helpful tools for policy makers, practitioners,
and the public, but using them correctly requires attention to a
number of issues:
Social indicators need to be measured for the appropriate
population. For example, if a policy focuses on services for lowincome children, then the outcomes should be measured for
low-income children – not middleclass or all children.
Social indicators need to be measured at the appropriate
geographic level. Looking just at trends on the national level
may obscure how a policy is affecting individuals in their own
states and home communities.
Social indicators need to be well conceptualised. That is, social
indicators need to accurately reflect the concept that they are
intended to capture.
Date: 17 November 2010
Training Course in Basic Statistics
16
QUESTIONS AND
ANSWERS
Date: 5 July 2011
17