Introduction to elementary quantitative concepts and methods
Download
Report
Transcript Introduction to elementary quantitative concepts and methods
Introduction to elementary
quantitative concepts and methods
Guest lecture
Carl Henrik Knutsen, 14/5-2008
Motivation
•
•
•
•
•
•
Social sciences, and science in general: We are generally interested in:
– “How” questions
– “Why” questions.
Social scientists seek descriptions of empirical phenomena and try to come up
with causal explanations. Both quantitative and qualitative methodology try to
respond to such questions.
Nature of problem question is important for choice of methodology, even if in
the real world of social science, researchers often choose method after their
knowledge and “taste”.
Knowledge of different methodologies allow researchers and students to fit
methodology to problem question Improve analysis.
Triangulation can often be a good idea: Usage of different methodologies to
illuminate a problem in a more comprehensive fashion.
The knowledge of elementary quantitative method enables you to read
different types of research.
Causality and the control problem
• Independent of choice of methodology
• Theory and clever design needed
• Three causal structures that might lead to
correlation:
X
Y
X
Y
X
Y
Z
Generalization
• The big advantage of quantitative methods
• Provides stringent criteria for when we can be
relatively certain that our generalizations hold
true and are not driven by coincidences.
• Remember that in the social sciences, we do
not face deterministic relationships between
factors. Quant. methods takes into account
the stochastic structure of social life.
Data
• There exists a vast number of sources for data
constructed by different agencies or researchers:
You do not need to construct your own data for
many purposes. But: Know the data you use in
order to avoid different pit-falls.
• Sources on the web: World Development
Indicators, Penn World Tables, World Governance
Indicators, Polity, Freedom House, OECD,
UNESCO, UNCTAD etc!
Descriptive statistics
• Descriptive vs inferential statistics
• Descriptive statistics: Draw out
comprehensible information about the
structure of your data
• 1) Central tendencies, 2) variation, 3)
correlation
Central tendency of variable
• Mean
• Median
• Mode
Variation
• Range
• Variance (S^2 = (Σ(X-M)^2)/(N-1))
• Standard deviation
Correlation
• Covariance cov(xy) = (Σ((X-Xm)(Y-Ym)/(N-1)
• Correlation coefficients
• Pearson’s r = cov(xy)/(S(x)*S(y)): Always
between -1 and 1. NB: Gives only degree of
linear relationship.
Presentation of data
•
•
•
•
•
Tables
Histogram
Bar- and pie-charts
Scatter plots
Important to think about the reader:
Combrehensible and informative. Need to
strike a balance on the amount of information
presented in a chart. Label charts.
Table
Male
No higher education
Mean income (N)
150 (2000)
Female
University
300 (1000)
No higher education
100 (2500)
University
250 (700)
Scatter plot
Denmark
100,00
Iceland
Canada
Australia
Ireland
Austria
United States
80,00
Portugal
France Israel
Spain
Slovenia
Chile
Estonia
Botswana
Hungary Belgium
Taiwan Italy
Mauritius
Namibia
Uruguay
Czech Republic
60,00
40,00
20,00
0,00
Jamaica Korea, South
Singapore
Philippines El SalvadorLatvia
Brazil
Jordan
India
Argentina
Mexico Guatemala
Morocco
Ecuador
Malaysia Colombia
Ukraine
Venezuela Armenia Honduras
Cote d'Ivorie
Albania
Nigeria
Kazakhstan
Belarus
China
Azerbaijan
Afghanistan
Burma
0,00
20,00
40,00
60,00
Inverted and normalized FHI
80,00
100,00
Inferential statistics
• The aim is solid inference from an observed
sample to a larger (unobserved) universe.
Generalization about populations or about
effects.
• For effects: Can we say that trajectories we
observe are due to “real” effects or are they
likely only a product of chance?
• Law of large numbers...
– Population, samples,
– Estimates and underlying mean.
• Random selection? Selection bias ALWAYS a
possibility.
• Sampling techniques:
– Experiment
– Random draws
– Stratification
Hypothesis test
• Democracy and economic growth as example.
– H0: Democracy has no effect on growth
– Halt: Democracy has an effect on growth
• In general H0 is often a hypothesis which claims that there
is no effect. We often want to investigate whether we can
with relative certainty claim that Halt is valid.
• Burden of proof is on the alternative hypothesis.
Conservative bias: we have to have relatively strong results
to claim a relationship is not due to pure chance.
• Central limit theorem as underlying. How do we know the
distribution given H0? Use given distribution to find out
what one is likely to arrive at by pure chance. The normal
distribution.
Central limit theorem
• “The central limit theorem is one of the most
remarkable results of the theory of probability. In its
simplest form, the theorem states that the sum of a
large number of independent observations from the
same distribution has, under certain general
conditions, an approximate normal distribution.
Moreover, the approximation steadily improves as the
number of observations increases. The theorem is
considered the heart of probability theory, although a
better name would be normal convergence theorem.”
http://davidmlane.com/hyperstat/normal_distribution.
html (Berrie Zielman)
Significance levels and p-values
• Significance level. If we take H0 as true, then we
want to have a critical level beyond which it is
unlikely that we will see results. For example 5%.
Only in 5% probability that we will see this strong
relationship if H0 is true. Important to have large
sample.
• P-value: The lowest significance level that will
give rejection of H0. If H0 is true: What is
probability that we will see this extreme result.
Models
• Stockburger: “A model is a representation
containing the essential structure of some
object or event in the real world.”
– 1. Models are necessarily incomplete
– (2. The model may be changed or manipulated
with relative ease.)
Regression analysis
• How to fit a straight line through a scatterplot!
• Best fit: one criteria is to minimize sum of squared
residuals Ordinary Least Squares (OLS)
• Bivariate regression equation: Y = a + bX + ε
• Regression analysis recognizes that the world is not
deterministic. The role of the error term: ε. Large error
terms in general implies large uncertainty
• Interpretation of a: Mean value of Y when X is equal to
zero. Often no substantial interpretation. Not so
interesting
• Interpretation of b: Increase in mean of Y when X
increases with one unit. Effect of X on Y?
Assumptions of distribution error term
when using OLS:
• Homoskedastic
• No autocorrelation
• Normally distributed
Multivariate regression
• Y = a + b1X1 + b2X2 +b3X3 + ε
• New interpretation of b: The mean increase in Y when
relevant X increases with one unit, given that all other
variables are held constant.
• R-square: How much of the variation in the data is
“explained by the model” (A very imprecise
interpretation). Goes from 0 to 1.
• “Control variables”
• Extensions of regression analysis: Generalized Least
Squares, Systems of equations, Instrumental Variables,
Logit and Probit models and many more.
Extensions
•
•
•
•
Dummy variable
Squared X
Logarithmic specifications
Splitting the sample
Problems
• 1) “Simultaneity bias”: Reverse causation.
Exogeneity vs endogeneity of X-variables.
• 2) “Omitted variable bias”
• 3) Measurement error.
– Reliability. Where does the data come from? GDP
in developing countries.
– Validity (TFP and technological change)
• Operationalization of variable: Have to be
observable, quantifiable and measurable.