Class 12 (Lecture 9) Measures of Association
Download
Report
Transcript Class 12 (Lecture 9) Measures of Association
Class 9
Where are we going from here?
Introduction to Measures of
Association and Multivariate
Analysis
The road map
1. Familiarize you with the terms and
concepts of measuring association
and testing causality.
2. You will carry out and interpret a
basic Ordinary Least Squares
regression equation.
3. You will cement what you learned in
the first part of this class by
completing your presentation and
project.
Going Back to Central Tendency
and the distribution of data
• Measures of Central Tendency
– mean, median, and mode
•
•
•
•
•
Skew (lack of symmetry)
Kurtosis (peakedness)
Outliers
Variability
Normal distribution
Normal Distribution
• Normal distributions are symmetric with scores
more concentrated in the middle than in the tails.
They are defined by two parameters: the mean (μ)
and the standard deviation (σ). Many kinds of
behavioral data are approximated well by the
normal distribution. Many statistical tests assume
a normal distribution. Most of these tests work well
even if the distribution is only approximately
normal and in many cases as long as it does not
deviate greatly from normality.
• Source:
http://davidmlane.com/hyperstat/normal_distribu
tion.html
Measures of Dispersion
• Range - Sensitive to outliers
– Largest observation minus the smallest
• IQR - Controls for outliers.
– Distance from the top threshold of the
lower quartile to the bottom threshold of
the upper quartile of a distribution.
• Standard Deviation and Variance
Source: Statistics Canada www.statcan.ca/english/edu/power/ch12/plots.htm
Chart 5: Effective average tax rates on FDI between OECD countries, 1996 and 2001(a)
Standard Deviation
• A measure of the “typical” distance
from the mean.
• It is the square root of the variance
(somewhat circularly, the variance is
the standard deviation squared)
Standard Deviation
• Variance:
S2
2
(
X
X
)
i
N
• So therefore standard deviation is:
S
(X
i
X)
N
2
How to “follow” the formula
1. Subtract the mean from the value of each
individual observation to get “deviations”
from the mean.
2. Square each deviation.
3. Sum all the squared deviations.
4. Divide by the number of observations
•
•
Samples have slightly different rules (N-1)
This is the variance.
5. Take the square root and this is the
standard deviation.
How to interpret “S”
• If means of two variables within a
population are close but their
standard deviations differ than the
one with a larger standard deviation is
more dispersed.
• There are other deviation and
dispersion measures but Standard
Deviation has a special relationship to
the normal curve.
Standard Deviation and the
Normal Distribution
Normalizing Using St. Dev.
• Z-scores (an Index using St. Dev)
Xi X
Z
S
• Z= number of standard deviation units from
the mean = z-score
Univariate, Bivariate and
Multivariate
Univariate – (Analysis of) one variable’s distribution.
– Measures of central tendency, freq. dist., st.dev, etc.
Bivariate and Multivariate – (Analysis of) two or more
variables.
• What is the relationship between variables?
–
–
–
–
–
Are they statistically different from one another?
Do they change together?
In the same direction?
Does one proceed the other?
Does one cause the other?
• Two questions to ask about the relationship:
1. What is the chance that a particular finding happened by
chance? - Statistical significance
2. How strong is the relationship between two (or more)
variables? - Measures of association, correlation
coefficient, regression coefficient etc.
Measures of Association
• Chi-square
• T-test
• Correlation Coefficient
– Also known as, Pearson’s r, r, zero order
correlation coefficient.
– Varies from 1.0, indicating they move perfectly
together in the same direction to -1.0 indicating
they move perfectly in the opposite direction.
– 0.00 indicates a null linear relationship.
– For social sciences .4 to .6 is considered
sufficient, though as low as .3 may be worth
looking at.
Statistical Significance
•
Research and null hypotheses
–
–
Hypothesis states the relationship between
two variables.
The null hypothesis state that there is NO
(or a random) relationship between two
variables.
•
•
–
H: Democracies trade more with each other
than with non-democracies.
H0: Status as a democracy is not related to
trade volume
You are testing to reject H0 not accept H.
Types of Error
State of Nature
Decision based
on Sample
Reject H0
Do not Reject
H0
H0 true
H0 Untrue
Type 1 error
(false alarm)
Correct
Correct
Type 2 error
Alpha level
=.05, 5% chance of committing Type 1
error, or 95% chance of the decision to
reject the null hypothesis being
correct.
Causality
• In establishing causality there is a
dependent variable, which you are
trying to explain, and one or more
independent variables that are
assumed to be factors in the variation
of the dependent variable.
• You need a logical model to “explain”
this relationship or causality
Thinking in Models (again)
• What is a model?
– Explains which elements relate to each
other and how.
– Describing Relationships in a model
• Covariation – move in the same direction
– Direct or Positive
– Inverse or Negative
– Nonlinear
• False of spurious
– Control (confounding) variables
• Are you looking for the best model or
testing someone else’s?
Developing models
• Where does a model come from?
– From your own assessment and observation
of the problem, or from talking to others.
– From the literature.
• Elements others include or consider important
• Definitions of these elements
• Descriptions of the “expected” relationships
among variables
• Results and explanations
• Sources and strategies for data
• Suggestions of models or variations to be tested
in the future
Types of Models
1. Schematic
Capital
Econ Growth
Labor
2. Symbolic
a) Economic growth is a function of
changes to the amount of capital (K)
and changes to the amount of Labor (L).
b) G=f(K,L)
The basic linear model
(equation)
You can express many relationships as the linear
equation:
y = a + bx, where
•
•
•
•
•
y is the dependent variable
x is the independent variable
a is a constant
b is the slope of the line
For every increase of 1 in x, y changes by an
amount equal to b
A Perfectly linear relationship is where each change
results in exactly the same change. i.e. a strict ad
valorem tariff.