No Slide Title

Download Report

Transcript No Slide Title

Issues in Interpreting Correlations:
Correlation does not imply causality:
X might cause Y.
Y might cause X.
Z (or a whole set of factors) might cause both X and Y.
Factors affecting the size of a correlation
coefficient:
1. Sample size and random variation:
The larger the sample, the more stable the correlation
coefficient. Correlations obtained with small samples
are unreliable.
Limits within which 80% of sample r's will fall,
when the true (population) correlation is 0:
Sample size:
5
15
25
50
100
200
80% limits for r:
-0.69 to +0.69
-0.35 to +0.35
-0.26 to +0.26
-0.18 to +0.18
-0.13 to +0.13
-0.09 to +0.09
Conclusion: you need a large sample before you can
be really sure that your sample r is an accurate
reflection of the population r.
2. Linearity of the relationship:
Pearson’s r measures the strength of the linear
relationship between two variables; r will be misleading
if there is a strong but non-linear relationship. e.g.:
3. Range of talent (variability):
The smaller the amount of variability in X and/or Y, the
lower the apparent correlation. e.g.:
no linear trend,
if viewed in
isolation
strong linear
relationship
overall
4. Homoscedasticity (equal variability):
r describes the average strength of the relationship
between X and Y. Hence scores should have a constant
amount of variability at all points in their distribution.
regression line
in this region: high variability
of Y (large Y-Y' )
in this region: low variability
of Y (small Y-Y' )
5. Effect of discontinuous distributions:
A few outliers here distort things considerably. There is
no real correlation between X and Y.
Deciding what is a "good" correlation:
A moderate correlation could be due to either
(a) sampling variation (and hence a "fluke"); or
(b) a genuine association between the variables
concerned.
How can we tell which of these is correct?
Distribution of r's obtained using samples drawn
from two uncorrelated populations of scores:
Large negative
correlations:
unlikely to
occur
by chance
r=0
Small
correlations:
likely to occur
by chance
Large positive
correlations:
unlikely to
occur
by chance
5 out of 100 random samples are likely to produce an r of
0.44 or larger, merely by chance (i.e., even though in the
population, there was no correlation at all!)
For an N of 20:
0.025
r = - 0.44
0.025
r=0
r = + 0.44
Thus we arbitrarily decide that:
(a) If our sample correlation is so large that it would
occur by chance only 5 times in a hundred, we will
assume that it reflects a genuine correlation in the
population from which the sample came.
(b) If a correlation like ours is likely to occur by
chance more often than this, we assume it has arisen
merely by chance, and that it is not evidence for a
correlation in the parent population.
How do we know how likely it is to obtain a sample
correlation as large as ours by chance?
Tables (in the back of many statistics books and on
my website) give this information for different
sample sizes.
An illustration of how to use these tables:
Suppose we take a sample of 20 people, and
measure their eye-separation and back hairiness.
Our sample r is .75. Does this reflect a true
correlation between eye-separation and hairiness
in the parent population, or has our r arisen merely
by chance (i.e. because we have a freaky sample)?
Step 1:
Calculate the "degrees of freedom" (df = the
number of pairs of scores, minus 2).
Here, we have 20 pairs of scores, so df = 18.
Step 2:
Find a table of "critical values for Pearson's r".
Part of a table of "critical values for Pearson's r":
Level of significance (two-tailed)
df
.05
.01
.001
17
.4555
.5751
.6932
18
.4438
.5614
.6787
19
.4329
.5487
.6652
20
.4227
.5368
.6524
With 18 df, a correlation of .4438 or larger will occur by
chance with a probability of 0.05: i.e., if we took 100
samples of 20 people, about 5 of those samples are
likely to produce an r of .4438 or larger (even though
there is actually no correlation in the population!)
Part of a table of "critical values for Pearson's r":
Level of significance (two-tailed)
df
.05
.01
.001
17
.4555
.5751
.6932
18
.4438
.5614
.6787
20
.4227
.5368
.6524
With 18 df, a correlation of .5614 or larger will occur by
chance with a probability of 0.01: i.e., if we took 100
samples of 20 people, about 1 of those 100 samples is
likely to give an r of .5614 or larger (again, even though
there is actually no correlation in the population!)
Part of a table of "critical values for Pearson's r":
Level of significance (two-tailed)
df
.05
.01
.001
17
.4555
.5751
.6932
18
.4438
.5614
.6787
20
.4227
.5368
.6524
With 18 df, a correlation of .6787 or larger will occur by
chance with a probability of 0.001: i.e., if we took 1000
samples of 20 people, about 1 of those 1000 samples is
likely to give an r of .6787 or larger (again, even though
there is actually no correlation in the population!)
The table shows that an r of .6787 is likely to occur by
chance only once in a thousand times.
Our obtained r is .75. This is larger than .6787.
Hence our obtained r of .75 is likely to occur by chance
less than one time in a thousand (p<0.001).
Conclusion:
Any sample correlation could in principle occur due to
chance or because it reflects a true relationship in the
population from which the sample was taken.
Because our r of .75 is so unlikely to occur by chance,
we can safely assume that there really is a relationship
between eye-separation and back-hairiness.
Important point:
Do not confuse statistical significance with practical
importance.
We have just assessed "statistical significance" - the
likelihood that our obtained correlation has arisen merely
by chance. Our r of .75 is "highly significant" (i.e., highly
unlikely to have arisen by chance).
However, a weak correlation can be statistically
significant, if the sample size is large enough...
e.g. with 100 df, an r of .1946 is "significant" in the
sense that it is unlikely to have arisen by chance (r's
bigger than this will occur by chance only 5 in a 100
times).
The coefficient of determination (r2) shows that this is
not a strong relationship in a practical sense;
knowledge of one of the variables would account for
only 3.79% of the variance in the other - completely
useless for predictive purposes!