Chapter 16: Inference about correlation

Download Report

Transcript Chapter 16: Inference about correlation

Hypothesis test flow chart
Test H0: r=0
(17.2)
Table G
1
number of
correlations
START HERE
correlation (r)
Measurement
scale
frequency
data
Means
Test H0: r1= r2
(17.4)
Tables H and A
Yes
2
χ2 test for
independence
(19.9) Table I
1
2
z -test
(13.1)
Table A
number of
variables
Do you know
s?
1
No
basic χ2 test
(19.5)
Table I
number of
means
More than 2
1
2
t -test
(13.14)
Table D
1-way ANOVA
Ch 20
Table E
independent
samples?
Test H0: m1= m2
(15.6)
Table D
number of
factors
Test H0: D=0
(16.4)
Table D
2
2-way ANOVA
Ch 21
Table E
Chapter 16: Inference about Correlation Coefficients
We will make inferences about population correlations in two ways:
1) To test if a population has a correlation different from zero.
2) To test if two correlations are different from each other.
We use the letter ‘r’ for sample correlations.
We use the Greek symbol r (rho) for population correlations.
1) Testing the null hypothesis that a single correlation is equal to zero.
The sampling distribution for r is very complicated and can be strongly skewed.
However, for testing whether a sample is being drawn from a population with zero
correlation, Ronald Fisher figured out an easy way to convert a sample correlation
from r to t, so we can run a standard t-test:
t
r
(1  r 2 )
(n  2)
Using df = n-2, where n is the number of pairs in the correlation.
The book makes it even easier by giving us Table G which provides critical values of
r for values of a and df. We don’t have to convert from r to t, just look up the
critical value for r in table G.
I’ve provided a version of Table G and its corresponding calculator in Excel for you.
Example: We found that the heights of the 75 women in our class correlates with the
heights of their mothers with a value of r= 0.35. If we consider this a random sample
of the population, does this suggest that there is a positive correlation between the
heights of women and their mothers? (use a value of a = .01)
Answer: We will be testing the null hypothesis using a one-tailed test:
H0: r = 0
HA: r > 0
All we have to do is look up the critical value for r for df = 75-2 = 73 and a = .01 for
a one-tailed test in table G:
Our critical value of r is 0.268.
Our observed value of r (.35) is greater than this critical value (.268), so we Reject
H0 and conclude that there is a very low probability of observing this high of a
correlation by chance if the null hypothesis was true. If we assume that the
women in this class were randomly sampled from the general population then we
conclude that a significant correlation exists in the population.
Example: The hours of video games played by the 94 students in this class correlates with
current GPA with a value of -.1044. Using an a value of .05, is this significantly different
from zero?
Answer: We will be testing the null hypothesis using a two-tailed
test:
H0: r = 0
HA: r ≠ 0
All we have to do is look up the critical value for r for df = 94-2 =
92 and a = .05 for a two-tailed test in table G.
Our critical value of r is ±0.203. So we fail to reject Ho (*whew*)
Using APA format:
There is not a significant correlation between GPA and video game playing,
r(92) = -.1044, p>.05.
2) Testing the null hypothesis that two independently sampled correlations are the same.
The sampling distribution of correlations becomes more and more skewed when the
correlation of the population gets closer to 1 or -1. So comparing whether two
correlations are different requires a more complicated conversion.
Comparing two correlation values is done by first converting from r to Fisher’s z’ (zprime)
1
z '  ln 1  r   ln( 1  r )
2
and then conducting a z-test
z
z1'  z 2'
s z z
'
1
'
2
where
s z z 
'
1
'
2
1
1

(n1  3) (n2  3)
The book provides Table H for converting from sample correlations r to z’ so you don’t have
to use the formula. I’ve provided a version of table H along with a calculator in excel which
you can download.
Example: The heights of the 75 women in our class correlates with the heights of their
mothers with a value of r= 0.35, and the correlation between the 21 men and their father’s
heights is 0.61. If we assume that these students are randomly sampled from the general
population, can we conclude that correlations of heights between men and their fathers is
different than the correlation of heights between women and their mothers? Use a = 0.05
Answer: Let r1 = 0.61, n1 = 21, r2 = 0.35 and n2 = 75. and let
H0: r1 = r2 and HA: r1 ≠ r2
Step 1: Convert the correlations to z’. Looking these values up in table H we find:
z1 = 0.7082 and z2 = 0.3654
Step 2: Calculate z from the equations:
s z z 
'
1
z
'
2
1
1
1 1



 .2635
(n1  3) (n2  3)
18 72
z1'  z 2'
s z z
'
1
'
2

.7082  .3654
 1.303
.2635
z
z1'  z2'
s z z
'
1
'
2

.7082  .3654
 1.303
.2635
Step 3: find the critical values for z.
We’re conducting a two-tailed test with a = .05. We want to find the value of z for which
each tail contains a proportion of 0.025. Looking this up in table A, column C gives:
zcrit = ±1.96
Our value of z=falls outside the rejection region, so we fail to reject H0.
Using APA format:
“There is not a statistically significant difference between the correlation between
women and their mothers (r = .61) and men and their fathers (r = . 35), z = 1.303, p<.05”
What does the probability distribution of correlations look like?
𝑛−1
𝑛−4
2
(𝑛 − 2)Γ(𝑛 − 2) 1 − 𝜌2 2 (1 − 𝑟 2 )
𝑃 𝑟 =
3
1
2𝜋 𝑛 −
1 − 𝜌𝑟 𝑛− 2
2
1 𝜌𝑟 + 1
9
𝜌𝑟 + 1 2
× 1+
+
+ …
4 2𝑛 − 1 16 2𝑛 − 1 2𝑛 + 1