Transcript Slide 1
t-Tests
Interval Estimation and the t Distribution
Large Sample z-Test
• Sometimes we have reason to test hypotheses involving specific
values for the mean.
– Example 1. Claim: On average, people sleep less than the often
recommended eight hours per night.
– Example 2. Claim: On average, people drink more than the
recommended 2 drinks per day.
– Example 3. Claim: On average, women take more than 4 hours to run
the marathon.
• However, it is rare that we have a specific hypothesis about the
standard deviation of the population under study.
• For these situations, we can use the sample standard deviation s as
an estimator for the population standard deviation s.
• If the sample size is pretty big (e.g., >100), then this estimate is
pretty good, and we can just use the standard z test.
PSYC 6130, PROF. J. ELDER
3
Example: Canadian General Social Survey, Cycle 6 (1991)
PSYC 6130, PROF. J. ELDER
4
But what if we don’t have such a large sample?
PSYC 6130, PROF. J. ELDER
5
Student’s t Distribution
• Problem: for small n, s is not a very accurate estimator
of s.
• The result is that the computed z-score will not follow a
standard normal distribution.
• Instead, the standardized score will follow what has
become known as the Student’s t distribution.
X -
t
sX
PSYC 6130, PROF. J. ELDER
6
where s X
s
n
Student’s t Distribution
Normal distribution
t distribution, n=2, df=1
t distribution, n=10, df=9
t distribution, n=30, df=29
How would you describe the difference between the normal and t distributions?
PSYC 6130, PROF. J. ELDER
7
Student’s t distribution
• Student’s t distribution is leptokurtic
– More peaked
– Fatter tails
• What would happen if we were to ignore this difference,
and use the standard normal table for small samples?
PSYC 6130, PROF. J. ELDER
8
Student’s t Distribution
• Critical t values decrease as df increases
• As df infinity, critical t values critical z values
• Using the standard normal table for small samples would
result in an inflated rate of Type I errors.
PSYC 6130, PROF. J. ELDER
9
One-Sample t Test: Example
PSYC 6130, PROF. J. ELDER
10
PSYC 6130, PROF. J. ELDER
11
Reporting Results
• Respondents who report being very forgetful sleep, on
average, 7.11 hours/night, significantly less than the
recommended 8 hours/night, t(37)=2.25, p<.05, twotailed.
PSYC 6130, PROF. J. ELDER
12
Confidence Intervals
• NHT allows us to test specific hypotheses about the
mean.
– e.g., is < 8 hours?
• Sometimes it is just as valuable, or more valuable, to
know the range of plausible values.
• This range of plausible values is called a confidence
interval.
PSYC 6130, PROF. J. ELDER
13
Confidence Intervals
• The confidence interval (CI) of
the mean is the interval of
values, centred on the sample
mean, that contains the
population mean with specified
probability.
• e.g., there is a 95% chance that
the 95% confidence interval
contains the population mean.
• NB: This assumes a flat prior
on the population mean (nonBayesian).
X
Confidence Interval
PSYC 6130, PROF. J. ELDER
14
Confidence Intervals
p( )
sX
p .025
PSYC 6130, PROF. J. ELDER
X
95% Confidence Interval
15
p .025
Basic Procedure for Confidence Interval Estimation
1.
Select the sample size (e.g., n = 38)
2.
Select the level of confidence (e.g., 95%)
3.
Select the sample and collect the data (Random sampling!)
4.
Calculate the limits of the interval
t
X
X sX t
sX
X sX t / 2
X sX t / 2
PSYC 6130, PROF. J. ELDER
16
End of Lecture 4
Oct 8, 2008
Selecting Sample Size
•
Suppose that
1. You have a rough estimate s of the standard deviation of the
population, and
2. You want to do an experiment to estimate the mean within some
95% confidence interval of size W.
Then the sample size n should be roughly
4s
n
W
2
PSYC 6130, PROF. J. ELDER
18
Assumptions Underlying Use of the t Distribution
for NHT and Interval Estimation
• Same as for z test:
– Random sampling
– Variable is normal
• CLT: Deviations from normality ok as long as sample is large.
– Dispersion of sampled population is the same as for the
comparison population
PSYC 6130, PROF. J. ELDER
19
Sampling Distribution of the Variance
Sampling Distribution of the Variance
• We are sometimes interested in testing a hypothesis
about the variance of a population.
– e.g., is IQ more diverse in university students than in the general
population?
Suppose we measure the IQ of a random sample of 13 university students
We then calculate the sample variance:
( X X )
2
s
2
N 1
400
Suppose that we know the variance s 02 of IQs in the general population:
s 02 152 225
Can we conclude that student IQs are more diverse?
To solve this problem we need to know the range of plausible values for
the test statistic s 2 under the null hypothesis.
PSYC 6130, PROF. J. ELDER
21
Sampling Distribution of the Variance
• What form does the sampling distribution of the variance
assume?
p(s2)
• If the variable of interest (e.g., IQ) is normal, the
sampling distribution of the variance takes the shape of a
c-squared distribution:
0
PSYC 6130, PROF. J. ELDER
E(s2 ) s 2
s2
22
Sample Variances and the c-Square Distribution
We first standardize the sample variance statistic by multiplying by
the degrees of freedom and dividing by the population variance s 2 :
s 2 (n 1)s 2
c
s2
s2
2
The resulting variable c 2 follows a c 2 ( ) distribution with df n 1.
p(c2)
=9
=29
=99
0
PSYC 6130, PROF. J. ELDER
50
100
c2
23
150
Sample Variances and the c-Square Distribution
• The c-square distribution is:
– strictly positive.
– positively skewed.
• Since the sample variance is an unbiased estimator of the
population variance: E(s2) = s 2
• Due to the positive skew, the mean of the distribution E(s2) is greater
than the mode.
• As the sample size increases, the distribution approaches a normal
distribution.
• If the original distribution is not normal and the sample size is not
large, the sampling distribution of the variance may be far from csquare, and tests based on this assumption may be flawed.
PSYC 6130, PROF. J. ELDER
24
Example: Height of Female Psychology Graduate Students
Canadian Adult Female Population:
s
2005 PSYC 6130A Students (Female)
63.937 in
2.7165 in
Canadian Adult Male Population:
s
69.252 in
3.189 in
n 131,110!
Source: Canadian Community
Health Survey Cycle 3.1 (2005)
Caution: self report!
PSYC 6130, PROF. J. ELDER
25
Properties of Estimators
• We have now met two statistical estimators:
X is an estimator for .
s 2 is an estimator for s 2.
Both of these estimators are:
Unbiased, i.e.,
E( X )=
E(s2 )= s 2
Consistent, i.e.,
the quality of the estimate improves as the sample size increases.
Efficient, i.e.,
given a fixed sample size, the accuracy of these estimators is better than
competing estimators.
PSYC 6130, PROF. J. ELDER
26
NHT for Two Independent Sample Means
Conditions of Applicability
• Comparing two samples (treated differently)
• Don’t know means of either population
• Don’t know variances of either population
• Samples are independent of each other
PSYC 6130, PROF. J. ELDER
28
Example: Height of Canadian Males by Income Category
(Canadian Community Health Survey, 2004)
PSYC 6130, PROF. J. ELDER
29
Sampling Distribution
To solve this problem we need to know the sampling distribution for the difference of the means,
i.e., X1 X 2
Under the null hypothesis, both samples come from the same distribution.
Suppose this distribution is normal.
Then we know that X1 and X 2 are also normally distributed:
X1 is N ( ,s X1 ) N ( ,
X 2 is N ( ,s X 2 ) N ( ,
PSYC 6130, PROF. J. ELDER
s1
n1
)
s2
n2
)
30
Sampling Distribution (cntd…)
Major Theorem of Probability: Any linear combination of normal variables is itself normal.
Thus X1 X 2 is also normal:
X1 X 2 is N (0,s X1 X2 )
What is the dispersion s X1 X2 ?
Basic principle for normal distributions - variances add:
s X2 X s X2 s X2
1
2
1
2
Knowing the standard error for the 2 distributions, we can calculate our sampling distribution.
X
z
1
X 2 1 2
s X X
1
2
PSYC 6130, PROF. J. ELDER
31
NHT for Two Large Samples
Recall: If sample is large (e.g., n 100), can approximate population variance by sample variance:
s X2 s X2
1
s
2
X2
1
s
2
X2
And thus we can estimate s X21 X2 s X21 s X2 2
PSYC 6130, PROF. J. ELDER
32
Height of Canadian Males by Income Category
(Canadian Community Health Survey, 2004)
X 69.87 "
s 2.63 "
n 7586
X 69.01"
s 2.85 "
n 7777
PSYC 6130, PROF. J. ELDER
33
NHT for Two Small Samples
Example: Social Factors in Psychological Well-Being
Canadian Community Health Survey, 2004
PSYC 6130, PROF. J. ELDER
35
Social Factors in Psychological Well-Being (cntd…)
Canadian Community Health Survey, 2004
PSYC 6130, PROF. J. ELDER
36
Social Factors in Psychological Well-Being (cntd…)
Canadian Community Health Survey, 2004:
Respondents who report never getting along with others
PSYC 6130, PROF. J. ELDER
37
NHT for Two Small Independent Samples
By analogy with one-sample NHT, we might approximate
the standard errors s X1 and s X2 by the sample standard errors s X1 and s X2 .
Unfortunately, the resulting sampling distribution of the difference of the means
is not straightforward to analyze.
So what do we do?
If we can assume homogeneity of variance (the two populations have the same variance),
then there is a statistic that follows the t distribution and is simple to analyze.
PSYC 6130, PROF. J. ELDER
38
NHT for Two Small Independent Samples (cntd…)
If both populations have the same variance, we want to use both samples simultaneously
to get the best possible estimate of this variance.
In general, recall that s 2
SS
n 1
n1 1 s12 n2 1 s22
SS1 SS2
Thus our formula for the pooled variance is s
n1 1 n2 1
n1 1 n2 1
2
p
And the sample standard error is s
and t
X
1
X 2 1 2
s X1 X2
PSYC 6130, PROF. J. ELDER
2
X1 X2
sp2
n1
sp2
n2
follows a t distribution with n1 n2 2 degrees of freedom.
39
Pooled Variance
Pooled variance is s
2
p
n1 1 s12 n2 1 s22
n1 1 n2 1
df1s12 df2s22
df1 df2
Note that the pooled variance is a weighted sum of the sample variances.
The weights are proportional to the size of each sample
(Bigger samples are more reliable estimators of the common variance)
df2
df1
sp2
s12
PSYC 6130, PROF. J. ELDER
40
s22
Social Factors in Psychological Well-Being (cntd…)
Canadian Community Health Survey, 2004:
Respondents who report never getting along with others
X 36.59
s 22.40
n 37
X 41.84
s 23.87
n 25
PSYC 6130, PROF. J. ELDER
41
Reporting the Result
No significant difference was found between the psychological well-being
of men (M 41.8, SD 23.9) and women (M 36.6, SD 22.4)
who report never getting along with others, t (60) 0.88, p .38.
PSYC 6130, PROF. J. ELDER
42
Confidence Intervals for the Difference Between Two Means
X
t
1
X 2 1 2
s X1 X2
1 2 X1 X 2 ts X1 X2
1 2 X1 X 2 tcrit s X1 X 2
p( 1 2 )
sX1 X2
p .025
X1 X2
t.025
PSYC 6130, PROF. J. ELDER
p .025
95% Confidence Interval
t.025
43
Underlying Assumptions
• Dependent variable measured on interval or ratio scale.
• Independent random sampling
– (independence within and between samples)
– In experimental work, often make do with random assignment.
• Normal distributions
– Moderate deviations ok due to CLT.
• Homogeneity of Variance
– Only critical when sample sizes are small and different.
PSYC 6130, PROF. J. ELDER
44
End of Lecture 5
Oct 15, 2008
Social Factors in Psychological Well-Being (cntd…)
Canadian Community Health Survey, 2004:
Respondents who report never getting along with others
X 36.59
s 22.40
n 37
X 41.84
s 23.87
n 25
PSYC 6130, PROF. J. ELDER
46
Separate Variances t Test
• If
– Population variances are different (suggested by substantially
different sample variances)
AND
– Samples are small
AND
– Sample sizes are substantially different
• Then
– Pooled variance t statistic will not be correct.
• In this case, use separate variances t test
PSYC 6130, PROF. J. ELDER
47
Separate Variances t Test
X
t
1
X 2 1 2
s X1 X2
where sX2 1 X2 sX2 1 sX2 2
•
This statistic is well-approximated by a t distribution.
•
Unfortunately, calculating the appropriate df is difficult.
•
SPSS will calculate the Welch-Satterthwaite approximation for df as
part of a 2-sample t test:
s
df
2
X1
s X4 1
df1
s
2
X2
2
s X4 2
df2
PSYC 6130, PROF. J. ELDER
48
Social Factors in Psychological Well-Being (cntd…)
Canadian Community Health Survey, 2004:
Respondents who report never getting along with others
X 36.59
s 22.40
n 37
X 41.84
s 23.87
n 25
PSYC 6130, PROF. J. ELDER
49
Summary: t-Tests for 2 Independent Sample Means
n1, n2 100 n1
s2 Test
n2 s1
statistic
s X2 1 X2
df
t
s12 s22
n1 n2
WelchSatterthwaite
t
sp2
n1 n2 2
n1
sp2
n2
t
s
s2
2
n1 n2
n1 n2 2
t
s12 s22
n1 n2
n1 n2 2
z
s12 s22
n1 n2
NA
z
s12 s22
n1 n2
NA
z
s12 s22
n1 n2
NA
z
s12 s22
n1 n2
NA
PSYC 6130, PROF. J. ELDER
50
2
1
More on Homogeneity of Variance
• How do we decide if two sample variances are different enough to
suggest different population variances?
• Need NHT for homogeneity of variance.
– F-test
• Straightforward
• Sensitive to deviations from normality
– Levene’s test
• More robust to deviations from normality
• Computed by SPSS
PSYC 6130, PROF. J. ELDER
51
Levene’s Test: Basic Idea
1. Replace each score X1i , X2i with its absolute deviation from the sample mean:
d1i | X1i X1 |
d 2i | X 2i X 2 |
2. Now run an independent samples t-test on d1i and d2i :
t
d1 d 2
sd1 d2
SPSS reports an F-statistic for Levene’s test
• Allows the homogeneity of variance for two or more variables to be tested.
• We will introduce the F distribution later in the term.
PSYC 6130, PROF. J. ELDER
52
The Matched t Test
Independent or Matched?
• Application of the Independent-Groups t test depended
on independence both within and between groups.
• There are many cases where it is wise, convenient or
necessary to use a matched design, in which there is a
1:1 correspondence between scores in the two samples.
• In this case, you cannot assume independence between
samples!
• Examples:
– Repeated-subject designs (same subjects in both samples).
– Matched-pairs designs (attempt to match possibly important
attributes of subjects in two samples)
PSYC 6130, PROF. J. ELDER
54
Example: Assignment Marks
A3
A4
72 80
70 80
69 93
83 88
88 93
88 93
87 88
88 93
85 100
85 100
70 80
72 90
60 80
83 75
81 83
68 75
36 93
80 100
65 83
65 83
41 75
73 88
68 75
Assignment 2 Mark (%)
100
90
80
70
60
50
40
40
60
Assignment 1 Mark (%)
Mean 73 86
SD
n
80
14 8
23 23
These scores are not independent!
PSYC 6130, PROF. J. ELDER
55
100
Better alternative:
The matched t-test using the direct difference method
A3
A4 A4-A3
72 80
70 80
69 93
83 88
88 93
88 93
87 88
88 93
85 100
85 100
70 80
72 90
60 80
83 75
81 83
68 75
36 93
80 100
65 83
65 83
41 75
73 88
68 75
Mean
SD
n
8
10
24
5
5
5
1
5
15
15
10
18
20
-8
2
7
57
20
18
18
34
15
7
73
86
13
14
23
8
23
13
23
PSYC 6130, PROF. J. ELDER
t
X 0
s/ n
t
56
D 0
sD / n
Matched vs Independent t-test
• Why does a matched t-test yield a higher t-score than an
independent t-test in this example?
– The t-score is determined by the ratio of the difference between
the groups and the variance within the groups.
– The matched t-test factors out the portion of the within-group
variance due to differences between individuals.
PSYC 6130, PROF. J. ELDER
57
The Matched t Test and Linear Correlation
• The degree to which the matched t value exceeds the independentgroups t value depends on how highly correlated the two samples
are.
• Alternate formula for matched standard error:
100
Assignment 4 Mark (%)
s12 s22 2rs1s2
s
,
n
n
where r is the Pearson correlation
2
D
90
80
r2 = 0.18
70
60
50
40
40
60
80
Assignment 3 Mark (%)
PSYC 6130, PROF. J. ELDER
58
100
Case 1: r = 0
• Independent t-test
s
2
X1 X 2
1
s12 s22
n
• Matched t-test
s12 s22 2rs1s2
s
n
n
1 2
s1 s22
n
2
D
Thus the t-score will be the same.
But note that
df 2(n 1)
df n 1
Thus the critical t-values will be larger for the matched test.
PSYC 6130, PROF. J. ELDER
59
Case 2: r > 0
• Independent t-test
s
2
X1 X 2
1
s12 s22
n
• Matched t-test
s12 s22 2rs1s2
s
n
n
2
D
Now the t-score will be larger for the matched test. Although the
critical t-values are larger, the net result is that the matched test will
often be more powerful.
PSYC 6130, PROF. J. ELDER
60
Confidence Intervals
• Just as for one-sample t test:
t
PSYC 6130, PROF. J. ELDER
D
D t / 2sD
sD
61
Repeated Measures Designs
• Many matched sample designs involve repeated
measures of the same individuals.
• This can result in carry-over effects, including learning
and fatigue.
• These effects can be minimized by counter-balancing
the ordering of conditions across participants.
PSYC 6130, PROF. J. ELDER
62
Assumptions of the Matched t Test
• Normality
• Independent random sampling (within samples)
PSYC 6130, PROF. J. ELDER
63