Transcript T-tests
T Tests: Comparison of Means
•
•
•
Most t tests involve the comparison of two populations
with respect to the means of randomly drawn samples
from the respective populations.
The two populations could be different groups or
experimental conditions, or they could be “within”
persons or units, such as a “before” and “after”
design, e.g., the population of people who were tested
before a treatment and the population of people who
were tested after it
If the obtained scores within a sample are reasonably
homogeneous (have low variability), and the
variances of the two groups are roughly equal, then a
difference of means test is an appropriate way to test
hypotheses about the differences between two
populations
T-Test and The Null Hypothesis
• The null hypothesis, usually expressed as µ1
= µ2 , is what we ordinarily seek to reject
(but sometimes fail to reject) in statistical
hypothesis testing
•
•
With respect to the difference of means test, the
null hypothesis is that any differences we observe
in the samples we draw from the two populations
were obtained by chance (due to sampling error),
and that the differences in the population means
are zero
If the observed differences we obtain in our
samples are not sufficiently large (don’t fall within
the predetermined confidence region), we can say
that we have failed to reject the null hypothesis or
alternatively that we must retain the null
hypothesis
T Test and the Research Hypothesis
• The research hypothesis, µ1 ≠ µ2 , is that
the population means are unequal, i.e., that
there are differences between the
populations. When we get a result such
that we can reject the null hypothesis, we
then can certainly say that there is
evidence to support the research
hypothesis. Some researchers will state
this as “confirming” or “accepting” the
research hypothesis
Sampling Distribution of
Differences between Means
• Underlying the t statistic is the notion of a
sampling distribution of differences between
means
• In this distribution it is assumed that any
obtained differences between pairs of
samples (say, samples of males and females,
or befores and afters) are due to sampling
error and do not represent true population
differences
• The sampling distribution of differences
between means approximates a normal
distribution with a mean of zero for samples
over size 100
Formula for t
•
In this formulat, the expression in the numerator is the
difference between the obtained sample means for the two
groups (treatments, etc) we are comparing and in the
denominator we have an estimate of the standard deviation
of the sampling distribution of the differences between
sample means. We estimate this denominator based on
sample values (more on how to calculate this in a minute).
Significance Levels
•
•
Significance levels, also known as confidence levels,
critical values, rejection levels (for the null
hypothesis), alpha levels, etc: they are the points at
which the region beyond them under the curve (of the
test statistic distribution) contains such unlikely
occurrences that, when an obtained sample value falls
into that region, one can reject the null hypothesis
with confidence
It is conventional to set the confidence level in
advance of performing the test to .05 (two-tailed,
which means that the obtained statistic has to fall into
one of the two regions which represent the upper and
lower .025 of the area under the curve) for noncritical applications and to more stringent levels like
.001 or .0001 for medical or other critical applications
Setting Significance Levels
•
•
•
•
The researcher will opt for a one-tailed test whenever s/he is able
to predict the direction of differences. This means that a result
must be obtained that falls within the upper 5% of the area under
the curve, or the lower 5%, depending upon the predicted
direction of the differences
Most of the research you will do will not require that you set a
significance level higher than .05, one-tailed
Most risk of Type II error (failing to reject the null hypothesis when
it is in fact false) can be avoided if you have sufficiently large
samples
Distinction between P and alpha: P is the exactly probability level
associated with an obtained statistics such as a t score and is
gotten from the raw data. SPSS will give you this value. Alpha
level refers to the size of the critical region under the curve into
which our test statistic must fall, according to our requirements.
•
SPSS will automatically calculate the actual probability for you. You
may report this value in a research report, but you will base your
decision about whether or not to reject the null hypothesis based on
whether or not your obtained value of t, is greater than the value of t
associated with the confidence level you set in advance
Further Criteria for Setting
Significance Levels
•
•
•
•
•
•
Plausibility of alternatives: if the research hypothesis is directly
counter to prevailing theory, it is better to set a more stringent
level than .05
Sample size: when the sample size is small, the power to detect
an effect is less, so the critical region should be more generous
Degree of control in experimental design: the greater the degree
of control (e.g. the more extraneous influences eliminated which
could account for the observed variation between conditions of the
experiment) the greater the freedom to use a larger critical region
to detect an effect
Extent to which data do not meet assumptions of the statistical
test (independence of cases, random sampling, equality of
variances, etc): when data do not meet assumptions of statistical
test better to use a smaller error rate such as .025 and interpret it
as a larger one
Direction of hypothesis: should choose a smaller critical region if
you are confident of the direction of the difference of means, etc.
Testing vs. developing hypotheses: A larger confidence region can
be used for pilot studies.
T test for Independent or
Unmatched Samples
•
•
The purpose of the t test is to make a determination
with respect to two sample means whether or not
they were drawn from different populations. Another
way to put this is to decide if the means for the two
samples (two samples which differ on the “grouping
variable” )“differ significantly” on the variable of
interest (the “test variable”)
There are several varieties of t test
•
•
Most generally, the t test assumes that the standard
deviations σ1 and σ2 in the two populations are equal
(we can call this Model A).
However, there are times when we would not make this
assumption (we will call this Model B; σ1 ≠ σ2 ) (When
conducting a t test in SPSS for independent samples,
the program will conduct a test for homogeneity of
variance and give you values of t assuming both models
A and B)
T test for Independent or
Unmatched Samples, cont’d
•
•
•
•
Use of t test assumes that the populations from
which the samples are drawn are normally distributed
with respect to the variables of interest
Use of t test assumes interval level data (minimally)
and random sampling
Sometimes referred to as a Z-test since t is normally
distributed for large samples. In fact for n > 120 it is
OK to consult the Z table to obtain the probability
The obtained value of t and its significance depend on
(1) the size of the mean differences (2) the amount
of variability within each sample (3) the sample size
•
Small variability and large sample size give us more
confidence in the results we obtain
Model A t test: Equal Population
Variances are Assumed
•
•
•
•
Let’s consider an example of Model A, when we make the
assumption that the variances in the populations are equal. We
have the following problem:
In a study of attitudes toward smoking, it was found than an
experimental group (N=40, s = 6) who had visited a Web site
organized by the Tobacco Lobbyist’s League had a mean score on
the “smoking favorability” test of 40, while a control group (N = 22,
s = 4) had a mean score on the smoking favorability test of 35.
Higher scores on the test reflect greater favorability towards
smoking
Our null hypothesis, H0, is that the two groups are from the same
population
Our research hypothesis, H1, is that the two groups are from
different populations. Another way to put this is that we
hypothesize that the two groups differently significantly with respect
to the variable of interest, scores on the smoking favorability test.
Further, we anticipate that the differences will be such that that
experimental group will have a higher mean than the control group
on the smoking favorability test, so we have a predicted direction of
differences
Model A t test, Equal Variances,
cont’d
•
•
•
•
To test the null hypothesis we will turn to the t test.
We will make a decision that to reject the null
hypothesis we will require a value of t that falls into
the p <.05 critical region of the t distribution, and that
this will be a one-tailed test, since we have
hypothesized a particular direction of differences (that
the mean for the Experimental Group will be greater
than the mean for the Control Group). A smaller
value of t is required for the same level of significance
with a one-tailed test (e.g., t might be significant at
the .05 level with a one-tailed test, but only at the .10
level for a two-tailed test)
Our DF to enter the t table is N1 + N2-2, or 60.
To reject the null hypothesis with DF = 60 we need a
value of t of 1.671 for a one-tailed test (see next
slide)
Table of t for one tailed and twotailed tests
Calculation of Test Statistic for Pooled Variance
t Test, Model A (Equal Variances Assumed)
•
•
•
•
How is t calculated when it is assumed that the population
variances for the two groups are equal?
Recall that the experimental Group (N=40, s = 6) who had visited
a Web site organized by the Tobacco Lobbyist’s League had a mean
score on the “smoking favorability” test of 40, while a control
group (N = 22, s = 4) had a mean score on the smoking
favorability test of 35.
The numerator in the “real” formula for t is the difference of the
two sample means minus the difference of the populations means.
However, under the null hypothesis, the population means are
assumed to be equal and the second term (zero) drops out, so the
numerator of t is just the difference between the means of the two
groups. In our case, that is +5. (40-35)
In calculating the denominator, we want to have some measure of
the variance of the sampling distribution of the differences in
sample means. Because of the assumption of equal population
variances, we are going to use a “pooled estimate.” To calculate the
denominator, we first have to find the “weighted average of
variances.” We will symbolize this pooled denominator as sp2
Computing the Weighted Average of Variances
for the Denominator of the t Statistic, Model A
•
To compute the pooled, weighted average of
variances, we need to assemble our sample data:
N1 = 40, N2 = 22, M1 = 40, M2 = 35, s1 = 6, s2 =
4. The weighted average of variances, sp2, equals
(N1-1)S12 + (N2-1)S22
(N1 + N2) - 2
Inserting our sample data into the formula, we have
(39)(36) + (21)(16) / 40 + 22 -2 = 1404 + 336/60
= 29. Thus sp2 equals 29.
Calculation of t, Model A (Equal
Variances Assumed)
• Calculate t:
X1 – X2
t=
√s
2
p
N1
+
Sp2
N2
Pooled estimate of
the standard
deviation of the
sampling distribution
of differences in
sample means is in
the denominator-what
we computed on
previous slide
The numerator of t equals the mean of group 1 (40) minus mean of group 2 (35) or 5.
This value, 5, is divided by the square root of (29/40 + 29/22) and t equals 3. 498.
Can we reject the null hypothesis? In other words, how likely is it that we would
obtain a value of t as large as 3.498 if the experimental and control groups were from
the same population with respect to the variable of interest? Looking up in the table
we find that a t of 3.498 is significant (p < .005, one-tailed, DF = 60) and we can
reject the null hypothesis-can say that the experimental and control groups
differ significantly.
Model B, Equal Population Variances Not
Assumed (t-test for Unequal Variances)
•
•
If we cannot assume equal variations in the populations from
which the samples are purportedly drawn, then we need a
different estimate of the standard error of the sampling
distribution of differences of means in the denominator
In calculating t we use almost the same formula as in the
previous model but instead we substitute the separate sample
variances for the pooled or weighted average of variances, sp2,
that we used in the first model
Some authorities,
like Blalock, use
N1-1 and N2 -2
in the denominator
for unequal variances.
X1 – X2
√s
1
2
N1
+
S 22
N2
So in this case , t would
be equal to 5/ the
square root of 36/40 +
16/22, or 3.919. This
statistic requires that
you compute a different
DF before consulting the
t distribution table
Using SPSS to conduct a t Test for
Independent Samples, Assuming Equal
Population Variances
•
Let’s use the data from the employment2.sav data file to
test the research hypothesis that males and females
differed with respect to how long they had been at their
current job at the time of data collection. The null
hypothesis would be that with respect to the variable
“months of experience at the current job” men and women
are from the same population
•
•
•
•
In SPSS go to Analyze/Compare Means/Independent Samples
t-tests
Move the Previous Experience variable into the Test Variable
box and move Gender into the Grouping box. Click on the
Define Groups button (if it is blanked out highlight the variable
name in the box above it) and define the first group as “1” and
the second group as “2,” and click Continue
Under Options, set the confidence interval to 95%, click
Continue and then OK
Compare your output to the next slide
SPSS Output, t Test for Independent Samples
with both Equal and Unequal Variances
Assumed
Group Statistics
Previous Experience
(months)
Gender
male
female
N
Mean
111.84
77.04
257
216
Std. Deviation
109.849
95.012
Std. Error
Mean
6.852
6.465
Independent Samples Test
Levene's Test for
Equality of Variances
F
Previous Experience
(months)
Equal variances
assumed
Equal variances
not assumed
2.676
Sig .
.103
t-test for Eq uality of Means
t
df
Sig . (2-tailed)
Mean
Difference
Std. Error
Difference
95% Confidence
Interval of the
Difference
Lower
Upper
3.648
471
.000
34.80
9.539
16.059
53.548
3.694
470.604
.000
34.80
9.420
16.292
53.315
Can we reject the null hypothesis that there are no differences
between males and females in months of previous experience?
T test for Dependent or Matched
Samples
•
•
•
In certain cases, for example in “before and after”
designs or when members of group A have been
matched with members of group B on all salient
characteristics except one, the variable of interest, an
alternative formula for computing t is used. For
example, you might want to find out if there have been
significant changes in brand preference among the same
persons following exposure to a commercial
In this type of t test, we treat a “pair” of individuals as a
case, rather than the N1 + N2 individuals we ordinarily
treat as cases
We test a hypothesis of the following form: the mean of
the pair-by-pair differences in the population, µD , is
zero; in this case, that there are no differences
attributable to exposure to the commercial
An Example of t-Test for Dependent
Samples
•
Problem: Ten subjects are given a pre-test on attitudes toward downloading
of “hijacked” movie files. They heard a commercial from a union
representing technical people in the motion picture industry in which they
talked about having people “steal” the fruits of their labors. The ten people
then were re-administered the attitude measure. Given the pre- and posttest scores below, can you conclude, at the p <.01 level, one-tailed, that the
commercial made a significant impact on attitudes toward movie
downloading? (Higher scores on the test mean more negative attitudes
toward downloading)
Where XD-bar is
the mean
difference between
pairs of scores, N
is the # of pairs of
scores, the XD are
the differences
between each of
the matched pairs
of scores
t=
XD
√∑(XD –XD)2 / √(N-1)
N
Note: this computing formula
gives an equivalent result to
pp. 152-154 in Levin and Fox
Calculation of t for dependent
Samples
ID
Pre
Post
XD
Difference
Pre-Post
X1 – X2
(XD –XD)
(XD – XD)2
1
50
55
5
0
0
2
45
52
7
2
4
3
40
39
-1
-6
36
4
41
44
3
-2
4
5
47
50
3
-2
4
6
46
62
16
11
121
7
50
55
5
0
0
8
38
52
14
9
81
9
37
40
3
-2
4
10
40
35
-5
-10
100
∑ = 354
XD = 5
Calculate t for this
data: XD = 5; ∑(XD –
XD)2 = 354, N= 10,
DF=N-1
t
=
5
√(354/10) /√9
= 5/1.983 = 2.521
N = 10
Mean difference in positivity after hearing a commercial against pirating movie files
T Test for Dependent Samples in
SPSS
• Now let’s try that in SPSS. Go here to
download the pre/post data set
•
•
•
•
In SPSS Data Editor, go to Analyze/Compare
Means/ Paired Sample
Put the Posttest and Pretest variables into the
Paired Variables box; put Posttest in first if you
expect posttest scores to be higher
Click Options and select the 95% confidence
interval, and click Continue, then click OK
Compare your results to your hand calculations
Output for Paired Samples t Test
Note that the mean is
higher (e.g. in this
case a more positive
attitude) after the
commercial
Paired Samples Statistics
Pair
1
POSTTEST
PRETEST
Mean
48.4000
43.4000
N
10
10
Std. Deviation
8.55310
4.81202
Std. Error
Mean
2.70473
1.52169
This correlation indicates that
about 49% (1-(.692)2)of the variation
in post-test attitudes could be
explained by pre-test attitudes.
Presumably the rest of the variation
is explained by treatment plus error
Paired Samples Correlations
N
Pair 1
POSTTEST & PRETEST
10
Correlation
.692
Sig .
.027
Paired Samples Test
Paired Differences
Pair 1
POSTTEST - PRETEST
Mean
5.0000
Std. Deviation
6.27163
Std. Error
Mean
1.98326
95% Confidence
Interval of the
Difference
Lower
Upper
.5135
9.4865
t
2.521
df
9
Sig . (2-tailed)
.033
We have a significant value of t, but look at that confidence interval ;-( Also,
compare the means; does this seem like a major change? And compare the
standard deviations; in both cases they are all over the place in the raw scores
So we can reject the null hypothesis of no differences between pre and post and
conclude that our treatment increased negative attitudes towards downloading