Comparing Two Populations
Download
Report
Transcript Comparing Two Populations
STAT 651
Lecture 9
Copyright (c) Bani K. Mallick
1
Topics in Lecture #9
Comparing two population means
Output: detailed look
The t-test
Copyright (c) Bani K. Mallick
2
Book Sections Covered in Lecture #9
Chapter 6.2
Copyright (c) Bani K. Mallick
3
Relevant SPSS Tutorials
Transformations of Data
2-sample t-test
Paired t-test
Copyright (c) Bani K. Mallick
4
Lecture 8 Review: Comparing Two
Populations
There a two populations
Take a sample from each population
The sample sizes need not be the same
Population 1:
n1
Population 2:
n2
Copyright (c) Bani K. Mallick
5
Lecture 8 Review: Comparing Two
Populations
Each will have a sample standard deviation
Population 1:
Population 2:
s1
s2
Copyright (c) Bani K. Mallick
6
Lecture 8 Review: Comparing Two
Populations
Each sample with have a sample mean
Population 1:
X1
Population 2:
X2
That’s the statistics. What are the
parameters?
Copyright (c) Bani K. Mallick
7
Lecture 8 Review: Comparing Two
Populations
Each sample with have a population standard
deviation
Population 1:
1
Population 2:
2
Copyright (c) Bani K. Mallick
8
Lecture 8 Review: Comparing Two
Populations
Each sample with have a population mean
Population 1:
Population 2:
1
2
Copyright (c) Bani K. Mallick
9
Lecture 8 Review: Comparing Two
Populations
How do we compare the population means
and
????
2
The usual way is to take their difference:
1
1 2
If the population means are equal, what is
their difference?
Copyright (c) Bani K. Mallick
10
Lecture 8 Review: Comparing Two
Populations
The usual way is to take their difference:
1 2
If the population means are equal, their
difference = 0
Suppose we form a confidence interval for
the difference. From this we learn whether 0
is in the confidence interval, and hence can
make decisions about the hypothesis
Copyright (c) Bani K. Mallick
11
Log(Saturated Fat)
NHANES Comparison
Group Statistics
Health Status
Healthy
Cancer
N
60
59
Mean
2.9905
2.6969
Copyright (c) Bani K. Mallick
Std. Error
Std. Deviation
Mean
.6173 7.969E-02
.6423 8.362E-02
12
g(Saturated Fat)
NHANES Comparison: what the
output looks like
Independent Samples Test
Levene's Test for
Equality of Variances
F
Equal variances
ass umed
Equal variances
not as sumed
.186
t-tes t for Equality of Means
Sig.
t
.667
df
Sig. (2-tailed)
Mean
Difference
Std. Error
Difference
95% Confidence
Interval of the
Difference
Lower
Upper
2.543
117
.012
.2937
.1155
6.497E-02
.5223
2.542
116.627
.012
.2937
.1155
6.488E-02
.5224
Copyright (c) Bani K. Mallick
13
NHANES Comparison: the variable
g(Saturated Fat) Equal variances
assumed
Equal variances
not assumed
Independent Samples Test
Levene's Test for
Equality of Variances
F
Sig.
.186
t-test for Equality of Means
t
.667
2.543
2.542
df
Mean
Difference
Std. Error
Difference
.012
.2937
.1155
6.497E-02
.5223
.012
.2937
.1155
6.488E-02
.5224
Sig. (2-tailed)
117
116.627
Copyright (c) Bani K. Mallick
95% Confidence
Interval of the
Difference
Lower
Upper
14
NHANES Comparison: The method. If
you think the varianes are wildly
different, try a transformation
Independent Samples Test
Levene's Test for
Equality of Variances
F
g(Saturated Fat) Equal
variances
assumed
Equal variances
not assumed
Sig.
.186
t-test for Equality of Means
t
.667
2.543
2.542
df
Mean
Difference
Std. Error
Difference
.012
.2937
.1155
6.497E-02
.5223
.012
.2937
.1155
6.488E-02
.5224
Sig. (2-tailed)
117
116.627
Copyright (c) Bani K. Mallick
95% Confidence
Interval of the
Difference
Lower
Upper
15
NHANES Comparison: the p-value.
g(Saturated Fat) Equal variances
assumed
Equal variances
not assumed
Independent Samples Test
Levene's Test for
Equality of Variances
F
Sig.
.186
t-test for Equality of Means
t
.667
2.543
2.542
df
95% Confidence
Interval of the
Difference
Lower
Upper
Sig. (2-tailed)
Mean
Difference
Std. Error
Difference
.012
.2937
.1155
6.497E-02
.5223
.012
.2937
.1155
6.488E-02
.5224
117
116.627
Copyright (c) Bani K. Mallick
16
NHANES Comparison: the difference
in sample means
g(Saturated Fat) Equal variances
assumed
Equal variances
not assumed
Independent Samples Test
Levene's Test for
Equality of Variances
F
Sig.
.186
t-test for Equality of Means
t
.667
2.543
2.542
df
Sig. (2-tailed)
117
116.627
Copyright (c) Bani K. Mallick
Mean
Difference
Std. Error
Difference
95% Confidence
Interval of the
Difference
Lower
Upper
.012
.2937
.1155
6.497E-02
.5223
.012
.2937
.1155
6.488E-02
.5224
17
NHANES Comparison: the standard
error of difference in sample means
g(Saturated Fat) Equal variances
assumed
Equal variances
not assumed
Independent Samples Test
Levene's Test for
Equality of Variances
F
Sig.
.186
t-test for Equality of Means
t
.667
2.543
2.542
df
Sig. (2-tailed)
Mean
Difference
.012
.2937
117
116.627
Copyright (c) Bani K. Mallick
.012
.2937
Std. Error
Difference
.1155
.1155
95% Confidence
Interval of the
Difference
Lower
Upper
6.497E-02
.5223
6.488E-02
.5224
18
NHANES Comparison: the 95%
confidence interval
Independent Samples Test
Levene's Test for
Euality of Variances
F
Equal variances
assumed
Equal variances
not assumed
Sig.
.186
t-test for Equality of Means
t
.667
2.543
2.542
df
Sig. (2-tailed)
Mean
Difference
Std. Error
Difference
.012
.2937
.1155
.2937
.1155
117
116.627
Copyright (c) Bani K. Mallick
.012
95% Confidence
Interval of the
Difference
Lower
Upper
0.0065
6.488E-02
19
.5223
.5224
NHANES Comparison
The “Mean Difference” is 0.2937. Since the
healthy cases had a higher mean, this is
Mean(Healthy) – Mean(Cancer)
The 95% CI is from 0.0065 to 0.5223
What is this a CI for? The difference in
population mean log(saturated fat) intake
between cancer cases and healthy controls:
(Healthy) – (Cancer)
Copyright (c) Bani K. Mallick
20
NHANES Comparison
Mean(Healthy) – Mean(Cancer)
The 95% CI is from 0.0065 to 0.5223
The null hypothesis of interest is that the
population means are equal, i.e.,
(Healthy) – (Cancer) = 0
Copyright (c) Bani K. Mallick
21
NHANES Comparison
Mean(Healthy) – Mean(Cancer)
The 95% CI is from 0.0065 to 0.5223
Is the p-value p < 0.05 or p > 0.05?
Copyright (c) Bani K. Mallick
22
NHANES Comparison
Mean(Healthy) – Mean(Cancer)
The 95% CI is from 0.0065 to 0.5223
Confidence Interval
0=
Hypothesized
value
0.0065
Copyright (c) Bani K. Mallick
0.5223
23
NHANES Comparison
Mean(Healthy) – Mean(Cancer)
The 95% CI is from 0.0065 to 0.5223
Is the p-value p < 0.05 or p > 0.05?
Answer: p < 0.05 since the 95% CI does not
cover zero.
Copyright (c) Bani K. Mallick
24
NHANES Comparison
Mean(Healthy) – Mean(Cancer)
The 95% CI is from 0.0065 to 0.5223
Is the p-value p < 0.01 or p > 0.01?
Answer: You cannot tell from a 95% CI.
However, from the SPSS output, p =
0.012. (see next slide)
Copyright (c) Bani K. Mallick
25
NHANES Comparison: the 95%
confidence interval
Independent Samples Test
Levene's Test for
Euality of Variances
F
Equal variances
assumed
Equal variances
not assumed
Sig.
.186
t-test for Equality of Means
t
.667
2.543
2.542
df
Sig. (2-tailed)
Mean
Difference
Std. Error
Difference
.012
.2937
.1155
.2937
.1155
117
116.627
Copyright (c) Bani K. Mallick
.012
95% Confidence
Interval of the
Difference
Lower
Upper
0.0065
6.488E-02
26
.5223
.5224
NHANES Comparison
Mean(Healthy) – Mean(Cancer)
The 95% CI is from 0.0065 to 0.5223
What do we conclude from this confidence
interval?
Copyright (c) Bani K. Mallick
27
NHANES Comparison
Mean(Healthy) – Mean(Cancer)
The 95% CI is from 0.0065 to 0.5223
What do we conclude from this confidence
interval?
The population mean log(saturated fat)
intake is greater in the Healthy cases by
between 0.0065 and 0.5223
(exponentiate to get in terms of grams
of saturated fat), with 95% confidence
Copyright (c) Bani K. Mallick
28
Comparing Two Population Means:
the Formulas
X1 s1 n1
X2 s 2 n 2
The data:
The populations:
1 1 2 2
The aim: CI for
1 2
Copyright (c) Bani K. Mallick
29
Comparing Two Populations
Does it matter which one you call population
1 and which one you call population 2?
Not at all. The key is to interpret the
difference properly.
Copyright (c) Bani K. Mallick
30
Comparing Two Populations
1 2
The aim: CI for
This is the difference in population means
The estimate of the difference in population
means is the difference in sample means
This is a random variable: it has sample to
sample variability
X1 X 2
Copyright (c) Bani K. Mallick
31
Comparing Two Populations
X1 X 2
Difference of sample means
“Population” mean from repeated sampling is
The s.d. from repeated sampling is
1 2
2
1
n1
2
2
n2
Copyright (c) Bani K. Mallick
32
Comparing Two Populations
X1 X 2
Difference of sample means
The s.d. from repeated sampling is
2
1
n1
2
2
n2
You need reasonably large samples from
BOTH populations
Copyright (c) Bani K. Mallick
33
Comparing Two Populations
If you can reasonably believe that the
population sd’s are nearly equal, it is
customary to pick the equal variance
assumption and estimate the common
standard deviation by
sp
(n1 1)s (n 2 1)s
n1 n 2 2
2
1
Copyright (c) Bani K. Mallick
2
2
34
Comparing Two Populations
The standard error then of
the value
sp
X1 X 2
is
1 1
n1 n 2
The number of degrees of freedom is
n1 n 2 2
Copyright (c) Bani K. Mallick
35
Comparing Two Populations
A (1a)100% CI for
1 2
t
(n
+n
-2)s
a
/2
1
2
p
X1 X 2
is
1 1
n1 n 2
Note how the sample sizes determine the CI
length
Copyright (c) Bani K. Mallick
36
Comparing Two Populations
Generally, you should make your sample sizes
nearly equal, or at least not wildly unequal.
Consider a total sample size of 100
X1 X2 ta /2 (n1 +n 2 -2)s p
1 1
n1 n 2
1 1
n1 n 2
= 1 if n1 = 1, n2 = 99
= 0.20 if n1 = 50, n2 = 50
Thus, in the former case, your CI would be 5
times longer!
Copyright (c) Bani K. Mallick
37
Comparing Two Populations
The CI can of course be used to test
hypotheses
H0 : 1 2 vs Ha : 1 2
This is the same as
H0 : 1 2 =0 vs Ha : 1 2 0
So we just need to check whether 0 is in the
interval, just as we have done
Copyright (c) Bani K. Mallick
38
Comparing Two Populations: The ttest
H0 : 1 2 =0 vs Ha : 1 2 0
There is something called a t-test, which
gives you the information as to whether 0 is
in the CI.
It does not tell you where the means lie
however, so it is of limited use. P-values
tell you the same thing.
Copyright (c) Bani K. Mallick
39
Comparing Two Populations: The ttest
The t-statistic is defined by
X1 X 2
t=
1 1
sp
n1 n 2
Copyright (c) Bani K. Mallick
40
Comparing Two Populations: The ttest
You reject equality of means if
|t| > ta /2 (n1 +n 2 -2)
In this case, is p < a or is p > a?
Copyright (c) Bani K. Mallick
41
Comparing Two Populations: The ttest
You reject equality of means if
|t| > ta /2 (n1 +n 2 -2)
p<a
Copyright (c) Bani K. Mallick
42
NHANES Comparison: the t-test
ta /2 (n1 +n 2 -2) = t .025 (117) 1.98
Independent Samples Test
Levene's Test for
Equality of Variances
F
g(Saturated Fat) Equal variances
assumed
Equal variances
not assumed
df
t
Sig.
.186
t-test for Equality of Means
.667
Sig. (2-tailed)
2.543
2.542
Mean
Difference
Std. Error
Difference
95% Confidence
Interval of the
Difference
Lower
Upper
117
.012
.2937
.1155
6.497E-02
.5223
116.627
.012
.2937
.1155
6.488E-02
.5224
t = 2.543 > ta /2 (n1 +n 2 -2) 1.98, hence reject
the hypothesis that the population means are equal,
for a = 0.05
Copyright (c) Bani K. Mallick
43
Comparing Two Populations
SPSS Demonstrations: bluebonnets and
Framingham Heart Disease and Blood
Pressure, as time permits
Copyright (c) Bani K. Mallick
44