Comparing Two Populations

Transcript Comparing Two Populations

STAT 651
Lecture 9
Copyright (c) Bani K. Mallick
1
Topics in Lecture #9

Comparing two population means

Output: detailed look

The t-test
Copyright (c) Bani K. Mallick
2
Book Sections Covered in Lecture #9

Chapter 6.2
Copyright (c) Bani K. Mallick
3
Relevant SPSS Tutorials

Transformations of Data

2-sample t-test

Paired t-test
Copyright (c) Bani K. Mallick
4
Lecture 8 Review: Comparing Two
Populations

There a two populations

Take a sample from each population

The sample sizes need not be the same

Population 1:
n1

Population 2:
n2
Copyright (c) Bani K. Mallick
5
Lecture 8 Review: Comparing Two
Populations

Each will have a sample standard deviation

Population 1:

Population 2:
s1
s2
Copyright (c) Bani K. Mallick
6
Lecture 8 Review: Comparing Two
Populations

Each sample with have a sample mean

Population 1:
X1

Population 2:
X2

That’s the statistics. What are the
parameters?
Copyright (c) Bani K. Mallick
7
Lecture 8 Review: Comparing Two
Populations

Each sample with have a population standard
deviation

Population 1:
1

Population 2:
2
Copyright (c) Bani K. Mallick
8
Lecture 8 Review: Comparing Two
Populations

Each sample with have a population mean

Population 1:

Population 2:
1
2
Copyright (c) Bani K. Mallick
9
Lecture 8 Review: Comparing Two
Populations



How do we compare the population means
and
????
2
The usual way is to take their difference:

1
  1   2
If the population means are equal, what is
their difference?
Copyright (c) Bani K. Mallick
10
Lecture 8 Review: Comparing Two
Populations

The usual way is to take their difference:
  1   2


If the population means are equal, their
difference = 0
Suppose we form a confidence interval for
the difference. From this we learn whether 0
is in the confidence interval, and hence can
make decisions about the hypothesis
Copyright (c) Bani K. Mallick
11
Log(Saturated Fat)
NHANES Comparison
Group Statistics
Health Status
Healthy
Cancer
N
60
59
Mean
2.9905
2.6969
Copyright (c) Bani K. Mallick
Std. Error
Std. Deviation
Mean
.6173 7.969E-02
.6423 8.362E-02
12
g(Saturated Fat)
NHANES Comparison: what the
output looks like
Independent Samples Test
Levene's Test for
Equality of Variances
F
Equal variances
ass umed
Equal variances
not as sumed
.186
t-tes t for Equality of Means
Sig.
t
.667
df
Sig. (2-tailed)
Mean
Difference
Std. Error
Difference
95% Confidence
Interval of the
Difference
Lower
Upper
2.543
117
.012
.2937
.1155
6.497E-02
.5223
2.542
116.627
.012
.2937
.1155
6.488E-02
.5224
Copyright (c) Bani K. Mallick
13
NHANES Comparison: the variable
g(Saturated Fat) Equal variances
assumed
Equal variances
not assumed
Independent Samples Test
Levene's Test for
Equality of Variances
F
Sig.
.186
t-test for Equality of Means
t
.667
2.543
2.542
df
Mean
Difference
Std. Error
Difference
.012
.2937
.1155
6.497E-02
.5223
.012
.2937
.1155
6.488E-02
.5224
Sig. (2-tailed)
117
116.627
Copyright (c) Bani K. Mallick
95% Confidence
Interval of the
Difference
Lower
Upper
14
NHANES Comparison: The method. If
you think the varianes are wildly
different, try a transformation
Independent Samples Test
Levene's Test for
Equality of Variances
F
g(Saturated Fat) Equal
variances
assumed
Equal variances
not assumed
Sig.
.186
t-test for Equality of Means
t
.667
2.543
2.542
df
Mean
Difference
Std. Error
Difference
.012
.2937
.1155
6.497E-02
.5223
.012
.2937
.1155
6.488E-02
.5224
Sig. (2-tailed)
117
116.627
Copyright (c) Bani K. Mallick
95% Confidence
Interval of the
Difference
Lower
Upper
15
NHANES Comparison: the p-value.
g(Saturated Fat) Equal variances
assumed
Equal variances
not assumed
Independent Samples Test
Levene's Test for
Equality of Variances
F
Sig.
.186
t-test for Equality of Means
t
.667
2.543
2.542
df
95% Confidence
Interval of the
Difference
Lower
Upper
Sig. (2-tailed)
Mean
Difference
Std. Error
Difference
.012
.2937
.1155
6.497E-02
.5223
.012
.2937
.1155
6.488E-02
.5224
117
116.627
Copyright (c) Bani K. Mallick
16
NHANES Comparison: the difference
in sample means
g(Saturated Fat) Equal variances
assumed
Equal variances
not assumed
Independent Samples Test
Levene's Test for
Equality of Variances
F
Sig.
.186
t-test for Equality of Means
t
.667
2.543
2.542
df
Sig. (2-tailed)
117
116.627
Copyright (c) Bani K. Mallick
Mean
Difference
Std. Error
Difference
95% Confidence
Interval of the
Difference
Lower
Upper
.012
.2937
.1155
6.497E-02
.5223
.012
.2937
.1155
6.488E-02
.5224
17
NHANES Comparison: the standard
error of difference in sample means
g(Saturated Fat) Equal variances
assumed
Equal variances
not assumed
Independent Samples Test
Levene's Test for
Equality of Variances
F
Sig.
.186
t-test for Equality of Means
t
.667
2.543
2.542
df
Sig. (2-tailed)
Mean
Difference
.012
.2937
117
116.627
Copyright (c) Bani K. Mallick
.012
.2937
Std. Error
Difference
.1155
.1155
95% Confidence
Interval of the
Difference
Lower
Upper
6.497E-02
.5223
6.488E-02
.5224
18
NHANES Comparison: the 95%
confidence interval
Independent Samples Test
Levene's Test for
Euality of Variances
F
Equal variances
assumed
Equal variances
not assumed
Sig.
.186
t-test for Equality of Means
t
.667
2.543
2.542
df
Sig. (2-tailed)
Mean
Difference
Std. Error
Difference
.012
.2937
.1155
.2937
.1155
117
116.627
Copyright (c) Bani K. Mallick
.012
95% Confidence
Interval of the
Difference
Lower
Upper
0.0065
6.488E-02
19
.5223
.5224
NHANES Comparison

The “Mean Difference” is 0.2937. Since the
healthy cases had a higher mean, this is
Mean(Healthy) – Mean(Cancer)


The 95% CI is from 0.0065 to 0.5223
What is this a CI for? The difference in
population mean log(saturated fat) intake
between cancer cases and healthy controls:
(Healthy) – (Cancer)
Copyright (c) Bani K. Mallick
20
NHANES Comparison
Mean(Healthy) – Mean(Cancer)


The 95% CI is from 0.0065 to 0.5223
The null hypothesis of interest is that the
population means are equal, i.e.,
(Healthy) – (Cancer) = 0
Copyright (c) Bani K. Mallick
21
NHANES Comparison
Mean(Healthy) – Mean(Cancer)

The 95% CI is from 0.0065 to 0.5223

Is the p-value p < 0.05 or p > 0.05?
Copyright (c) Bani K. Mallick
22
NHANES Comparison
Mean(Healthy) – Mean(Cancer)

The 95% CI is from 0.0065 to 0.5223
Confidence Interval
0=
Hypothesized
value
0.0065
Copyright (c) Bani K. Mallick
0.5223
23
NHANES Comparison
Mean(Healthy) – Mean(Cancer)

The 95% CI is from 0.0065 to 0.5223

Is the p-value p < 0.05 or p > 0.05?

Answer: p < 0.05 since the 95% CI does not
cover zero.
Copyright (c) Bani K. Mallick
24
NHANES Comparison
Mean(Healthy) – Mean(Cancer)

The 95% CI is from 0.0065 to 0.5223

Is the p-value p < 0.01 or p > 0.01?

Answer: You cannot tell from a 95% CI.
However, from the SPSS output, p =
0.012. (see next slide)
Copyright (c) Bani K. Mallick
25
NHANES Comparison: the 95%
confidence interval
Independent Samples Test
Levene's Test for
Euality of Variances
F
Equal variances
assumed
Equal variances
not assumed
Sig.
.186
t-test for Equality of Means
t
.667
2.543
2.542
df
Sig. (2-tailed)
Mean
Difference
Std. Error
Difference
.012
.2937
.1155
.2937
.1155
117
116.627
Copyright (c) Bani K. Mallick
.012
95% Confidence
Interval of the
Difference
Lower
Upper
0.0065
6.488E-02
26
.5223
.5224
NHANES Comparison
Mean(Healthy) – Mean(Cancer)


The 95% CI is from 0.0065 to 0.5223
What do we conclude from this confidence
interval?
Copyright (c) Bani K. Mallick
27
NHANES Comparison
Mean(Healthy) – Mean(Cancer)



The 95% CI is from 0.0065 to 0.5223
What do we conclude from this confidence
interval?
The population mean log(saturated fat)
intake is greater in the Healthy cases by
between 0.0065 and 0.5223
(exponentiate to get in terms of grams
of saturated fat), with 95% confidence
Copyright (c) Bani K. Mallick
28
Comparing Two Population Means:
the Formulas
X1 s1 n1
X2 s 2 n 2

The data:

The populations:
1  1  2  2

The aim: CI for
1   2
Copyright (c) Bani K. Mallick
29
Comparing Two Populations


Does it matter which one you call population
1 and which one you call population 2?
Not at all. The key is to interpret the
difference properly.
Copyright (c) Bani K. Mallick
30
Comparing Two Populations
1   2

The aim: CI for

This is the difference in population means


The estimate of the difference in population
means is the difference in sample means
This is a random variable: it has sample to
sample variability
X1  X 2
Copyright (c) Bani K. Mallick
31
Comparing Two Populations
X1  X 2

Difference of sample means

“Population” mean from repeated sampling is

The s.d. from repeated sampling is
1   2

2
1
n1


2
2
n2
Copyright (c) Bani K. Mallick
32
Comparing Two Populations
X1  X 2

Difference of sample means

The s.d. from repeated sampling is

2
1
n1



2
2
n2
You need reasonably large samples from
BOTH populations
Copyright (c) Bani K. Mallick
33
Comparing Two Populations

If you can reasonably believe that the
population sd’s are nearly equal, it is
customary to pick the equal variance
assumption and estimate the common
standard deviation by
sp 
(n1  1)s  (n 2  1)s
n1  n 2  2
2
1
Copyright (c) Bani K. Mallick
2
2
34
Comparing Two Populations

The standard error then of
the value
sp

X1  X 2
is
1 1

n1 n 2
The number of degrees of freedom is
n1  n 2  2
Copyright (c) Bani K. Mallick
35
Comparing Two Populations

A (1a)100% CI for
1   2

t
(n
+n
-2)s
a
/2
1
2
p
X1  X 2

is
1 1

n1 n 2
Note how the sample sizes determine the CI
length
Copyright (c) Bani K. Mallick
36
Comparing Two Populations

Generally, you should make your sample sizes
nearly equal, or at least not wildly unequal.
Consider a total sample size of 100
X1  X2  ta /2 (n1 +n 2 -2)s p



1 1

n1 n 2
1 1

n1 n 2
= 1 if n1 = 1, n2 = 99
= 0.20 if n1 = 50, n2 = 50
Thus, in the former case, your CI would be 5
times longer!
Copyright (c) Bani K. Mallick
37
Comparing Two Populations

The CI can of course be used to test
hypotheses
H0 : 1   2 vs Ha : 1   2

This is the same as
H0 : 1   2 =0 vs Ha : 1   2  0

So we just need to check whether 0 is in the
interval, just as we have done
Copyright (c) Bani K. Mallick
38
Comparing Two Populations: The ttest
H0 : 1   2 =0 vs Ha : 1   2  0


There is something called a t-test, which
gives you the information as to whether 0 is
in the CI.
It does not tell you where the means lie
however, so it is of limited use. P-values
tell you the same thing.
Copyright (c) Bani K. Mallick
39
Comparing Two Populations: The ttest

The t-statistic is defined by
X1  X 2
t=
1 1
sp

n1 n 2
Copyright (c) Bani K. Mallick
40
Comparing Two Populations: The ttest

You reject equality of means if
|t| > ta /2 (n1 +n 2 -2)

In this case, is p < a or is p > a?
Copyright (c) Bani K. Mallick
41
Comparing Two Populations: The ttest

You reject equality of means if
|t| > ta /2 (n1 +n 2 -2)

p<a
Copyright (c) Bani K. Mallick
42
NHANES Comparison: the t-test
ta /2 (n1 +n 2 -2) = t .025 (117)  1.98
Independent Samples Test
Levene's Test for
Equality of Variances
F
g(Saturated Fat) Equal variances
assumed
Equal variances
not assumed
df
t
Sig.
.186
t-test for Equality of Means
.667
Sig. (2-tailed)
2.543
2.542
Mean
Difference
Std. Error
Difference
95% Confidence
Interval of the
Difference
Lower
Upper
117
.012
.2937
.1155
6.497E-02
.5223
116.627
.012
.2937
.1155
6.488E-02
.5224
t = 2.543 > ta /2 (n1 +n 2 -2)  1.98, hence reject
the hypothesis that the population means are equal,
for a = 0.05
Copyright (c) Bani K. Mallick
43
Comparing Two Populations

SPSS Demonstrations: bluebonnets and
Framingham Heart Disease and Blood
Pressure, as time permits
Copyright (c) Bani K. Mallick
44

Comparing Two Populations

Transcript Comparing Two Populations

Directory