Two-Sample Testing: Small Samples

Download Report

Transcript Two-Sample Testing: Small Samples

Two-Sample Testing: Small Samples
Problem 9.15: Bear gallbladder is used in Chinese medicine
to treat inflammation. Due to the difficulty of obtaining bear
gallbladder, researchers are searching for a more readily
available source of animal bile. A study was performed to
determine if pig gallbladder is an effective substitute for bear
gallbladder. Twenty male mice were divided randomly into
two groups: 10 were given a dosage of bear bile and 10 were
given a dosage of pig bile. All mice receive an injection of
croton oil in the left ear lobe to induce inflammation. Four
hours later, both the left and right ear lobes were weighed,
with the difference (in milligrams) representing the degree of
swelling. Summary statistics are provided in the following
table.
Summary Statistics
Bear Bile
Pig Bile
Sample Size =10 Sample Size =10
Sample Mean =
9.19
Sample Std. Dev.
= 4.17
Sample Mean =
9.71
Sample Std. Dev.
= 3.33
Question: What conclusion can we make about whether
pig bile is an effective substitute for bear bile?
Comparing Means of Several Populations
Problem 10.21: Studies conducted at the University of
Melbourne indicate that there may be a difference between
the pain thresholds of blondes and brunettes. Men and
women of various ages were divided into four categories
according to hair color: light blond, dark blond, light
brunette, and dark brunette. Each person in the experiment
was given a pain threshold score based on his/her
performance in a pain sensitivity test (higher scores mean
higher pain tolerance). The data is provided in the following
table.
Data from Experiment
Light
Blond
62
60
71
55
48
Dark
Light
Dark
Blond Brunette Brunette
63
42
32
57
50
39
52
41
51
41
37
30
43
35
Question: Based on this data set, could we conclude that
there are differences in the mean pain threshold of blondes
and brunettes?
Descriptive Statistics
Variable
LightBlo
DarkBlon
LightBru
DarkBrun
N
5
5
4
5
Mean
59.20
51.20
42.50
37.40
Median
60.00
52.00
41.50
35.00
StDev
8.53
9.28
5.45
8.32
SE Mean
3.81
4.15
2.72
3.72
Note: Would usually accompany this summary by a
comparative box plots, but since we only have a few
observations per group in this example, it is not very
appropriate in this case.
Comparative DotPlots of the Four Groups
Dotplots of LightBlo - DarkBrun
(group means are indicated by lines)
70
60
50
40
DarkBrun
LightBru
DarkBlon
LightBlo
30
Computations of Sum of Squares
SS due to Treatment (SSTr) =  ni(LMeani - OMean)2
= 5(59.2 - 47.84)2 + 5(51.2 - 47.84)2 + 4(42.5 - 47.84)2 +
5(37.4-47.84)2 = 1360.7264
Lmeani = sample mean of the observations in sample i
Omean = overall sample mean of all observations
SS due to Error (SSE) = (ni-1)(Si)2
= (5-1)(8.53)2 + (5-1)(9.28)2 + (4-1)(5.45)2 +
(5- 1)2(8.32)2 = 1001.5143
Si = sample standard deviation of the observations in
sample i
Test Procedure (ANOVA)
To test the null H0: m1 = m2 = … = mp versus the alternative
H1: at least two means are different, we use the F-test which
rejects H0 whenever
Fc 
MSTr ( SSTr/( p  1))

 F ; p 1,n  p
MSE ( SSE /( N  p))
where Fa;p-1,N-p is the tabular value from the F-distribution
with (p-1, N-p) degrees-of-freedom; p is the number of
groups, and N = n1 + n2 + … + np is the total number of
observations. Or, one may simply compare the p-value
(observed significance level) to the nominal level (usually
.05).
Analysis of Variance
(Using Minitab)
Test Statistic
p-value
One-way Analysis of Variance
Analysis of Variance
Source
DF
SS
Factor
3
1360.7
Error
15
1001.8
Total
18
2362.5
Level
LightBlo
DarkBlon
LightBru
DarkBrun
N
5
5
4
5
Pooled StDev =
Mean
59.200
51.200
42.500
37.400
8.172
MS
453.6
66.8
StDev
8.526
9.284
5.447
8.325
F
6.79
P
0.004
Individual 95% CIs For Mean
Based on Pooled StDev
------+---------+---------+---------+
(-----*------)
(------*-----)
(------*-------)
(-----*------)
------+---------+---------+---------+
36
48
60
72
Conclusion: Since the p-value is very small, we can conclude
that at least two of the population means are different.
Interpretations
If the F-test (ANOVA) fails to reject H0, then you conclude
that there are no differences among the population means of
the p populations.
If the F-test rejects H0, then you conclude that at least two
of the population means are different (but not necessarily all
of them). You then proceed to examine the individual
confidence intervals, or you could perform pairwise t-tests
to determine which population means are different. The use
of the pairwise t-test will entail an increase in the Type I
error, so a more appropriate analysis to detect which means
are different uses “multiple comparisons procedures.”