Lecture 12: Tues., Feb. 24 - Wharton Statistics Department

Download Report

Transcript Lecture 12: Tues., Feb. 24 - Wharton Statistics Department

Lecture 13: Tues., Feb. 24
• Comparisons Among Several Groups –
Introduction (Case Study 5.1.1)
• Comparing Any Two of the Several Means
(Chapter 5.2)
• The One-Way Analysis of Variance F-test
(Chapter 5.3)
• Robustness to Assumptions (5.5.1)
• Thursday: Multiple Comparisons (6.3-6.4)
Comparing Several Groups
• Chapter 5 and 6: Compare the means of I groups
(I>=2). Examples:
– Compare the effect of three different teaching methods
on test scores.
– Compare the effect of four different therapies on how
long a cancer patient lives.
– Compare the effect of using different amounts of
fertilizer on the yield of a crop.
– Compare the amount of time that ten different tire
brands last.
• As in Ch. 1-4, studies can either seek to compare
treatments (causal inferences) or population means
Case Study 5.1.1
• Female mice randomly assigned to one of six
treatment groups
– NP: Mice in this group ate as much as they pleased of nonpurified,
standard diet
– N/N85: Fed normally both before and after weaning. After
weaning, ration controlled at 85 kcal/wk
– N/R50: Fed normal diet before weaning and reduced calorie diet of
50 kcal/wk after weaning
– R/R50: Fed reduced calorie diet of 50 kcal/wk both before and
after weaning
– N/R50 lopro: Fed normal diet before weaning, a restricted diet of
50 kcal/wk after weaning and dietary protein content decreased
with advancing age
– N/R40: Fed normally before weaning and given severely reduced
diet of 40 kcal/wk after weaning.
Questions of Interest
• Specific comparisons of treatments, see Display
5.3 (section 5.2)
• Are all of the treatments the same? (F-test,
Section 5.3).
• Multiple comparisons (Chapter 6)
• Terminology for several group problem: one-way
classification problem, one-way layout
• Setup in JMP: One column for response (e.g.,
lifetime), a second column for group label.
Ideal Model for Several Samples
Ideal model:
– The populations 1,2,…,I have normal distributions with
means 1, 2 ,..., I
– Each population has the same standard deviation 
– Observations within each sample are independent
– Observations in any one sample are independent of
observations in other samples
• Sample sizes n1,..., nI . Total sample size

I
n n
i 1 i
Randomized Experiments
• Terminology of samples from multiple
populations used but methods also apply to data
from randomized experiments in which response
of Y1 on treatment 1 would produce response of
Y 
Y1   2 on treatment 2 and 1 3 on treatment 3, etc
• Can think of 2  1as equivalent to  2 and 3  2
as equivalent to  3   2 (additive treatment effect
of treatment 3 compared to treatment 2)
• Phrase concluding statements in terms of
treatment effects or population means depending
on type of study.
Comparing any two of several
means
• Compare mean of mice on N/R50 diet to mean of
N/N85 diet, 3  2 (i.e., what is the additive
treatment effect of N/N85 diet?)
• What’s different from two group problem? We
have additional information about the variability
in the populations from the additional groups.
• We use this information in constructing a more
accurate estimate of the population variance.
Comparing any two means
• Comparison of 2 and  3
• Use usual t-test but estimate  from weighted
average of sample standard deviations in all
groups, use df=n-I.
2
2
2
(
n

1
)
s

(
n

1
)
s



(
n

1
)
s
1
2
2
I
I
• sp2  1
(n1  1)  (n2  1)    (nI  1)
1 1

n1 n2
• 95% CI for 3  2 : Y3  Y2  t.975,nI * s p
(Note: Multiplier for degree of confidence equals
t.975,n I  t.975,n n 2 , the multiplier if there were only
two groups)
• See handout for implementation in JMP
2
3
Note about CIs and hyp. tests
• Suppose we form a 95% confidence interval for a
parameter, e.g., 1  2
• The 95% confidence interval will contain 0 if and
only if the p-value of the two sided test that the
parameter equals 0 (e.g., H 0 : 1  2  0
vs. H A : 1  2  0) has p-value >=0.05.
• In other words the test will only give a
“statistically significant” result if the confidence
interval does not contain 0.
One-Way ANOVA F-test
• Basic Question: Is there any difference between
any of the means?
• H0 : 1  2    I
• HA: At least two of the means i and  j are not
equal
• Could do t-tests of all pairs of means but this has
difficulties (Chapter 6 – multiple comparisons)
and is not the best test.
• Test statistic: Analysis of Variance F-test.
ANOVA F-test in JMP
Analysis of Variance
Source
DF
Sum of Squares
Mean Square
F Ratio
Prob > F
DIET
5 12733.942 2546.79 57.1043 <.0001
Error
343 15297.415
44.60
C. Total 348 28031.357
• Convincing evidence that the means
(treatment effects) are not all the same
Rationale behind the test statistic
• If the null hypothesis is true, we would expect all the
sample means to be close to one another (and as a result,
close to the grand mean).
• If the alternative hypothesis is true, at least some of the
sample means would differ.
• Thus, we measure variability between sample means.
• Large variability within the samples weakens the “ability”
of the sample means to represent their corresponding
population means.
• Therefore, even though sample means may markedly differ
from one another, variability between sample means must
be judged relative to the “within samples variability”.
F Test Statistic
• Notation: Yij = jth observation in ith group, Yi =
sample mean of ith group,
= grand mean
Y
(sample mean of all observations)
I
2
n
(
Y

Y
)
/( I  1)
• F-test statistic:
i1 i i
F
2
(
Y

Y
)
i1  j1 ij i /( n  I )
I
ni
• Test statistic is essentially (Variability of the
sample means)/(Variability within samples).
• Large values of F are implausible under H0.
• F statistic follows F(I-1,n-I) distribution under H0.
Reject H0 if F>F(1   ; I  1, n  I ) [See Table A.4]
ANOVA F-test in JMP
Analysis of Variance
Source
DF
Sum of Squares
Mean Square
F Ratio
Prob > F
DIET
5 12733.942 2546.79 57.1043 <.0001
Error
343 15297.415
44.60
C. Total 348 28031.357
• F=57.1043, p-value <0.0001
• Convincing evidence that the means
(treatment effects) are not all the same
Robustness to Assumptions
• Robustness of t-tests and F-tests for comparing several
groups are similar to robustness for two group problem.
– Normality is not critical. Extremely long-tailed or
skewed distributions only cause problems if sample size
in a group is <30
– The assumptions of independence within and across
groups are critical.
– The assumption of equal standard deviations in the
population is crucial. Rule of thumb: Check if largest
sample standard deviation divided by smallest sample
standard deviation is <2
– Tools are not resistant to severely outlying
observations. Use outlier examination strategy in
Display 3.6
Distributions
Midterm Score
Approximate grade guidelines:
Exam raw score Grade
44+
A (includes A+, A, A)
35+
B (includes B+, B, B-)
•
25
Final grade is determined by 40%
homework, 20% each midterm and 20%
final. Lower midterm is replaced by
final score if latter is higher.
30
35
40
45
50
Quantiles
75.0%
50.0%
25.0%
quartile
median
quartile
47.000
42.750
36.750
Moments
Mean
Std Dev
41.216667
6.4255462