Tues, Oct 21 - Wharton Statistics Department

Download Report

Transcript Tues, Oct 21 - Wharton Statistics Department

Lecture 13 – Tues, Oct 21
• Comparisons Among Several Groups –
Introduction (Case Study 5.1.1)
• Comparing Any Two of the Several Means
(Chapter 5.2)
• The One-Way Analysis of Variance F-test
(Chapter 5.3)
• Robustness to Assumptions (5.5.1)
• Thursday: Linear Combinations of Group Means
(6.2), Multiple Comparisons (6.3-6.4)
Rules of thumb for validity of ttools
• Assumptions and rules of thumb for validity of t-tools in the face of
violations
– Normality: Look for gross skewness. Okay if both sample sizes
greater than 30.
– Equal spread: Validity okay if ratio of larger sample standard
deviation to smaller sample standard deviation is less than 3 and
ratio of larger group size to smaller group size is less than 2.
Consider transformations. Use Welch’s t-test otherwise.
– Outliers: Look for outliers in box plots, especially very extreme
points (more than 3 box-lengths away from box). Apply the
examination strategy in Display 3.6.
– Independence: If indep. not appropriate, apply matched pairs if
appropriate or other tools later in course.
Comparing Several Groups
• Chapter 5 and 6: Compare the means of I groups
(I>=2). Examples:
– Compare the effect of three different teaching methods
on test scores.
– Compare the effect of four different therapies on how
long a cancer patient lives.
– Compare the effect of using different amounts of
fertilizer on the yield of a crop.
– Compare the amount of time that ten different tire
brands last.
• As in Ch. 1-4, studies can either seek to compare
treatments (causal inferences) or population means
Case Study 5.1.1
• Female mice randomly assigned to one of six
treatment groups
– NP: Mice in this group ate as much as they pleased of nonpurified,
standard diet
– N/N85: Fed normally both before and after weaning. After
weaning, ration controlled at 85 kcal/wk
– N/R50: Fed normal diet before weaning and reduced calorie diet of
50 kcal/wk after weaning
– R/R50: Fed reduced calorie diet of 50 kcal/wk both before and
after weaning
– N/R50 lopro: Fed normal diet before weaning, a restricted diet of
50 kcal/wk after weaning and dietary protein content decreased
with advancing age
– N/R40: Fed normally before weaning and given severely reduced
diet of 40 kcal/wk after weaning.
Questions of Interest
• Specific comparisons of treatments, see Display
5.3 (section 5.2)
• Are all of the treatments the same? (F-test,
Section 5.3).
• Multiple comparisons (Chapter 6)
• Terminology for several group problem: one-way
classification problem, one-way layout
• Setup in JMP: One column for response (e.g.,
lifetime), a second column for group label.
Ideal Model for Several Samples
Ideal model:
– The populations 1,2,…,I have normal distributions with
means 1, 2 ,..., I
– Each population has the same standard deviation 
– Observations within each sample are independent
– Observations in any one sample are independent of
observations in other samples
• Sample sizes n1,..., nI . Total sample size

I
n n
i 1 i
Randomized Experiments
• Terminology of samples from multiple
populations used but methods also apply to data
from randomized experiments in which response
of Y1 on treatment 1 would produce response of
Y 
Y1   2 on treatment 2 and 1 3 on treatment 3, etc
• Can think of 2  1as equivalent to  2 and 3  2
as equivalent to  3   2 (additive treatment effect
of treatment 3 compared to treatment 2)
• Phrase concluding statements in terms of
treatment effects or population means depending
on type of study.
Comparing any two of several
means
• Compare mean of mice on N/R50 diet to mean of
N/N85 diet, 3  2 (i.e., what is the additive
treatment effect of N/N85 diet?)
• What’s different from two group problem? We
have additional information about the variability
in the populations from the additional group.
• We use this information in constructing a more
accurate estimate of the population variance.
Comparing any two means
• Comparison of 2 and 3
• Use usual t-test but estimate  from
weighted average of sample standard
deviations in all groups, use df=n-I.
2
2
2
•
(n1  1) s1  (n2  1) s2    (nI  1) sI
2
sp 
(n1  1)  (n2  1)    (nI  1)
• See handout for implementation in JMP
Note about CIs and hyp. tests
• Suppose we form a 95% confidence interval for a
parameter, e.g., 1  2
• The 95% confidence interval will contain 0 if and
only if the p-value of the two sided test that the
parameter equals 0 (e.g., H 0 : 1  2  0
vs. H A : 1  2  0) has p-value >=0.05.
• In other words the test will only give a
“statistically significant” result if the confidence
interval does not contain 0.
One-Way ANOVA F-test
• Basic Question: Is there any difference between
any of the means?
• H0 : 1  2    I
• HA: At least two of the means i and  j are not
equal
• Could do t-tests of all pairs of means but this has
difficulties (Chapter 6 – multiple comparisons)
and is not the best test.
• Test statistic: Analysis of Variance F-test.
ANOVA F-test in JMP
Analysis of Variance
Source
DF
Sum of Squares
Mean Square
F Ratio
Prob > F
DIET
5 12733.942 2546.79 57.1043 <.0001
Error
343 15297.415
44.60
C. Total 348 28031.357
• Convincing evidence that the means
(treatment effects) are not all the same
The rationale behind the test statistic – I
• If the null hypothesis is true, we would
expect all the sample means to be close to
one another (and as a result, close to the
grand mean).
• If the alternative hypothesis is true, at least
some of the sample means would differ.
• Thus, we measure variability between
sample means.
The rationale behind the test statistic – II
• If the null hypothesis is true, we would
expect all the sample means to be close to
one another (and as a result, close to the
grand mean).
• If the alternative hypothesis is true, at least
some of the sample means would differ.
• Thus, we measure variability between
sample means.
Robustness to Assumptions
• Robustness of t-tests and F-tests for comparing several
groups are similar to robustness for two group problem.
– Normality is not critical. Extremely long-tailed or
skewed distributions only cause problems if sample
sizes in each group are <30
– The assumption of independence within and across
groups are critical.
– The assumption of equal standard deviations in the
population is crucial. Rule of thumb: Check if largest
sample standard deviation divided by smallest sample
standard deviation is <2
– Tools are not resistant to severely outlying
observations. Use outlier examination strategy in