Transcript Notes 21

Stat 112: Lecture 21 Notes
•
•
•
•
Model Building (Brief Discussion)
Chapter 9.1: One way Analysis of Variance.
Homework 6 is due Friday, Dec. 1st.
I will be e-mailing you tonight or tomorrow some
comments on your project ideas.
• I will have the quizzes graded by tomorrow’s
office hours (Wed. 1:30-2:30); otherwise, I will
return to you next Tuesday.
Model Building
1. Among the potential explanatory variables,
think about which explanatory variables
address the question of interest.
2. For each explanatory variable, investigate
whether a transformation is needed for it either
because of curvature or crunching.
3. Consider adding polynomial terms for each
variable if there is remaining curvature for the
variable (use the procedure of adding higher
orders as long as the highest order term has pvalue < 0.05).
4. Consider interactions between the explanatory
variables, adding the interaction if the p-value
< 0.05 on the interaction term.
Analysis of Variance
• The goal of analysis of variance is to
compare the means of several (many)
groups.
• Analysis of variance is regression with
only categorical variables
• One-way analysis of variance: Groups are
defined by one categorical variable.
• Two-way analysis of variance: Groups are
defined by two categorical variables.
Milgram’s Obedience Experiments
• Subjects recruited to take part in an
experiment on “memory and learning.”
• The subject is the teacher.
The subject conducted a paired-associated
learning task with the student.
The subject is instructed by the experimenter
to administer a shock to the student each time he
gave a wrong response. Moreover, the subject
was instructed to “move one level higher on the
shock generator each time the learner gives a
wrong answer.” The subject was also instructed
to announce the voltage level before
administering a shock.
Four Experimental Conditions
1. Remote-Feedback condition: Student is placed
in a room where he cannot be seen by the
subject nor can his voice be heard; his
answers flash silently on signal box. However,
at 300 volts the laboratory walls resound as he
pounds in protest. After 315 volts, no further
answers appear, and the pounding ceases.
2. Voice-Feedback condition: Same as remotefeedback condition except that vocal protests
were introduced that could be heard clearly
through the walls of the laboratory.
3.
4.
Proximity: Same as the voice-feedback condition
except that student was placed in the same room as
the subject, a few feet from subject. Thus, he was
visible as well as audible.
Touch-Proximity: Same as proximity condition except
that student received a shock only when his hand
rested on a shock plate. At the 150-volt level, the
student demanded to be let free and refused to place
his hand on the shock plate. The experimenter
ordered the subject to force the victim’s hand onto the
plate.
Two Key Questions
1. Is there any difference among the mean
voltage levels of the four conditions?
2. If there are differences, what conditions
specifically are different?
Oneway Analysis of Voltage Level By Condition
450
Voltage Level
400
350
300
250
200
150
100
Proximity
Remote
Touch-Proximity
Voice-Feedback
Condition
Means and Std Deviations
Level
Proximity
Remote
Touch-Proximity
Voice-Feedback
Number
40
40
40
40
Mean
312.000
405.000
268.125
367.875
Std Dev
129.979
63.640
131.874
119.518
Std Err Mean
20.552
10.062
20.851
18.897
Lower 95%
270.43
384.65
225.95
329.65
Upper 95%
353.57
425.35
310.30
406.10
Multiple Regression Model for
Analysis of Variance
• To answer these questions, we can fit a multiple
regression model with voltage level as the response and
one categorical explanatory variable (condition).
• We obtain a sample from each level of the categorical
variable (group) and are interested in estimating the
population means of the groups based on these
samples.
• Assumptions of multiple regression model for one-way
analysis of variance:
– Linearity: automatically satisfied.
– Constant variance: Check if spread within each group is the
same.
– Normality: Check if distribution within each group is normally
distributed.
– Independence: Sample consists of independent observations.
Comparing the Groups
Expanded Estimates
Nominal factors expanded to all levels
Term
Intercept
Condition[Proximity]
Condition[Remote]
Condition[Touch-Proximity]
Condition[Voice-Feedback]
Estimate
338.25
-26.25
66.75
-70.125
29.625
Std Error
9.067431
15.70525
15.70525
15.70525
15.70525
t Ratio
37.30
-1.67
4.25
-4.47
1.89
Prob>|t|
<.0001
0.0966
<.0001
<.0001
0.0611
Eˆ (Y | Condition  Re mote  Feedback )  338.25  66.75  405
Eˆ (Y | Condition  Voice  Feedback )  338.25  29.625  367.875
Eˆ (Y | Condition  Pr oximity)  338.25  26.25  312
Eˆ (Y | Condition  Touch  Pr oximity)  338.25  70.125  268.125
• The coefficient on Condition[Proximity]=-26.25
means that proximity is estimated to have a
mean that is 26.25 less than the mean of the
means of all the conditions.
ˆ (Y | Condition  Proximity ) 
E
•
Sample mean of
proximity group.
Means and Std Deviations
Level
Proximity
Remote
Touch-Proximity
Voice-Feedback
Number
40
40
40
40
Mean
312.000
405.000
268.125
367.875
Std Dev
129.979
63.640
131.874
119.518
Std Err Mean
20.552
10.062
20.851
18.897
Lower 95%
270.43
384.65
225.95
329.65
Upper 95%
353.57
425.35
310.30
406.10
Response Voltage Level
Effect Tests
Source
Condition
Nparm
3
DF
3
Sum of Squares
437591.25
F Ratio
11.0881
Prob > F
<.0001
Expanded Estimates
Nominal factors expanded to all levels
Term
Intercept
Condition[Proximity]
Condition[Remote]
Condition[Touch-Proximity]
Condition[Voice-Feedback]
Estimate
338.25
-26.25
66.75
-70.125
29.625
Std Error
9.067431
15.70525
15.70525
15.70525
15.70525
t Ratio
37.30
-1.67
4.25
-4.47
1.89
Prob>|t|
<.0001
0.0966
<.0001
<.0001
0.0611
• Effect Test tests null hypothesis that the mean in
all four conditions is the same versus alternative
hypothesis that at least two of the conditions
have different means.
• p-value of Effect Test < 0.0001. Strong evidence
that population means are not the same for all
four conditions.
JMP for One-way ANOVA
• One-way ANOVA can be carried out in JMP
either using Fit Model with a categorical
explanatory variable or Fit Y by X with the
categorical variable as the explanatory variable.
• After using the Fit Y by X command, click the red
triangle next to Oneway Analysis and then
Display Options, Boxplots to see side by side
boxplots and click Mean/ANOVA to see means
of the different groups and the test of whether all
groups have the same means. This test of
whether all groups have the same means has pvalue Prob>F in the ANOVA table.
Oneway Analysis of Voltage Level By Condition
450
Voltage Level
400
350
300
250
200
150
100
Proximity
Remote
Touch-Proximity
Voice-Feedback
Condition
Oneway Anova
Summary of Fit
Rsquare
Adj Rsquare
Root Mean Square Error
Mean of Response
Observations (or Sum Wgts)
0.175756
0.159906
114.6949
338.25
160
Analysis of Variance
Source
Condition
Error
C. Total
DF
3
156
159
Sum of Squares
437591.3
2052168.8
2489760.0
Mean Square
145864
13155
F Ratio
11.0881
Prob > F
<.0001
Means for Oneway Anova
Level
Proximity
Remote
Touch-Proximity
Voice-Feedback
Number
40
40
40
40
Mean
312.000
405.000
268.125
367.875
Std Error
18.135
18.135
18.135
18.135
Lower 95%
276.18
369.18
232.30
332.05
Upper 95%
347.82
440.82
303.95
403.70
Prob>F = p-value for
test that all groups
have same mean.
Same as p-value for
Effect test in Fit Model
Output.
Two Key Questions
1. Is there any difference among the mean
voltage levels of the four conditions?
Yes, there is strong evidence of a
difference. p-value of Effect Test <
0.0001.
2. If there are differences, what conditions
specifically are different?
Testing whether each of the groups
is different
• Naïve approach to deciding which groups
have mean that is different from the
average of the means of all groups: Do ttest for each group and look for groups
that have p-value <0.05.
• Problem: Multiple comparisons.
Finding pairs that are significantly different:
Naive approach: Compare each group using a custom t-test and reject the null
hypothesis that the means of both groups in the pair are same if the p-value of a
two-sided t-test is less than 0.05. This can be done automatically by using Fit Y
by X and then clicking Compare Means and clicking Each Pair, Student’s t.
Oneway Analysis of Voltage Level By Condition
Means Comparisons
Comparisons for each pair using Student's t
Mean
Level
Remote
Voice-Feedback
Proximity
Touch-Proximity
A
A
B
B
405.00000
367.87500
312.00000
268.12500
Levels not connected by same letter are significantly different
Level
- Level
Difference
Remote
Touch-Proximity
136.8750
Voice-Feedback
Touch-Proximity
99.7500
Remote
Proximity
93.0000
Voice-Feedback
Proximity
55.8750
Proximity
Touch-Proximity
43.8750
Remote
Voice-Feedback
37.1250
Lower CL
86.2157
49.0907
42.3407
5.2157
-6.7843
-13.5343
Upper CL
187.5343
150.4093
143.6593
106.5343
94.5343
87.7843
p-Value Difference
3.2771e-7
0.0001484
0.0003890
0.0308583
0.0891141
0.1497462
Significantly different pairs: Remote and Proximity, Remote and Touch-Proximity,
Voice-Feedback and Proximity, Voice-Feedback and Touch-Proximity.
Errors in Hypothesis Testing
State of World
Decision
Based on
Data
Null
Hypothesis
True
Accept Null Correct
Hypothesis Decision
Alternative
Hypothesis
True
Type II
error
Reject Null Type I
Hypothesis errror
Correct
Decision
When we do one hypothesis test and reject null hypothesis if p-value <0.05, then
the probability of making a Type I error when the null hypothesis is true is 0.05. We
protect against falsely rejecting a null hypothesis by making probability of Type I
error small.
Multiple Comparisons Problem
• Compound uncertainty: When doing more
than one test, there is an increase chance
of making a mistake.
• If we do multiple hypothesis tests and use
the rule of rejecting the null hypothesis in
each test if the p-value is <0.05, then if all
the null hypotheses are true, the
probability of falsely rejecting at least one
null hypothesis is >0.05.
Multiple Comparisons
Simulation
• In multiplecomp.JMP, 20 groups are
compared with sample sizes of ten for each
group.
• The observations for each group are
simulated from a standard normal distribution.
Thus, in fact, 1  2    20  0
• Number of pairs found to have significantly
different means using t-test at level   0.05
2
3
4
5
• Iterat 1
ion
# of
Pairs
Multiple Comparison Simulation
• In multiplecomp.JMP, 20 groups are
compared with sample sizes of ten for each
group.
• The observations for each group are
simulated from a standard normal distribution.
1  2    20  0
Thus, in fact,
• Number of groups found to have means
different than average using t-test and
rejecting if p-value <0.05.
Iteration 1
2
3
4
5
# of
Groups
Individual vs. Familywise Error
Rate
• When several tests are considered
simultaneously, they constitute a family of tests.
• Individual Type I error rate: Probability for a
single test that the null hypothesis will be
rejected assuming that the null hypothesis is
true.
• Familywise Type I error rate: Probability for a
family of test that at least one null hypothesis will
be rejected assuming that all of the null
hypotheses are true.
• When we consider a family of tests, we want to
make the familywise error rate small, say 0.05,
to protect against falsely rejecting a null
hypothesis.
Bonferroni Method
• General method for doing multiple comparisons
for any family of k tests.
• Denote familywise type I error rate we want by
p*, say p*=0.05.
• Compute p-values for each individual test -p1,..., pk
p*
• Reject null hypothesis for ith test if pi 
k
• Guarantees that familywise type I error rate is at
most p*.
• Why Bonferroni works: If we do k tests and all
null hypotheses are true , then using Bonferroni
with p*=0.05, we have probability 0.05/k to make
a Type I error for each test and expect to make
k*(0.05/k)=0.05 errors in total.
Tukey’s HSD
• Tukey’s HSD is a method that is
specifically designed to control the
familywise type I error rate (at 0.05) for
analysis of variance.
• After Fit Model, click the red triangle next
to the X variable and click LSMeans Tukey
HSD.
LSMeans Differences Tukey HSD
Alpha=
0.050 Q=
2.59695LSMean[i] By LSMean[j]
Mean[i]-Mean[j]
Std Err Dif
Lower CL Dif
Upper CL Dif
Proximity
Remote
Touch-Proximity
Voice-Feedback
Level
Remote
Voice-Feedback
Proximity
Touch-Proximity
A
A
B
B
C
C
Proximity
Remote
Touch-Proximity
Voice-Feedback
0
0
0
0
93
25.6466
26.3972
159.603
-43.875
25.6466
-110.48
22.7278
55.875
25.6466
-10.728
122.478
-93
25.6466
-159.6
-26.397
0
0
0
0
-136.88
25.6466
-203.48
-70.272
-37.125
25.6466
-103.73
29.4778
43.875
25.6466
-22.728
110.478
136.875
25.6466
70.2722
203.478
0
0
0
0
99.75
25.6466
33.1472
166.353
-55.875
25.6466
-122.48
10.7278
37.125
25.6466
-29.478
103.728
-99.75
25.6466
-166.35
-33.147
0
0
0
0
Least Sq Mean
405.00000
367.87500
312.00000
268.12500
Levels not connected by same letter are significantly different
Comparisons between groups that are in red are groups for which the null hypothesis
that the group means are the same is rejected using the Tukey HSD procedure, which
controls the familywise Type I error rate at 0.05. A confidence interval for the difference
in group means that adjusts for multiple comparisons is shown in the third and fourth
lines.
Assumptions in one-way ANOVA
• Assumptions needed for validity of oneway analysis of variance p-values and CIs:
– Linearity: automatically satisfied.
– Constant variance: Spread within each group
is the same.
– Normality: Distribution within each group is
normally distributed.
– Independence: Sample consists of
independent observations.
Rule of thumb for checking
constant variance
• Constant variance: Look at standard deviation of
different groups by using Fit Y by X and clicking Means
and Std Dev.
Means and Std Deviations
Level
Proximity
Remote
Touch-Proximity
Voice-Feedback
Number
40
40
40
40
Mean
312.000
405.000
268.125
367.875
Std Dev
129.979
63.640
131.874
119.518
Std Err Mean
20.552
10.062
20.851
18.897
• Rule of Thumb: Check whether (highest group standard
deviation/lowest group standard deviation) is greater
than 2. If greater than 2, then constant variance is not
reasonable and transformation should be considered.. If
less than 2, then constant variance is reasonable.
• (Highest group standard deviation/lowest group standard
deviation) =(131.874/63.640)=2.07. Thus, constant
variance is not reasonable for Milgram’s data.
Transformations to correct for
nonconstant variance
• If standard deviation is highest for high groups with high
means, try transforming Y to log Y or Y . If standard
deviation is highest for groups with low means, try
transforming Y to Y2.
Means and Std Deviations
Level
Proximity
Remote
Touch-Proximity
Voice-Feedback
Number
40
40
40
40
Mean
312.000
405.000
268.125
367.875
Std Dev
129.979
63.640
131.874
119.518
Std Err Mean
20.552
10.062
20.851
18.897
• SD is particularly low for group with highest mean. Try
transforming to Y2. To make the transformation, right
click in new column, click New Column and then right
click again in the created column and click Formula and
enter the appropriate formula for the transformation.
Transformation of Milgram’s data to
Squared Voltage Level
Means and Std Deviations
Level
Proximity
Remote
Touch-Proximity
Voice-Feedback
Number
40
40
40
40
Mean
113816
167974
88847
149259
Std Dev
78920.2
48541.4
79291.3
74053.6
Std Err Mean
12478
7675
12537
11709
• Check of constant variance for transformed data:
(Highest group standard deviation/lowest group
standard deviation) = 1.63. Constant variance
assumption is reasonable for voltage squared.
• Analysis of variance tests are approximately
valid for voltage squared data; reanalyzed data
using voltage squared.
Analysis using Voltage Squared
Strong evidence that the group mean voltage squared levels are not all the same.
Response Voltage Squared
Effect Tests
Source
Condition
Nparm
3
DF
3
Sum of Squares
1.50737e11
F Ratio
9.8735
Prob > F
<.0001
Effect Test Gives Strong Evidence That Not All Conditions Have the Same Mean Voltage.
Oneway Analysis of Voltage Squared By Condition
Comparisons for all pairs using Tukey-Kramer HSD
Level
Remote
Voice-Feedback
Proximity
Touch-Proximity
A
A
B
B
C
C
Mean
167973.75
149259.38
113816.25
88846.88
Levels not connected by same letter are significantly different
Level
- Level
Difference
Remote
Touch-Proximity
79126.88
Voice-Feedback
Touch-Proximity
60412.50
Remote
Proximity
54157.50
Voice-Feedback
Proximity
35443.13
Proximity
Touch-Proximity
24969.38
Remote
Voice-Feedback
18714.38
Lower CL
37701.9
18987.6
12732.6
-5981.8
-16455.6
-22710.6
Upper CL Difference
120551.8
101837.4
95582.4
76868.1
66394.3
60139.3
Strong evidence that remote has higher mean voltage squared level than proximity
and touch-proximity and that voice-feedback has higher mean voltage squared level
than touch-proximity, taking into account the multiple comparisons.
Rule of Thumb for Checking
Normality in ANOVA
• The normality assumption for ANOVA is that the
distribution in each group is normal. Can be checked by
looking at the boxplot, histogram and normal quantile
plot for each group.
• If there are more than 30 observations in each group,
then the normality assumption is not important; ANOVA
p-values and CIs will still be approximately valid even for
nonnormal data if there are more than 30 observations in
each group.
• If there are less than 30 observations per group, then we
can check normality by clicking Analyze, Distribution and
then putting the Y variable in the Y, Columns box and the
categorical variable denoting the group in the By box.
We can then create normal quantile plots for each group
and check that for each group, the points in the normal
quantile plot are in the confidence bands. If there is
nonnormality, we can try to use a transformation such as
log Y and see if the transformed data is approximately
normally distributed in each group.
One way Analysis of Variance:
Steps in Analysis
1. Check assumptions (constant variance,
normality, independence). If constant variance
is violated, try transformations.
2. Use the effect test (commonly called the Ftest) to test whether all group means are the
same.
3. If it is found that at least two group means
differ from the effect test, use Tukey’s HSD
procedure to investigate which groups are
different, taking into account the fact multiple
comparisons are being done.