Transcript PPch12
Chapter 12
Analysis of Variance
1
Goals
1. List the characteristics of the F distribution
2. Conduct a test of hypothesis to determine
whether the variances of two populations
are equal
3. Discuss the general idea of analysis of
variance
4. Organize data into a ANOVA table
5. Conduct a test of hypothesis among three
or more treatment means
2
F Distribution
1. Used to test whether two
samples are from
populations having equal
variances
2. Applied when we want to
compare several
population means
simultaneously to
determine if they came
from equal population
ANOVA
Analysis of variance
In both situations:
Populations must be
Normally distributed
Data must be Intervalscale or higher
3
Characteristics Of The F Distribution
1.
Family of F Distributions
1.
Each is determined by:
1.
df in numerator
2.
df in denominator
2.
3.
F Value can assume an infinite number of values from 0 to ∞
Value for F Distribution cannot be negative
Smallest value = 0
Positively skewed
1.
2.
5.
comes from pop. 2 which has smaller sample variation
F distribution is continuous
4.
comes from pop. 1 which has larger sample variation
Long tail is always to right
As # of df increases in both the numerator and the denominator,
the distribution approaches normal
Asymptotic
1.
As X increases the F curve approaches the X-axis
4
Why Do We Want To Compare To See If
Two Population Have Equal Variances?
What if two machines are making the same part
for an airplane?
Do we want the parts to be identical or nearly
identical? Yes!
We would test to see if the means are the same:
Chapter 10 & 11
We would test to see if the variation is the same
for the two machines: Chapter 12
What if two stocks have similar mean returns?
Would we like to test and see if one stock has more
variation than the other?
5
Why Do We Want To Compare To See If
Two Population Have Equal Variances?
Remember Chapter 11: Assumptions for
small sample tests of means:
1. Sample populations must follow the normal
distribution
2. Two samples must be from independent
(unrelated) populations
3. The variances & standard deviations of the
two populations are equal
6
Conduct A Test Of Hypothesis To Determine Whether
The Variances Of Two Populations Are Equal
To conduct a test:
Always list
Conduct two random samples
the sample
List population 1 as the sample with the largest
variance:
n1 = # of observations
s1^2 = sample variance
n1 – 1 = df1 = degree of freedom (numerator for
critical value lookup)
List population 2 as the sample with the smaller
variance:
n2 = # of observations
s2^2 = sample variance
n2 – 1 = df2 = degree of freedom (denominator for
critical value lookup)
with the
larger
sample
variance
as
population
1
(allows us
to
use fewer
tables)
7
Step 1: State null and alternate hypotheses
• List the population with the suspected
largest variance as population 1
• Because we want to limit the number of F tables
we need to use to look up values, we always put
the larger variance in the numerator and the
smaller variance in the denominator
• This will force the F value to be at least 1
• We will only use the right tail of the F distribution
• Examples of Step 1:
H 0 : 12 22
H 0 : 12 22
H1 : 12 22
H 1 : 12 22
8
Step 2: Select a level of significance:
• Appendix G only lists significance levels: .05
and .01
H 0 : 12 22
H 0 : 12 22
H1 : 12 22
H 1 : 12 22
Significance level = .10
.10/2 = .05
Use .05 table in
Appendix G
Significance level = .05
Use .05 table in
Appendix G
9
Step 3: Identify the test statistic (F), find
critical value and draw picture
• Look up Critical value in Appendix G and draw your picture
Level of Significance
0.05
Degrees of Freedom for Denominator (From Pop 2)
Degrees of Freedom for Numerator (From Pop 1)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
1
161
18.5
10.1
7.7
6.61
5.99
5.59
5.32
5.12
4.96
4.84
4.75
4.67
4.6
4.54
4.49
4.45
2
200
19.0
9.55
6.94
5.79
5.14
4.74
4.46
4.26
4.1
3.98
3.89
3.81
3.74
3.68
3.63
3.59
3
216
19.2
9.28
6.59
5.41
4.76
4.35
4.07
3.86
3.71
3.59
3.49
3.41
3.34
3.29
3.24
3.2
4
225
19.3
9.12
6.39
5.19
4.53
4.12
3.84
3.63
3.48
3.36
3.26
3.18
3.11
3.06
3.01
2.96
5
230
19.3
9.01
6.26
5.05
4.39
3.97
3.69
3.48
3.33
3.2
3.11
3.03
2.96
2.9
2.85
2.81
6
234
19.3
8.94
6.16
4.95
4.28
3.87
3.58
3.37
3.22
3.09
3
2.92
2.85
2.79
2.74
2.7
7
237
19.4
8.89
6.09
4.88
4.21
3.79
3.5
3.29
3.14
3.01
2.91
2.83
2.76
2.71
2.66
2.61
If you have a df
that is not listed
in the border,
calculate your F
by estimating
a value
between
two values.
HW #5:
df = 11,
use value
Between
10 & 12
Book says:
(3.14+3.07)/2 =
3.105 3.10
10
Step 4
• Step 4: Formulate a decision rule:
• Example:
• If our calculated test statistic is greater than
3.87, reject Ho and accept H1, otherwise fail to
reject Ho
11
Step 5
• Step 5: Take a random sample, compute the test
statistic, compare it to critical value, and make
decision to reject or not reject null and
hypotheses
2
Larger variance
s
1
Test Statistic F:
F
in numerator,
s
2
2
• Example Conclusion for a two tail test:
always!!
Let’s
• Fail to reject null
• “The evidence suggests that there is not a Look at
Handout
difference in variation”
• Reject null and accept alternate
• “The evidence suggests that there is a difference
12
in variation”
Colin, a stockbroker at Critical
Securities, reported that the mean
rate of return on a sample of 10
software stocks was 12.6 percent
with a standard deviation of 3.9
percent.
The mean rate of return on a sample
of 8 utility stocks was 10.9 percent
with a standard deviation of 3.5
percent. At the .05 significance level,
can Colin conclude that there is more
variation in the software stocks?
Example 1
Step 1: The hypotheses are
2
H0 : I
H1 : I2
2
U
U2
Step 2: The significance level is .05.
Step 3: The test statistic is the F distribution.
Example 1 continued
Step 4: H0 is rejected
if F>3.68 or if p < .05.
The degrees of
freedom are n1-1 or 9
in the numerator and
n1-1 or 7 in the
denominator.
Step 5: The value of F is
computed as follows.
F
(3.9) 2
(3.5)
2
1.2416
H0 is not rejected. There is
insufficient evidence to show more
variation in the software stocks.
Example 1 continued
ANOVA
Analysis Of Variance
Technique in which we compare three or more
population means to determine whether they
could be equal
Assumptions necessary:
Populations follow the normal distribution
Populations have equal standard deviations ()
Populations are independent
Why ANOVA?
Using t-distribution leads to build up of type 1 error
“Treatment” = different populations being
examined
16
Case Where Treatment Means Are Different
17
Case Where Treatment Means Are The
Same
18
Example Of ANOVA Test To See If Four
Treatment Means Are The Equal
22 students earned the following grades in Professor
Rad’s class. The grades are listed under the
classification the student gave to the instructor
Is there a difference in the mean score of the students in
each of the four categories?
Use significance level α = .01
# of "Treatments"
Rating of Instructor
1
2
3
Excellent Good
Fair
Course Grades
94
75
78
90
81
77
80
83
88
4
Poor
70
80
76
89
80
75
65
68
82
72
73
74
65
19
Conduct A Test Of Hypothesis
Among Four Treatment Means
Step 1: State H0 and H1
H0 : µ1 = µ2 = µ3 = µ4
H1 : The Mean scores are not all equal (at least
one treatment mean is different)
Step 2: Significance Level?
α = .01
20
Step 3: Determine Test Statistic
And Select Critical Value
# of "Treatments"
1
2
3
Rating of Instructor Excellent Good
Fair
Course Grades
94
75
78
90
81
77
80
83
88
4
Poor
70
80
76
89
80
75
65
k = Number of treatments =
n = Total number of observations from all the treatments =
Degrees of Freedom in the numerator = k - 1
Degrees of Freedom in the denominator = n - k
α = Level of significance =
F from appendix G ((df = 3, 18), α = 0.01)
68
82
72
73
74
65
4
22
3
18
0.01
5.09
21
Step 4: State Decision Rule
If our calculated test statistic is greater
than
we reject H0 and accept
5.09
H1, otherwise we fail to reject H0
Now we move on to Step 5: Select the sample,
perform calculations, and make a decision…
Are you ready for a lot of procedures?!!
22
ANOVA Table
Sources of
Variations
Treatments
Error
Total
Sum of Squares
SST
SSE
SS Total
(Total Variation)
Degrees of
Freedoms
(k - 1)
(n - k)
Mean Square (Estimate
of Variation)
SST/(k - 1) = MST
SSE/(n - k) = MSE
F
MST/MSE
(n - 1)
The idea is: If we estimate variation in two ways
and use one estimate in the numerator and the
other estimate in the denominator:
If we divide and get 1 or close to 1, the sample means
are assumed to be the same
If we get a number far from 1, we say that the means
are assumed to be different
The F critical value will determined whether we
are close to 1 or not
23
ANOVA Table So Far
ANOVA Table
Sources of
Variations
Treatments
Error
Total
Mean Square
Degrees of
(Estimate of
Sum of Squares Freedoms
Variation)
SST
3 SST/3 =
MST
SSE
18 SSE/18 =
MSE
SS Total
(Total Variation)
21
F
MST/MSE
Let’s go calculate this!
24
Calculation 1: Treatment Means and Overall Mean
# of "Treatments"
Rating of Instructor
1
2
3
Excellent Good
Fair
Course Grades
94
75
78
90
81
77
80
83
88
Treatment Mean
83.25
82.60
Grand Mean = Overall Mean for all the data =
70
80
76
89
80
75
65
76.43
4
Poor
68
82
72
73
74
65
72.33
77.95
XG OverallMean Grand Mean
25
Calculation 2: Total Variation
(X - X
) SS T otal
2
G
Sum of Squares T otal T otalVariation
X A ParticularObservation
XG OverallMean
26
Calculation 2: Total Variation
Excellent
94
78
81
80
Example of 1st
Example of 2nd
94-77.95 = 16.05 16.05^2 = 257.6025
(X - 77.95)
(X - 77.95)^2
Good (X - 77.95) (X - 77.95)^2
16.05
257.6025
75
-2.95
8.7025
0.05
0.0025
90
12.05
145.2025
3.05
9.3025
77
-0.95
0.9025
2.05
4.2025
83
5.05
25.5025
88
10.05
101.0025
Totals
271.11
Fair
(X - 77.95)
70
80
76
89
80
75
65
Totals
(X - 77.95)^2
-7.95
2.05
-1.95
11.05
2.05
-2.95
-12.95
281.3125
Poor (X - 77.95) (X - 77.95)^2
68
-9.95
99.0025
82
4.05
16.4025
72
-5.95
35.4025
73
-4.95
24.5025
74
-3.95
15.6025
65
-12.95
167.7025
63.2025
4.2025
3.8025
122.1025
4.2025
8.7025
167.7025
373.9175
Total Variation = SS Total =
358.615
1284.96
27
ANOVA Table So Far
ANOVA Table
Sources of
Variations
Treatments
Error
Total
Sum of Squares
SST
SSE
1284.96
Degrees of
Freedoms
Mean Square (Estimate
of Variation)
3 SST/3 =
MST
18 SSE/18 =
MSE
21
F
MST/MSE
Let’s go calculate this!
28
Calculation 3: Random Variation
(X - X
) SSE
2
C
Sum of Squares Error Random Variation
X A P articularObservation
X C Sample Mean for T reatmentC
C P articularT reatment
29
Calculation 3: Random Variation
Excellent (X - 83.25)
94
78
81
80
Xbar
Totals
83.25
75
90
77
83
88
(X - 76.43)
70
80
76
89
80
75
65
76.43
(X - 82.6) (X - 82.6)^2
-7.6
57.76
7.4
54.76
-5.6
31.36
0.4
0.16
5.4
29.16
82.6
158.75
Fair
Xbar
Totals
10.75
-5.25
-2.25
-3.25
(X - 83.25)^2
Good
115.5625
27.5625
5.0625
10.5625
(X - 76.43)^2
-6.43
3.57
-0.43
12.57
3.57
-1.43
-11.43
41.3449
12.7449
0.1849
158.0049
12.7449
2.0449
130.6449
173.2
Poor
68
82
72
73
74
65
(X - 72.33) (X - 72.33)^2
-4.33
18.7489
9.67
93.5089
-0.33
0.1089
0.67
0.4489
1.67
2.7889
-7.33
53.7289
72.33
357.7143
Sum of Squares Error = SSE =
169.3334
859 30
ANOVA Table So Far
ANOVA Table
Sources of
Variations
Treatments
Error
Total
Sum of Squares
SST
859
1284.96
Degrees of
Freedoms
Mean Square (Estimate
of Variation)
F
3 SST/3 =
MST
MST/47.72
18 859/18 =
47.72
21
Let’s go calculate this!
31
Calculation 4: Treatment Variation
SST SS T otal- SSE Sum of Squares T reatment
T reatmentVariation T heSum of theSqaure
differences between each treatment mean and
thegrand overall mean
ANOVA Table
Sources of
Variations
Treatments
Error
Total
Sum of Squares
425.96
859
1284.96
Degrees of
Freedoms
Mean Square (Estimate
of Variation)
F
3 425.96/3 =
141.99
141.99/47.72 = 2.98
18 859/18 =
47.72
21
Simple Subtraction!
32
Calculation 5: Mean Square (Estimate of Variation)
ANOVA Table
Sources of
Variations
Treatments
Error
Total
Sum of Squares
425.96
859
1284.96
Degrees of
Freedoms
Mean Square (Estimate
of Variation)
F
3 425.96/3 =
141.99
141.99/47.72 = 2.98
18 859/18 =
47.72
21
33
Calculation 6: F
ANOVA Table
Sources of
Variations
Treatments
Error
Total
Sum of Squares
425.96
859
1284.96
Degrees of
Freedoms
Mean Square (Estimate
of Variation)
F
3 425.96/3 =
141.99
141.99/47.72 = 2.98
18 859/18 =
47.72
21
34
Step 5: Make A Decision
Because 2.98 is less than 5.09, we fail to
reject H0
The evidence suggests that the mean score of
the students in each of the four categories are
equal (no difference)
35
Summarize Chapter 12
1. List the characteristics of the F distribution
2. Conduct a test of hypothesis to determine
whether the variances of two populations
are equal
3. Discuss the general idea of analysis of
variance
4. Organize data into a ANOVA table
5. Conduct a test of hypothesis among three
or more treatment means
36