Transcript Step 2.1

One-Way ANOVA
• ANOVA = Analysis of Variance
• This is a technique used to analyze the
results of an experiment when you have
more than two groups
Between and Within
Group Variability
• Two types of variability
• Between / Treatment
– the differences between the mean scores of
the three groups
– The more different these means are, the more
variability!
Between and Within
Group Variability
• Two types of variability
• Within / Error
– the variability of the scores within each group
Between and Within
Group Variability
sampling error + effect of variable
sampling error
Calculating this Variance Ratio
Calculating this Variance Ratio
Calculating this Variance Ratio
Degrees of Freedom
• dfbetween
• dfwithin
• dftotal
• dftotal = dfbetween + dfwithin
Degrees of Freedom
• dfbetween = k - 1
• dfwithin = N - k
(k = number of groups)
(N = total number of
observations)
• dftotal = N - 1
• dftotal = dfbetween + dfwithin
Degrees of Freedom
• dfbetween = k - 1
• dfwithin = N - k
• dftotal = N - 1
• 20 = 2 + 18
3-1=2
21 - 3 = 18
21 - 1 = 20
Sum of Squares
• SSBetween
• SSWithin
• SStotal
• SStotal = SSBetween + SSWithin
Sum of Squares
• SStotal
Sum of Squares
• SSWithin
Sum of Squares
• SSBetween
Sum of Squares
• Ingredients:
•
•
•
•
•
X
X2
Tj2
N
n
To Calculate the SS
Socio
Psych
Bio
4
3
2
3
3
4
2
1
2
0
2
4
3
2
1
1
2
0
2
0
1
X
Socio
Psych
Bio
4
3
2
3
3
4
2
1
2
0
2
4
3
2
1
1
2
0
2
0
1
Xs = 21
Xp = 14
XB = 7
X
Socio
4
3
2
3
3
4
2
Xs = 21
Psych
1
2
0
2
4
3
2
Bio
1
1
2
0
2
0
1
Xp = 14
XB = 7
X = 42
X2
Socio
4
3
2
3
3
4
2
Xs = 21
X2s = 67
X2
16
9
4
9
9
16
4
Psych
1
2
0
2
4
3
2
X2
1
4
0
4
16
9
4
Xp = 14
X2P = 38
Bio
1
1
2
0
2
0
1
X2
1
1
4
0
4
0
1
XB = 7
X2B = 11
X = 42
X2
Socio
4
3
2
3
3
4
2
Xs = 21
X2s = 67
X2
16
9
4
9
9
16
4
Psych
1
2
0
2
4
3
2
X2
1
4
0
4
16
9
4
Xp = 14
X2P = 38
Bio
1
1
2
0
2
0
1
X2
1
1
4
0
4
0
1
XB = 7
X2B = 11
X = 42
X2 = 116
T2 = (X)2 for each group
Socio
4
3
2
3
3
4
2
Xs = 21
X2s = 67
T2s = 441
X2
16
9
4
9
9
16
4
Psych
1
2
0
2
4
3
2
X2
1
4
0
4
16
9
4
Xp = 14
X2P = 38
T2P = 196
Bio
1
1
2
0
2
0
1
X2
1
1
4
0
4
0
1
XB = 7
X2B = 11
T2B = 49
X = 42
X2 = 116
Tj2
Socio
4
3
2
3
3
4
2
Xs = 21
X2s = 67
T2s = 441
X2
16
9
4
9
9
16
4
Psych
1
2
0
2
4
3
2
X2
1
4
0
4
16
9
4
Xp = 14
X2P = 38
T2P = 196
Bio
1
1
2
0
2
0
1
X2
1
1
4
0
4
0
1
XB = 7
X2B = 11
T2B = 49
X = 42
X2 = 116
Tj2 = 686
N
Socio
4
3
2
3
3
4
2
Xs = 21
X2s = 67
T2s = 441
X2
16
9
4
9
9
16
4
Psych
1
2
0
2
4
3
2
X2
1
4
0
4
16
9
4
Xp = 14
X2P = 38
T2P = 196
Bio
1
1
2
0
2
0
1
X2
1
1
4
0
4
0
1
XB = 7
X2B = 11
T2B = 49
X = 42
X2 = 116
Tj2 = 686
N = 21
n
Socio
4
3
2
3
3
4
2
Xs = 21
X2s = 67
T2s = 441
X2
16
9
4
9
9
16
4
Psych
1
2
0
2
4
3
2
X2
1
4
0
4
16
9
4
Xp = 14
X2P = 38
T2P = 196
Bio
1
1
2
0
2
0
1
X2
1
1
4
0
4
0
1
XB = 7
X2B = 11
T2B = 49
X = 42
X2 = 116
Tj2 = 686
N = 21
n=7
Ingredients
X = 42
X2 = 116
Tj2 = 686
N = 21
n=7
Calculate SS
X = 42
X2 = 116
Tj2 = 686
N = 21
• SStotal
n=7
X = 42
Calculate SS
X2 = 116
Tj2 = 686
N = 21
• SStotal
32
n=7
116
42
21
Calculate SS
• SSWithin
X = 42
X2 = 116
Tj2 = 686
N = 21
n=7
Calculate SS
X = 42
X2 = 116
Tj2 = 686
• SSWithin
N = 21
n=7
18
116
686
7
Calculate SS
• SSBetween
X = 42
X2 = 116
Tj2 = 686
N = 21
n=7
Calculate SS
X = 42
X2 = 116
Tj2 = 686
• SSBetween
N = 21
n=7
14
686
7
42
21
Sum of Squares
• SSBetween
• SSWithin
• SStotal
• SStotal = SSBetween + SSWithin
Sum of Squares
• SSBetween
• SSWithin
• SStotal
• 32 = 14 + 18
= 14
= 18
= 32
Calculating the F value
Calculating the F value
Calculating the F value
7
14
2
Calculating the F value
7
Calculating the F value
7
1
18
18
Calculating the F value
7
7
1
How to write it out
Source
SS
df
MS
F
Between
14
2
7
7
Within
18
18
1
Total
32
20
Significance
• Is an F value of 7.0 significant at the .05
level?
• To find out you need to know both df
Degrees of Freedom
• Dfbetween = k - 1
• dfwithin = N - k
observations)
(k = number of groups)
(N = total number of
Degrees of Freedom
• Dfbetween = k - 1
• dfwithin = N - k
•
•
•
•
3-1=2
21 - 3 = 18
Use F table
Dfbetween are in the numerator
Dfwithin are in the denominator
Write this in the table
Critical F Value
• F(2,18) = 3.55
• The nice thing about the F distribution is
that everything is a one-tailed test
Decision
• Thus, if F value > than F critical
– Reject H0, and accept H1
• If F value < or = to F critical
– Fail to reject H0
Current Example
• F value = 7.00
• F critical = 3.55
• Thus, reject H0, and accept H1
• Alternative hypothesis (H1)
H1: The three population means are not all
equal
– In other words, psychology, sociology, and
biology majors do not have equal IQs
– Notice: It does not say where this
difference is at!!
How to write it out
Source
SS
df
MS
F
Between
14
2
7
7*
Within
18
18
1
Total
32
20
SPSS
ANOVA
Sum of
Squares
DAYS
Between
Groups
Within
Groups
Total
Mean
Square
df
14.000
2
7.000
18.000
18
1.000
32.000
20
F
7.000
Sig.
.006
Six Easy Steps for an ANOVA
•
•
•
•
•
•
1) State the hypothesis
2) Find the F-critical value
3) Calculate the F-value
4) Decision
5) Create the summary table
6) Put answer into words
Example
• Want to examine the effects of feedback
on self-esteem. Three different conditions
-- each have five subjects
• 1) Positive feedback
• 2) Negative feedback
• 3) Control
• Afterward all complete a measure of selfesteem that can range from 0 to 10.
Example:
• Question: Is the type of feedback a
person receives significantly (.05) related
their self-esteem?
Results
Positive
Feedback
8
Negative
Feedback
5
Control
7
6
4
9
7
5
10
4
3
6
3
6
2
Step 1: State the Hypothesis
• H1: The three population means are not all
equal
• H0: pos = neg = cont
Step 2: Find F-Critical
• Step 2.1
• Need to first find dfbetween and dfwithin
• Dfbetween = k - 1
(k = number of groups)
• dfwithin = N - k
(N = total number of
observations)
• dftotal = N - 1
• Check yourself
• dftotal = Dfbetween + dfwithin
Step 2: Find F-Critical
• Step 2.1
• Need to first find dfbetween and dfwithin
• Dfbetween = 2
• dfwithin = 12
observations)
• dftotal = 14
• Check yourself
• 14 = 2 + 12
(k = number of groups)
(N = total number of
Step 2: Find F-Critical
• Step 2.2
• Look up F-critical using table F on pages
370 - 373.
• F (2,12) = 3.88
Step 3: Calculate the F-value
• Has 4 Sub-Steps
•
•
•
•
3.1) Calculate the needed ingredients
3.2) Calculate the SS
3.3) Calculate the MS
3.4) Calculate the F-value
Step 3.1: Ingredients
•
•
•
•
•
X
X2
Tj2
N
n
Step 3.1: Ingredients
Positive
Feedback
8
Negative
Feedback
5
Control
7
6
4
9
7
5
10
4
3
6
3
6
2
X
Positive
Feedback
8
Negative
Feedback
5
Control
7
6
4
9
7
5
10
4
3
6
3
6
Xp = 40
Xn = 25
X = 85
2
Xc = 20
X2
Positive
Feedback
8
64
Negative
Feedback
5
25
2
4
7
49
6
36
4
16
9
81
7
49
5
25
10
100
4
16
3
9
6
36
3
9
6
36
Xp = 40
X2p = 330
Xn = 25
X2n = 135
Control
X = 85
Xc = 20
X2c = 90
X2 = 555
T2 = (X)2 for each group
Positive
Feedback
8
64
Negative
Feedback
5
25
2
4
7
49
6
36
4
25
9
81
7
49
5
25
10
100
4
16
3
9
6
36
3
9
6
36
Xp = 40
X2p = 330
T2p = 1600
Xn = 25
X2n = 135
T2n = 625
Control
Xc = 20
X2c = 90
T2c = 400
X = 85
X2 = 555
Tj2
Positive
Feedback
8
64
Negative
Feedback
5
25
2
4
7
49
6
36
4
25
9
81
7
49
5
25
10
100
4
16
3
9
6
36
3
9
6
36
Xp = 40
X2p = 330
T2p = 1600
Xn = 25
X2n = 135
T2n = 625
Control
Xc = 20
X2c = 90
T2c = 400
X = 85
X2 = 555
Tj2 = 2625
N
Positive
Feedback
8
64
Negative
Feedback
5
25
2
4
7
49
6
36
4
25
9
81
7
49
5
10
100
4
16
3
25 N = 15
9
6
36
3
9
6
36
Xp = 40
X2p = 330
T2p = 1600
Xn = 25
X2n = 135
T2n = 625
Control
Xc = 20
X2c = 90
T2c = 400
X = 85
X2 = 555
Tj2 = 2625
n
Positive
Feedback
8
64
Negative
Feedback
5
25
2
4
7
49
6
36
4
25
9
81
7
49
5
10
100
4
16
3
25 N = 15
9 n=5
6
36
3
9
6
36
Xp = 40
X2p = 330
T2p = 1600
Xn = 25
X2n = 135
T2n = 625
Control
Xc = 20
X2c = 90
T2c = 400
X = 85
X2 = 555
Tj2 = 2625
X = 85
Step 3.2: Calculate SS
X2 = 555
Tj2 = 2625
N = 15
• SStotal
n=5
X = 85
Step 3.2: Calculate SS
X2 = 555
Tj2 = 2625
N = 15
n=5
• SStotal
73.33
555
85
15
X = 85
Step 3.2: Calculate SS
• SSWithin
X2 = 555
Tj2 = 2625
N = 15
n=5
X = 85
Step 3.2: Calculate SS
• SSWithin
X2 = 555
Tj2 = 2625
N = 15
n=5
30
555
2625
5
X = 85
Step 3.2: Calculate SS
• SSBetween
X2 = 555
Tj2 = 2625
N = 15
n=5
X = 85
Step 3.2: Calculate SS
• SSBetween
X2 = 555
Tj2 = 2625
N = 15
n=5
43.33
2625
5
85
15
Step 3.2: Calculate SS
• Check!
• SStotal = SSBetween + SSWithin
Step 3.2: Calculate SS
• Check!
• 73.33 = 43.33 + 30
Step 3.3: Calculate MS
Step 3.3: Calculate MS
21.67
43.33
2
Calculating this Variance Ratio
Step 3.3: Calculate MS
2.5
30
12
Step 3.4: Calculate the F value
Step 3.4: Calculate the F value
8.67
21.67
2.5
Step 4: Decision
• If F value > than F critical
– Reject H0, and accept H1
• If F value < or = to F critical
– Fail to reject H0
Step 4: Decision
• If F value > than F critical
– Reject H0, and accept H1
• If F value < or = to F critical
– Fail to reject H0
F value = 8.67
F crit = 3.88
Step 5: Create the Summary
Table
Source
SS
df
MS
F
Between
43.33
2
21.67
8.67*
Within
30.00
12
2.5
Total
73.33
14
Step 6: Put answer into words
• Question: Is the type of feedback a
person receives significantly (.05) related
their self-esteem?
• H1: The three population means are not all
equal
• The type of feedback a person receives is
related to their self-esteem
SPSS
ANOVA
Sum of
Squares
ESTEEM
Between
Groups
Within
Groups
Total
Mean
Square
df
43.333
2
21.667
30.000
12
2.500
73.333
14
F
8.667
Sig.
.005
Practice
• You are interested in comparing the
performance of three models of cars.
Random samples of five owners of each
car were used. These owners were asked
how many times their car had undergone
major repairs in the last 2 years.
Results
VW
Beetle
2
Ford
Mustang
5
Geo
Metro
9
1
4
6
2
3
3
3
4
7
2
4
5
Practice
• Is there a significant (.05) relationship
between the model of car and repair
records?
Step 1: State the Hypothesis
• H1: The three population means are not all
equal
• H0: V = F = G
Step 2: Find F-Critical
• Step 2.1
• Need to first find dfbetween and dfwithin
• Dfbetween = 2
• dfwithin = 12
observations)
• dftotal = 14
• Check yourself
(k = number of groups)
(N = total number of
Step 2: Find F-Critical
• Step 2.2
• Look up F-critical using table F on pages
370 - 373.
• F (2,12) = 3.88
Step 3.1: Ingredients
•
•
•
•
•
X = 60
X2 = 304
Tj2 = 1400
N = 15
n=5
X = 60
Step 3.2: Calculate SS
X2 = 304
Tj2 = 1400
N = 15
• SStotal
n=5
X = 60
Step 3.2: Calculate SS
X2 = 304
Tj2 = 1400
N = 15
n=5
• SStotal
64
304
60
15
X = 60
Step 3.2: Calculate SS
• SSWithin
X2 = 304
Tj2 = 1400
N = 15
n=5
X = 60
Step 3.2: Calculate SS
• SSWithin
X2 = 304
Tj2 = 1400
N = 15
n=5
24
304
1400
5
X = 60
Step 3.2: Calculate SS
• SSBetween
X2 = 304
Tj2 = 1400
N = 15
n=5
X = 60
Step 3.2: Calculate SS
• SSBetween
X2 = 304
Tj2 = 1400
N = 15
n=5
40
1400
5
60
15
Step 3.2: Calculate SS
• Check!
• SStotal = SSBetween + SSWithin
Step 3.2: Calculate SS
• Check!
• 64 = 40 + 24
Step 3.3: Calculate MS
Step 3.3: Calculate MS
20
40
2
Calculating this Variance Ratio
Step 3.3: Calculate MS
2
24
12
Step 3.4: Calculate the F value
Step 3.4: Calculate the F value
10
20
2
Step 4: Decision
• If F value > than F critical
– Reject H0, and accept H1
• If F value < or = to F critical
– Fail to reject H0
Step 4: Decision
• If F value > than F critical
– Reject H0, and accept H1
• If F value < or = to F critical
– Fail to reject H0
F value = 10
F crit = 3.88
Step 5: Create the Summary
Table
Source
SS
df
MS
F
Between
40
2
20
10*
Within
24
12
2
Total
64
14
Step 6: Put answer into words
• Question: Is there a significant (.05)
relationship between the model of car
and repair records?
• H1: The three population means are not
all equal
• There is a significant relationship
between the type of car a person drives
and how often the car is repaired
Practice
• 11.1
Practice
Source
* p < .05
SS
df
MS
F
Between
2100
2
1050
40.13*
Within
392.5
15
26.17
Total
2492.5
17
A way to think about ANOVA
• Make no assumption about Ho
– The populations the data may or may not
have equal means
A way to think about ANOVA
VW
Beetle
2
Ford
Mustang
5
Geo
Metro
9
1
4
6
2
3
3
3
4
7
2
4
5
2
4
6
A way to think about ANOVA
• The samples can be used to estimate the variance of
the population
2
2
2
2
2
2
 VW
 SVW
,  Ford
 S Ford
,  GEO
 SGEO
• Assume that the populations the data are from have
the same variance

2
VW

2
Ford

2
GEO
• It is possible to use the same variances to estimate
the variance of the populations
 e2 
2
S
 j
k
2
2
2
2
2
2
 VW
 SVW
,  Ford
 S Ford
,  GEO
 SGEO
2
2
2
 VW
  Ford
  GEO
VW
Beetle
2
Ford
Mustang
5
Geo
Metro
9
1
4
6
2
S2 = .50
3
S2 = .50
3
3
4
7
2
4
5
S2 = 5.0
A way to think about ANOVA
(.50  .50  5.0)
 
 2.00
3
2
e
ANOVA
Sum of
Squares
REP
Between
Groups
Within
Groups
Total
Mean
Square
df
40.000
2
20.000
24.000
12
2.000
64.000
14
F
10.000
Sig.
.003
A way to think about ANOVA
• Assume about Ho is true
– The population mean are not different from
each other
• They are three samples from the same
population
– All have the same variance and the same
mean
VW
Beetle
2
Ford
Mustang
5
Geo
Metro
9
1
4
6
2
3
3
3
4
7
2
4
5
Random
A
2
Random
B
5
Random
C
9
1
4
6
2
3
3
3
4
7
2
4
5
2
4
6
A way to think about ANOVA
Central Limit Theorem
For any population of scores, regardless of
form, the sampling distribution of the mean
will approach a normal distribution a N
(sample size) get larger. Furthermore, the
sampling distribution of the mean will have
a mean equal to  and a standard
deviation equal to / N
A way to think about ANOVA
Central Limit Theorem
For any population of scores, regardless of
form, the sampling distribution of the mean
will approach a normal distribution a N
(sample size) get larger. Furthermore, the
sampling distribution of the mean will have
a mean equal to  and a standard
deviation equal to / N
A way to think about ANOVA
• Central Limit Theorem (remember)
• The variance of the means drawn from the
same population equals the variance of the
population divided by the sample size.
S 
2
X

2
e
n
A way to think about ANOVA
S 
2
X

2
e
n
Can estimate population variance from the sample means with the formula
  n( S )
2
e
2
X
*This only works if the means are from the same population
A way to think about ANOVA
Random
A
2
Random
B
5
Random
C
9
1
4
6
2
3
3
3
4
7
2
4
5
2
4
6
S2 = 4.00
A way to think about ANOVA
  n( S )
2
e
2
X
20  5(4.00)
A way to think about ANOVA
20  5(4.00)
*Estimates population variance only if the three means are
from the same population
ANOVA
Sum of
Squares
REP
Between
Groups
Within
Groups
Total
Mean
Square
df
40.000
2
20.000
24.000
12
2.000
64.000
14
F
10.000
Sig.
.003
A way to think about ANOVA
*Estimates population variance regardless if the three means
are from the same population
ANOVA
Sum of
Squares
REP
Between
Groups
Within
Groups
Total
Mean
Square
df
40.000
2
20.000
24.000
12
2.000
64.000
14
F
10.000
Sig.
.003
What do all of these numbers
mean?
ANOVA
Sum of
Squares
REP
Between
Groups
Within
Groups
Total
Mean
Square
df
40.000
2
20.000
24.000
12
2.000
64.000
14
F
10.000
Sig.
.003
Why do we call it “sum of
squares”?
• SStotal
• SSbetween
• SSwithin
• Sum of squares is the sum the squared
deviations about the mean
( X  X )
2
Why do we use “sum of squares”?
( X  X )
s
2
x
2
(X  X )


2
n 1
SS are additive
Variances and MS are only additive if df are the same
Another way to think about ANOVA
• Think in “sums of squares”
SStotal   ( X ij  X ..)
2
Represents the SS of all observations, regardless of the treatment.
Another way to think about ANOVA
VW
Beetle
2
4.00
Ford
Mustang
5
1.00
Geo
Metro
9
25.00
1
9.00
4
.00
6
4.00
2
4.00
3
1.00
3
1.00
3
1.00
4
.00
7
9.00
2
4.00
4
.00
5
1.00
( X
 X ..)  64
2
ij
Overall Mean= 4
Another way to think about
ANOVA
( X
 X ..)  64
2
ij
ANOVA
Sum of
Squares
REP
Between
Groups
Within
Groups
Total
Mean
Square
df
40.000
2
20.000
24.000
12
2.000
64.000
14
SS total
2
Note :
 S ij
df total
F
Sig.
10.000
.003
Descriptive Statistics
N
VAR00001
Valid N
(lis twis e)
15
15
Mean
4.0000
Variance
4.571
Another way to think about ANOVA
• Think in “sums of squares”
SSbetween  n ( X j  X ..)
2
Represents the SS deviations of the treatment means around the grand mean
Its multiplied by n to give an estimate of the population variance
(Central limit theorem)
n ( X j  X ..)  (5)8  40
2
VW
Beetle
2
Ford
Mustang
5
Geo
Metro
9
1
4
6
2
3
3
3
4
7
2
4
5
Overall Mean= 4
2
4
6
Another way to think about
ANOVA
n ( X j  X ..)  (5)8  40
2
ANOVA
Sum of
Squares
REP
Between
Groups
Within
Groups
Total
Mean
Square
df
40.000
2
20.000
24.000
12
2.000
64.000
14
F
10.000
Sig.
.003
Another way to think about ANOVA
• Think in “sums of squares”
SSwithin   ( X ij  X j )
2
Represents the SS deviations of the observations within each group
SS within   ( X ij  X j )  24
2
VW
Beetle
2
0
Ford
Mustang
5
1
Geo
Metro
9
9
1
1
4
0
6
0
2
0
3
1
3
9
3
1
4
0
7
1
2
0
4
0
5
1
Overall Mean= 4
2
4
6
Another way to think about
ANOVA
SS within   ( X ij  X j )  24
2
ANOVA
Sum of
Squares
REP
Between
Groups
Within
Groups
Total
Mean
Square
df
40.000
2
20.000
24.000
12
2.000
64.000
14
F
10.000
Sig.
.003
Sum of Squares
• SStotal
– The total deviation in the observed scores
• SSbetween
– The total deviation in the scores caused by the
grouping variable and error
• SSwithin
– The total deviation in the scores not caused by
the grouping variable (error)
Conceptual Understanding
Source
SS
df
MS
F
Between
--
--
--
--
Within
152
--
--
Total
182
--
Complete the above table for an ANOVA having 3 levels of the independent
variable and n = 20. Test for significant at .05.
Conceptual Understanding
Source
SS
df
MS
F
Between
30
2
15
5.62*
Within
152
57
2.67
Total
182
59
Fcrit = 3.18
Fcrit (2, 57) = 3.15
Complete the above table for an ANOVA having 3 levels of the independent
variable and n = 20. Test for significant at .05.
Conceptual Understanding
• Distinguish between: Between-group
variability and within-group variability
Conceptual Understanding
• Distinguish between: Between-group
variability and within-group variability
• Between concerns the differences
between the mean scores in various
groups
• Within concerns the variability of scores
within each group
Between and Within
Group Variability
Between-group variability
Within-group variability
Between and Within
Group Variability
sampling error + effect of variable
sampling error
Conceptual Understanding
• Under what circumstance will the F ratio,
over the long run, approach 1.00? Under
what circumstances will the F ratio be
greater than 1.00?
Conceptual Understanding
• Under what circumstance will the F ratio,
over the long run, approach 1.00? Under
what circumstances will the F ratio be
greater than 1.00?
• F ratio will approach 1.00 when the null
hypothesis is true
• F ratio will be greater than 1.00 when the
null hypothesis is not true
Conceptual Understanding
A
B
C
3
5
7
3
5
7
3
5
7
3
5
7
Without computing the SS within, what must its value be? Why?
Conceptual Understanding
A
B
C
3
5
7
3
5
7
3
5
7
3
5
7
The SS within is 0. All the scores within a group are the same (i.e.,
there is NO variability within groups)