Transcript a - 1

Factorial Experiments
Analysis of Variance (ANOVA)
Experimental Design
• Dependent variable Y
• k Categorical independent variables A, B, C,
… (the Factors)
• Let
–
–
–
–
a = the number of categories (levels) of A
b = the number of categories (levels) of B
c = the number of categories (levels) of C
etc.
Random Effects and Fixed
Effects Factors
• A factor is called a fixed effects factors if the
levels of the factor are a fixed set of levels and
the conclusions of any analysis is in
relationship to these levels.
• If the levels have been selected at random from
a population of levels the factor is called a
random effects factor
• The conclusions of the analysis will be
directed at the population of levels and not
only the levels selected for the experiment
Example - Random Effects
In this Example a Taxi company is interested in
comparing the effects of three brands of tires (A, B and
C) on mileage (mpg). Mileage will also be effected by
driver. The company selects b = 4 drivers at random
from its collection of drivers. Each driver has n = 3
opportunities to use each brand of tire in which mileage
is measured.
Dependent
– Mileage
Independent
– Tire brand (A, B, C),
• Fixed Effect Factor
– Driver (1, 2, 3, 4),
• Random Effects factor
Comments
• The ANOVA Table will be the same for
performing tests with respect to Source, SS, df
and MS.
• The differences will occur in the denominator
of the F – ratios.
• The denominators of the F ratios are
determined by evaluating Expected Mean
Squares for each effect.
Example: 3 factors A , B and C fixed
Source
A
B
C
AB
AC
BC
ABC
Error
EMS
a
F
  nbc   i2
 a  1
MS A MS Error
  nac   j2
 b  1
MS B MS Error
 2  nbc   k2
 c  1
MSC MS Error
2
i 1
a
2
i 1
c
k 1
a
b
  nc   ij
 a  1b  1
2
2
i 1 j 1
a
c
 2  nb   ij
 a  1 c  1
2
i 1 k 1
b
c
 2  na    ij
a
b
2
j 1 k 1
c
 2  n   ijk
i 1 j 1 k 1
2
2
MS AB MS Error
MS AC MS Error
 b  1 c  1
MS BC MS Error
 a  1b  1 c  1
MS ABC MSError
Example: 3 factors A, B, C – all are random effects
Source
A
B
C
AB
AC
BC
ABC
Error
EMS
F
2
2
2
 2  n ABC
 nc AB
 nb AC
 nbc A2
2
2
2
 2  n ABC
 nc AB
 na BC
 nac B2
2
2
2
 2  n ABC
 na BC
 nb AC
 nab C2
2
2
 2  n ABC
 nc AB
MS AB MS ABC
2
2
 2  n ABC
 nb AC
MS AC MS ABC
2
2
 2  n ABC
 na BC
MS BC MS ABC
2
 2  n ABC
2
MS ABC MSError
Example: 3 factors A fixed, B, C random
Source
A
B
C
AB
AC
BC
ABC
Error
EMS
  n
2
2
ABC
 nc
2
AB
 nb
  na
2
2
BC
F
a
2
AC
 nbc   i2
 a  1
i 1
 nac B2
MS B MS BC
2
 2  na BC
 nab C2
MSC MS BC
2
2
 2  n ABC
 nc AB
MS AB MS ABC
2
2
 2  n ABC
 nb AC
MS AC MS ABC
2
 2  na BC
MS BC MS Error
2
 2  n ABC
2
MS ABC MSError
Example: 3 factors A , B fixed, C random
Source
A
B
C
AB
AC
BC
ABC
Error
EMS
F
a
  nb
2
AC
 nbc   i2
 a  1
MS A MS AC
  na
2
BC
 nac   j2
 b  1
MS B MS BC
2
2
i 1
a
i 1
 2  nab C2
  n
2
a
2
ABC
b
 nc   ij
2
i 1 j 1
MSC MS Error
 a  1b  1
MS AB MS ABC
2
 2  nb AC
MS AC MS Error
2
 2  na BC
MS BC MS Error
2
 2  n ABC
2
MS ABC MSError
Rules for determining Expected
Mean Squares (EMS) in an Anova
Table
Both fixed and random effects
Formulated by Schultz[1]
1.
Schultz E. F., Jr. “Rules of Thumb for Determining
Expectations of Mean Squares in Analysis of
Variance,”Biometrics, Vol 11, 1955, 123-48.
1. The EMS for Error is 2.
2. The EMS for each ANOVA term contains two
or more terms the first of which is 2.
3. All other terms in each EMS contain both
coefficients and subscripts (the total number
of letters being one more than the number of
factors) (if number of factors is k = 3, then
the number of letters is 4)
4. The subscript of 2 in the last term of each
EMS is the same as the treatment
designation.
5. The subscripts of all 2 other than the first contain
the treatment designation. These are written with
the combination involving the most letters written
first and ending with the treatment designation.
6. When a capital letter is omitted from a subscript ,
the corresponding small letter appears in the
coefficient.
7. For each EMS in the table ignore the letter or letters
that designate the effect. If any of the remaining
letters designate a fixed effect, delete that term from
the EMS.
8. Replace 2 whose subscripts are composed
entirely of fixed effects by the appropriate sum.
a

2
A

by
i 1
a 1
a
2
 AB
by
2
i
  
i 1
2
ij
 a  1 b  1
Example - Random Effects
In this Example a Taxi company is interested in
comparing the effects of three brands of tires (A, B and
C) on mileage (mpg). Mileage will also be effected by
driver. The company selects at random b = 4 drivers at
random from its collection of drivers. Each driver has n
= 3 opportunities to use each brand of tire in which
mileage is measured.
Dependent
– Mileage
Independent
– Tire brand (A, B, C),
• Fixed Effect Factor
– Driver (1, 2, 3, 4),
• Random Effects factor
The Data
Driver
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
Tire
A
A
A
B
B
B
C
C
C
A
A
A
B
B
B
C
C
C
Mileage
39.6
38.6
41.9
18.1
20.4
19
31.1
29.8
26.6
38.1
35.4
38.8
18.2
14
15.6
30.2
27.9
27.2
Driver
3
3
3
3
3
3
3
3
3
4
4
4
4
4
4
4
4
4
Tire
A
A
A
B
B
B
C
C
C
A
A
A
B
B
B
C
C
C
Mileage
33.9
43.2
41.3
17.8
21.3
22.3
31.3
28.7
29.7
36.9
30.3
35
17.8
21.2
24.3
27.4
26.6
21
Asking SPSS to perform Univariate ANOVA
Select the dependent variable, fixed factors, random factors
The Output
Tests of Between-Subjects Effects
Dependent Variable: MILEAGE
Source
Intercept
TIRE
DRIVER
TIRE * DRIVER
Hypothesis
Error
Hypothesis
Error
Hypothesis
Error
Hypothesis
Error
Type III
Sum of
Squares
28928.340
68.290
2072.931
87.129
68.290
87.129
87.129
170.940
df
1
3
2
6
3
6
6
24
Mean
Square
28928.340
22.763a
1036.465
14.522b
22.763
14.522b
14.522
7.123 c
F
1270.836
Sig.
.000
71.374
.000
1.568
.292
2.039
.099
a. MS(DRIVER)
b. MS(TIRE * DRIVER)
c. MS(Error)
The divisor for both the fixed and the random main effect is MSAB
This is contrary to the advice of some texts
The Anova table for the two factor model
(A – fixed, B - random)
yijk     i   j   ij   ijk
Source
SS
df
MS
EMS
A
SSA
a -1
MSA
B
SSA
b-1
MSB
 2  na B2
MSB/MSError
AB
SSAB
(a -1)(b -1)
MSAB
2
 2  n AB
MSAB/MSError
Error
SSError
ab(n – 1)
MSError
2
2
 2  n AB

nb a 2
i
a  1 
i 1
F
MSA/MSAB
Note: The divisor for testing the main effects of A is no longer
MSError but MSAB.
References Guenther, W. C. “Analysis of Variance” Prentice Hall, 1964
The Anova table for the two factor model
(A – fixed, B - random)
yijk     i   j   ij   ijk
Source
SS
df
MS
EMS
A
SSA
a -1
MSA
B
SSA
b-1
MSB
2
 2  n AB
 na B2
MSB/MSAB
AB
SSAB
(a -1)(b -1)
MSAB
2
 2  n AB
MSAB/MSError
Error
SSError
ab(n – 1)
MSError
2
2
 2  n AB

nb a 2
i
a  1 
i 1
F
MSA/MSAB
Note: In this case the divisor for testing the main effects of A is
MSAB . This is the approach used by SPSS.
References Searle “Linear Models” John Wiley, 1964
Crossed and Nested Factors
The factors A, B are called crossed if every level
of A appears with every level of B in the
treatment combinations.
Levels of B
Levels
of A
Factor B is said to be nested within factor A if the
levels of B differ for each level of A.
Levels of A
Levels of B
Example: A company has a = 4 plants for
producing paper. Each plant has 6 machines for
producing the paper. The company is interested
in how paper strength (Y) differs from plant to
plant and from machine to machine within plant
Plants
Machines
Machines (B) are nested within plants (A)
The model for a two factor experiment with B
nested within A.
yijk 

overall mean

i
effect of factor A

   j i    ijk
effect of B within A
random error
The ANOVA table
Source
SS
df
MS
F
A
SSA
a-1
MSA
MSA/MSError
B(A)
SSB(A)
a(b – 1)
MSB(A)
MSB(A) /MSError
Error
SSError
ab(n – 1) MSError
p - value
Note: SSB(A ) = SSB + SSAB and a(b – 1) = (b – 1) + (a - 1)(b – 1)
Example: A company has a = 4 plants for
producing paper. Each plant has 6 machines for
producing the paper. The company is interested
in how paper strength (Y) differs from plant to
plant and from machine to machine within plant.
Also we have n = 5 measurements of paper
strength for each of the 24 machines
The Data
Plant
machine
Plant
machine
1
1
2
3
4
5
98.7 59.2 84.1 72.3 83.5
93.1 87.8 86.3 110.3 89.3
100.0 84.1 83.4 81.6 86.1
3
13
14
15
16
17
83.6 76.1 64.2 69.2 77.4
84.6 55.4 58.4 86.7 63.3
90.6 92.3 75.4 60.8 76.6
2
6
7
60.6 33.6
84.8 48.2
83.6 68.9
8
44.8
57.3
66.5
9
58.9
51.6
45.2
10
63.9
62.3
61.1
11
63.7
54.6
55.3
12
48.1
50.6
39.9
22
37.0
47.8
41.0
23
43.8
62.4
60.8
24
30.0
43.0
56.9
4
18
19
61.0 64.2
81.3 50.3
73.8 32.1
20
35.5
30.8
36.3
21
46.9
43.1
40.8
Anova Table Treating Factors (Plant, Machine) as
crossed
Tests of Between-Subjects Effects
Dependent Variable: STRENGTH
Type III
Sum of
Source
Squares
Corrected Model
21031.065 a
Intercept
298531.4
PLANT
18174.761
MACHINE
1238.379
PLANT * MACHINE 1617.925
Error
5505.469
Total
325067.9
Corrected Total
26536.534
df
23
1
3
5
15
48
72
71
Mean
Square
914.394
298531.4
6058.254
247.676
107.862
114.697
a. R Squared = .793 (Adjusted R Squared = .693)
F
7.972
2602.776
52.820
2.159
.940
Sig.
.000
.000
.000
.074
.528
Anova Table: Two factor experiment B(machine)
nested in A (plant)
Source
Plant
Machine(Plant)
Error
Sum of Squares
18174.76119
2856.303672
5505.469467
df
Mean Square
F
3
6058.253731 52.819506
20
142.8151836 1.2451488
48
114.6972806
p - value
0.00000
0.26171
ANOVA Table for
3 factors crossed
Effect
A
B
C
AB
AC
BC
ABC
Error
SS
SSA
SSB
SSC
SSAB
SSAC
SSBC
SSABC
SSError
df
(a – 1)
(b – 1)
(c – 1)
(a – 1) (b – 1)
(a – 1) (c – 1)
(b – 1) (c – 1)
(a – 1) (b – 1) (c – 1)
abc(n – 1)
ANOVA Table for 3 nested factors
B nested in A, C nested in B
Effect
A
B(A)
C(AB)
Error
SS
SSA
SSB(A)
SSC(AB)
SSError
df
(a – 1)
a(b – 1)
ab(c – 1)
abc(n – 1)
Note:
SSB(A) = SSB + SSAB and a(b – 1) = (b – 1) + (a – 1)(b –1)
Also
SSC(AB) = SSC + SSAC + SSBC + SSABC and
ab(c – 1) = (c – 1) + (a – 1)(c –1) + (b – 1)(c –1) + (a – 1)(b –1)(c –1)
Also in nested designs
Factors may be fixed effect factors
Levels of the factor are a fixed set of levels
or random effect factors
Levels of the factor are chosen at random
from a population of levels
This effects the divisor in the F ratio for
testing the effect
Other experimental designs
Randomized Block design
Latin Square design
Repeated Measures design
The Randomized Block Design
• Suppose a researcher is interested in how
several treatments affect a continuous
response variable (Y).
• The treatments may be the levels of a single
factor or they may be the combinations of
levels of several factors.
• Suppose we have available to us a total of
N = nt experimental units to which we are
going to apply the different treatments.
The Completely Randomized (CR) design
randomly divides the experimental units into t
groups of size n and randomly assigns a
treatment to each group.
The Randomized Block Design
• divides the group of experimental units into
n homogeneous groups of size t.
• These homogeneous groups are called
blocks.
• The treatments are then randomly assigned
to the experimental units in each block one treatment to a unit in each block.
Example 1:
• Suppose we are interested in how weight gain
(Y) in rats is affected by Source of protein
(Beef, Cereal, and Pork) and by Level of
Protein (High or Low).
• There are a total of t = 32 = 6 treatment
combinations of the two factors (Beef -High
Protein, Cereal-High Protein, Pork-High
Protein, Beef -Low Protein, Cereal-Low
Protein, and Pork-Low Protein) .
• Suppose we have available to us a total of N = 60
experimental rats to which we are going to apply the
different diets based on the t = 6 treatment
combinations.
• Prior to the experimentation the rats were divided
into n = 10 homogeneous groups of size 6.
• The grouping was based on factors that had
previously been ignored (Example - Initial weight
size, appetite size etc.)
• Within each of the 10 blocks a rat is randomly
assigned a treatment combination (diet).
• The weight gain after a fixed period is
measured for each of the test animals and is
tabulated on the next slide:
Randomized Block Design
Block
1
107
(1)
96
(2)
112
(3)
83
(4)
87
(5)
90
(6)
Block
6
128
(1)
89
(2)
104
(3)
85
(4)
84
(5)
89
(6)
2
102
(1)
72
(2)
100
(3)
82
(4)
70
(5)
94
(6)
7
56
(1)
70
(2)
72
(3)
64
(4)
62
(5)
63
(6)
3
102
(1)
76
(2)
102
(3)
85
(4)
95
(5)
86
(6)
8
97
(1)
91
(2)
92
(3)
80
(4)
72
(5)
82
(6)
4
93
(1)
70
(2)
93
(3)
63
(4)
71
(5)
63
(6)
9
80
(1)
63
(2)
87
(3)
82
(4)
81
(5)
63
(6)
5
111
(1)
79
(2)
101
(3)
72
(4)
75
(5)
81
(6)
10
103
(1)
102
(2)
112
(3)
83
(4)
93
(5)
81
(6)
Example 2:
• The following experiment is interested in
comparing the effect four different chemicals
(A, B, C and D) in producing water resistance
(y) in textiles.
• A strip of material, randomly selected from
each bolt, is cut into four pieces (samples) the
pieces are randomly assigned to receive one of
the four chemical treatments.
• This process is replicated three times
producing a Randomized Block (RB) design.
• Moisture resistance (y) were measured for
each of the samples. (Low readings indicate
low moisture penetration).
• The data is given in the diagram and table on
the next slide.
Diagram: Blocks (Bolt Samples)
9.9
10.1
11.4
12.1
C
A
B
D
13.4
12.9
12.2
12.3
D
B
A
C
12.7
12.9
11.4
11.9
B
D
C
A
Table
Chemical
A
B
C
D
Blocks (Bolt Samples)
1
2
3
10.1
12.2
11.9
11.4
12.9
12.7
9.9
12.3
11.4
12.1
13.4
12.9
The Model for a randomized Block Experiment
yij     i   j   ij
i = 1,2,…, t
j = 1,2,…, b
yij = the observation in the jth block receiving the
ith treatment
yij     i   j   ij
 = overall mean
i = the effect of the ith treatment
j = the effect of the jth Block
ij = random error
The Anova Table for a randomized Block Experiment
Source
S.S.
d.f.
M.S.
F
Treat
Block
Error
SST
SSB
SSE
t-1
n-1
(t-1)(b-1)
MST
MSB
MSE
MST /MSE
MSB /MSE
p-value
• A randomized block experiment is assumed to be
a two-factor experiment.
• The factors are blocks and treatments.
• The is one observation per cell. It is assumed that
there is no interaction between blocks and
treatments.
• The degrees of freedom for the interaction is used
to estimate error.
The Anova Table for Diet Experiment
Source
Block
Diet
ERROR
S.S
5992.4167
4572.8833
3147.2833
d.f.
9
5
45
M.S.
F
665.82407
9.52
914.57667 13.076659
69.93963
p-value
0.00000
0.00000
The Anova Table forTextile Experiment
SOURCE
Blocks
Chem
ERROR
SUM OF SQUARES
7.17167
5.20000
0.53500
D.F.
2
3
6
MEAN SQUARE
3.5858
1.7333
0.0892
F
40.21
19.44
TAIL PROB.
0.0003
0.0017
• If the treatments are defined in terms
of two or more factors, the treatment
Sum of Squares can be split
(partitioned) into:
– Main Effects
– Interactions
The Anova Table for Diet Experiment
terms for the main effects and interactions between
Level of Protein and Source of Protein
Source
Block
Diet
ERROR
Source
Block
Source
Level
SL
ERROR
S.S
5992.4167
4572.8833
3147.2833
S.S
5992.4167
882.23333
2680.0167
1010.6333
3147.2833
d.f.
9
5
45
d.f.
9
2
1
2
45
M.S.
F
665.82407
9.52
914.57667 13.076659
69.93963
M.S.
665.82407
441.11667
2680.0167
505.31667
69.93963
F
9.52
6.31
38.32
7.23
p-value
0.00000
0.00000
p-value
0.00000
0.00380
0.00000
0.00190
Repeated Measures Designs
In a Repeated Measures Design
We have experimental units that
• may be grouped according to one or several
factors (the grouping factors)
Then on each experimental unit we have
• not a single measurement but a group of
measurements (the repeated measures)
• The repeated measures may be taken at
combinations of levels of one or several
factors (The repeated measures factors)
Example
In the following study the experimenter was
interested in how the level of a certain enzyme
changed in cardiac patients after open heart
surgery.
The enzyme was measured
• immediately after surgery (Day 0),
• one day (Day 1),
• two days (Day 2) and
• one week (Day 7) after surgery
for n = 15 cardiac surgical patients.
The data is given in the table below.
Table: The enzyme levels -immediately after surgery (Day
0), one day (Day 1),two days (Day 2) and one week (Day 7)
after surgery
Subject
1
2
3
4
5
6
7
8
Day 0 Day 1 Day 2 Day 7
108
63
45
42
112
75
56
52
114
75
51
46
129
87
69
69
115
71
52
54
122
80
68
68
105
71
52
54
117
77
54
61
Subject
9
10
11
12
13
14
15
Day 0 Day 1 Day 2 Day 7
106
65
49
49
110
70
46
47
120
85
60
62
118
78
51
56
110
65
46
47
132
92
73
63
127
90
73
68
• The subjects are not grouped (single group).
• There is one repeated measures factor -Time
– with levels
–
–
–
–
Day 0,
Day 1,
Day 2,
Day 7
• This design is the same as a randomized
block design with
– Blocks = subjects
The Anova Table for Enzyme Experiment
Source
Subject
Day
ERROR
SS
4221.100
36282.267
390.233
df
MS
14
301.507
3 12094.089
42
9.291
F
32.45
1301.66
p-value
0.0000
0.0000
The Subject Source of variability is modelling the
variability between subjects
The ERROR Source of variability is modelling the
variability within subjects
Example :
(Repeated Measures Design - Grouping Factor)
• In the following study, similar to example 3,
the experimenter was interested in how the
level of a certain enzyme changed in cardiac
patients after open heart surgery.
• In addition the experimenter was interested in
how two drug treatments (A and B) would
also effect the level of the enzyme.
• The 24 patients were randomly divided into three
groups of n= 8 patients.
• The first group of patients were left untreated as a
control group while
• the second and third group were given drug
treatments A and B respectively.
• Again the enzyme was measured immediately after
surgery (Day 0), one day (Day 1), two days (Day 2)
and one week (Day 7) after surgery for each of the
cardiac surgical patients in the study.
Table: The enzyme levels - immediately after surgery (Day 0),
one day (Day 1),two days (Day 2) and one week (Day 7)
after surgery for three treatment groups (control, Drug A,
Drug B)
0
122
112
129
115
126
118
115
112
Control
Day
1
2
87
68
75
55
80
66
71
54
89
70
81
62
73
56
67
53
7
58
48
64
52
71
60
49
44
0
93
78
109
104
108
116
108
110
Group
Drug A
Day
1
2
56
36
51
33
73
58
75
57
71
57
76
58
64
54
80
63
7
37
34
49
60
65
58
47
62
0
86
100
122
101
112
106
90
110
Drug B
Day
1
2
46
30
67
50
97
80
58
45
78
67
74
54
59
43
76
64
7
31
50
72
43
66
54
38
58
• The subjects are grouped by treatment
– control,
– Drug A,
– Drug B
• There is one repeated measures factor -Time
– with levels
–
–
–
–
Day 0,
Day 1,
Day 2,
Day 7
The Anova Table
Source
Drug
Error1
Time
Time x Drug
Error2
SS
1745.396
df
2
MS
872.698
10287.844
47067.031
357.688
21
3
6
489.897
15689.010
59.615
668.031
63
10.604
F
1.78
p-value
0.1929
1479.58
5.62
0.0000
0.0001
There are two sources of Error in a repeated
measures design:
The between subject error – Error1 and
the within subject error – Error2
Tables of means
Drug
Control
A
B
Overall
Day 0
118.63
103.25
103.38
108.42
Day 1
77.88
68.25
69.38
71.83
Day 2
60.50
52.00
54.13
55.54
Day 7
55.75
51.50
51.50
52.92
Overall
78.19
68.75
69.59
72.18
120
Time Profiles of Enzyme Levels
100
Control
Enzyme Level
Drug A
Drug B
80
60
40
0
1
2
3
Day
4
5
6
7