Chap10: SUMMARIZING DATA
Download
Report
Transcript Chap10: SUMMARIZING DATA
Ch12: Analysis of Variance
(ANOVA)
12.1: INTRO:
This is an extension of Chap 11 (2-sample
design) to more than two. The name ANOVA is
misleading because it compares the means of
data and not their variances.
One-way and Two-way Layouts will be applied to
parametric and nonparametric methods.
12.2: The One-Way Layout
The One-Way Layout is an experimental design in which
independent measurements are made under each of
several (more than 2) treatments.
Thus, the One-Way Layout ANOVA focuses on
comparing more than 2 pop’ns or treatment means.
Terminology:
The characteristic that differentiates the pop’ns or
treatments from one another is called the factor under
study.
The different pop’ns or treatments are termed as the
levels of the factor.
12.2.1: Normal Theory; the F-test
This section deals with ANOVA and F-test for the case
of I samples (treatments or levels) of same size , J.
Notation:
Model : Yij i ij
n
where i 0 and ij are independent and ij ~ N (0, 2 )
i 1
Yij j observation of the i treatm ent
th
th
overallm eanlevel
i differential effect of the i th treatm ent
ij random error in the j th observation underthe i th treatm ent
12.2.1 : (cont’d)
Re sponse:Yij i ij ExpectedRe sponse: E (Yij ) i
Thus, i 0 E (Yij ) i.e. all treatm entshavethe SAME expected response
Hypothesesof int erest : to be donelater! H 0 : 1 2 ... I 0 (equalm ean)
ANOVA is based on the followingidentity :
SSTotal SSWithin SSBetween Yij Y.. Yij Yi. J Yi. Y..
I
J
i 1 j 1
1 J
whereYi. Yij (average under i th trt )
J j 1
1 I J
and Y.. Yij overallaverage
IJ i 1 j 1
2
I
J
i 1 j 1
2
I
i 1
2
12.2.1 : (cont’d)
Lem m aA : E X i X i
2
2
n 1 2
n
1 n
where X i (i 1,...,n) are independent r.v. with E ( X i ) i , Var ( X i ) , i
n i 1
2
I
SSWithin
2
Theorem A : E
[unbiasedness ] and E SSBetween J i2 ( I 1) 2
I ( J 1)
i 1
s pooled
SS
2
TheoremB : ij ~ N (0, 2 ) independently Within
~
I ( J 1)
2
If , additionally, all i 0, then
SSBetween
2
~ I21 and is independent of SSWithin
TheoremC :
SSBetween /( I 1)
The null distribution of F
~ FI 1, I ( J 1)
SSWithin /I ( J 1)
if the errors are norm allyDistributed by assum ption.
12.2.2: Multiple Comparisons
The F-test in Ex. A of Sec. 12.2.1 states that the
means of measurements from the 7 different
labs are NOT all equal, but how much do they
differ & which pairs are significantly different?
These questions will be addressed in this section.
Our main focus is to compare pairs or groups of
treatments & to estimate the treatment means
and their differences.
Naïve approach: compare all pairs thru t-tests (?)
New approaches: Tukey & Bonferroni methods
12.2.2.1: Tukey’s method
It is used to construct CIs for the differences of all
pairs of means in such a way (unlike the naïve
approach) that the intervals simultaneously
have a set coverage probability. Using the
duality between CI & HT, one can determine
which particular pairs are significantly different.
Tukey’s procedure depends on the so-called
studentized range dist’n (A14-A19, textbook)
characterized by 2 parameters: I=the number of
samples being compared & I(J-1)=the degrees
of freedom in the pooled sample std deviation.
12.2.2.1: Tukey’s method (cont’d)
(1) Let the sam plesizes be ALL EQUALand the errors Norm allyDist ' ed
with a CONSTANT VARIANCE Yi1 . i centered sam plem eansare independent
2
and Yi1 . i ~ N 0,
J
.
Note that
s 2p
J
estim ates
2
J
(2) The probability dist' n of the randomvariableQ maxi1 , i2
Y
i1 .
i1 Yi2 . i2
s 2p
J
is to be called the Studentized Range Dist ' n ( SRD) with param etersI and I ( J 1)
(3) With probability (1 ) , for every pairs(i1 , i2 ) with i1 i2 i1 , i2 1,...,I
Yi1 . Yi2 .
sp
q I , I ( J 1) ( )
J
is a 100(1 )% set of sim ultaneouseous CIs for all differences i1 i2
where q
( ) denote the upper tail critical valueof the SRD
Tukey’s method: (steps)
read example A on page 452
Step1 : Select and find q I , I ( J 1) ( )
from pagesA15 A19 ( AppendixB : table6)
sp
Step 2 : Calculate the quantity w
q I , I ( J 1) ( )
J
where s p MSE (read from ANOVA table)
Step3 : List the sam plem eansin DECREASINGorder
and underline each pairthat differs by less thanw.
Pairsthat are notunderlinedindicatethat the
corresponding pop' n m eansdiffer significantly at level .
12.2.2.2: The Bonferroni method
This method was briefly introduced in Section 11.4.8
If k null hypotheses are to be tested, then a desired
overall type I error rate of at most can be
guaranteed by testing each null hypothesis at level .
k
By duality between CI & HT, one can say that:
If k confidence intervals are
each formed to have
confidence level 1001 %, then they all hold
k
simultaneously with confidence level at least1001 %
Nice results are obtained for not too large
k
.
12.2.3: The Kruskal-Wallis Test:
(a nonparametric method)
The Kruskal-Wallis test is a generalization of the
Mann-Witney test seen in Section 11.2.3
Thus, such the Kruskal-Wallis test makes no
normality assumption and has a wider range
of applicability than does the F test.
The Kruskal-Wallis is especially useful for smallsample size problems.
Data are replaced by their ranks but outliers will
have less influence in the Kruskal-Wallis test
(nonparametric) than they do on its counterpart
F test (parametric).
12.3: The Two-Way Layout
Here the experimental design involves 2 factors (each
factor has 2 or more levels).
Goal: We would to assess the effect of 2 factors
(Temperature & Humidity) on a variable of interest
(yield of a chemical reaction).
T1 T2 T3 T4
I=4 levels for factor 1
H1
J=2 levels for factor 2
I*J=8 combinations (cells) H2
Take K independent obs. in cells
Another situation: an agricultural scientist may be
interested in the corn yield using 3 different fertilizers
with 4 different types of soils. What are the effects?
12.3.1: Additive Parametrization
AdditiveModel:Yij i j ij
( NO INTERACTION ; K 1obs per cell)
Yˆ ˆ ˆ ˆ is the fitted or predictedvaluesof Y
ij
i
j
ij
Yˆi1 Yˆi 2 ( ˆ ˆ i ˆ1 ) ( ˆ ˆ i ˆ 2 ) ˆ1 ˆ 2
( sam edifferenceam ongrangeson all m enudays)
read exam pleon pg 455
If thereexists an INTERACTION betweenm enu days & range,
then differences am ongrangeson m enu dayswill NOT be the sam e
look at figure12.3 on pg 457.
Interactions canbeincorporated in the model to better fit the data.
12.3.1:Normal Theory
for the 2-Way Layout
Assume the number of observations per cell K>1.
If K is the same for each cell, then the design is to be said balanced.
Model :Yijk i j ij ijk
k observation in cell (i, j )
E (Yijk ) i j ij
th
andVar (Yijk )
2
because ijk N (0, )
iid
2
I
J
I
J
i 1
j 1
i 1
j 1
Assum ption: i j ij ij 0
12.2.1: (cont’d)
iid
TheoremA : Under the assum ption that ijk
N (0, 2 )
I
J
i 1
j 1
E SS A ( I 1) 2 JK i2 ; E SSB ( J 1) 2 IK j2
I
J
E SS AB ( I 1)(J 1) K ij2 and E SSE IJ ( K 1) 2
2
i 1 j 1
where SS A JK Yi.. Y ... ; SSB IK Y. j . Y ...
I
J
2
i 1
2
j 1
SS AB Yij. Yi.. Y. j . Y .. and SSE Yijk Yij.
I
J
I
2
i 1 j 1
J
K
2
i 1 j 1 k 1
Identity: SS A SSB SS AB SSE Yijk Y... SSTOT
I
J
K
i 1 j 1 k 1
2
12.2.1: (cont’d)
Theorem B :
SSE
Under the assum ption s ijk N (0, ) 2 ~ IJ2 ( K 1)
SS A
A
2
Underthe null H 0 : i 0 , i 1,...,I 2 ~ I 1
SSB
0
Underthe null H B : j 0 , j 1,..., J 2 ~ J21
SS AB
0
Underthe null H AB : ij 0 , i 1,...,I ; j 1,..., J 2 ~ (2I 1)( J 1)
The SS ( sum s of squares ) are INDEPENDEN TLY distribute d .
iid
2