Residual Analysis for ANOVA Models
Download
Report
Transcript Residual Analysis for ANOVA Models
Residual Analysis for ANOVA
Models
KNNL – Chapter 18
Residuals
Model Errors (unobserved):
^
eij Yij Y ij Yi Y i
ni 1
ni
n 1
n 1
s 2 eij MSE i
s eij MSE i
ni
ni
Semi-Studentized Residual (Residual divided by estimate of , trivial to compute):
E eij 0
eij*
2 eij 2
eij
MSE
Studentized Residual (Residual divided by its standard error, messier to compute):
rij
ei
n 1
MSE i
ni
Studentized Deleted Residual:
1/2
n
r
1
T
tij eij
ni 1 2
SSE
eij
n
i
Model Departures Detected With Residuals and Plots
•
•
•
•
•
•
Errors have non-constant variance
Errors are not independent
Existence of Outlying Observations
Omission of Important Predictors
Non-normal Errors
Common Plots
Residuals versus Treatment
Residuals versus Treatment Mean
Aligned Dot Plot (aka Strip Chart)
Residuals versus Time
Residuals versus Omitted Variables
Box Plots, Histograms, Normal Probability Plots
Tests for Constant Variance H0:12=...=t2
Hartley's Test: (Assumes normal data, equal sample sizes)
H*
max si2
min s
2
i
Reject H 0 if H * H 1 ; r , n 1
where n1 ... nr n
Brown-Forsythe Test: (Robust to non-normality, allows unequal sample sizes)
~
dij Yij Yi
i 1,..., r j 1,..., ni Yi median Yi1 ,..., Yini
ni
d i
*
BF
F
d
j 1
ni
~
r
ij
d
MSTRBF
MSEBF
ni
d
i 1 j 1
nT
n d
r
ij
MSTRBF
i 1
i
i
d
r 1
Reject H 0 if F * F 1 ; r 1, nT r
d
r
2
MSEBF
ni
ij
d i
i 1 j 1
nT r
2
Remedial Measures
• Normally distributed, Unequal variances – Use
Weighted Least Squares with weights: wij = 1/si2
SSEw R SSEw F
Fw*
r 1
SSEw F
nT r
Conclude means not all equal if Fw* F 1 ; r 1, nT r
• Non-normal data (with possibly unequal variances) –
Variance Stabilizing Transformations and Box-Cox
Transformation
–
–
–
–
Variance proportional to mean: Y’=sqrt(Y)
Standard Deviation proportional to mean: Y’=log(Y)
Standard Deviation proportional to mean2: Y’=1/Y
Response is a (binomial) proportion: Y’=2arcsin(sqrt(Y))
• Non-parametric tests – F-test based on ranks and
Kruskal-Wallis Test
Effects of Model Departures
• Non-normal Data – Generally not problematic in
terms of the F-test, if data are not too far from
normal, and reasonably large sample sizes
• Unequal Error Variances – As long as sample sizes are
approximately equal, generally not a problem in
terms of F-test.
• Non-independence of error terms – Can cause
problems with tests. Should use Repeated Measures
ANOVA if same subject receives each treatment
Nonparametric Tests
Rank all observations across treatments from 1 to nT , assigning average ranks when ties occur
ni
R i
r
R
j 1
ij
R
ni
r
ni
ni
R
nT
SSTOR Rij R
i 1 j 1
ij
i 1 j 1
2
1 ... nT nT nT 1 2 nT 1
nT
nT
2
r
SSTRR ni R i R
i 1
2
r
ni
SSER Rij R i
i 1 j 1
2
(Approximate) F test :
SSTRR r 1 MSTRR
FR*
SSER nT r MSER
Conclude means not all equal if FR* F 1 ; r 1, nT r
Simultaneous CIs for Differences in Mean Ranks: R i R i ' z 1 / 2 g
nT nT 1 1 1
12
ni ni '
Kruskal-Wallis Test (Directly computed in most software packages):
X
2
KW
r
Ri2
SSTRR
12
3 nT 1
SSTOR
nT nT 1 i 1 ni
n 1
T
2
Conclude means not all equal if X KW
2 1 ; r 1