variance stabilizing transformation

Download Report

Transcript variance stabilizing transformation

• Occasionally, we are able to see clear violations of
the constant variance assumption by looking at a
residual plot - characteristic “funnel” shape… often
this can be fixed through a variance stabilizing
transformation.
• if the standard deviation of the response is
proportional to the mean, then often the logarithm
transformation of the response works…do a
regression of log(y) against the explanatory
variables
• if the variance of the response is proportional to the
mean, then often the square root transformation of
the response works… do a regression of sqrt(y)
against the expl. variables…
• In any case, always perform the transformation
on the response and then refit the regression and
check the residuals to make sure you’ve found
the transformation that shows the best residual
plots.
• Note that if you transform the response you will
probably need to express the predictions back in
the original scale - so if you fit log(y) the
prediction will be exp( yˆ ). The regression
coefficients will have to be interpreted though on
the transformed scale.
• For the log transform though, we have a nice
ˆ  
ˆ x  ... 
ˆ x
interpretation: log( yˆ )  
0
1 1
p p

ˆ 0 ˆ1 x1
yˆ  e e
...e
ˆ p x p
• This implies that an increase in 1 for x1 means
that the original responseˆ is predicted to
increase by a factor of e 1
; this means that
the coefficients can be interpreted as
multiplicative effects instead of additive ones.
• Let’s consider the Box-Cox method of
determining a
transformation. It should be used
with positive response variables and the method
finds the transformation that gives the best fit. It

uses the general formula 
y
 1
g (y)    ,when   0

log( y), when   0
• Using maximum likelihood we may find the
“best” value of lambda - actually a confidence
interval for lambda
… see the R code…

#read in the gasconsumption data
#bring in the MASS library and apply the
#boxcox function on the simple linear model
attach(gasconsumption)
g=lm(MPG~WT) ; summary(g)
library(MASS)
boxcox(g,plotit=T) #plot log-likelihood
#against lambda - find the maximum
#notice that values between ~.25 and -1.5
#are in the 95% confidence interval of
#the maximum. Your authors chose -1
#and worked with GPM instead of MPG since
#GPM=1/MPG. If you want to find the exact
#lambda, try this…
l=boxcox(g); l$x[l$y==max(l$y)] #note this is
#harder to interpret than it’s rounded value
#-1…
• Now for practice, load the faraway library and
get the dataset called prostate. Look at the help
file for the dataset and go through the various
diagnostics that we’ve considered in this chapter
and find the best model for predicting log(psa)
– check the normality assumption on the errors - are
any transformations required?
– find large leverage points & look for outliers
– see if there are influential points
– is the constant variance assumption met?
• HW: Work on #6.1, 6.14, 6.15, 6.18, 6.20,6.21,
6.23