Ch3-3 Residuals & R-Squared
Download
Report
Transcript Ch3-3 Residuals & R-Squared
Residuals
Recall that the vertical distances from the points
to the least-squares regression line are as small
as possible.
Because those vertical distances represent “leftover” variation in the response after fitting the
regression line, these distances are called
residuals.
Or in other words, the residuals are the distances from the
points to the LSRL.
Calculating a Residual
One subject's NEA rose by 135
calories and he gained 2.7 kg of
fat. The predicted gain for 135
calories from the regression
equation is:
yˆ = 3.505 0.00344135 = 3.04kg
The residual for this subject is therefore:
observed - predicted
y yˆ =
2. 7− 3. 04= − 0.34 kg
Fat Gain & NEA (yet again!)
Here are the residuals for all 16 data values from the
NEA experiment:
Although residuals can be calculated from any model
that is fitted to the data, the residuals from the leastsquares line have a special property: the sum of the
least-squares residuals is always zero. (Try adding
the numbers above- - they add up to zero!)
The line y=0
corresponds with the
regression line, and
also marks the mean
of our residuals.
The residuals plot magnifies the deviations from the
line to make patterns easier to see.
Residual Plots
What to look for when examining a residual plot:
1. Residual plots should have no pattern.
Residual Plots
What to look for when examining a residual plot:
A curved pattern
shows that the
relationships may
not be linear.
Increasing spread
about the line as x
increases indicates
the prediction will be
less accurate for
larger x values.
Similarly, decreasing
spread indicates the
prediction will be
less accurate for
smaller x values.
Residual Plots
What to look for when examining a residual plot:
1. The residual plot should show no pattern.
2.The residuals should be relatively small in size.
The role of
2
r
in regression
A residual plot is a graphical tool for evaluating
how well a linear model fits the data.
Look at the residual plot first to see if a linear
model is a good fit.
If the linear model is a good fit, then there is
also a numerical quantity that tells us how well
the LSRL does at predicting values of the
response variable y. It is r2, the coefficient of
determination.
The role of
2
r
in regression
r2 is actually the correlation squared, but there's
more to the story...
The idea of r2 is this: how much better is the leastsquares line at predicting responses y than if we
just used our mean?
The role of r2 in regression
Here's the line that represents the y mean of
our data.
Here's our LSRL
Is the LSRL better at predicting the data values than the
mean? r2 tells us how much better.
Here's the formula:
Note: Remember we defined the variance back when we talked about standard
deviation. r2 compares the variance from the mean (the SST part of the equation) with
the residuals (the SSE part of the equation).
For example, if r2=0.606 (as it does in the NEA
example), then about 61% of the variation in fat
gain among the individual subjects is due to the
straight-line relationship between fat gain and
NEA. The other 39% is individual variation
among subjects that is not explained by the
linear relationship.
When you report a regression, give r2 as a
measure of how successful the regression was
in explaining the response. When you see a
correlation, square it to get a better feel for the
strength of the linear relationship.
Review
Facts About Least-Square Regression
The
distinction between explanatory and
response variables is essential in regression. In
the regression setting you must know clearly
which variable is explanatory!
Review
Facts About Least-Square Regression
There is a close connection between correlation
and the slope of the LSRL. The slope is
sx
b=r
sy
This equation says that along the regression line,
a change of one standard deviation in x
corresponds to a change of r standard deviations
in y.
Review
Facts About Least-Square Regression
The least-squares regression line of y on x always
passes through the point
x , y
(mean of x values, mean of y values)
Review
Facts About Least-Square Regression
The correlation r describes the strength of a
straight-line relationship. The square of the
correlation, r2, is the fraction of the variation in the
values of y that is explained by the least-squares
regression of y on x.