Transcript Powerpoint
PSY 307 – Statistics for the
Behavioral Sciences
Chapter 7 – Regression
Regression Line
A way of making a somewhat
precise prediction based upon the
relationships between two variables.
Predictor variable & criterion variable
The regression line is placed so that
it minimizes the predictive error.
When based upon the squared
predictive error the line is called a
least squares regression line.
Demo
This demo from the textbook’s student
website shows how different lines result in
different MSE’s (mean square error):
http://www.ruf.rice.edu/~lane/stat_sim/reg_by_eye/index.html
Least Squares Equation
Y’ = bX + a
To obtain Y’:
Solve for b and a using the data from
the correlation analysis
Substitute b and a into the regression
equation and solve for Y’.
To find points along the line,
substitute X values into the
regression equation and calculate Y.
Formula for Regression Line
Solving for b:
Solving for a:
Then insert both into formula:
Y’ = bX + a
Plug in values of X and solve for Y’.
Error Bars show the Standard Error of
the Estimate (Regression Line)
Predictive Error for a Value of X
X = 50
Y’ = 137
Error
of Y’
Standard Error of the Estimate
The average amount of predictive
error.
Average amount actual Y values
deviate from predicted Y’ values.
No predictive error when r = 1
Extreme predictive error when r = 0
Again, formulas vary.
Calculating Predictive Error
Definition Formula:
sy x
SS y x
n2
2
(
Y
Y
)
n2
Computation
Formula:
SS y (1 r )
2
sy x
n2
Kinds of Errors for ALEKS
Difference between the predictions
of the regression line and the mean
(used as a predictor).
Difference between the predictions
of the regression line and the
observed values.
Predictive error
The difference between these two
kinds of errors.
Comparing the Regression Line to the
Mean
Mean of Y
Z Score Approach
Prediction using Z scores:
Zy = b(Zx) where b = r
b is called the standardized regression
coefficient because it is being used for
prediction.
Prediction using raw scores:
Change the person’s raw score to a zscore using the z-score formula.
Multiple by b, then change the resulting
z-score back to a raw score.
Squared Correlation Coefficient
r2 – the square of the correlation
coefficient
Also called coefficient of determination
Measures the proportion of variance of
one variable predictable from its
relationship with the other variable.
It is the variance of the errors from
repetitively predicting the mean, minus
error variance using least squares,
expressed as a proportion.
Interpretation of r2
r2 – not r – is the true measure of
strength of association and the
proportion of a perfect relationship.
Large values of r2 are unusual in
behavioral research.
Large values of r2 do not indicate
causation.
“Explained variance” refers to
predictability not causality.
Regression Toward the Mean
The mean is a statistical default –
use the mean to predict when r is 0
or unknown.
Smaller values of r move the prediction
toward the mean.
The smaller r is, the greater the
predictive error, hedged by moving
toward the mean.
Chance results in a regression to
the mean with repeated measures.
Regression Fallacy
The statistical regression of extreme
values toward the mean occurs due
to chance.
Israeli pilots praised for landings do
worse on next landing.
It is a mistake (fallacy) to interpret
this regression as a real effect.
Praise did not cause the change in
landings.
Testing for Regression Fallacy
Divide the group showing regression
into two groups: (1) manipulation,
(2) control without manipulation.
Underachievers could show
improvement due to regression
upward to mean.
Always include a control group for
regression to the mean.