Transcript Regression

Kin 304
Regression
Linear Regression
Least Sum of Squares
Assumptions about the relationship
between Y and X
Standard Error of Estimate
Multiple Regression
Standardized Regression
Prediction

Can we predict one variable from another?

Linear Regression Analysis

Y = mX + c
– m = slope; c = intercept
Regression
Linear Regression

Correlation Coefficient (r)
– how well the line fits

Standard Error of Estimate (S.E.E.)
– how well the line predicts
Regression
Least Sum of Squares Curve Fitting
r = +0.71
Y
d
Predicted
least  d 2
d = Predicted - Observed
Observed
(Residual)
X
Regression
Assumptions about the relationship between Y and X



For each value of X there is a normal distribution of Y
from which the sample value of Y is drawn
The population of values of Y corresponding to a
selected X has a mean that lies on the straight line
In each population the standard deviation of Y about its
mean has the same value
Standard Error of Estimate

Standard Error of Estimate
– measure of how well the equation predicts Y
– has units of Y
– true score 68.26% of time is within plus or minus 1
SEE of predicted score
– Standard deviation of the normal distribution of
residuals
Regression
Right Hand L. = 0.99Left Hand L. + 0.254
r = 0.94
S.E.E. = 0.38cm
Right side Hand Length (cm)
22
21
20
19
18
17
16
16
17
18
19
20
21
22
Left side Hand Length (cm)
Regression
Kin 304 Spring 2009
How good is my equation?


Regression equations are sample specific
Cross-validation Studies
–

Test your equation on a different sample
Split sample studies
–
Take a 50% random sample and develop your
equation then test it on the other 50% of the
sample
Regression
Multiple Regression




More than one independent variable
Y = m1X1 + m2X2 + m3X3 …… + c
Same meaning for r, and S.E.E., just more
measures used to predict Y
Stepwise regression
–
variables are entered into the equation based upon
their relative importance
Regression
Building a multiple regression equation
X3
Y
X1 has the highest correlation with
Y, therefore it would be the first
variable included in the equation.
X3 has a higher correlation with Y
than X2.
X1
X2
However, X2 would be a better
choice than X3. to include in an
equation with X1, to predict Y.
X2 has a low correlation with X1
and explains some of the variance
that X1 does not.
Standardized Regression

The numerical value is of mn is dependent upon the size of the
independent variable
–
Y = m1X1 + m2X2 + m3X3 …… + c

Variables are transformed into standard scores before regression
analysis, therefore mean and standard deviation of all
independent variables are 0 and 1 respectively.

The numerical value of zmn now represents the relative
importance of that independent variable to the prediction
–
Y = zm1X1 + zm2X2 + zm3X3 …… + c
Regression