Transcript Chapter 8

Chapter 8 – Regression 2
Basic review, estimating
the standard error of the
estimate and short cut
problems and solutions.
1
You can use the regression
equation when:
1. the relationship between X and Y is linear,
2. r falls outside the CI.95 around 0.000 and is
therefore a statistically significant correlation,
and
3. X is within the range of X scores observed in
your sample,
2
df
1
2
3
4
5
6
7
8
9
10
11
12
.
.
.
100
200
300
500
1000
2000
10000
nonsignificant
.05
.01
-.996 to .996
-.949 to .949
-.877 to .877
-.810 to .810
-.753 to .753
-.706 to .706
-.665 to .665
-.631 to .631
-.601 to .601
-.575 to .575
-.552 to .552
-.531 to .531
.
.
.
-.194 to .194
-.137 to .137
-.112 to .112
-.087 to .087
-.061 to .061
-.043 to .043
-.019 to .019
.997
.950
.878
.811
.754
.707
.666
.632
.602
.576
.553
.532
.
.
.
.195
.138
.113
.088
.062
.044
.020
.9999
.990
.959
.917
.874
.834
.798
.765
.735
.708
.684
.661
.
.
.
.254
.181
.148
.115
.081
.058
.026
Simple problems using the
regression equation
tY' =
r * tX
tY' = .150 * 0.40 = 0.06
tY' = .40 * -1.70 = -0.68
tY' = .40 * 1.70 = 0.68
4
Predictions from Raw Data
1. Calculate the t score for X.
t X  ( X  X ) / sX
2. Solve the regression equation.
tY   r (t X )
3. Transform the estimated t score for Y into a raw score.
Y   Y  (tY  ) * ( sY ) 
5
Problem: We look into the correlation between time spent
studying and score on a midterm exam The mean study time in
a random sample is 560 minutes (Range = 260-860). The
estimated standard deviation of the study times, sX, was 216.00
The estimated mean on the midterm was 76.00 and the sample’s
estimated standard deviation, (sY) was 7.98.
There were 10 pairs of tX,tY scores
The estimated correlation coefficient is .851.
John studied for 400 minutes for the midterm. Predict his
midterm score.
6
Can you use the regression
equation?
7
df
1
2
3
4
5
6
7
8
9
10
11
12
.
.
.
100
200
300
500
1000
2000
10000
nonsignificant
.05
.01
-.996 to .996
-.949 to .949
-.877 to .877
-.810 to .810
-.753 to .753
-.706 to .706
-.665 to .665
-.631 to .631
-.601 to .601
-.575 to .575
-.552 to .552
-.531 to .531
.
.
.
-.194 to .194
-.137 to .137
-.112 to .112
-.087 to .087
-.061 to .061
-.043 to .043
-.019 to .019
.997
.950
.878
.811
.754
.707
.666
.632
.602
.576
.553
.532
.
.
.
.195
.138
.113
.088
.062
.044
.020
.9999
.990
.959
.917
.874
.834
.798
.765
.735
.708
.684
.661
.
.
.
.254
.181
.148
.115
.081
.058
.026
YES! You can use the
regression equation.
r (8) = .851, p < .01
400 minutes is inside the range of X
scores seen in the random sample
(260-860 minutes)
9
Predicting from and to raw
scores
1. Translate raw X to tX score.
X X-bar
sX
(X-X-bar) / sX = tX
400 560
216.02
(400-560)/216.02= -0.74
10
Use regression equation
2.
Find value of tY'
r
r * tX = tY'
.851 .851*-0.74=-0.63
11
Translate tY' to raw Y'
Y
sY
76.00 7.98
Y + (tY' * sY) = Y'
76.00+(-0.63*7.98) = 70.97
12
Another reminder
Never assume that a correlation will stay linear
outside of the range you originally observed.
Therefore, never use the regression equation to
make predictions from X values outside of the
range you found in your sample.
Example: Basing a prediction of the height of a
83 year old adult based on a study examining
the correlation of age and height in a sample
composed only of children age 12 or less.
13
Correlation Characteristics: Which line
best shows the relationship between
age (X) and height (Y)
Linear vs Curvilinear
14
Reviewing the r table and
reporting the results of
calculating r from a
random sample
15
How the r table is laid out:
the important columns
Column 1 of the r table shows degrees of freedom
for correlation and regression (dfREG)
dfREG=nP-2
Column 2 shows the CI.95 for varying degrees of
freedom
Column 3 shows the absolute value of the r that falls
just outside the CI.95. Any r this far or further from
0.000 falsifies the hypothesis that rho=0.000 and can
be used in the regression equation to make
predictions of Y scores for people who were not in
the original sample but who were part of the
population from which the sample is drawn.
16
df
1
2
3
4
5
6
7
8
9
10
11
12
.
.
.
100
200
300
500
1000
2000
10000
nonsignificant
.05
.01
-.996 to .996
-.949 to .949
-.877 to .877
-.810 to .810
-.753 to .753
-.706 to .706
-.665 to .665
-.631 to .631
-.601 to .601
-.575 to .575
-.552 to .552
-.531 to .531
.
.
.
-.194 to .194
-.137 to .137
-.112 to .112
-.087 to .087
-.061 to .061
-.043 to .043
-.019 to .019
.997
.950
.878
.811
.754
.707
.666
.632
.602
.576
.553
.532
.
.
.
.195
.138
.113
.088
.062
.044
.020
.9999
.990
.959
.917
.874
.834
.798
.765
.735
.708
.684
.661
.
.
.
.254
.181
.148
.115
.081
.058
.026
Notice how the interval in
Column 2 gets narrower as
df increase.
Why does this happen?
Why is an r of .210 not
significant with nP= 12 but
is significant with nP=102?
18
df
1
2
3
4
5
6
7
8
9
10
11
12
.
.
.
100
200
300
500
1000
2000
10000
nonsignificant
.05
.01
-.996 to .996
-.949 to .949
-.877 to .877
-.810 to .810
-.753 to .753
-.706 to .706
-.665 to .665
-.631 to .631
-.601 to .601
-.575 to .575
-.552 to .552
-.531 to .531
.
.
.
-.194 to .194
-.137 to .137
-.112 to .112
-.087 to .087
-.061 to .061
-.043 to .043
-.019 to .019
.997
.950
.878
.811
.754
.707
.666
.632
.602
.576
.553
.532
.
.
.
.195
.138
.113
.088
.062
.044
.020
.9999
.990
.959
.917
.874
.834
.798
.765
.735
.708
.684
.661
.
.
.
.254
.181
.148
.115
.081
.058
.026
This happens because r is a
consistent estimate of rho, as
well as a least-squares,
unbiased estimate.
As you increase df, r becomes a better and
better estimate of rho.
This is the same thing that happens when a
sample mean gets closer to the population
mean.
Sample statistics get closer and closer to their
population parameters as df increase.
20
AS dfREG increase, r becomes
a better estimate of rho.
If rho were .400, r should get closer and closer
to .400 as df increase.
Why does this happen?
Remember, r is really an index of how far from
each other the ZXZY scores in the population
are, on the average.
An r of .400 means an estimated average
squared difference between the Z scores of
about 1.200 squared units
(r=1-.5(1.200) = .400
21
Each pair of scores tends to
bring r closer to rho
Some of the pairs of tXtY scores will be further
from each other than average, some will be
closer.
Each randomly added pair of scores tends to
make the estimated average squared difference
between the Z scores closer to the average
found in the population.
That makes r closer and closer to rho.
22
Test Scores correlation
between X & Y about .500
F
r
e
q
u
e
n
c
y
Scores
Mean
score
Squared
Distance of
t scores
0.0
.4
.8
1.2
1.6
2.0
2.4
Squared differences between tX & tY
2.2
Means:
1.2
1.7
1.0
1.5
1.3
1.0
1.4
1.3
1.3
1.3
.4
1.2
.7
1.1
23
According to the null, rho = 0.000
If the null is right, increasing df should result in
r getting closer to the true value of rho, 0.000.
If we had 100 random samples, all would tend
to get closer to 0.000
Therefore, the interval around 0.000 in which
we could expect 95 of the 100 samples to fall
would get narrower and narrower. (That’s what
you see in Column 2 of the r table.)
Of course, that brings the critical values in
Columns 3 and 4 similarly closer to 0.000.
So with larger samples, weaker correlations can
be statistically significant.
24
Can we generalize to the
population from the
correlation in the sample?
A Type 1 error involves saying that there is a
correlation in the population as a whole, when
the correlation is actually 0.000 (and the null is
true).
We carefully guard against Type 1 error by using
significance tests to try to falsify the null
hypothesis.
25
Why is it important to
avoid Type 1 error.
26
Using the regression equation when rho=0.000
increases error beyond that which we would get
if we predicted everyone will score precisely at
the mean of Y.
The scientist’s rule is “First don’t increase error!”
Such errors lead to unfair prejudgments rather
than giving everyone an equal chance.
27
Example : Achovy pizza
and horror films, rho=0.000
(scale 0-9)
H1: People who enjoy food
with strong flavors also
enjoy other strong
sensations.
H0: There is no relationship
between enjoying food
with strong flavors and
enjoying other strong
sensations.
horror
anchovies films
7
7
7
9
3
8
3
6
0
9
8
6
4
5
1
2
1
1
1
6
Can we reject the null hypothesis?
28
Can we reject the null hypothesis?
8
6
Pizza
4
2
0
0
2
4
6
8
Horror films
29
Can we reject the null hypothesis?
We do the math and we find that:
r = .352
df = 8
30
df
1
2
3
4
5
6
7
8
9
10
11
12
.
.
.
100
200
300
500
1000
2000
10000
nonsignificant
.05
.01
-.996 to .996
-.949 to .949
-.877 to .877
-.810 to .810
-.753 to .753
-.706 to .706
-.665 to .665
-.631 to .631
-.601 to .601
-.575 to .575
-.552 to .552
-.531 to .531
.
.
.
-.194 to .194
-.137 to .137
-.112 to .112
-.087 to .087
-.061 to .061
-.043 to .043
-.019 to .019
.997
.950
.878
.811
.754
.707
.666
.632
.602
.576
.553
.532
.
.
.
.195
.138
.113
.088
.062
.044
.020
.9999
.990
.959
.917
.874
.834
.798
.765
.735
.708
.684
.661
.
.
.
.254
.181
.148
.115
.081
.058
.026
This finding falls within the
CI.95 around 0.000
We call such findings “nonsignificant”
Nonsignificant is abbreviated n.s.
We would report these finding as follows
r (8)=0.352, n.s.
Given that it fell inside the CI.95, we must
assume that rho actually equals zero and that
our sample r is .352 instead of 0.000 solely
because of sampling fluctuation.
We go back to predicting that everyone will
score at the mean of Y.
32
In fact, the null hypothesis
was correct; rho = 0.000
I made up that example using numbers
randomly selected from a random number table.
So there really was no relationship between the
two sets of scores: rho really equaled 0.000
But samples don’t give you an r of zero, they
fluctuate around 0.000
Significance testing is your protection against
mistaking sampling fluctuation for a real
correlation.
Significance testing protects against Type 1
error.
33
We use significance testing to
protect us from Type 1 error.
Our sample gave us an r of .352.
Without the r table, we could have thought that
far enough from zero to represent a true
correlation in the population.
0.352 was the product only of sampling
fluctuation
Significance testing is your protection against
mistaking sampling fluctuation for a real
correlation.
Significance testing protects against Type 1
error.
34
How to report a significant r
For example, let’s say that you had a sample
(nP=30) and r = -.400
Looking under nP-2=28 dfREG, we find the
interval consistent with the null is between
-.360 and +.360
So we are outside the CI.95 for rho=0.000
 We would write that result as r(28)=-.400, p<.05
That tells you the dfREG, the value of r, and that
you can expect an r that far from 0.000 five or
fewer times in 100 when rho = 0.000
35
df
1
2
3
4
5
6
7
8
9
10
11
12
.
.
.
100
200
300
500
1000
2000
10000
nonsignificant
.05
.01
-.996 to .996
-.949 to .949
-.877 to .877
-.810 to .810
-.753 to .753
-.706 to .706
-.665 to .665
-.631 to .631
-.601 to .601
-.575 to .575
-.552 to .552
-.531 to .531
.
.
.
-.194 to .194
-.137 to .137
-.112 to .112
-.087 to .087
-.061 to .061
-.043 to .043
-.019 to .019
.997
.950
.878
.811
.754
.707
.666
.632
.602
.576
.553
.532
.
.
.
.195
.138
.113
.088
.062
.044
.020
.9999
.990
.959
.917
.874
.834
.798
.765
.735
.708
.684
.661
.
.
.
.254
.181
.148
.115
.081
.058
.026
Then there is Column 4
 Column 4 shows the values that lie outside a CI.99
 (The CI.99 itself isn’t shown like the CI.95 in Column 2
because it isn’t important enough.)
 However, Column 4 gives you bragging rights.
 If your r is as far or further from 0.000 as the
number in Column 4, you can say there is 1 or fewer
chance in 100 of an r being this far from zero
(p<.01).
 For example, let’s say that you had a sample (nP=30)
and r = -.525.
 The critical value at .01 is .463. You are further from
0.000 than that.So you can brag.
 You write that result as r(28)=-.525, p<.01.
37
df
1
2
3
4
5
6
7
8
9
10
11
12
.
.
.
100
200
300
500
1000
2000
10000
nonsignificant
.05
.01
-.996 to .996
-.949 to .949
-.877 to .877
-.810 to .810
-.753 to .753
-.706 to .706
-.665 to .665
-.631 to .631
-.601 to .601
-.575 to .575
-.552 to .552
-.531 to .531
.
.
.
-.194 to .194
-.137 to .137
-.112 to .112
-.087 to .087
-.061 to .061
-.043 to .043
-.019 to .019
.997
.950
.878
.811
.754
.707
.666
.632
.602
.576
.553
.532
.
.
.
.195
.138
.113
.088
.062
.044
.020
.9999
.990
.959
.917
.874
.834
.798
.765
.735
.708
.684
.661
.
.
.
.254
.181
.148
.115
.081
.058
.026
To summarize
If r falls inside the CI.95 around 0.000, it is
nonsignificant (n.s.) and you can’t use the
regression equation (e.g., r(28)=.300, n.s.
If r falls outside the CI.95, but not as far from
0.000 as the number in Column 4, you have a
significant finding and can use the regression
equation (e.g., r(28)=-.400,p<.05
If r is as far or further from zero as the number
in Column 4, you can use the regression
equation and brag while doing it (e.g., r(28)=.525, p<.01
39
Here is one in exam format
 Jack was comparing the degree of severity of a back injury as
ascertained by an MRI and the amount of pain reported by the
patient. Back injuries were rated in severity on a scale from 0-9.
 Jack randomly selected 26 patients for his sample. MRI Scores
ranged from 1-8. The mean severity was 4.70 with a standard
deviation of 2.00 points.
 Pain ratings were assessed on a scale that went from 1-20. Mean
pain rating in the sample was 11.00 with a standard deviation of
3.00.
 The sum of the squared differences between the tX and tY scores
was 30.00.
 1. Compute the correlation and report it in correct form.
 2. Mrs. Green, who was part of the population but not part of your
original sample, had a back injury rated as a 6. Predict her reported
pain.
 3. Mr. Jones, who was part of the population but not part of your
original sample, had a back injury rated as a 9. Predict his reported
pain.
40
Answer to Q. 1
r = 1-.5(30/25) =.400
r(24)=.400, p<.05
41
Q. 2 First translate to tX
1. Translate raw X to tX score.
X X-bar
sX
(X-X-bar)/sX = tX
6
4.70
2.00
(6-4.70)/2.00= 0.65
42
Q.2-Use regression equation
2.
Find value of tY'
r
r * tX = tY'
.400 .400*0.65= .26
43
Q.2 -Translate tY' to raw Y'
YBAR
11.00
sY YBAR + (tY' * sY) = Y'
3.00 11.00+(.26*3.00) = 11.78
Mrs. Green was predicted to score a little more than a
quarter of a standard deviation about the mean, With
a mean pain rating of 11.00 and a standard deviation
of 3.00 points, Mrs. Green’s pain rating is predicted
to be 11.78.
44
Q. 3 Mr. Jones’ pain rating
It will turn out that Mr. Jones will have a lower
predicted pain rating than Ms. Green, though his
severity score is much higher.
Mr. Jones should have a pain rating of 11.00
Why? Because his injury was rated 9. We had a
range in our random sample of 1-8.
He was not like anyone we had seen before.
You can not use the regression equation.
You must go back to predicting that he will
score at your best estimate of mu, XBAR.
45
How much better than the mean can we guess?
46
Improved prediction
If we can use the regression equation rather
than the mean to make individualized estimates
of Y scores, how much better are our estimates?
We are making predictions about scores on the
Y variable from our knowledge of the
statistically significant correlation between X & Y
and the fact that we know someone’s X score.
The average unsquared error when we predict
that everyone will score at the mean of Y equals
sY, the ordinary standard deviation of Y.
How much better than that can we do?
47
Estimating the standard error of
the estimate the (very) long way.
 Calculate the correlation of X and Y from the raw data
(which includes calculating s for Y). If the correlation is
significant, you can use the regression equation to make
individualized predictions of scores on the Y variable.
 Then find the difference between each predicted and
actual score in your sample, square the differences, add
them up, divide by dfREG (nP-2), and take a square root.
 The average unsquared error of prediction when you do
that is called the estimated standard error of the
estimate. Its symbol is sEST.
48
Example for Prediction Error
A study was performed to investigate
whether the quality of an image affects
reading time.
The experimental hypothesis was that
reduced quality would slow down reading
time.
Quality was measured on a scale of 1 to
10. Reading time was in seconds.
49
Quality vs Reading Time data:
Compute the correlation
Quality Reading time
(scale 1-10) (seconds)
4.30
8.1
Is there a relationship?
4.55
8.5
Check for linearity.
5.55
7.8
Compute r.
5.65
7.3
6.30
7.5
6.45
7.3
6.45
6.0
50
Calculate t scores for X
X
4.30
4.55
5.55
5.65
6.30
6.45
6.45
X=39.25
n= 7
X=5.61
X-X
-1.31
-1.06
-0.06
0.04
0.69
0.84
0.84
(X - X)2
1.71
1.12
0.00
0.00
0.48
0.71
0.71
tX =
(X - X) / sX
-1.48
-1.19
-0.07
0.05
0.78
0.95
0.95
SSW = 4.73
MSW = 4.73/(7-1) = 0.79
sX = 0.89
51
Calculate t scores for Y
Y
8.1
8.5
7.8
7.3
7.5
7.3
6.0
Y=52.5
n= 7
Y=7.50
Y-Y
0.60
1.00
0.30
-0.20
0.00
-0.20
-1.50
(Y - Y)2
0.36
1.00
0.09
0.04
0.00
0.04
2.25
tY =
(Y - Y) / sY
0.76
1.26
0.38
-025
0.00
-0.25
-1.89
SSW = 3.78
MSW = 3.78/(7-1) = 0.63
sY = 0.794
52
Plot t scores
tX
tY
-1.48
-1.19
-0.07
0.05
0.78
0.95
0.95
0.76
1.28
0.39
-0.25
0.00
-0.25
-1.89
53
t score plot with best fitting
line: linear? YES!
Reading Time (t score)
2.00
1.00
-2.00
0.00
-1.00
0.00
1.00
2.00
-1.00
-2.00
Image quality (t score)
54
Calculate r
tX
tY
-1.48
-1.19
-0.07
0.05
0.78
0.95
0.95
0.76
1.28
0.39
-0.25
0.00
-0.25
-1.88
tY -tX
(tY -tX)2
-2.24
5.02
-2.47
6.10
-0.46
0.21
0.30
0.09
0.78
0.61
1.20
1.44
2.83
8.01
 (tX - tY)2 = 21.48
 (tX - tY)2 / (nP - 1) = 3.580
r = 1 - (1/2 * 3.580) = 1 - 1.79 = -0.790
55
Check whether r is significant
r = -0.790
df = nP-2 = 5
 is .05
Look in r table:With 5
dfREG, the CI.95 goes
from -.753 to +.753
r(5)= -.790, p <.05
r is significant!
56
We can find the Y' for every raw X score by
using the regression equation. Then we can
compute how much error remains, subtracting
the predicted score (Y' ) from the actual Y
score obtained by each person.
X
4.30
4.55
5.55
5.65
6.30
6.45
6.45
Y'
8.42
8.23
7.54
7.47
7.01
6.91
6.91
57
Can we show mathematically that regression
estimates are better than saying everyone
will score precisely at the mean of Y?
Y
8.1
8.5
7.8
7.3
7.5
7.3
6.0
Y
7.5
7.5
7.5
7.5
7.5
7.5
7.5
Y'
8.42
8.23
7.54
7.47
7.01
6.91
6.91
We expect of course that
there will be less error if
we use regression.
To calculate the standard deviation we
take deviations of Y from the
mean of Y, square them, add them up,
divide by degrees of freedom, and then
take the square root.
To calculate the estimated standard
error of the estimate, sEST, we will take
the deviations of each raw Y score from
its regression equation estimate, square
them, add them up, divide by degrees of
freedom, and take the square root.
58
Estimated standard error
of the estimate
Y
8.1
8.5
7.8
7.3
7.5
7.3
6.0
Y'
8.42
8.23
7.54
7.47
7.01
6.91
6.91
Y - Y'
-0.32
0.27
0.26
-0.17
0.49
0.39
-0.91
(Y - Y')2
0.10
0.07
0.07
0.03
0.24
0.15
0.83
SSRES = 1.49
MSRES = 1.49/(7-2) = 0.298
SEST = 0.546
59
How much better?
MSY = 0. 630
MSRES = 0.298
.630  .298
 .527  53%
.630
53% less squared error when we use the regression
equation instead of the mean to predict Y scores.
60
How much better is the estimated
standard error of the estimate
than the estimated standard
deviation?
SY = 0.794
SEST = 0.546
.794  .546
 .312  31%
.794
31% less error of prediction(using unsquared
units) when we use the regression equation
instead of the mean to predict.
61
Mathematical magic
There is usually an alternative formula for calculating
statistics that is easier to perform.
We went through a lot of extra steps to calculate SEST = 0.546.
It is not necessary to calculate all of the estimated Y
scores, find the difference between each actual Y score
and Y', then square, sum, and divide by dfREG.
62
Another way to phrase it:
How much error did we get
rid of?
Treat it as a weight loss problem.
If Jack is 30 pounds overweight and he
loses 40% of it, how much is he still
overweight.
He lost .400 x 30 pounds = 12 pounds.
He has 30 – 12 = 18 pounds left to lose.
63
How did we solve that
problem?
First we found how much weight Jack had
gotten rid of.
That equaled the percent he lost (expressed as
a proportion) times the amount overweight he
started with
He was 30 pounds overweight and lost 40% of
it.
30 * .400 = 12.00.
He lost 12 pounds.
64
Then we found how much
he was still overweight.
He started off 30 pounds overweight.
He lost 12.00 pounds.
So he had 30-12=18 pounds of
overweight left.
To find what is left, subtract what you got
rid of from the amount you started with.
65
So to compute how much of
something is left after some is
lost, you need to know how much
there was to start with and what
percentage was gotten rid of.
Percentage gotten rid of times
original quantity = amount gotten
rid of.
Original quantity minus amount
gotten rid of = what’s left.
66
Calculate how much is left.
C. Munster was 100 pounds overweight.
He lost 35% of it. How much is he still
overweight?
Mr. Hardy was 25.8 pounds overweight.
He lost 60% of it. How much is he still
overweight?
67
Cookie Muster is still 65 pounds
overweight.
Mr. Hardy has earned some laurels. He
only has 10.32 pounds left to lose.
68
SSY= error to start
r2=percent of error lost
SSY is the total amount of error we start with
when prediction scores on Y. It is the amount of
error when everyone is predicted to score at the
mean. It is analogous to the total amount
overweight.
The proportion of error you get rid of
using the regression equation as your
predictor equals Pearson’s correlation
coefficient squared (r2)! It is analogous to
the proportion of weight lost.
69
To get the total error left
find how much you got rid
of, then subtract from
what you started with
Amount you got rid of: SSY * r2
Amount you started with SSY
Amount left: SSRES = SSY – (SSY * r2 )
70
Now you know how much error is left
when you use the regression equation
on your sample (SSRES).
To estimate the average amount of
squared error you will have left if you
use the regression equation to make
predictions for the whole population
divide SSRES by dfREG to obtain MSRES.
To compute the estimated standard
error of the estimate, sEST, just take
the square root of MSRES.
71
Computing sEST the easier
way!
In the problem for which we computed
sEST the long way, we already knew
that SSY = 3.78 and r = -0.790.
Thus, r2 = (-0.790)2=0.624.
Here is the computation:
SSRES = SSY - (SSY * r2) = 3.78 - (3.78 * 0.624) = 1.42
MSRES = 1.42/(7-2) = 0.284
SEST = 0.533
72
Compare the two methods: Estimated
standard error of the estimate
Y
8.1
8.5
7.8
7.3
7.5
7.3
6.0
Y'
8.42
8.23
7.54
7.47
7.01
6.91
6.91
Y - Y'
-0.32
0.27
0.26
-0.17
0.49
0.39
-0.91
(Y - Y')2
0.10
0.07
0.07
0.03
0.24
0.15
0.83
SSRES = 1.49
MSRES = 1.49/(7-2) = 0.298
SEST = 0.546
73
The easier method is more
accurate.
Why? Because there is less rounding.
74
Summary on estimating
the standard error of the
estimate.
75
In generic terms, as usual you
divide the sum of squares (SSRES)
by degrees of freedom
(dfREG= nP-2) to estimate the
average amount of squared error
that will be left.
Then you take a square root to
get the average amount of
unsquared error you will make
when you use your best
prediction to predict Y scores,
which, in this case, is tY'
76
Only change is that you used the regression
equation to get SSRES rather than using the
mean to compute SSY.
So you divide SSRES by dfREG=nP-2 to get MSRES,
to estimate the average amount of squared
error you will have left when you use the
regression equation.
Then take the square root of MSRES to get sEST.
sEST is your best estimate of the average
unsquared error of prediction when you properly
use the regression equation to predict Y scores.
Remember to properly use the regression
equation, r must be significant and X within the
range of X scores observed in your random
sample.
77
Stating the (hopefully) obvious:
 The estimated standard deviation (s) was the estimated
average unsquared distance of scores in the population
from mu. If we are looking at Y scores it is the average
unsquared difference of Y scores from the mean of Y.
 When using the regression equation we are predicting Y
scores other than the mean.
 The estimated standard error of the estimate (sEST) is
the estimated average unsquared distance of Y scores in
the population from the regression equation based
predicted Y scores.
 Both s and sEST reflect error of prediction. Using the
regression equation individualizes prediction. If r is
significant, and prediction restricted to values of X within
the range of X scores seen in the random sample, using
the regression equation leads to less error.
78
How much better is using the
regression equation rather than
the mean as your predictor?
SY = 0.794
SEST = 0.533
.794  .533
 .329  33%
.794
In this case, there was 33% less unsquared error when the regression
equation was used instead of the mean to predict scores on the Y
variable.
Note: the difference between 33% and 31% when we calculated using
the long way is mostly due to rounding error in the long calculation. 33%
is more accurate
79
Here are the formulae again:
Residual sum of squared error if the regression
equation is used:
SSRES = SSY - (SSY * r2)
Estimated average amount of squared error left:
MSRES = SSRES/dfREG = SSRES/(nP-2)
Estimated average amount of unsquared error
left:
sEST = square root of MSRES
80