Prediction concerning Y variable

Download Report

Transcript Prediction concerning Y variable

Prediction concerning Y variable
Three different research
questions
• What is the mean response, E(Yh), for a
given level, Xh, of the predictor variable?
• What would one predict a new observation,
Yh(new) , to be for a given level, Xh, of the
predictor variable?
• What would one predict the mean of m new
observations, Y h(new ) , to be for a given level,
Xh, of the predictor variable?
Example: Mortality and Latitude
• What is the expected (mean) mortality rate
for locations at 40o N latitude?
• What is the predicted mortality rate for a
new randomly selected location at 40o N?
• What is the predicted mortality rate for 10
new randomly selected locations at 40o N?
Regression Plot
Mortality = 389.189 - 5.97764 Latitude
S = 19.1150
R-Sq = 68.0 %
R-Sq(adj) = 67.3 %
Mortality
200
150
100
30
40
Latitude
50
Point estimators
Yˆh  b0  b1 X h
is the best point estimator in each case.
That is, it is:
• the best guess of the mean response at Xh
• the best guess of a new observation at Xh
• the best guess of a mean of m new observations at Xh
But, as always, to be confident in the answer to our research
question, we should put an interval around our best guess.
Interval estimation of mean
response E(Yh)
Sampling distribution of Y-hat-h
Providing error terms εi are normally distributed:
Y-hat-h is normally distributed
with mean E(Yh)
and variance


2 
1

Xh  X 
2

   n
2
n
X i  X  


i 1

Implications on precision
• The greater the spread in the Xi values, the
smaller the variance of Y-hat-h, the more
precise the prediction of E(Yh).
• Given the same set of Xi values, the further
Xh is from the (sample) mean of the Xi, the
greater the variance of Y-hat-h, the less
precise the prediction of E(Yh).
Estimate of the variance
Estimate variance


2 
1

Xh  X 
2

   n
2
n
X i  X  


i 1



2 
1

Xh  X 
2 ˆ

s Yh  MSE  n
2
n
X i  X  


i 1

with
 
 
Then, the estimated standard deviation is s Yˆh
Confidence interval for E(Yh)
Sample estimate ± margin of error
 


ˆ
ˆ
Yh   t1 ,n  2   s Yh
2


The estimation in Minitab
• Stat >> Regression >> Regression …
• Specify response and predictor(s).
• Select Options… In “Prediction intervals
for new observations” box, specify either
the X value or a column name containing
multiple X values. Specify confidence level.
• Click on OK. Click on OK.
• Results appear in session window.
Predicted Values for New Observations
New Fit SE Fit
95.0% CI
95.0% PI
1 150.08 2.75 (144.6,155.6) (111.2,188.93)
2 221.82 7.42 (206.9,236.8) (180.6,263.07)X
X denotes a row with X values away from the
center
Values of Predictors for New Observations
New Obs Latitude
1
40.0
2
28.0
Minitab output
“Fit” is
Yˆh  b0  b1 X h  389.19  5.9776(40)  150.08
“SE Fit” is
 
1
s Yˆh  MSE

n
X  X 
 X  X 
2
h
n
i 1
 2.75
2
i
t0.975, 47  2.0117
Therefore, the “95% CI” for E(Yh) is
150.08  (2.0117)(2.75)
150.08  5.53
(144.55, 155.61)
Difference in precision of estimates
• The mean of the 49 latitudes in the data set is
39.5o N.
• SE Fit for Xh=40 is 2.75.
• SE Fit for Xh=28 is 7.42 (larger as expected).
• The closer Xh is to the sample mean, the
narrower the confidence interval, the more
precise the estimate of E(Yh).
Comments on assumptions
• Xh is value within scope of model, that is, within
range of X values in data set, but not necessary that
it is one of the X values.
• It is OK to use the formula for the confidence
interval for E(Yh) even if the error terms are only
approximately normally distributed.
• If you have a large sample, the error terms can even
deviate substantially from normality without greatly
affecting appropriateness of the confidence interval.
Prediction of a New Observation
Restatement of problem
• We previously estimated the mean response
E(Yh). That is, we estimated the mean of the
distribution of Y at a given Xh.
• Now, we want to predict a new response Yh(new).
That is, we predict an individual outcome Y at a
given Xh.
• Most outcomes Y deviate from the mean response
E(Yh). We must take this into account when we
predict Yh(new).
How to obtain a prediction interval if
distribution of Y is known
• If you know the distribution of Y, you know
– its shape (say, it’s normal)
– its mean (say, it’s μ (“mu”))
– its standard deviation (say, it’s σ (“sigma”))
• BASIC IDEA: Using the distribution,
determine a range in which most of the Y
observations will fall. Claim that the next
observation will fall there, too.
Example: High school GPA (X) and
College GPA (Y)
• Distribution of college GPA (Y) depends on high
school GPA (X) through intercept and slope
parameters.
• Suppose:
– Y is normally distributed
– Mean is E(Y) = 0.10 + 0.95 X
– Standard deviation σ (“Sigma”) = 0.12
• For students with X = 3.5 high school GPA:
– E(Y) = 0.10 + 0.95(3.5) = 3.425
Example: 99.7% prediction interval
for Yh(new)
• The probability that a randomly selected
high school student with a GPA of 3.5 will
have a college GPA between
– 3.425 - 3(0.12) = 3.065 and
– 3.425 + 3(0.12) = 3.785
is 0.997.
But we have a problem …
• The last calculation was possible because we
knew β0, β1, and σ. Hence, we knew the
mean and variance, E(Y) and σ2,
respectively, of the distribution of Y.
• We could consider estimating E(Y) and σ2
with Y-hat-h and MSE, respectively, and
applying the same method as before.
• But, it’s not quite right. Here’s why.
So …
• We cannot be certain of the location (mean)
of the distribution of Y.
• Prediction limits for Yh(new) must take into
account:
– variation in possible location (mean) of the
distribution of Y
– variation in the Y of the probability distribution
Variation of the prediction
The variation in the prediction of a new response depends
on two components: the variation due to estimating E(Yh)
with Y-hat-h and the variation in Y within the probability
distribution.
2
ˆ
 ( pred)   (Yh )  
2
2
which is estimated by:




2 
2 
1
 1




X

X
X

X
  MSE1   n h

s 2  pred   MSE  MSE  n h
2
2
n
 n




X

X
X

X


i
i




i 1
i 1


Prediction interval for Yh(new)
Providing error terms εi are normally distributed:


ˆ
Yh   t 1 ,n  2    s  pred 
2


The prediction in Minitab
• Stat >> Regression >> Regression …
• Specify response and predictor(s).
• Select Options… In “Prediction intervals
for new observations” box, specify either
the X value or a column name containing
multiple X values. Specify confidence level.
• Click on OK. Click on OK.
• Results appear in session window.
S = 19.12
R-Sq = 68.0%
R-Sq(adj)= 67.3%
Predicted Values for New Observations
New Fit SE Fit
95.0% CI
95.0% PI
1 150.08 2.75 (144.6,155.6) (111.2,188.93)
2 221.82 7.42 (206.9,236.8) (180.6,263.07)X
X denotes a row with X values away from the
center
Values of Predictors for New Observations
New Obs Latitude
1
40.0
2
28.0
Minitab output
“Fit” is
Yˆh  b0  b1 X h  389.19  5.9776(40)  150.08



2 
X  X    19.122  2.752  373.1369
1
s 2  pred   MSE  MSE  n h
n
2 


X

X



i
i 1


s( pred)  373.1369  19.32
t0.975, 47  2.0117
Therefore, the “95% PI” for Yh(new) is
150.08  (2.0117)(19.32)
150.08  38.8595
(111.2, 188.93)
As always, some comments…
• In general, prediction intervals are wider
than confidence intervals.
• Prediction intervals are (somewhat) wider
the further Xh is from the mean of the X
values.
• The formula for the prediction interval
depends strongly on the assumption that the
error terms are normally distributed.
Remember the distinction …
• A confidence interval concerns the
estimation of an unknown parameter. It is
an interval that is intended to cover the
value of the unknown parameter.
• A prediction interval, on the other hand, is
a statement about the value to be taken by a
random variable, here, the new observation
Yh(new).
Getting a plot of the CI and PI in
Minitab
• Stat >> Regression >> Fitted line plot …
• Specify predictor and response.
• Under Options …Select Display confidence
bands. Select Display prediction bands.
Specify desired confidence level.
• Select OK. Select OK.
Regression Plot
Mortality = 389.189 - 5.97764 Latitude
S = 19.1150
R-Sq = 68.0 %
R-Sq(adj) = 67.3 %
Mortality
250
150
Regression
95% CI
95% PI
50
30
40
Latitude
50
Row
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Xh
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
Xbar
39.533
39.533
39.533
39.533
39.533
39.533
39.533
39.533
39.533
39.533
39.533
39.533
39.533
39.533
39.533
39.533
39.533
39.533
39.533
39.533
39.533
39.533
sumsqX
1020.54
1020.54
1020.54
1020.54
1020.54
1020.54
1020.54
1020.54
1020.54
1020.54
1020.54
1020.54
1020.54
1020.54
1020.54
1020.54
1020.54
1020.54
1020.54
1020.54
1020.54
1020.54
n
49
49
49
49
49
49
49
49
49
49
49
49
49
49
49
49
49
49
49
49
49
49
MSE
365.383
365.383
365.383
365.383
365.383
365.383
365.383
365.383
365.383
365.383
365.383
365.383
365.383
365.383
365.383
365.383
365.383
365.383
365.383
365.383
365.383
365.383
SD_EY
7.42147
6.86862
6.32406
5.79013
5.27006
4.76838
4.29156
3.84884
3.45337
3.12313
2.88066
2.74927
2.74497
2.86833
3.10416
3.42933
3.82112
4.26117
4.73606
5.23632
5.75533
6.28846
SD_Pred
20.5052
20.3116
20.1340
19.9727
19.8282
19.7008
19.5908
19.4986
19.4244
19.3685
19.3308
19.3117
19.3111
19.3290
19.3654
19.4202
19.4932
19.5842
19.6930
19.8192
19.9626
20.1228
Prediction of the mean of m new
observations for given Xh
Same thinking as before …just a
slight adjustment
• We cannot be certain of the location (mean)
of the distribution of the Y. The best
estimate is Y-hat-h.
• Prediction limits for Yh(new) must take into
account:
– variation in possible location (mean) of the
distribution of the Y
– variation in the Y within the probability
distribution
Variation of the prediction
The variation in the prediction of the mean of m new
responses depends on two components: the variation due to
estimating E(Yh) with Y-hat-h and the variation in the
sample means within the probability distribution.
 2 ( predm ean)   2 (Yˆh ) 
2
m
which is estimated by:




2
2
1
1 1


Xh  X  
Xh  X  
MSE
2
  MSE   n

s  predm ean 
 MSE  n
2
2
m
n
m n




X

X
X

X


i
i




i 1
i 1


Prediction interval for Yh(new)
Providing error terms εi are normally distributed:


ˆ
Yh   t1 ,n  2    s  predmean
2



Predict mean of m=10 new responses
“Fit” is
Yˆh  b0  b1 X h  389.19  5.9776(40)  150.08



2 
2


X

X
MSE
1
19
.
12
2

s  predmean 
 MSE  n h
 2.752  44.12
n
2 
m
10


X

X



i
i

1


s( predmean)  44.12  6.64
t0.975, 47  2.0117
Therefore, the “95% PI” for Yh(new) is
150.08  (2.0117)(6.64)
150.08  13.358
(136.7, 163.4)