Class 22. Understanding Regression

Download Report

Transcript Class 22. Understanding Regression

Class 22. Understanding
Regression
Sections 1-3 and 7 of
Pfeifer Regression note
EMBS
Part of 12.7
What is the regression line?
• It is a line drawn through a cloud of points.
• It is the line that minimizes sum of squared
errors.
Error aka residual
– Errors are also known as residuals.
Predicted aka fitted
– Error = Actual – Predicted.
– Error is the vertical distance point (actual) to line
(predicted).
– Points above the line are positive errors.
• The average of the errors will be always be zero
• The regression line will always “go through” the
average X, average Y.
Can you draw the regression line?
Which is the regression line?
A
B
C
D
E
F
Which is the regression line?
D
Which is the regression line?
(2,7)
(1,3)
Error
= 1-3
= -2
Sum of
Errors is 0!
Error
= 7-3
=4
(1,1)
SSE=(-2^2+4^2+-2^2) is
smaller than from any other
line.
(2,3)
(3,3)
(3,1)
Error
= 1-3
= -2
The line goes through (2,3),
the average.
Draw in the regression line…
200
160
140
150
120
100
100
80
60
50
40
20
0
0
50
100
150
0
200
0
160
50
100
150
200
155
140
135
120
100
115
80
95
60
75
40
55
20
0
35
20
70
120
170
0
50
100
150
200
250
300
Draw in the regression line…
200
160
140
150
120
100
100
80
60
50
40
20
0
0
50
100
150
0
200
0
160
50
100
150
200
155
140
135
120
100
115
80
95
60
75
40
55
20
0
35
20
70
120
170
0
50
100
150
200
250
300
Two Points determine a line…
….and regression can give you the equation.
250
200
Degrees F
Degrees C Degrees F
0
32
100
212
150
100
50
0
0
50
100
Degrees C
150
Two Points determine a line…
….and regression can give you the equation.
250
y = 1.8x + 32
200
Degrees F
Degrees C Degrees F
0
32
100
212
150
100
50
0
0
50
100
Degrees C
150
Four Sets of X,Y Data
Data Set A
X
Y
10
9.14
8
8.14
13
8.74
9
8.77
11
9.25
14
8.1
6
6.13
4
3.1
12
9.13
7
7.26
5
4.74
Data Set B
X
Y
10
8.04
8
6.95
13
7.58
9
8.81
11
8.33
14
9.96
6
7.24
4
4.26
12
10.84
7
4.82
5
5.68
Data Set C
X
Y
10
7.47
8
6.47
13
8.97
9
6.97
11
10.87
14
9.47
6
5.47
4
4.47
12
8.47
7
8.87
5
4.97
Data Set D
X
Y
19
12.08
19
11.26
19
13.21
19
14.34
19
13.97
19
12.54
19
10.75
8
7.00
19
11.06
19
13.41
19
12.39
Four Sets of X,Y Data
10
12
A
9
B
10
8
8
7
6
6
5
4
4
3
2
2
0
1
0
5
10
15
0
0
5
10
12
15
16
C
D
14
10
12
8
10
6
8
6
4
4
2
2
0
0
0
5
10
15
0
5
10
15
20
Four Sets of X,Y Data
Data Analysis/Regression
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.8166
R Square
0.6669
Adjusted R Square
0.6299
Standard Error
1.2357
Observations
11
Identical Regression Output
For A, B, C, and D!!!!!
ANOVA
df
Regression
Residual
Total
Intercept
X
1
9
10
SS
MS
F
Significance F
27.5100 27.5100 18.0164
0.0022
13.7425 1.5269
41.2525
Coefficients Standard Error t Stat P-value
2.9993
2.1532 1.3929 0.1971
0.5001
0.1178 4.2446 0.0022
Lower 95% Upper 95% Lower 95.0% Upper 95.0%
-1.8716
7.8702
-1.8716
7.8702
0.2336
0.7666
0.2336
0.7666
Assumptions
• Y is normal and we sample n independent
observations.
– The sample mean 𝑌 is the estimate of μ
– The sample standard deviation s is the estimate of σ.
– We use 𝑌 and s and n to test hypotheses about μ
• Using the t-statistic and the t-distribution with n-1 dof.
– We never forecasted “the next Y”.
• Although, our point forecast for a new Y would be 𝑌
Example: Section 4 IQs
IQ
Mean
Standard Error
Median
Mode
Standard Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
108.545
3.448
110
102
19.807
392.318
0.228
-0.499
85
57
142
3582
33
𝑌
s
To test H0: μ=100
𝑡=
𝑌 − 100
𝑠/ 𝑛
n
The CLT tells us
this test works
even if Y is not
normal.
Regression Assumptions
• Y│X is normal with mean a+bX and standard
deviation σ, and we sample n independent
observations.
– We use regression to estimate a, b, and σ.
• 𝑎, 𝑏, and “standard error” are the appropriate estimates.
• Our point forecast for a new observation is 𝑎 + 𝑏 (X)
– (Plug X into the regression equation)
• At some point, we will learn how to use regression output
to test interesting hypotheses.
• What about a probability forecast of the new YlX?
EMBS
(12.14)
Summary: The key assumption of
In both cases,
we use the t
linear regression…..
because we
don’t know σ.
• Y ~ N(μ,σ) (no regression)
• Y│X ~ N(a+bX,σ) (with regression)
– In other words
μ = a + b (X) or
E(Y│X) = a + b(X)
The mean of Y given X is a
linear function of X.
Without regression,
we used data to
estimate and test
hypotheses about the
parameter μ.
With regression, we use (x,y)
data to estimate and test
hypotheses about the
parameters a and b.
With regression, we also want
to use X to forecast a new Y.
Example: Assignment
22
Standard
MSF
26
34.2
29
34.3
85.9
143.2
85.5
140.6
140.6
40.4
101
239.7
179.3
126.5
140.8
Hours
2
4.17
4.42
4.75
4.83
6.67
7
7.08
7.17
7.17
10
12
12.5
13.67
15.08
Regression Statistics
Multiple R
0.72600331
R Square
0.527080806
Adjusted R Square
0.490702407
Standard Error
2.773595935
Observations
15
error
n
ANOVA
df
Regression
Residual
Total
Intercept
MSF
1
13
14
Coefficients
3.312316042
0.044489502
𝑎
𝑏
Forecasting Y│X=157.3
• Plug X=157.3 into the regression equation to
get 10.31 as the point forecast.
– The point forecast is the mean of the probability
distribution forecast.
• Under Certain Assumptions…….
– GOOD METHOD
Assumes 𝑎 and 𝑏 and
“standard error” are a,
b, and σ.
• Pr(Y<8) = NORMDIST(8,10.31,2.77,true) = 0.202
Example: Assignment
22
Standard
MSF
26
34.2
29
34.3
85.9
143.2
85.5
140.6
140.6
40.4
101
239.7
179.3
126.5
140.8
Hours
2
4.17
4.42
4.75
4.83
6.67
7
7.08
7.17
7.17
10
12
12.5
13.67
15.08
Regression Statistics
Multiple R
0.72600331
R Square
0.527080806
Adjusted R Square
0.490702407
Standard Error
2.773595935
Observations
15
ANOVA
df
Regression
Residual
Total
Intercept
MSF
1
13
14
Coefficients
3.312316042
0.044489502
𝑌│(X=157.3)
error
n
Job A
1
157.3
Intercept
MSF
Point
Forecast 10.3105
sigma
2.77
X
8
Normdist 0.2021
Job B
1
64.7
6.1908
2.77
8
0.7432
𝑎
𝑏
𝑇ℎ𝑖𝑠 𝑖𝑠 𝑡ℎ𝑒 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑌 < 8
𝑔𝑖𝑣𝑒𝑛 𝑋 = 157.3 𝑎𝑛𝑑 𝑤𝑒 𝑘𝑛𝑜𝑤 𝑎, 𝑏, σ.
Forecasting Y│X=157.3
• Plug X=157.3 into the regression equation to
get 10.31 the point forecast.
– The point forecast is the mean of the probability
distribution forecast.
• Under Certain Assumptions…….
– BETTER METHOD
• t= (8-10.31)/2.77 = -0.83
• Pr(Y<8) = 1-t.dist.rt(-0.83,13) = 0.210
dof = n - 2
Assumes 𝑎 and 𝑏 are a
and b….but accounts for
the fact that “standard
error” is not σ
Forecasting Y│X=157.3
• Plug X=157.3 into the regression equation to
get 10.31 the point forecast.
– The point forecast is the mean of the probability
To account for using 𝑎 and 𝑏 to
distribution forecast.
estimate a and b, we must
• Under Certain Assumptions…….
– PERFECT METHOD
• t= (8-10.31)/2.93 = -0.79
• Pr(Y<8) = 1-t.dist.rt(-0.79,13) = 0.222
dof = n - 2
increase the standard deviation
used in the forecast. The
“correct” standard deviation is
called “standard error of
prediction”…which here is
0.293.
Probability Forecasting with Regression
summary
• Plug X into the regression equation to calculate
the point forecast.
– This becomes the mean.
• GOOD
– Use the normal with “standard error” in place of σ.
• BETTER
– Use the t (with n-2 dof) to account for using “standard
error” to estimate σ.
• PERFECT
– Use the t with the “standard error of prediction” to
account for using 𝑎 and 𝑏 to estimate a and b.
Probability Forecasting with Regression
• “Standard error of prediction” is larger than
“standard error” and depends on
– 1/n (the larger the n the smaller is “standard error
of prediction”)
– (X-𝑋)^2 (the farther the X is from the average X,
the larger is “standard error of prediction”)
• As n gets big, the “standard error of
prediction” approaches “standard error”.
(EMBS 12.26)
The X for
which we
predict Y
1
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟 𝑜𝑓 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 = 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟 × 1 + +
𝑛
The good and better
methods ignore these
terms…okay the bigger the
n.
Summed
over the
n data
points
𝑋−𝑋
2
𝑋−𝑋
2
BOTTOM LINE
• You will be asked to use the BETTER METHOD
– Use the t with n-2 dof
– Just use “standard error”
• Know that “standard error” is smaller than the
correct “standard deviation of prediction”.
– As a result, your probability distribution is a little too
narrow.
• Know that the “standard deviation of prediction”
depends on 1/n and (X-𝑋)^2 … which means it
approaches “standard error” as n gets big.
Much ado about nothing?
25
Perfect
(widest and
curved)
95% Prediction Intervals
20
Better
Hours
15
Good
(straight and
narrowest)
10
5
0
0
-5
50
100
150
MSF
200
250
300
TODAY
• Got a better idea of how the “least squares”
regression line goes through the cloud of points.
• Saw that several “clouds” can have exactly the
same regression line….so chart the cloud.
• Practiced using a regression equation to calculate
a point forecast (a mean)
• Saw three methods for creating a probability
distribution forecast of Y│X.
– We will use the better method.
– We will know that it understates the actual
uncertainty…..a problem that goes away as n gets big.
Next Class
• We will learn about “adjusted R square”
– (p 9-10 pfeifer note)
– The most over-rated statistic of all time.
• We will learn the four assumptions required to
use regression to make a probability forecast of
Y│X.
–
(Section 5 pfeifer note, 12.4 EMBS)
– And how to check each of them.
• We will learn how to test H0: b=0.
– (p 12-13 pfeifer note, 12.5 EMBS)
– And why this is such an important test.