Transcript Chapter 14

Chapter 14
Multiple Regression
McGraw-Hill/Irwin
Copyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved.
Multiple Regression
14.1 The Multiple Regression Model and
the Least Squares Point Estimate
14.2 Model Assumptions and the Standard
Error
14.3 R2 and Adjusted R2
14.4 The Overall F Test
14.5 Testing the Significance of an
Independent Variable
14-2
Multiple Regression
Continued
14.6 Confidence and Prediction
Intervals
14.7 Using Dummy Variables to Model
Qualitative Independent Variables
14.8 The Partial F Test: Testing the
Significance of a Portion of a
Regression Model
14.9 Residual Analysis in Multiple
Regression
14-3
The Multiple Regression Model
• Simple linear regression used one independent
variable to explain the dependent variable
– Some relationships are too complex to be described
using a single independent variable
• Multiple regression uses two or more
independent variables to describe the dependent
variable
– This allows multiple regression models to handle more
complex situations
– There is no limit to the number of independent
variables a model can use
• Multiple regression has only one dependent
variable
14-4
The Multiple Regression Model
•
•
•
•
The linear regression model relating y to x1,
x2,…, xk is y = β0 + β1x1 + β2x2 +…+ βkxk + 
µy = β0 + β1x1 + β2x2 +…+ βkxk is the mean
value of the dependent variable y when the
values of the independent variables are x1,
x2,…, xk
β0, β1, β2,… βk are unknown the regression
parameters relating the mean value of y to x1,
x2,…, xk
 is an error term that describes the effects on y
of all factors other than the independent
variables x1, x2,…, xk
14-5
Example 14.1 Fuel Consumption
Case
Week
1
2
3
4
5
6
7
8
Average Hourly
Temperature, x1 (F)
28.0
28.0
32.5
39.0
45.9
57.8
58.1
62.5
Chill Index, x2
18
14
24
22
8
16
1
0
Fuel Consumption
y (MMcf)
12.4
11.7
12.4
10.8
9.4
9.5
8.0
7.5
14-6
Example 14.1: Fuel Consumption Versus
Average Hourly Temperature
14-7
Example 14.1: Weekly Fuel
Consumption Versus the Chill Index
14-8
Example 14.1: A Geometrical
Interpretation of the Regression Model
14-9
The Least Squares Point Estimates
• Estimation/prediction equation
ŷ = b0 + b1x01 + b2x02 + … + bkx0k
is the point estimate of the mean value of the
dependent variable when the values of the
independent variables are x1, x2,…, xk
• It is also the point prediction of an individual value of
the dependent variable when the values of the
independent variables are x1, x2,…, xk
• b0, b1, b2,…, bk are the least squares point estimates
of the parameters β0, β1, β2,…, βk
• x01, x02,…, x0k are specified values of the
independent predictor variables x1, x2,…, xk
14-10
Calculating the Model
• A formula exists for computing the least
squares model for multiple regression
• This formula is written using matrix
algebra and is presented in Appendix G
of the CD-ROM
• In practice, the model can be easily
computed using Excel, MINITAB,
MegaStat or many other computer
packages
14-11
Figure 14.4a: Fuel Consumption Case
MINITAB Output
14-12
Figure 14.4b: Fuel Consumption Case
Excel Output
14-13
Table 14.2: Point Predictions and
Residuals Using Least Squares Estimates
14-14
The Multiple Regression Model
y = β0 + β1x1 + β2x2 + … + βkxk + ε
1. μy = β0 + β1x1 + β2x2 + … + βkxk + ε is
the mean value of the dependent
variable
2. β0, β1, β2, … , βk are unknown
regression parameters relating the
mean value of y to x1, x2, …, xk
3. ε is an error term
14-15
Model Assumptions and the Standard
Error
• The model is
y = β0 + β1x1 + β2x2 + … + βkxk + 
• Assumptions for multiple regression are
stated about the model error terms, ’s
14-16
The Regression Model Assumptions
Continued
• Mean of Zero Assumption
The mean of the error terms is equal to 0
• Constant Variance Assumption
The variance of the error terms σ2 is, the
same for every combination values of x1,
x2,…, xk
• Normality Assumption
The error terms follow a normal distribution
for every combination values of x1, x2,…, xk
• Independence Assumption
The values of the error terms are statistically
independent of each other
14-17
Sum of Squared Errors
SSE   e   ( yi  yˆi )
2
i
2
14-18
Mean Square Error
• This is the point estimate of the residual
variance σ2
• SSE is from last slide
• This formula is slightly different from
simple regression
SSE
s  MSE 
n-k  1
2
14-19
Standard Error
• This is the point estimate of the residual
standard deviation σ
• MSE is from last slide
• This formula too is slightly different from
simple regression
SSE
s  MSE 
n-k  1
14-20
Fuel Consumption Case MINITAB
Output
14-21
R2 and Adjusted R2
1. Total variation is given by the formula
Σ(yi - ȳ)2
2. Explained variation is given by the formula
Σ(ŷi - ȳ)2
3. Unexplained variation is given by the
formula
Σ(yi - ŷi)2
4. Total variation is the sum of explained and
unexplained variation
14-22
R2 and Adjusted R2
5. The multiple coefficient of
determination is the ratio of explained
variation to total variation
6. R2 is the proportion of the total
variation that is explained by the
overall regression model
7. Multiple correlation coefficient R is the
square root of R2
14-23
What Does R2 Mean?
The multiple coefficient of determination, R2, is
the proportion of the total variation in the n
observed values of the dependent variable that
is explained by the multiple regression model
14-24
Multiple Correlation Coefficient R
• The multiple correlation coefficient R is just
the square root of R2
• With simple linear regression, r would take on
the sign of b1
• There are multiple bi’s with multiple
regression
• For this reason, R is always positive
• To interpret the direction of the relationship
between the x’s and y, you must look to the
sign of the appropriate bi coefficient
14-25
The Adjusted R2
• Adding an independent variable to multiple
regression will raise R2
• R2 will rise slightly even if the new variable
has no relationship to y
• The adjusted R2 corrects this tendency in R2
• As a result, it gives a better estimate of the
importance of the independent variables
14-26
Calculating The Adjusted R2
• The adjusted multiple coefficient of
determination is
 2 k  n  1 

R   R  
 n  1  n  (k  1) 
2
14-27
Fuel Consumption Case MINITAB
Output
14-28
A Problem With Adjusted R2
• In the rare case where R2 is less than
k/(n-1), Adjusted R2 will be negative
• For R2 to be less than k/(n-1), there must be
little or no relationship between the
independent variables and y
• When this happens, many statistical software
systems will set adjusted R2 to zero rather
than displaying a negative value
• However, Excel shows the negative value for
adjusted R2
14-29
The Overall F Test
• To test
H0: β1= β2 = …= βk = 0 versus
Ha: At least one of β1, β2,…, βk ≠ 0
• The test statistic is
(Explained variation )/k
F(model) 
(Unexplain ed variation )/[n - (k  1)]
• Reject H0 in favor of Ha if F(model) > F* or
p-value < 
•
*F
is based on k numerator and n-(k+1) denominator degrees of
freedom

14-30
Example 14.3 Fuel Consumption Case
Test Statistic:
F(model) 
(Explained variation )/k
24.875 / 2

 92.30
(Unexplain ed variation )/[n - (k  1)] 0.674 /(8  3)
Reject H0 at  level of significance, since
F-test at  = 0.05
F(model)  92.30  5.79  F.05 and
level of significance
p - value  0.000  0.05  
F is based on 2 numerator and 5 denominator degrees of freedom
14-31
What Next?
• The F test tells us that at least one
independent variable is significant
• The natural question is which ones?
• That question is addressed in the next
section
14-32
Testing the Significance of an
Independent Variable
• A variable in a multiple regression
model is not likely to be useful unless
there is a significant relationship
between it and y
• To test significance, we use the null
hypothesis H0: βj = 0
• Versus the alternative hypothesis
Ha : β j ≠ 0
14-33
Testing Significance of an
Independent Variable #2
If the regression assumptions hold, we
can reject H0: j = 0 at the  level of
significance (probability of Type I error
equal to ) if and only if the appropriate
rejection point condition holds or,
equivalently, if the corresponding pvalue is less than 
14-34
Testing Significance of an Independent
Variable #3
Alternative
Reject H0 If p-Value
Ha: βj > 0
t > tα
Area under t distribution
right of t
Ha: βj < 0
t < –tα
Area under t distribution
left of t
Ha: βj ≠ 0
|t| > t/2*
Twice area under t
distribution right of |t|
* That
is t > t/2 or t < –t/2
14-35
Testing Significance of an Independent
Variable #4
• Test Statistics
t=
bj
sbj
• 100(1-)% Confidence Interval for βj
[b1 ± t/2 Sbj]
• t, t/2 and p-values are based on
n-(k+1) degrees of freedom
14-36
Testing Significance of an Independent
Variable #5
• It is customary to test the significance of every
independent variable in a regression model
• If we can reject H0: βj = 0 at the 0.05 level of
significance, we have strong evidence that the
independent variable xj is significantly related to y
• If we can reject H0: βj = 0 at the 0.01 level of
significance, we have very strong evidence that
the independent variable xj is significantly related
to y
• The smaller the significance level  at which H0
can be rejected, the stronger the evidence that xj is
significantly related to y
14-37
A Note on Significance Testing
• Whether the independent variable xj is
significantly related to y in a particular
regression model is dependent on what other
independent variables are included in the
model
• That is, changing independent variables can
cause a significant variable to become
insignificant or cause an insignificant variable
to become significant
• This issue is addressed in a later section on
multicollinearity
14-38
Fuel Consumption Case: Calculation of
the t Statistics
Chill is significant at the  = 0.05 level, but not at  = 0.01
t, t/2 and p-values are based on 5 degrees of freedom
14-39
Fuel Consumption Case: The MINITAB
and Excel Output
14-40
A Confidence Interval for the
Regression Parameter βj
• If the regression assumptions hold,
100(1-)% confidence interval for βj
is [b1 ± t/2 Sbj]
• t/2 is based on n – (k + 1) degrees of
freedom
14-41
Example 14.6 Fuel Consumption Case
• We know b1 = –0.09001 and sb1 = 0.01408
• The t value with n-(k+1)=5 degrees of
freedom and a 95 percent confidence interval
is 2.571
• This gives us the information we need to
compute a confidence interval for b1
[b1 ± t/2sb1] = [-0.09001± 2.571·0.01408] =
[-0.1262, -0.0538]
• Thus, we can be 95 percent confident that b1
is between –0.1262 and –0.0538
14-42
Confidence and Prediction Intervals
• The point on the regression line
corresponding to a particular value of x01,
x02,…, x0k, of the independent variables is
ŷ = b0 + b1x01 + b2x02 + … + bkx0k
• It is unlikely that this value will equal the
mean value of y for these x values
• Therefore, we need to place bounds on how
far the predicted value might be from the
actual value
• We can do this by calculating a confidence
interval for the mean value of y and a
prediction interval for an individual value of y
14-43
Distance Value
• Both the confidence interval for the mean
value of y and the prediction interval for an
individual value of y employ a quantity called
the distance value
• With simple regression, we were able to
calculate the distance value fairly easily
• However, for multiple regression, calculating
the distance value requires matrix algebra
• See Appendix G on CD-ROM for more detail
14-44
A Confidence Interval for a Mean
Value of y
• Assume the regression assumptions
hold
• The formula for a 100(1-) confidence
interval for the mean value of y is as
follows:
[ŷ  t /2 s( y  yˆ ) ] s( y  yˆ )  s Distance value
• This is based on n-(k+1) degrees of
freedom
14-45
A Prediction Interval for an Individual
Value of y
• Assume the regression assumptions
hold
• The formula for a 100(1-) prediction
interval for an individual value of y is as
follows:
[ŷ  t /2 s yˆ ], s yˆ  s 1 + Distance value
• This is based on n-(k+1) degrees of
freedom
14-46
Example 14.7 Fuel Consumption Case
• Recall from Example 14.1 that
ŷ = 13.1087 – 0.09001 x1 + 0.08249 x2
• For x1 = 40 and x2 = 10, ŷ = 10.333
• 95 Percent confidence interval
[ŷ  t /2 s Distance value ]
[10.333  (2.571)(0.3671) 0.2144515 ]
[10.333  0.438]
[9.895,10.771]
• 95 Percent prediction interval
[ŷ  t /2 s 1  Distance value ]
[10.333  (2.571)( 0.3671) 1  0.2144515 ]
[10.333  1.041]
[9.292,11.374]
14-47
Using Dummy Variables to Model
Qualitative Independent Variables
• So far, we have only looked at including
quantitative data in a regression model
• However, we may wish to include descriptive
qualitative data as well
– For example, might want to include the gender of
respondents
• We can model the effects of different levels of
a qualitative variable by using what are called
dummy variables
– Also known as indicator variables
14-48
How to Construct Dummy Variables
• A dummy variable always has a value of
either 0 or 1
• For example, to model sales at two
locations, would code the first location
as a zero and the second as a 1
– Operationally, it does not matter which is
coded 0 and which is coded 1
14-49
Example 14.9: The Electronics World
Case #1
Store
1
2
3
4
5
6
7
8
9
10
Number of
Households
x
161
99
135
120
164
221
179
204
214
101
Location
Street
Street
Street
Street
Street
Mall
Mall
Mall
Mall
Mall
Location
Dummy
DM
0
0
0
0
0
1
1
1
1
1
Sales
Volume
y
157.27
93.28
136.81
123.79
153.51
241.74
201.54
206.71
229.78
135.22
Location Dummy Variable
DM 
1 if a store is in a mall location
0 otherwise
14-50
Example 14.9: Plot of Sales Volume and
Geometrical Interpretation of the Model
14-51
Example 14.9: Excel Output of a
Regression Analysis
14-52
What If We Have More Than Two
Categories?
• Consider having three categories, say A, B
and C
• Cannot code this using one dummy variable
– A=0, B=1 and C=2 would be invalid
– Assumes the difference between A and B is the
same as B and C
• We must use multiple dummy variables
– Specifically, k categories requires k-1 dummy
variables
14-53
What If We Have More Than Two
Categories?
Continued
• For A, B, and C, would need two
dummy variables
– x1 is 1 for A, zero otherwise
– x2 is 1 for B, zero otherwise
– If x1 and x2 are zero, must be C
• This is why the third dummy variable is not
needed
14-54
Interaction Models
• So far, have only considered dummy
variables as stand-alone variables
– Model so far is y = β0 + β1x + β2D + 
– Where D is dummy variable
• However, can also look at interaction
between dummy variable and other variables
– That model would take the form
y = β0 + β1x + β2D + β3xD + 
• With an interaction term, both the intercept
and slope are shifted
14-55
Other Uses
• So far, we have seen dummy variables used
to code categorical variables
• Can be used to flag unusual events with an
impact on the dependent variable
• These can be one-time events
– Impact of a strike on sales
– Impact of major sporting event coming to town
• Or they can be reoccurring events
– Hot temperatures on soft drink sales
– Cold temperatures on coat sales
14-56
The Partial F Test: Testing the Significance
of a Portion of a Regression Model
• So far, we have looked at testing single
slope coefficients using t test
• We have also looked at testing all the
coefficients at once using F test
• The partial F test allows us to test the
significance of any set of independent
variables in a regression model
14-57
The Partial F Test Model
• Complete Model
y = β0 + β1x1+…+βgxg + βg+1xg+1+…+βkxk + 
• Reduced Model
y = β0 + β1x1 + … + βgxg + 
14-58
The Partial F Test Model
Continued
• To test
H0: βg+1 = βg+2 = …= βk = 0 versus
Ha: At least one of the βg+1, βg+2,…, βk ≠ 0
(SSE R - SSE C )/(k - g)
F
SSE C /[n - (k  1)]
• Reject H0 in favor of Ha if:
– F > F or
– p-value < 
• F is based on k-g numerator and n-(k+1) denominator degrees
of freedom
14-59
Example 14.10: Electronic World
• The model from Example 14.9 is:
y = β 0 + β 1x + β 2D M + β 3D D + ε
– DM and DD are dummy variables
– This called the complete model
14-60
Example 14.10: Electronic World #2
• Will now look at just the reduced model:
y = β0 + β1 x + e
• This gives us the hypotheses:
H0: β2 = β3 = 0
Ha: At least one of β2 and β3  0
• The SSE for the complete model is SSEC =
443.4650
• The SSE for the reduced model is SSER =
2,467.8067
14-61
Example 14.10: Electronic World #3

SSER  SSEc  / k  g 
F
SSEc /n  k  1
2,467.8067  443.4650 / 2

443.4650 /15  4
 25.1066
14-62
Example 14.10: Electronic World #4
• We compare F with F.01 = 7.21
– Based on k – g = 2 numerator degrees of freedom
– And n – (k + 1) = 11 denominator degrees of
freedom
– Note that k – g denotes the number of regression
parameters set to 0
• Since F = 25.1066 > 7.21 we reject the null
hypothesis
• We conclude that at least two locations have
different effects on mean sales volume
14-63
Residual Analysis in Multiple
Regression
• For an observed value of yi, the
residual is
ei = yi - ŷ = yi – (b0 + b1xi1 + … + bkxik)
• If the regression assumptions hold, the
residuals should look like a random
sample from a normal distribution with
mean 0 and variance σ2
14-64
Example: MegaStat Residual Plots for
the Sales Territory Performance Model
14-65
Residual Plots
• Residuals versus each independent
variable
• Residuals versus predicted y’s
• Residuals in time order (if the response
is a time series)
14-66