The Multiple Regression Model - McGraw Hill Higher Education

Download Report

Transcript The Multiple Regression Model - McGraw Hill Higher Education

Chapter 14
Multiple Regression
McGraw-Hill/Irwin
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved.
Multiple Regression
14.1 The Multiple Regression Model and the
Least Squares Point Estimate
14.2 Model Assumptions and the Standard Error
14.3 R2 and Adjusted R2
14.4 The Overall F Test
14.5 Testing the Significance of an Independent
Variable
14-2
Multiple Regression
Continued
14.6
14.7
14.8
Confidence and Prediction Intervals
The Sales Territory Performance Case
Using Dummy Variables to Model
Qualitative Independent Variables
14.9 The Partial F Test: Testing the
Significance of a Portion of a Regression
Model
14.10 Residual Analysis in Multiple Regression
14-3
LO 1: Explain the
multiple regression
model and the related
least squares point
estimates.

Simple linear regression used one independent
variable to explain the dependent variable


Some relationships are too complex to be described using
a single independent variable
Multiple regression uses two or more independent
variables to describe the dependent variable



14.1 The Multiple Regression
Model and the Least Squares
Point Estimate
This allows multiple regression models to handle more
complex situations
There is no limit to the number of independent variables a
model can use
Multiple regression has only one dependent variable
14-4
LO1




The Multiple Regression
Model
The linear regression model relating y to x1, x2,…,
xk is y = β0 + β1x1 + β2x2 +…+ βkxk + 
µy = β0 + β1x1 + β2x2 +…+ βkxk is the mean value
of the dependent variable y when the values of the
independent variables are x1, x2,…, xk
β0, β1, β2,… βk are unknown the regression
parameters relating the mean value of y to x1, x2,…,
xk
 is an error term that describes the effects on y of
all factors other than the independent variables x1,
x2,…, xk
14-5
LO 2: Explain the
assumptions behind
multiple regression and
calculate the standard
error.

14.2 Model Assumptions
and the Standard Error
The model is
y = β0 + β1x1 + β2x2 + … + βkxk + 

Assumptions for multiple regression are
stated about the model error terms, ’s
14-6
LO2
The Regression Model
Assumptions
Continued
1.
2.
3.
4.
Mean of Zero Assumption
The mean of the error terms is equal to 0
Constant Variance Assumption
The variance of the error terms σ2 is, the same for
every combination values of x1, x2,…, xk
Normality Assumption
The error terms follow a normal distribution for
every combination values of x1, x2,…, xk
Independence Assumption
The values of the error terms are statistically
independent of each other
14-7
LO 3: Calculate and
interpret the multiple
and adjusted multiple
coefficients of
determination.
1.
2.
3.
4.
14.3 R2 and Adjusted R2
Total variation is given by the formula
Σ(yi - ȳ)2
Explained variation is given by the formula
Σ(ŷi - ȳ)2
Unexplained variation is given by the formula
Σ(yi - ŷi)2
Total variation is the sum of explained and
unexplained variation
This section can be read
anytime after reading Section
14.1
14-8
LO 4: Test the
significance of a
multiple regression
model by using an F
test.


14.4 The Overall F Test
To test
H0: β1= β2 = …= βk = 0 versus
Ha: At least one of β1, β2,…, βk ≠ 0
The test statistic is
(Explained variation )/k
F(model) 
(Unexplain ed variation )/[n - (k  1)]


Reject H0 in favor of Ha if F(model) > F* or
p-value < 
*F is based on k numerator and n-(k+1)

denominator degrees of freedom
14-9
LO 5: Test the
significance of a single
independent variable.



14.5 Testing the Significance
of an Independent Variable
A variable in a multiple regression model is
not likely to be useful unless there is a
significant relationship between it and y
To test significance, we use the null
hypothesis H0: βj = 0
Versus the alternative hypothesis
H a: β j ≠ 0
14-10
LO 6: Find and interpret
a confidence interval for
a mean value and a
prediction interval for an
individual value.

14.6 Confidence and
Prediction Intervals
The point on the regression line corresponding to a
particular value of x01, x02,…, x0k, of the independent
variables is
ŷ = b0 + b1x01 + b2x02 + … + bkx0k



It is unlikely that this value will equal the mean value
of y for these x values
Therefore, we need to place bounds on how far the
predicted value might be from the actual value
We can do this by calculating a confidence interval
for the mean value of y and a prediction interval for
an individual value of y
14-11
LO 7: Use dummy
variables to model
qualitative independent
variables.


So far, we have only looked at including quantitative
data in a regression model
However, we may wish to include descriptive
qualitative data as well


14.8 Using Dummy Variables to
Model Qualitative Independent
Variables
For example, might want to include the gender of
respondents
We can model the effects of different levels of a
qualitative variable by using what are called dummy
variables

Also known as indicator variables
14-12
LO 8: Test the
significance of a portion
of a regression model
by using an F test.



14.9 The Partial F Test: Testing the
Significance of a Portion of a
Regression Model
So far, we have looked at testing single slope
coefficients using t test
We have also looked at testing all the
coefficients at once using F test
The partial F test allows us to test the
significance of any set of independent
variables in a regression model
14-13
LO 9: Use residual
analysis to check the
assumptions of multiple
regression.

14.10 Residual Analysis
in Multiple Regression
For an observed value of yi, the residual is
ei = yi - ŷ = yi – (b0 + b1xi1 + … + bkxik)

If the regression assumptions hold, the
residuals should look like a random sample
from a normal distribution with mean 0 and
variance σ2
14-14