Predictions from the Multiple Regression Models

Download Report

Transcript Predictions from the Multiple Regression Models

Chapter 13
Multiple Regression
©
Multiple Regression Model
Multiple regression enables us to
determine the simultaneous effect of
several independent variables on a
dependent variable using the least
squares principle.
Y  f ( X1 , X 2 ,, X K )
Multiple Regression Objectives
Multiple regression provides two important results:
1. A linear equation that predicts the dependent variable, Y, as a
function of “K” independent variables, xji, j = 1 , . . K.
yˆ  b0  b1 x1i  b2 x2i    bk xki
2.
The marginal change in the dependent variable, Y, that is
related to a change in the independent variables – measured
by the partial coefficients, bj’s. In multiple regression these
partial coefficients depend on what other variables are
included in the model. The coefficients bj indicates the
change in Y given a unit change in xj while controlling for the
simultaneous effect of the other independent variables. (In
some problems both results are equally important. However,
usually one will predominate.
Multiple Regression Model
(Example 11.1)
Year
Revenue
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
3.92
3.61
3.32
3.07
3.06
3.11
3.21
3.26
3.42
3.42
3.45
3.58
3.66
3.78
3.82
3.97
4.07
4.25
4.41
4.49
4.7
4.58
4.69
4.71
4.78
Number of Offices Profit Margin
7298
0.75
6855
0.71
6636
0.66
6506
0.61
6450
0.7
6402
0.72
6368
0.77
6340
0.74
6349
0.9
6352
0.82
6361
0.75
6369
0.77
6546
0.78
6672
0.84
6890
0.79
7115
0.7
7327
0.68
7546
0.72
7931
0.55
8097
0.63
8468
0.56
8717
0.41
8991
0.51
9179
0.47
9318
0.32
Multiple Regression Model
POPULATION MULTIPLE REGRESSION MODEL
The population multiple regression model defines the
relationship between a dependent or endogenous
variable, Y, and a set of independent or exogenous
variables, xj, j=1, . . , K. The xji’s are assumed to be fixed
numbers and Y is a random variable, defined for each
observation, i, where i = 1, . . ., n and n is the number of
observations. The model is defined as
Yi  0  1 x1i   2 x2i     K xKi   i
Where the j’s are constant coefficients and the ’s are
random variables with mean 0 and variance 2.
Standard Multiple Regression
Assumptions
The population multiple regression model is
Yi  0  1 x1i   2 x2i     K xKi   i
and we assume that n sets of observations are available. The
following standard assumptions are made for the model.
1. The x’s are fixed numbers, or they are realizations of random
variables, Xji that are independent of the error terms, i’s. In
the later case, inference is carried out conditionally on the
observed values of the xji’s.
2. The error terms are random variables with mean 0 and the
same variance, 2. The later is called homoscedasticity or
uniform variance.
E[ i ]  0 and
E[ i ]   2 for (i  1, , n)
2
Standard Multiple Regression
Assumptions
(continued)
3.
The random error terms, i , are not correlated with one
another, so that
E[ i j ]  0 for all i  j
4.
It is not possible to find a set of numbers, c0, c1, . . . , ck, such
that
c0  c1 x1i  c2 x2i    cK xKi  0
This is the property of no linear relation for the Xj’s.
Least Squares Estimation and the
Sample Multiple Regression
We begin with a sample of n observations denoted as (x1i, x2i, . . ., xKi,
yi i = 1, . . ,n) measured for a process whose population multiple
regression model is
Yi  0  1 x1i   2 x2i     K xKi   i
The least-squares procedure obtains estimates of the coefficients, 1,
2, . . .,K are the values b0 , b1, . . ., bK, for which the sum of the
squared deviations
n
SSE   ( yi  b0  b1 x1i  b2 x2i  bK xKi ) 2
i 1
is a minimum.
The resulting equation
yi  b0  b1 x1  b2 x2    bK xK
is the sample multiple regression of Y on X1, X2, . . ., XK.
Multiple Regression Analysis for
Profit Margin Analysis
(Using Example 11.1)
The regression equation is:
Y Profit Margin = 1.56 + 0.382 X1 Revenue – 0.00025 X2 Office Space
Regression Statistics
Multiple R
0.930212915
R Square
0.865296068
Adjusted R Square 0.853050256
Standard Error
0.053302217
Observations
25
ANOVA
df
Regression
Residual
Total
Intercept
Revenue
Number of Offices
b0
2
22
24
SS
0.40151122
0.06250478
0.464016
MS
F
Significance F
0.20075561 70.66057082
2.64962E-10
0.002841126
Coefficients Standard Error
t Stat
P-value
1.564496771
0.079395981 19.70498685 1.81733E-15
0.237197475
0.055559366 4.269261695 0.000312567
-0.000249079
3.20485E-05 -7.771949195 9.50879E-08
b1
b2
Sum of Squares Decomposition and the
Coefficient of Determination
Given the multiple regression model fitted by least squares
yi  b0  b1 x1i  b2 x2i    bK xKi  ei  yˆi  ei
Where the bj’s are the least squares estimates of the coefficients of the
population regression model and e’s are the residuals from the
estimated regression model.
The model variability can be partitioned into the components
SST  SSR  SSE
Where
n
Total Sum of Squares
SST   ( yi  y )
2
i 1
n
n
i 1
i 1
SST   ( yˆ i  y ) 2   ( yi  yˆ i ) 2
Sum of Squares Decomposition and the
Coefficient of Determination
(continued)
Error Sum of Squares:
n
n
i 1
i 1
SSE   ( yi  yˆ i ) 2   eˆi2
Regression Sum of Squares:
n
SSR   ( yˆ i  y ) 2
i 1
This decomposition can be interpreted as
Total sample variability  Explained variability  Unexplained variability
Sum of Squares Decomposition and the
Coefficient of Determination
(continued)
The coefficient of determination, R2, of the fitted
regression is defined as the proportion of the total sample
variability explained by the regression and is
SSR
SSE
R 
 1
SST
SST
2
and it follows that
0  R2  1
Estimation of Error Variance
Given the population regression model
Yi  0  1 x1i   2 x2i     K xKi   i
And the standard regression assumptions, let 2 denote the
common variance of the error term i. Then an unbiased
estimate of that variance is n
2
e
i
SSE
s 

n  K 1 n  K 1
2
e
i 1
The square root of the variance, Se is also called the standard
error of the estimate.
Multiple Regression Analysis for
Profit Margin Analysis
(Using Example 11.1)
The regression equation is:
Y Profit Margin = 1.56 + 0.382 X1 Revenue – 0.00025 X2 Office Space
Regression Statistics
Multiple R
0.930212915
R Square
0.865296068
Adjusted R Square 0.853050256
Standard Error
0.053302217
Observations
25
R2
se
SSR
SSE
ANOVA
df
Regression
Residual
Total
Intercept
Revenue
Number of Offices
b0
2
22
24
SS
0.40151122
0.06250478
0.464016
MS
F
Significance F
0.20075561 70.66057082
2.64962E-10
0.002841126
Coefficients Standard Error
t Stat
P-value
1.564496771
0.079395981 19.70498685 1.81733E-15
0.237197475
0.055559366 4.269261695 0.000312567
-0.000249079
3.20485E-05 -7.771949195 9.50879E-08
b1
b2
Adjusted Coefficient of
Determination
The adjusted coefficient of determination, R2, is defined
as
SSE /( n  K  1)
R  1
SST /( n  1)
2
We use this measure to correct for the fact that nonrelevant independent variables will result in some small
reduction in the error sum of squares. Thus the adjusted
R2 provides a better comparison between multiple
regression models with different numbers of independent
variables.
Coefficient of Multiple Correlation
The coefficient of multiple correlation, is the correlation
between the predicted value and the observed value of the
dependent variable
2
ˆ
R  Corr (Y , y)  R
and is equal to the square root of the multiple coefficient of
determination. We use R as another measure of the
strength of the linear relationship between the dependent
variable and the independent variables. Thus it is
comparable to the correlation between Y and X in simple
regression.
Basis for Inference About the
Population Regression Parameters
Let the population regression model be
Yi  0  1 x1i   2 x2i     K xKi   i
Let b0, b1 , . . , bK be the least squares estimates of the population
parameters and sb0, sb1, . . ., sbK be the estimated standard
deviations of the least squares estimators. Then if the standard
regression assumptions hold and if the error terms i are
normally distributed, the random variables corresponding to
tb j 
bj   j
sb j
( j  1,2,, K )
are distributed as Student’s t with (n – K – 1) degrees of
freedom.
Confidence Intervals for Partial
Regression Coefficients
If the regression errors i , are normally distributed and the standard
regression assumptions hold, the 100(1 - )% confidence
intervals for the partial regression coefficients j, are given by
b j  t( n K 1, / 2) sb j   j  b j  t( n K 1, / 2) sb j
Where t(n – K - 1, /2) is the number for which
P(t( n  K 1)  t( n K 1, / 2) )   / 2
And the random variable t(n – K - 1) follows a Student’s t distribution
with (n – K - 1) degrees of freedom.
Multiple Regression Analysis for
Profit Margin Analysis
(Using Example 11.1)
The regression equation is:
Y Profit Margin = 1.56 + 0.382 X1 Revenue – 0.00025 X2 Office Space
Regression Statistics
Multiple R
0.930212915
R Square
0.865296068
Adjusted R Square 0.853050256
Standard Error
0.053302217
Observations
25
ANOVA
df
Regression
Residual
Total
Intercept
Revenue
Number of Offices
2
22
24
SS
0.40151122
0.06250478
0.464016
MS
F
Significance F
0.20075561 70.66057082
2.64962E-10
0.002841126
Coefficients Standard Error
t Stat
P-value
1.564496771
0.079395981 19.70498685 1.81733E-15
0.237197475
0.055559366 4.269261695 0.000312567
-0.000249079
3.20485E-05 -7.771949195 9.50879E-08
b1
b2
tb2 tb1
Tests of Hypotheses for the Partial
Regression Coefficients
If the regression errors i are normally distributed and the standard
least squares assumptions hold, the following tests have
significance level :
1. To test either null hypothesis
H 0 :  j  *
or H 0 :  j  *
against the alternative
H1 :  j  *
the decision rule is
Reject H 0 if
b j  *
sb j
 t n  K 1,
Tests of Hypotheses for the Partial
Regression Coefficients
(continued)
2.
To test either null hypothesis
H 0 :  j  *
or H 0 :  j  *
against the alternative
H1 :  j  *
the decision rule is
Reject H 0 if
b j  *
sb j
 t n  K 1,
Tests of Hypotheses for the Partial
Regression Coefficients
(continued)
3.
To test the null hypothesis
H 0 :  j  *
Against the two-sided alternative
H1 :  j  *
the decision rule is
Reject H 0 if
b j  *
sb j
 t n  K 1, / 2
or
b j  *
sb j
 t n  K 1, / 2
Test on All the Parameters of a
Regression Model
Consider the multiple regression model
Yi  0  1 x1i   2 x2i     K xKi   i
To test the null hypothesis
H 0 : 1   2     K  0
against the alternative hypothesis
H1 : At least one of the  j  0
At a significance level  we can use the decision rule
SSR / K
Reject H 0 if Fk,n -K -1 
 Fk ,n  K 1,
2
se
Where F K,n – K –1, is the critical value of F from Table 7 in the
appendix for which
P(Fk,n-K-1  Fk ,n  K 1, )  
The computed F K,n – K –1 follows an F distribution with numerator degrees of
freedom k and denominator degrees of freedom (n – K – 1)
Test on a Subset of the Regression
Parameters
Consider the multiple regression model
Yi   0  1 x1i     K xKi  1Z1i   r Z ri   i
To test the null hypothesis
H 0 : 1   2     r  0
That a subset of regression parameters are simultaneously equal to 0
against the alternative hypothesis
H1 : At least one of the  j  0 ( j  1,, r )
Test on a Subset of the Regression
Parameters
(continued)
We compare the error sum of squares for the complete model with
the error sum of squares for the restricted model. First run a
regression for the complete model that includes all the
independent variables and obtain SSE. Next run a restricted
regression that excludes the Z variables whose coefficients are
the ’s - - the number of variables excluded is r. From this
regression obtain the restricted error sum of squares SSE (r). The
compute the F statistic and apply the decision rule for a
significance level 
( SSE (r )  SSE ) / r
Reject H 0 if F 
 Fr ,n  K r 1,
2
se
Predictions from the Multiple
Regression Models
Given that the population regression model
Yi  0  1 x1i   2 x2i     K xKi   i
(i  1,2,, n)
holds and that the standard regression assumptions are valid. Let
b0, b1, . . . , bK be the least squares estimates of the model
coefficients, j, j = 1, 2, . . . ,K, based on the x1i, x2i, . . . , xKi, yi
(i = 1, 2, . . . n) data points. Then given a new observation of a
data point, x1,n+1, x 2,n+1, . . . , x K,n+1 the best linear unbiased
forecast of Y n+1 is
yˆ n1  b0  b1 x1,n1  b2 x2,n 1    bK xK ,n1
It is very risky to obtain forecasts that are based on X values outside the range
of the data used to estimate the model coefficients, because we do not have
data evidence to support the linear model at those points.
Quadratic Model Transformations
The quadratic function
Y   0  1 X 1   2 X  
2
1
Can be transformed into a linear multiple regression model by
defining new variables: z  x
1
1
z2  x12
And then specifying the model as
Yi   0  1 z1i   2 z2i   i
Which is linear in the transformed variables. Transformed quadratic
variables can be combined with other variables in a multiple
regression model. Thus we could fit a multiple quadratic
regression using transformed variables.
Exponential Model Transformations
Coefficients for exponential models of the form
1
2
Y  0 X1 X 2 
Can be estimated by first taking the logarithm of both sides to obtain
an equation that is linear in the logarithms of the variables:
log( Y )  log( 0 )  1 log( X 1 )   2 log( X 2 )  log(  )
Using this form we can regress the logarithm of Y on the logarithm of
the two X variables and obtain estimates for the coefficients 1, 2
directly from the regression analysis. Note that this estimation
procedure requires that the random errors are multiplicative in the
original exponential model. Thus the error term, , is expressed as a
percentage increase or decrease instead of the addition or
subtraction of a random error as we have seen for linear regression
models.
Dummy Variable Regression Analysis
The relationship between Y and X1
Y   0  1 X 1  
can shift in response to a changed condition. The shift effect can be
estimated by using a dummy variable which has values of 0
(condition not present) and 1 (condition present). All of the
observations from one set of data have dummy variable X2 = 1,
and the observations for the other set of data have X2 = 0. In
these cases the relationship between Y and X1 is specified by the
regression model
yˆ  b0  b2 x2  b1 x1
Dummy Variable Regression Analysis
(continued)
The functions for each set of points are
Yˆ  b0  b1 x1 when X 2  0
and
Yˆ  (b0  b2 x2 )  b1 x1 when X 2  1
In the first function the constant is b0, while in the second
the constant is b0 + b2. Dummy variables are also called
indicator variables.
Dummy Variable Regression for
Differences in Slope
To determine if there are significant differences in the slope
between two discrete conditions we need to expand our
regression model to a more complex form
yˆ  b0  b2 x2  (b1  b3 x2 ) x1
Now we see that the slope coefficient of x1 contains two
components, b1, and b3x2. When x2 equals 0, the slope
estimate is the usual b1. However, when x2 equals 1, the slope
is equal to the algebraic sun of b1 + b3. To estimate the model
we actually need to multiply the variables to create a new set
of transformed variables that are linear. Therefore the model
actually used for the estimation is
yˆ  b0  b2 x2  b1 x1  b3 x1 x2
Dummy Variable Regression for
Differences in Slope
(continued)
The resulting regression model is now linear with three variables.
The new variable x1x2 is often called an interaction variable.
Note that when the dummy variable x2 = 0 this variable has a
value of 0, but when x2 = 1 this variable has the value of x1.
The coefficient b3 is an estimate of the difference in the
coefficient of x1 when x2 = 1 compared to when x2 = 0. Thus
the t statistic for b3 can be used to test the hypothesis
H 0 :  3  0 | 1  0,  2  0
H1 :  3  0 | 1  0,  2  0
If we reject the null hypothesis we conclude that there is a difference in
the slope coefficient for the two subgroups. In many cases we will
be interested in both the difference in the constant and difference in
the slope and will test both of the hypotheses presented in this
section.
Key Words
 Adjusted Coefficient of
Determination
 Basis for Inference About
the Population
Regression Parameters
 Coefficient of Multiple
Determination
 Confidence Intervals for
Partial Regression
Coefficients
 Dummy Variable
Regression Analysis
 Dummy Variable
Regression for
Differences in Slope
 Estimation of Error
Variance
 Least Squares Estimation
and the Sample Multiple
Regression
 Prediction from Multiple
Regression Models
 Quadratic Model
Transformations
Key Words
(continued)
 Regression Objectives
 Standard Error of the
Estimate
 Standard Multiple
Regression Assumptions
 Sum of Squares
Decomposition and the
Coefficient of
Determination
 Test on a Subset of the
Regression Parameters
 Test on All the
Parameters of a
Regression Model
 Tests of Hypotheses for
the Partial Regression
Coefficients
 The Population Multiple
Regression Model