403: Quantitative Business Analysis for Decision Making

Download Report

Transcript 403: Quantitative Business Analysis for Decision Making

Quantitative Business Analysis for
Decision Making
Multiple Linear
Regression
Analysis
Outlines
Multiple Regression Model
 Estimation
 Testing Significance of Predictors
 Multicollinearity
 Selection of Predictors
 Diagnostic Plots

403.8
2
Multiple Regression Model
Multiple linear regression model:
Y   0  1 X 1   2 X 2 ....   k X k  
1 ,  2 ,.... k are slope coefficients of
X1, X2 ,… ,Xk.
 i quantifies the amount of change in
response Y for a unit change in Xi when
all other predictors are held fixed.
403.8
3
Multiple Regression Model
(con’t)
In the model,
 y   0   1 X 1   2 X 2  ......   k X k
is the mean of Y.
–
 Contributes to the variation in Y values
from their mean  y, and
–  is assumed normally distributed with
mean 0 and standard deviation 
403.8
4
Sampling
A random sample of n units is taken. Then for
each unit k+1 measurements are made:
Y, X1 , X2 , …., Xk
A multivariate sample of size n
Unit
Response Y Predictor X1 Predictor X2
Predictor Xk
1
2
Y1
Y2
X11
X21
X12
X22
X1k
X2k
n
Yn
Xn1
Xn2
Xnk
403.8
5
Estimated Model
Estimated multiple regression model is:
Yˆ  b0  b1 X 1  b2 X 2  .......  bk X k
Expressions for bi are cumbersome to
write. Yˆ is an estimate of  y
403.8
6
Standard Error
Sample standard deviation around the mean
(estimated regression model) is:
s

Yˆ

 Y  Yˆ
n  k 1
2
It is an estimate of 
Standard error of Yˆ (for specified values of
predictors) is denoted by s yˆ
403.8
7
Testing Significance of a
Predictor
For comparing  i with a reference  i 0 ,test
statistic is:
bi   i 0
t
s bi
and for estimating  i by a confidence
interval,
compute
bi  t sbi 
403.8
8
Coefficient of Determination
Coefficient of determination R2 quantifies the % of
variation in the Y-distribution that is accounted by the
predictors in the model. If
– R2 = 80%, then 20% variation in the Y-distribution
is due to factors other than those in the model.
– R2 increases as predictors are added in the model
but at the cost of complicating it.
403.8
9
Testing the Model for
Significance
Null hypothesis = predictors in the relationship have no
predictive power to explain the variation in Ydistribution
H 0 : 1   2  ....   k  0 vs. H1 : at least one of  i  0
(n  k  1) R 2
Test statistic: F =
. It has
2
k (1  R )
F- distribution with k and (n-k-1) degrees of
freedoms for the numerator and denominator.
403.8
10
Multicollinearity and Selection
of Predictors

Multicollinearity - occurs when predictors are highly
correlated among themselves. In its presence R2 may be high,
but individual coefficients are less reliable.
Screening process (e.g. stepwise regression) can eliminate
multicollinearity by selecting only those predictors that are not
strongly correlated among themselves.

403.8
11
Diagnostic Plots

Residuals ei  Yi  Yˆi are used to diagnose the
validity of the model assumptions.

A scatter plot of the residuals Yˆi against the
predicted values can serve as a diagnostic tool.
A diagnostic plot can identify outliers, unequal
variability, and need for transformation to achieve
homogeneity etc.

403.8
12
Indicator Variables
Indicator variables (also called dummy variables) are
numerical codes that are used to represent qualitative
variables.


For example, 0 for men and 1 for women.

For a qualitative variable with c categories, (c-1)
indicator variables need to be defined.
403.8
13