403: Quantitative Business Analysis for Decision Making
Download
Report
Transcript 403: Quantitative Business Analysis for Decision Making
Quantitative Business Analysis for
Decision Making
Simple Linear
Regression
Lecture Outlines
Scatter Plots
Correlation Analysis
Simple Linear Regression Model
Estimation and Significance Testing
Coefficient of Determination
Confidence and Prediction Intervals
Analysis of Residuals
403.7
2
Regression Analysis ?
Regression analysis is used for modeling
the mean of “response” variable Y as a
function of “predictor” variables X1, X2,..,
Xk.
When K = 1, it is called simple regression
analysis.
403.7
3
Random Sample
Y: Response Variable,
X: Predictor Variable
For each unit in a random sample of n, the pair
(X, Y) is observed resulting a random sample:
(x1, y1), (x2, y2),... (xn, yn)
403.7
4
Scatter Plot
Scatter Plot is a graphical displays of the
sample (x1, y1), (x2, y2),... (xn, yn) by n
points in 2-dimension.
It will suggest if there is a relationship
between X and Y
403.7
5
A Scatter Plot Showing Linear Trend
A Scatter Plot Showing Linear Trend
of Peoples Ratings and Nielsen Ratings
PeopleM
25
20
15
16
21
26
Nielsen
403.7
6
A Scatter Plot Showing No Linear
Trend
A Scatter Plot Showing No Linear Trend
of Today's With Yesterday's DJIA
Yesterda
1
0
-1
-1
0
1
Today
403.7
7
Modeling linear Trend
A perfect linear relationship between Y
and X exists if Y X .
Coefficient of X is the slope--quantifying
the amount of change in y corresponding
to one unit change in x.
There are no perfect linear relationships
in practical world.
403.7
8
Simple Linear Regression Model
Model:
Y X
X is linear function (nonrandom)
is random error. It is assumed to be
normally distributed mean 0 and standard
deviation . So y X
, and are parameters of the model
403.7
9
Estimation
Simple linear regression analysis estimates the mean of
Y (linear trend) y X by yˆ a bx
a y bx
and
( x x )( y y )
b
(x x)
2
403.7
10
Standard deviation
Standard deviation (s) of the sample of
n points in the scatter plot around the
estimated regression line yˆ a bx
is:
s
2
ˆ
y y
n2
403.7
11
Testing the Slope of Linear Trend
For Testing
H 0 : 0 vs. H a : 0
compute t-statistic and its p value:
b - 0
t - statistic
sb
403.7
12
Coefficient of Determination: R2
A quantification of the significance of
estimated model yˆ a bx is denoted by
R2.
R2 > 85% = significant model
R2 < 85% = model is perceived as
inadequate
Low R2 will suggest a need for additional
predictors for modeling the mean of Y
403.7
13
Correlation Coefficient: r
The correlation coefficient r is the square
root of R2. It is a number between -1 and 1.
– Closer r is to -1 or 1, the stronger is the
linear trend
– Its sign is positive for increasing trend
(slope b is positive)
– Its sign is negative for decreasing trend
(slope b is negative)
403.7
14
Confidence and Prediction
Intervals
To estimate y x by a confidence
interval, or to predict response Y
corresponding to its predictor value x = x0
– 1. Compute: yˆ a bx0
– 2. compute:
yˆ s.e. yˆ
403.7
15
What is s.e. yˆ ?
i.e. Standard Error of
For estimating
ŷ
y,
2
1
( x x0 )
s.e.( yˆ ) s
n (x x)2
For Predicting Y,
2
1
( x x0 )
s.e.( yˆ ) s 1
n (x x)2
403.7
16
Analysis of Residuals
Residuals are defined:
ei yi yˆ i , i 1, 2,....n
Residual analysis is used to check the
normality and homogeneity of variance
assumptions of random errors .
Histogram or box plot of residuals will
help to ascertain if errors are
normally distributed.
403.7
17
Analysis of Residuals (con’t)
Plot of residual e i against observed
predictor values xi will help ascertain
homogeneity assumption.
– random appearance = homogeneity of
variance assumption is valid.
– non-random appearance =homogeneity
assumption is not valid and variance is
dependent on predictor values.
403.7
18