Transcript Slide 1

Chapter 10: Simple Linear
Regression
A model in which a variable, X, explains
another variable, Y, using a linear structure,
with allowance for error, e—the unexplained
part of Y: Y = b + m*X + e
Regression Analysis
Assesses two sets of Issues:
• How well does X explain Y (Regression
Analysis)?
• Do the regression “residuals” behave like
they theoretically should? (Residuals
Analysis)?
Regression Analysis: 4 issues
1.
2.
3.
4.
R2: coefficient of determination: Evaluates the fit of the
regression line to the data. 0 ≤ R2 ≤ 1. Ideally, R2 ≈ 1.
SE: standard error of the regression. Measures the
sparseness of the actual data points from the
regression line . The SE is measured in units of Y,
and ideally, SE ≈ 0. Can also compare SE to
average(Y) and obtain a Coefficient of Variation to
assess magnitude of SE.
ANOVA Table  Significance F  pvalue for the test
of the null hypothesis that the regression line is
statistically insignificant (Ho: b=m=0 vs. Ha: m≠ 0))
Coefficient s table that reports the estimated intercept
and slope for the regression line, their respective
standard errors, test statistics and also p-values for
the numeric significance (Ha: slope, m ≠ 0, and Ha:
intercept b≠ 0), versus H0: m=0 and H0: b=0,
respectively.
Regression Statistics: Coefficient of Determination, r2, and Standard Error
Chapter 10,
Regression Analysis
r2 
SSR Regression Sum of Squares

SST
Total Sum of Squares
ANOVA
 Y  Yˆ 
n
SYX 
Y?  Y  e
SSE

n2
n2
ANOVA
df
SS
MS
F
Significance F
Regression
k
SSR
MSR
=SSR/k
MSR/MSE
P-value of
the F Test
Residuals
n-k-1
SSE
MSE
=SSE/(n-k-1)
Total
n-1
SST
Estimate to perform Regression Analysis using Least Squares
Assumptions:
2 Equations to solve for 2
unknowns: intercept b0, and slope
b1
Unbiased Explanation:
Se = 0
Y  b0  b1 X
Explanatory Factor, X,
uncorrelated with e:
SX*e = 0
 X  Y  b   X  b  X
Coeff.
table
i 1
ANOVA
0
2
i
1
2
Residuals Analysis: 3 issues
1.
2.
3.
Normality of residuals requires that we
construct a histogram of the residuals, or a
Box-Whisker Plot of the residuals, or that we
construct a Normal Probability Plot of the
residuals with the assistance of MSExcel.
The residuals plot should show no pattern or
regularities in the scatterplot between X and e.
Otherwise, the linear model inconsistently
explains Y as a function of X, and a nonlinear
function of X would better explain Y.
Autocorrelation of the residuals can be tested
by using excel to compute the Durbin-Watson
statistic from the residuals calculated by the
Regression process. 1.4 ≤ DW,≤ 2.6 for no
significant autocorrelation
1. Checking for Normality of Residuals
2. Checking for Uniform Variation in Residuals
Relative to X
3. Checking for autocorrelation in residuals
Durbin-Watson Calculations
Sum of Squared Difference of Residuals
2123665.578
Sum of Squared Residuals
870949.4547
Want this value to be “close” to 2.00
Durbin-Watson Statistic
2.438333897