ch 3 Regression part..
Download
Report
Transcript ch 3 Regression part..
Simple Linear Regression
Start by exploring the data
Construct a scatterplot
Does a linear relationship between variables
exist?
Is the relationship strong?
How much variation can be explained by a linear
relationship with the independent or explanatory
variable?
Beers and BAC
Regression Plot
BAC = -0.0127006 + 0.0179638 Beers
S = 0.0204410
R-Sq = 80.0 %
R-Sq(adj) = 78.6 %
BAC
0.2
0.1
0.0
1
2
3
4
5
Beers
6
7
8
9
Variance “Candy Bar”
Explained
Unexplained
•The R-sq value: estimates the percentage of
variation explained by a linear relationship with the
independent or explanatory variable. Unless this
estimate is 100% (or very near), it is not sufficient
on its own.
•The amounts of explained and unexplained
information due to the model are measured by
Sums of Squares
Decomposition of information into
explained and unexplained parts
Residuals
A residual is the difference between an
observed value of the dependent variable
and the value predicted by the regression
line.
Residual = (observed y) - (predicted y)=
y–ŷ
They help us assess the fit of a regression
line.
Variance “Candy Bar”
Explained
Unexplained
2
2
2
ˆ
ˆ
( y y ) ( y y) ( y y )
SS explained
by model
SS Error
SS Total
Systematic SS + Random SS = Total SS
Model Assumptions about the residuals (ε)
The distribution is NORMAL
The mean is ZERO
The variance is CONSTANT for all values of x
(σ2)
Errors associated with any two observations are
independent
Assessing the utility of the model:
model variance
Variance is variability of the random error (σ2)
The higher the variability of the random error, the
greater the error of prediction
σ2 is estimated with s2 (often called the mean square
for error, MSE)
Variance: s2= SSE/degrees of freedom (n-2)
Standard error: s s 2
This is like standard deviation; with standard error, we are
looking at deviation from the line
Approximately 95% of observed y values will lie within 2s of
their respective predicted values
Assessing the utility of the model: Slope
Does y change as x changes? Does x
contribute information for the prediction
of y?
H 0 : 1 0
H a : 1 0
b1
t
Test this with the t-statistic
SEb1
or p-value (p<.05); these values are
included in software output
Assessing the utility of the model:
Correlation Coefficient r
Measure of the strength and direction of the
linear relationship between x and y
Always between -1 and +1
High correlation does not imply causality
Assessing the utility of the model:
Coefficient of Determination (r2)
The R squared value is the % of the variation in y
explained by the model.
Explained sample variabilit y SS yy SSE
r
Total sample variabilit y
SS yy
2
For linear regression, the higher the value, the
better the model.
Using the model for estimation and
prediction: Confidence interval for mean
response
For any specific value of x:
y b0 b1 x*
A confidence interval for SÊ adds to this
estimate a margin of error based on the
standard error .
Confidence intervals widen as the value of x is
further from its mean.
Confidence interval for mean response
Regression Plot
BAC = -0.0127006 + 0.0179638 Beers
S = 0.0204410
R-Sq = 80.0 %
R-Sq(adj) = 78.6 %
BAC
0.2
0.1
Regression
95% CI
0.0
1
2
3
4
5
Beers
6
7
8
9
Prediction interval for a future observation
Similar to confidence interval for mean
response
Standard error SE yˆ used in prediction
interval includes
Variability due to the fact that the leastsquares line is not exactly equal to the true
regression line
Variability of the future response variable y
around the subpopulation mean.
Prediction interval for a future observation
Regression Plot
BAC = -0.0127006 + 0.0179638 Beers
S = 0.0204410
R-Sq = 80.0 %
R-Sq(adj) = 78.6 %
BAC
0.2
0.1
Regression
0.0
95% CI
95% PI
1
2
3
4
5
Beers
6
7
8
9
In the MINITAB regression window, you
might want to…
Set confidence levels in Options
Enter a value for prediction in Options
Store Residuals and Fits in Storage
Display full table of fits and residuals in
Results (select last bullet)
Beware of Extrapolation
Extrapolation is the use of a regression line
for prediction far outside the range of values
of the independent variable x that you used to
obtain the line. Such predictions are not
accurate.
Example from book: p. 138
How can we tell if it is reasonable to fit a
linear regression model?
Let’s run the analysis and interpret the results