Inference for the Regression Coefficient
Download
Report
Transcript Inference for the Regression Coefficient
Inference for the Regression Coefficient
• Recall, b0 and b1 are the estimates of the slope β1 and intercept β0 of
population regression line.
• We can shows that b0 and b1 are the unbiased estimates of β1 and β0
and furthermore that b0 and b1 are Normally distributed with means
β1 and β0 and standard deviation that can be estimated from the data.
• We use the above facts to obtain confidence intervals and conduct
hypothesis testing about β1 and β0.
STA 286 week 13
1
CI for Regression Slope and Intercept
• A level 100(1-α)% confidence interval for the intercept β0 is
b0 t n2; / 2 SEb0
where the standard error of the intercept is
1
x2
SEb0 s
n x i x 2
• A level 100(1-α)% confidence interval for the slope β1 is
b1 tn2; / 2 SEb1
where the standard error of the slope is
SEb1
s
x
i
x
2
• Example ….
STA 286 week 13
2
Significance Tests for Regression Slope
• To test the null hypothesis H0: β1 = 0 we compute the test statistic
t
b1
SEb1
• The above test statistic has a t distribution with n-2 degrees of
freedom. We can use this distribution to obtain the P-value for the
various possible alternative hypotheses.
• Note: testing the null hypothesis H0: β1 = 0 is equivalent to testing
the null hypothesis H0: ρ = 0 where ρ is the population correlation.
STA 286 week 13
3
Example
• Refer to the heart rate and oxygen example….
STA 286 week 13
4
Confidence Interval for the Mean Response
• For any specific value of x, say x0, the mean of the response y in this
subpopulation is given by: μy = β0 + β1x0.
• We can estimate this mean from the sample by substituting the leastsquare estimates of β0 and β1: ˆ y b0 b1 x0 .
• A 100(1-α)% level confidence interval for the mean response μy
when x takes the value x0 is
ˆ y t n2; / 2 SEˆ
where the standard error of ˆ is
x0 x
1
SEˆ s
n xi x 2
2
STA 286 week 13
5
Example
• Data on the wages and length of service (LOS) in months for 60
women who work in Indiana banks.
• We are interested to know how LOS relates to wages. The Minitab
output and commands are given in a separate file.
STA 286 week 13
6
Prediction Interval
• The predicted response y for an individual case with a specific value
x0 of the explanatory variable x is: yˆ b0 b1 x0
• A useful prediction should include a margin of error to indicate its
accuracy.
• The interval used to predict a future observation is called a
prediction interval.
• A 100(1-α)% level prediction interval for a future observation on the
response variable y from the subpopulation corresponding to x0 is
yˆ t n2 ; / 2 SEyˆ
where the standard error of yˆ is
x0 x
1
SEyˆ s 1
n xi x 2
2
STA 286 week 13
7
Example
• Calculate a 95% PI for the wage of an employee with 3 years
experience (i.e. LOS=36).
• Calculate a 90% PI for the wage of an employee with 3 years
experience (i.e. LOS=36).
STA 286 week 13
8
Analysis of Variance for Regression
• Analysis of variance, ANOVA, is essential for multiple regression and
for comparing several means.
• ANOVA summarizes information about the sources of variation in
the data. It is based on the framework of DATA = FIT + RESIDUAL.
• The total variation in the response y is expressed by the deviations
yi y
• The overall deviation of any y observation from the mean of the y’s
can be split into two main sources of variation and expressed as
yi y yˆ i y yi yˆ i
STA 286 week 13
9
Sum of Squares
• Sum of squares (SS) represent variation presented in the responses.
They are calculated by summing squatted deviations. Analysis of
variance partition the total variation between two sources.
• The total variation in the data is expressed as SST = SSM + SSE.
• SST stands for sum of squares for total it is given by...
• SSM stands for sum of squares for model it is given by...
• SSE stands for sum of squares for errors it is given by ...
• Each of the above SS has degrees of freedom associated with it. The
degrees of freedom are…
STA 286 week 13
10
Coefficient of Determination R2
• The coefficient of variation R2 is the fraction of variation in the
values of y that is explained by the least-squares regression. The SS
make this interpretation precise.
• We can show that
R2
SSM
SSE
1
SST
SST
• This equation is the precise statement of the fact that R2 is the
fraction of variation in y explained by x.
STA 286 week 13
11
Mean Square
• For each source, the ratio of the SS to the degrees of freedom is
called the mean square (MS).
• To calculate mean squares, use the formula
MS
sum of squares
degrees of freedom
STA 286 week 13
12
ANOVA Table and F Test
• In the simple linear regression model, the hypotheses H0: β1 = 0 vr
H1: β1 ≠ 0 are tested by the F statistic.
• The F statistic is given by
F
MSM
MSE
• The F statistic has an F(1, n-2) distribution which we can use to find
the P-value.
• Example…
STA 286 week 13
13
Residual Analysis
• We will use residuals for examining the following six types of
departures from the model.
The regression is nonlinear
The error terms do not have constant variance
The error terms are not independent
The model fits but some outliers
The error terms are not normally distributed
One or more important variables have been omitted from the
model
STA 286 week 13
14
Residual plots
• We will use residual plots to examine the aforementioned types of
departures. The plots that we will use are:
Residuals versus the fitted values
Residuals versus time (when the data are obtained in a time
sequence) or other variables
Normal probability plot of the residuals
Histogram, Stemplots and boxplots of residuals
STA 286 week 13
15
Example
• Below are the residual plots from the model predicting GPA based
on SAT scores….
STA 286 week 13
16