Chapter 12 Section 1
Download
Report
Transcript Chapter 12 Section 1
Chapter 12 Section 1
Inference for Linear Regression
Inference for Linear
Regression
Students will be able
to check conditions for performing inference about
the slope (beta) for the population (true) regression
line.
to interpret computer output from a least-squares
regression analysis
to construct and interpret a confidence interval for
the slope (beta) of the population (true) regression
line.
to perform a significance test about the slope (beta)
of a population (true) regression line.
Inference for Linear
Regression
Observing the scatter plot on pp. 739, the line that is
draw out is known as the population regression
line due to it using all the observations.
If we take sample size out of the population (still
use the equation (y (phat) = a + bx) for the sample
regression line.
More than likely the slope of the sample will vary on
your choice of samples. The pattern of variation in the
slope b is described by its sampling distribution.
Sampling Distribution of b
Confidence intervals and significance
tests about the slope of the population
regression line is based upon the
sampling distribution of b, the slope of
the sample regression line.
Sampling Distribution of b
Describing the approximate sampling distribution:
Shape – a strong linear pattern in the graph tells that
the approximate sampling distribution is close to
Normal.
Center – calculate the mean: as long as the mean of the
sample is close to the mean of the population, then
you are good.
Spread – calculate the standard deviation: same rules
of the center applies
Conditions for Regression
Inference
Conditions:
Linear – the actual relationship between x and y is linear.
For any fixed value of x, the mean response (mhew), falls
on the population (true) regression line mhewx = alpha +
betax. The slope beta and intercept alpha are usually
unknown parameters.
Independent – individual observations are independent of
each other (one does not effect the other)
Normal – for any fixed value of x, the response y varies
according to a normal distribution.
Equal variance – the standard deviation of y (call it sigma)
is the same for all values of x. The common standard
deviation sigma is usually an unknown parameter.
Random – the data come from a well-designed random
sample or randomized experiment.
Conditions for Regression
Inference
Regression model tells us: a linear regression tells us
whatever x does it concludes with a predicted y
value.
**** Remember to always check conditions before
doing inference about the regression model.
Take a look at example on pp. 743 - 744
Estimating the Parameters
When conditions are met, we can proceed to calculating
the unknown parameters.
If we calculate the least-square regression line, the slope
is an unbiased estimator of the true slope and the y
intercept is an unbiased estimator of the true y intercept.
The remaining parameter is the standard deviation
(sigma), which describes the variability of the response y
about the population (true) regression line.
Residuals estimate how much y varies about the
population line. The standard deviation of responses
about the population regression line, we estimate
standard deviation using the formula at the top of page
745
Estimating the Parameters
Take a look at example on pp. 745
It is possible to do inference about any of the three
parameters. However, the slope (beta) is usually the
most important parameter in a regression problem.
So try to stick with that one.
Sampling Distribution of b
For spread – since we do not know the standard
deviation, then we estimate using the standard
deviation of the residuals. Then we estimate the
spread of the sampling distribution of b with the
standard error of the slope (formula on pp. 746)
If we transform to a formula we use (the middle
formula on pp. 746) which translates to the last
formula (use this one). Now when calculating the
degrees of freedom take the “n” value and subtract
2 from it (we use 2 instead of 1 – explanation is
deeper and complicating).
Constructing a Confidence
Interval for the Slope
The slope (beta) is the rate of change of the mean
response as the explanatory variable increases.
Mhew x = alpha + beta x
A confidence interval is more useful than the point
estimate because it shows how precise the estimate
b is likely to be.
(Statistic) plus/minus (critical value) * (standard
deviation)
B plus/minus t * Seb
Take a look at yellow box on pp. 747 and example on
pp. 747-748
Performing a Significance
Test for the Slope
Null hypothesis has the general form H0 : beta =
hypothesized value.
To do the test:
Test statistic = (statistic – parameter) / (standard
deviation of statistic)
T = b – beta0 / Seb
To find p-value, use t distribution with n – 2
Take a look at yellow box on pp. 751
Take a look at the remainder of the examples in this
section for clarification