Regression Inference
Download
Report
Transcript Regression Inference
Simple Linear Regression and
Correlation: Inferential Methods
Chapter 13
AP Statistics
Peck, Olsen and Devore
Topic 2: Summary of Bivariate Data
In Topic 2 we discussed summarizing
bivariate data
Specifically we were interested in
summarizing linear relationships between
two measurable characteristics
We summarized these linear relationships
by performing a linear regression using
the method of least squares
Least Squares Regression
Graphically display the data in a scatterplot
Calculate the Pearson’s Correlation Coefficient
Determine if the model is appropriate
No patterns
Determine the Coefficient of Determination
yˆ a bx
Inspect the residual plot
The strength of the linear association
Perform the least squares regression
Form, strength and direction
How good is the model as a prediction tool
Use the model as a prediction tool
Interpretation
Pearson’s correlation coefficient
Coefficient of Determination
Variables in yˆ a bx
Standard deviation of the residuals
Minitab Output
Simple Linear Regression Model
‘Simple’ because we had only one independent variable yˆ a bx
We interpreted
as a predicted value of y given a specific
value of x
When y f (x) we can describe this as a deterministic
model. That is, the value of y is completely determined by a
given value x
That wasn’t really the case when we used our linear
regressions. The value of y was equal to our predicted value
+/- some amount. That is, y a bx e
We call this a probabilistic model.
So, without e, the (x,y) pairs (observed points) would fall on
the regression line.
yˆ
Now consider this …
How did we calculate the coefficients in our
linear regression models?
We were actually estimating a population
parameter using a sample. That is, the
simple linear regression y a bx e is an
estimate for the population regression line
y x e
We can consider a, b
estimates for ,
Basic Assumptions for the Simple
Linear Regression Model
The distribution of e at any particular value
of x has a mean value of 0. That is, e 0
The standard deviation of e is the same for
any value of x. Always denoted by
The distribution of e at any value of x is
normal
The random deviations are independent.
Another interpretation of yˆ
Consider , y x e where the
coefficients are fixed and e is distributed
normally. Then the sum of a fixed number
yˆ
and a normally distributed
variable is
normally distributed (Chapter 7). So y is
normally distributed.
Now the mean of y will be equal to x
plus the mean of e which is equal to 0
So another interpretation is the mean y
value for a given x value = x
Distribution of y
Where y x e we can now see that
y is distributed normally with a mean of x
The variance for y is the same as the
2
variance of e -- which is
2
2
An estimate for is se
Assumption
The major assumption to all this is that
the random deviation e is normally
distributed.
We’ll talk more about how this assumption
is reasonable later.
Inferences about the slope of the
population regression line
Now we are going to make some
inferences about the slope of the
regression line. Specifically, we’ll
construct a confidence interval and then
perform a hypothesis test – a model utility
test for simple linear regression
Just to repeat …
We said the population regression model is
y x e
The coefficients of this model are fixed but
unknown (parameters) – so using the method
of least squares, we estimate these
parameters using a sample of data (statistics)
and we get
y a bx e
Sampling distribution of b
We use b as an estimate for the
population coefficient
in the simple
regression model
b is therefore a statistic determined by a
random sample and it has a sampling
distribution
Sampling distribution of b
When the four assumptions of the linear
regression model are met
The mean value of the sampling distribution of b
is . That is,
b
The standard deviation of the statistic b is
b
x X
2
The sampling distribution of b is normally
distributed.
Estimates for …
The estimate for the standard deviation of b is
sb
se
x X
2
When we standardize b it has a t distribution
with n-2 degrees of freedom
b
t
sb
Confidence Interval
Sample Statistic +/- Crit Value * Std Dev of Stat
b t sb
*
Hypothesis Test
We’re normally interested in the null H o :
because if we reject the null, the data
suggests there is a useful linear relationship
between our two variables
We call this ‘Model Utility Test for Simple
Linear Regression’
0
Summary of the Test
Ho : 0
Test Statistic
HA : 0
b
t
sb
Assumptions are the same four as those for
the simple linear regression model.
Minitab Output