12.1a Inference about the Model

Download Report

Transcript 12.1a Inference about the Model

Chapter 12
Inference for Linear Regression
Target Goals:
I can make predictions using regression for normal
distributions.
I can check conditions for performing inference
about the slope β of the population (true) regression
line.
12.1a
h.w: pg 759: 1 – 11 odd
Inference about the Model
We can use LSRL fitted to data to predict
y for a given value of x for two
quantitative variables.
 Now we will do tests and construct
confidence intervals in this setting.

Pg. 752
Ex. Crying and IQ
Infants who cry easily may be more
easily stimulated than others and this
may be a sign of higher IQ.
 The researchers snapped a rubber band
on the sole of the foot of infants and
caused the infants to cry.
 At age 3 years the measured IQ.

Step 1: Make a scatterplot of the
data.




Explanatory variable: Crying
Response variable: IQ
Enter “crying” data into L1 and “IQ” data into L2.
Plot and Interpret. STAT:CALC:LinReg(a+bx) L1,L2,Y1



Y1:(VARS:Y-VARS:FUNCT:Y1)
Scatterplot shows a roughly linear pattern.
The correlation r describes the direction and strength
of the relationship.
Step 2: Calculate the LSRL
Step 3: Identify outliers and
influential points
Influential points
 Outliers

•
No extreme outliers or potentially influential
observations.
Step 4: Calculate the Correlation
(r value)

The correlation between crying and IQ is
r = 0.455.
yˆ  91.27  1.493 x
yˆ reminds us that the regression
line gives predictions of IQ.
Interpret r2 = 0.207,
 only about 21% of the variation in IQ
scores (response variable) is explained by
crying intensity.
 r2
is called the coefficient of determination.
 Is prediction of IQ accurate with this
model? No
 It
is interesting though that
behavior shortly after birth can
partly predict IQ.
How long it will take before Old Faithful erupts again
based on the duration
of the previous eruption.
Conditions
for Regression
Inference
3 SRSs of 20 Old Faithful Eruptions
The values of the slope b for the 1000
sample regression lines are plotted.
Pg. 742
Conditions for Regression
Inference
1)

Our goal is to predict the behavior of y
for a given value of x.
Linear: The y responses for various
samples vary according to a normal
distribution.
The mean response μy has a straight-line
relationship with x.
The true regression line is written in the
form:      x
y
y     x
where μy is the mean response,
and  is the true y-intercept and β is
the true slope.
2)
3)
Independent: The y responses are
independent of each other.
Normal: for any fixed value of x, the
observed response value y varies
according to a normal distribution having
mean μy.
Equal Variance: The standard
deviation s about the true regression
line is the same for all values of x.
(constant).
It is usually an unknown parameter.
5) Random: The data come from a
well designed random sample or
randomized experiment.
4)
Linear
 Independent
 Normal
 Equal Variance
 Random


The LSRL : ŷ = a + b x where b is an
unbiased estimator of the true slope β and a is
the unbiased estimator of the true intercept .

The line is the true regression line,
which shows how the mean response
μy changes as the explanatory
variable x changes.
Standard Deviation
σ determines whether the points fall close to
the true regression line (small σ) or are
widely scattered (large σ).
 This is also the size of a typical prediction
error if we use the least-squares regression
line to predict “how long it will take before Old
Faithful erupts again” based on the duration
of the previous eruption.

Ex: Slope and Intercept

The LSRL is ŷ = 91.27 + 1.493x
IQ
crying peak
The slope measures rate of change:
how much higher average IQ is for
children with one more peak in their
crying measurements.
 b est. the unknown β; we est. that on
the average IQ is about 1.5 points
higher for each additional crying peak.

Standard Deviation
σ describes the variability of the
response y about the true regression
line.
 Recall that residuals estimate how
much y varies about the true line and
are the vertical deviations of the data
points from the least-square line:
 Residual = observed y – predicted y

Standard Error about the LSRL
We estimate σ with s, the sample standard
deviation, which is also called the standard
error (this is the key to inference about the
regression).
 Since σ is unknown, we use s to estimate the
value of σ.

s

2
RESID

n2
Note: (n – 2) is the degrees of freedom for the
regression model.
Ex. Calculating Residuals and
Standard Error
The quickest way to do this is to: (use ex 14.1 data).
Enter “crying” data into L1 and “IQ” data into
L2. (We already did this.)
 Recall: LINREG (a+bx) automatically
calculates the residuals and stores them in
“Resid.”
 Store “Resid” in L3
 STAT:CALC:1-Var Stats L3

∑ resid2
To find s, first find s2:
•
To find s2:
Enter the value of ∑X2 by hand or (VARS:5: : ∑X2 )
and divide by (n-2)

Take sqrt to find s.

s
2
RESID

n2
A level C confidence interval for the
slope b of the true regression line is
b  t * SEb
SEb 
s
 x  x 
2
 You
will rarely have to calculate this
by hand.
 Regression software gives you the
standard error SE b and b itself.
Ex. Regression Output: Crying
and IQ
Statistic
Do
There are 38 data points so
df = n – 2 = 36.
 Find the critical value t* (critical value).
For a 95% C.I. for true slope b, use
critical value t* = 2.042 with df =30
from table C.

b  t * SEb  1.4929  2.042  .4870
 0.4985 to 2.4873
Conclude
 We
are 95 % confident that mean
IQ increases by, between 0.5 and
2.5 points, for each additional peak
in crying.
Interpret SEb


Seb estimates how much the slope of the sample
regression line typically varies from the slope of the
population (true) regression line if we repeat the
data production process many times.
If we repeated the experiment many times, the
slope the slope of the sample regression line would
typically vary by about .4870 from the slope of the
true regression line for predicting IQ from cry count
of infants.