Transcript PPT

Regression
y    x
Weight
What would
you
expect
How much
for
other
could
weigh
This She
distribution
would an
adult
heights?
varying
amounts
is normally
female
weigh
60
60
60
62
62
60
62
64
64
64
66
66
68
68
Height
– in other words,
distributed.
ifthere
sheiswere
5
a
(wedistribution
hope)
feet tall?
of
What
about
64
62 We
want
the 68
weights
for 66adult
the
standard
Where
would
are
66females
68 who
standard
deviations
deviations
of
expect
feet
tall.
of all5you
these
normal
these
theall
TRUE
distributions
to be
normal
LSRL
to be?
the same.
distributions?
Regression Model
• The mean response my has a straight-line
relationship with x:  y    x
– Where: slope b and intercept a are unknown
parameters
• For any fixed value of x, the response y
varies according to a normal distribution.
Repeated responses of y are independent
of each other.
• The standard deviation of y (sy) is the
same for all values of x. (sy is also an
unknown parameter)
Person #
Ht
Wt
1
64
130
10
64
175
15
64
150
19
64
125
21
64
145
40
64
186
47
64
121
60
64
137
63
64
143
68
64
120
70
64
112
78
64
108
83
64
160
Suppose we look at
part of a population
of adult women.
These women are all
64 inches tall.
What distribution
does their weight
have?
We use
yˆ  a  bx
to estimate  y    x
• The slope b of the LSRL is an unbiased
estimator of the true slope b.
• The intercept a of the LSRL is an unbiased
estimator of the true intercept a.
• The standard error s is an unbiased
estimator of the true standard deviation
of y (sy).

y  yˆ

s 

n  2
2
Note:
df = n-2
 residuals
n 2
2
Notes!
Student
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# of
beers
5
2
9
8
3
7
3
5
3
5
4
6
5
7
1
4
BAC
.10
.03
.19
.12
.04
.095
.07
.06
.02
.05
.07
.10
.085
.09
.01
.05
For a study on student drinking and blood alcohol
level, sixteen student volunteers at Ohio State
University drank a randomly assigned number of
cans of beer. Thirty minutes later, a police
officer measured their blood alcohol content
(BAC). The results are show below:
Use your calculator to find a regression equation
(ax + b) for this data. State your equation using
descriptive notation.
What does the value a represent in the context
of this problem?
We would like to create a confidence interval for the slope
of the regression line.
In other words,
we want to know
.
Conditions for regression inference
• For any fixed value of x, the response
variable y varies normally about the
true regression line.
– Check a histogram or boxplot of residuals
• The mean response, y ,has a straight
line relationship with x  y    x
– Check the scatter plot & residual plot
• The standard deviation of y is the
same for values of x.
– Check the scatter plot & residual plot
For problems involving inference for
regression, we use a
.
Weight
What is the slope
of a horizontal
line?
60
62
64
Height
Suppose the
LSRL has a
horizontal line
–would height
be useful in
predicting
A slope of zero weight?
that
66– means
68
there is NO
relationship
between x & y!
Formulas:
• Confidence Interval:
CI  statistic  critical value SD of statistic
b t *
SEb 
SEb 
sx
SEb 
= n -2
the standard error of thedf
least
squares slope, b
s
n 1

Because there are
two unknowns  &

Interpretation:
We are 95% confident that the mean change in
BAC per beer is between ___________ and
_____________
Back to our Example: For a study on
student drinking and blood alcohol level,
sixteen student volunteers at Ohio State
University drank a randomly assigned
number of cans of beer. Thirty minutes
later, a police officer measured their
blood alcohol content (BAC). The results
are show below:
a)Find the LSRL, correlation coefficient,
and coefficient of determination.
BAC = -.0127 + 0.018 (Beers)
r = 0.8943
r2 = 0.7998
b) Explain the meaning of slope in the
context of the problem.
There is approximately 1.8% increase
in BAC for every Beer
c) Explain the meaning of the coefficient
of determination in context.
Approximately 80% of the variation in
BAC can be explained by the
regression of BAC on number of Beers
drunk.
d) Estimate a, b, and s.
a = -.0127
b = .0180
s = .0204
s 
2
residuals

n 2
BAC
Residuals
e) Create a scatter plot, residual plot and box plot of
the residuals for the data.
Beers
Beers
Residuals
f) Give a 95% confidence interval for the true slope of the LSRL.
Assumptions:
•Have an SRS of students
•Since the residual plot is randomly scattered, BAC and # of
beers are linear
•Since the points are evenly spaced across the LSRL on the
scatterplot, sy is approximately equal for all values of BAC
•Since the boxplot of residual is approximately symmetrical,
the responses are approximately normally distributed.
Be sure to show
all graphs!
b  t * sb   0.018  2.145.0024 
(.0128, .0231)
df  14
We are 95% confident that the true slope of the LSRL
of weight & body fat is between 0.12 and 0.38.