3.2A - mcdonaldmath

Download Report

Transcript 3.2A - mcdonaldmath

Bellwork (Why do we want scattered
residual plots?): 10/2/15
• I feel like I didn’t explain this well, so this is residual plots redux!
• Copy down these two scatterplots
3.2B&C
LSRL—stdev of the
2
Residuals & r
If she loves you
more each and
every day,
by linear regression
she hated you
before you met.
Objectives:
CALCULATE residuals 
CONSTRUCT and INTERPRET
residual plots 
DETERMINE how well a line fits
observed data
A scatterplot of home price in
thousands of dollars vs. home size
in thousands of square feet shows a
relatively linear, positive
association with r = 0.85. There are
no unusual points and the residual
plot shows random scatter.
1. If a home size is one stdev
above the mean home size, how
many stdev above the mean
would you expect the sale price
to be?
1.
If a home size is one stdev above the
mean home size, how many stdev above
the mean would you expect the sale
price to be?
𝑠𝑦
𝑠𝑙𝑜𝑝𝑒 = 𝑟 ∙
So, I would expect it’s
sale price to be 0.85
stdev above the mean.
𝑠𝑥
𝑠𝑦
= 0.85 ∙
1
= 0.85 ∙ 𝑠𝑦
2. What would you predict about
the sale price of a home 2 SD below
the average size?
2. What would you predict about the
sale price of a home 2 SD below the
average size?
𝑠𝑦
𝑠𝑙𝑜𝑝𝑒 = 𝑟 ∙
𝑠𝑥
So, I would expect its
sale price to be 1.70
stdev below the mean.
𝑠𝑦
= 0.85 ∙
−2
= −1.70 ∙ 𝑠𝑦
Consider the linear regression:
𝒑𝒓𝒊𝒄𝒆 = 𝟗. 𝟓𝟔𝟒 + 𝟏𝟐𝟐. 𝟕𝟒(𝒔𝒊𝒛𝒆)
3. What are the units of the slope?
Consider the linear regression:
𝒑𝒓𝒊𝒄𝒆 = 𝟗. 𝟓𝟔𝟒 + 𝟏𝟐𝟐. 𝟕𝟒(𝒔𝒊𝒛𝒆)
4. Interpret the slope.
Consider the linear regression:
𝒑𝒓𝒊𝒄𝒆 = 𝟗. 𝟓𝟔𝟒 + 𝟏𝟐𝟐. 𝟕𝟒(𝒔𝒊𝒛𝒆)
5. By how much would the value of
my home increase after the
addition of 500 sq ft?
Consider the linear regression:
𝒑𝒓𝒊𝒄𝒆 = 𝟗. 𝟓𝟔𝟒 + 𝟏𝟐𝟐. 𝟕𝟒(𝒔𝒊𝒛𝒆)
6. How much would you expect to pay
for a 3000 sq ft home?
Homework Comments
•
•
•
•
•
If form isn’t linear, what is it?
Correlation vs Association
Form, strength (r), direction, outliers
Only use correlation/scatter plots for quantitative data
r has no units
Let’s go back to the Handspan vs.
Height activity
2
r
Computer Output
BAC
Computer Output
Scatterplot and Residual Plot
Questions
• What is the equation of the least-squares regression line that
describes the relationship between beers consumed and
blood alcohol content?
• Interpret the slope of the regression line in context
• Find the correlation
• Is a line an appropriate model to use for these data? Explain
• What was the BAC reading for the person who consumed 9
beers?
• How would we interpret the 𝑟 2 value?
Definition:
If we use a least-squares regression line to predict the
values of a response variable y from an explanatory
variable x, the standard deviation of the residuals
(AKA s) is given by:
s
2
residuals

n 2

2
ˆ
(y

y
)
 i
n 2
(Whaaat?!)
The standard deviation of the
residuals gives us a numerical
estimate of the typical size of our
prediction errors (AKA residuals.)
There is another numerical
quantity that tells us how well the
least-squares regression line
predicts values of the response y.
Definition:
The coefficient of determination r2 is the fraction of the
variation in the values of y that is accounted for by the leastsquares regression line of y on x. We can calculate r2 using the
following formula:
SSE
r  1
SST
2
Where
SSE 


residual2
and SST 
(Whaaat?!)
2

( yi  y )2
1993 NL Statistics for MLB
Team
Atlanta
Chic Cubs
Cincinnati
Colorado
Florida
Houston
LA
Montreal
NY Mets
Philly
Pittsburg
San Diego
San Fran
St. Louis
Games Won
104
84
73
67
64
85
81
94
59
97
75
61
103
87
Runs Scored
767
738
722
758
581
716
675
732
672
877
707
679
808
758
Miles driven
and the
price of a
used Honda
CR-V
Miles
Driven
22000
29000
35000
39000
45000
49000
55000
56000
69000
70000
86000
Price
17998
16450
14998
13998
14599
14988
13599
14599
11998
14450
10998
data value ?y
?
10
Error
?
w.r.t. mean model
Call it 10 units!
Est. This8 ?
?
mean?model
x
Proportion of error
eliminated by new model =
for this data point
=
?
?
?
?
= 0.8
r2 is proportion
of computed
error (variability)
in thein the
Conceptually,
if we
a proportion
response
variable
(y) accounted
by the given
same
way for
each data
point and for
combined
them
model (w.r.t
the mean
sensibly,
we would
end model).
up with r2.
Bellwork: 10/5/15
Describe what each of these residual plots is telling us
about their linear regression
Homework Comments
• Define your variables.
• #37: “The slope is 1.109 which means that the typical highway
gas mileage increases on average by 1.109 mpg for each 1
mpg increase in city mileage.” “An increase in city mileage of 1
mpg is associated with a predicted increase of 1.109 mpg on
average in highway mileage”
• Reference residual plot
• CONTEXT!!
• Meaning!
FRQ Practice
• Sewing machines
• Study time