Transcript File

Linear Regression
One Double Whopper with cheese provides 53
grams of protein, 1020 calories, and 65 grams of fat.
The correlation between
Fat and Protein for 30 of
the items on the Burger
King menu is 0.83
The fat content and calories for a Double Whopper
are extreme, but are they an outlier?
The association between
protein and fat is a positive
linear association with a fairly
strong correlation of 0.83
If you want 25 grams of protein
from a BK item, how much fat
should you expect to
consume?
We can use a linear model to predict values.
Regression Line
𝑦 = 𝑏0 + 𝑏1 π‘₯
𝑏1 = π‘ π‘™π‘œπ‘π‘’; 𝑏0 = y-intercept
(Also known as the least squares line or line
of best fit)
Slope
b1 ο€½ r
sy
sx
y-intercept
b 0 ο€½ y ο€­ b1 x
β€œPutting a hat on it” is notation to
indicate that something has been
predicted by a model.
𝑦 𝑖𝑠 π‘Ÿπ‘’π‘Žπ‘‘ π‘Žπ‘  "𝑦 β„Žπ‘Žπ‘‘"
Before using a regression model, we need to check
the same conditions as we do for correlation:
οƒΌQuantitative variables
οƒΌStraight pattern
οƒΌCheck for outliers
This model says that our predictions follow a straight
line. The equation is given in slope-intercept form.
For the association of fat and protein of Burger King
items, the estimated linear model is:
π‘“π‘Žπ‘‘ = 6.8 + 0.97π‘π‘Ÿπ‘œπ‘‘π‘’π‘–π‘›
Example
π‘“π‘Žπ‘‘ = 6.8 + 0.97π‘π‘Ÿπ‘œπ‘‘π‘’π‘–π‘›
What is the predicted amount of fat for the BK
Broiler chicken sandwich, which has 30 grams of
protein?
π‘“π‘Žπ‘‘ = 6.8 + 0.97 30
= 35.9𝑔
If we convert the data into z-scores, the scatterplot shifts to
the origin. The origin is where both z-scores are 0. A zscore of 0 would happen at the mean.
When the variables are
standardized, the slope of the line
turns out to be r.
Moving one standard deviation
away from the mean in one
variable moves our estimate r
standard devations from the mean
in the other variable.
Example
A scatterplot of house price (in thousands of dollars) vs. house size (in
thousands of square feet) for houses sold recently in Saratoga, NY
shows a relationship that is straight, with only moderate scatter and no
outliers. The correlation between Price and Size is 0.77.
1)You go to an open house and find that the house is 1 standard deviation above
the mean in size. What would you guess about its price?
2)You read an ad for a house priced 2 standard deviations below the mean. What
would you guess about its size?
3)A friend tells you about a house whose size in square meters is 1.5 standard
deviations above the mean. What would you guess about its size in square feet?
Answers
1)You should expect the price to be 0.77 standard
deviations above the mean.
2)You should expect the size to be 2(0.77) = 1.54
standard deviations below the mean.
3)The home is 1.5 standard deviations above the
mean in size no matter how it is measured.
Residuals
We predict that a BK Broiler chicken sandwich with 30 grams of
protein should have 36 grams of fat, but it actually only has 25
grams of fat.
The difference between the
observed value and the
predicted value is called the
residual.
𝑦 βˆ’ 𝑦 = 25 βˆ’ 36 = βˆ’11 𝑔
 A negative residual means the
observed value is below the
prediction on the regression line.
 A positive residual means the
observed value is above the
prediction on the regression line.
If we keep the x-values and replace the y-values with the residuals,
the resulting scatterplot has no pattern or direction.
Example
The linear model for Saratoga homes uses the Size and Price:
π‘ƒπ‘Ÿπ‘–π‘π‘’ = βˆ’3.117 + 94.454𝑆𝑖𝑧𝑒.
1)What does the slope of 94.454 mean?
2)What are the units of the slope?
3)Your house is 2000 sq ft bigger than your neighbor’s house. How
much more do you expect it to be worth?
4)Is the y-intercept of -3.117 meaningful?
Answers
1)An increase in home size of 1000 sq ft is associated
with an increase in price of $94,454
2)Units are in thousands of dollars per thousand
square feet
3)About $188,908
4)No… You can’t have a zero square ft. house
Example
The linear model for Saratoga homes uses the Size and Price:
π‘ƒπ‘Ÿπ‘–π‘π‘’ = βˆ’3.117 + 94.454𝑆𝑖𝑧𝑒. Suppose you’re thinking of buying a
home there.
1)Would you prefer to find a home with a negative or positive
residual? Explain.
2)You plan to look for a home of about 3000 square feet. How much
should you expect to have to pay?
3)You find a nice home that size which is selling for $300,000.
What’s the residual?
Answers
1)Negative; that indicated it’s priced lower than a
typical home of its size.
2)$280,250
3)$19,755
Today’s Assignment
 Add to Homework #5: p. 192 #1-6, 11, 12