Transcript File
Bell Ringer
A random sample of records of sales of homes from Feb.
15 to Apr. 30, 1993, from the files maintained by the
Albuquerque Board of Realtors gives the Price and Size (in
square feet) of 117 homes. A regression to predict Price
(in thousands of dollars) from Size has r = 0.84. The
residuals plot indicated that a linear model is appropriate.
a)What are the variables and units in this regression?
b)What units does the slope have?
c)Do you think the slope is positive or negative?
Linear Regression
Recall that a residual is the difference between
an observed value and the predicted value.
π=π¦ β π¦
The standard deviation of the residuals gives us a
measure of how much the points spread around
the regression line.
π π =
π2
πβ2
β’ r is the correlation coefficient
β’ If we square r, we get the portion of
the variation in βyβ accounted for by
variation in βxβ
π
π = πππππππππ πππππππππ
πππ
AKA βcoefficient of determinationβ
πβ
π
π
= πππππππππ ππππ ππ πππ πππππ
ππππ
Example
The correlation between a cerealβs fiber and potassium contents is
r = 0.903. What fraction of the variability in potassium is accounted for
by the amount of fiber that servings contain?
About 81.5% of the variability in potassium content is accounted for by the model.
The regression model for fiber (in grams) and potassium content (in mg)
based on 77 breakfast cereals is πππ‘ππ π ππ’π = 38 + 27πΉππππ. What does
it mean if π π = 30.77?
True potassium content of cereals vary from the predicted values with a standard
deviation of 30.77 milligrams.
The notation that is typically
2
used is π
We express π
as a percent
between 0% and 100%
2
A random sample of records of sales of homes from February 15 to April
30, 1993, from the files maintained by the Albuquerque Board of Realtors
gives the Price and Size (in square feet) of 117 homes. A regression to
predict Price (in thousands of dollars) from Size has an R-squared of
71.4%. The residuals plot indicated that a linear model is appropriate.
a)What are the variables and units in the regression?
The explanatory variable (x) is size, measured in square feet, and the response
variable (y) is price measured in thousands of dollars.
b)What units does the slope have?
The units of the slope are thousands of dollars per square foot.
c) Do you think the slope is positive or negative?
The slope of the regression line predicting price from size should be positive.
Bigger homes are expected to cost more.
From the bell ringer example: A regression to predict Price (in thousands
of dollars) from Size has an R-squared of 71.4%. The residuals plot
indicated that a linear model is appropriate.
a)What is the correlation between Size and Price?
The correlation between size and price is π = .714 = 0.845. The positive value of the
square root is used, since the relationship is believed to be positive.
b)What would you predict about the Price of a home 1 standard deviation
above average in Size?
The price of a home that is one standard deviation above the mean size would be
predicted to be 0.845 standard deviations (in other words r standard deviations) above
the mean price.
c) What would you predict about the Price of a home 2 standard
deviations below average in Size?
The price of a home that is two standard deviations below the mean size would be
predicted to be 1.69 (or 2 x 0.845 ) standard deviations below the mean price.
Engine sizes (called
displacement) measure the
volume of the cylinders in
cubic inches. The regression
analysis of gasoline use and
displacement is shown.
The constant is the y-intercept of the
regression line.
Engine sizes (called
displacement) measure the
volume of the cylinders in
cubic inches. The regression
analysis of gasoline use and
displacement is shown.
The independent (explanatory) variable
is paired with the slope of the
regression line.
Engine sizes (called
displacement) measure the
volume of the cylinders in
cubic inches. The regression
analysis of gasoline use and
displacement is shown.
The equation of the regression line:
ππ’ππ πππππππ¦ = 34.9799 β 0.066196 β πππππππ ππ§π
Engine sizes (called
displacement) measure the
volume of the cylinders in
cubic inches. The regression
analysis of gasoline use and
displacement is shown.
The only other information we need at
this time: n and r-squared (take the
square root for r).
1) How many cars were included in
this analysis?
2) What is the correlation between
engine size and fuel economy?
3) A car you are thinking of buying
is available with two different
size engines, 190 cubic inches or
240 cubic inches. How much
difference might this make in
your gas mileage?
Answers:
1) 89
2) r = -0.78
3) 19.1 mpg for 240 cubic inches or 22.4 mpg for
190 cubic inches β a difference of 3.3 mpg
Todayβs Assignment:
ο±Be sure to read Chapter 8
ο±Add to HW: p. 192 #8, 10,
16, 18, 20, 22