y - danagoins

Download Report

Transcript y - danagoins

Chapter 5
LSRL
Least Squares
Regression Line
Bivariate data
• x – variable: is the independent or
explanatory variable
• y- variable: is the dependent or
response variable
• Use x to predict y
yˆ  a  bx
ŷ - (y-hat) means the predicted y
b – is the slope
– it is the approximate
amount
bythe
which
Be sure
to put
haty
increases when x increases
by 1yunit
on the
a – is the y-intercept
– it is the approximate height of the line
when x = 0
– in some situations, the y-intercept has no
meaning
Least Squares Regression Line
LSRL
• The line that gives the best fit to
the data set
• The line that minimizes the sum of
the squares of the deviations from
the line
(3,10)
y =.5(6) + 4 = 7
4.5
2 – 7 = -5
y =.5(0) + 4 = 4
ˆ
y  .5 x  4
0 – 4 = -4
y =.5(3) + 4 = 5.5
-4
(0,0)
10 – 5.5 = 4.5
-5
(6,2)
Sum of the squares = 61.25
What is the sum
of the deviations
from the line?
Will it always be
zero?
(3,10)
Use a calculator
to find the line of
best fit
6
1
ŷ  x  3
3
Find y - y
The line that minimizes the sum of the
squares of the deviations from the line
-3
is the LSRL.
(0,0)
-3
(6,2)
Sum of the squares = 54
Interpretations
Slope:
For each unit increase in x, there is an
approximate increase/decrease of b in y.
Correlation coefficient:
There is a direction, strength, linear of
association between x and y.
The ages (in months) and heights (in
inches) of seven children are given.
x
16
24
42
60
75
102 120
y
24
30
35
40
48
56
Find the LSRL.
60
y  hat  20.4  .342 x
r  .994
Interpret the slope and correlation
coefficient in the context of the problem.
Predicted heights = 20.4 + .342(age in months)
Correlation coefficient:
There is a strong, positive, linear
association between the age and
height of children.
Slope:
For an increase in age of one month,
there is an approximate increase of .34
inches in heights of children.
The ages (in months) and heights (in
inches) of seven children are given.
x
16
24
42
60
75
102 120
y
24
30
35
40
48
56
60
Predict the height of a child who is 4.5
years old.
38.5 inches
Predict the height of someone who is 20
years old.
102.48 inches or 8.5 feet?
Interpolation (good):
• Using a regression line for
estimating predicted values
between known values.
Extrapolation (bad)
• The LSRL should not be used to
predict y for values of x outside the
data set.
• It is unknown whether the pattern
observed in the scatterplot
continues outside this range.
The ages (in months) and heights (in
inches) of seven children are given.
The LSRL is
ˆ
y  .342 x  20 .404
For these data, this is the best equation to
Can this equation
be
used
to
estimate
the
predict y from
x. get the same
Do
you
always
use this
ageHowever,
of a childstatisticians
who is 50will
inches
tall?
equation to predict x LSRL?
from y
Calculate: LinReg L2,L1
ˆ
y  2.889 x  58 .198
The ages (in months) and heights (in
inches) of seven children are given.
x
16
24
42
60
75
102 120
y
24
30
35
40
48
56
Calculate x & y and sx and sy :
60
x  62.71
y  41.86
Plot the point (x, y) on the scatterplot.
sx  38.93
Will this point always
be on the LSRL?
s y  13.40
The correlation coefficient
and the LSRL are both
non-resistant measures.
Formulas – on chart
yˆ  b0  b1 x
b1
Predicted y = y-intercept + slope(x)
x  x  y  y 


 x  x 
i
i
2
Slope formula
i
b0  y  b1 x
b1  r
sy
sx
Y-intercept = mean of y – slope (mean of x)
Slope = correlation coefficient (st. dev.
of y / st.dev. of x)
The following statistics are found for the
variables posted speed limit and the
average number of accidents.
x  40, s x  11 .6,
y  18, s y  8.4, r  .9981
Find the LSRL & predict the number of
accidents for a posted speed limit of 50 mph.
x  40, s x  11 .6,
y  18, s y  8.4, r  .9981
For LSRL need slope and y-intercept:
b1  r
sy
sx
 8.4 
b1  .9981
  .7228
 11.6 
ˆ
y  .723 x  10 .92
b0  y  b1 x
b0 = 18 - .7228(40) = -10.92
Predicted # of accidents =
.723(posted speed limit) – 10.92
yˆ  .723(50mph)  10.92  25.23accidents
Sleep
Score
8
25
9
28
7
21
10
26
8.5
18
6.5
16
5.5
25
11
28
9
29
7
19
6
26
7.5
23
8.5
24
6.5
27
8.5
21
10
19
9
20
Describe the relationship between the two variables:
(Score = a + b (Hours of Sleep))
What is the equation for the LSRL?
What is the slope of this line? Interpret the slope in the
context of the problem. What are the units?
If a student got 10 hours of sleep the night before the exam,
use the linear equation to approximate her score on the ACT.
Example: The average annual cost per person due to
traffic delays for 70 US cities in 2000 was $298.96 with
a standard deviation of $180.83. The peak period average
freeway speed is 54.34 mph with a standard deviation of
4.494 mph. The correlation between cost per person and
freeway speed is -0.90. Write a regression model to
estimate costs per person associated with traffic delays.
x
=
r=
sx
=
y
=
sy =
Example: A scatterplot of house prices (in thousands of dollars) vs. house size (in
thousands of square feet) shows a relationship that is straight, with only
moderate scatter and no outliers. The correlation between house price and
house size is 0.85.
a) If a house is 1 SD above the mean in size (making it about 2170 sq
ft), how many SDs above the mean would you predict its sale price to
be?
b) What would you predict about the sale price of a house that’s 2 SDs
below average in size?
c) The regression model is Price = 9.564 + 122.74 size
What does the slope of 122.74 mean?
d) What are the units?
e) How much can a homeowner expect the value of his house to increase
if he builds on an additional 2000 sq ft?
f) How much would you expect to pay for a house of 3000 sq ft?