Least Squares Regression Line (LSRL)

Download Report

Transcript Least Squares Regression Line (LSRL)

Least Squares Regression
Line (LSRL)
Presentation 2-5
Introduction
• Many times the
scatterplot shows
some pattern in the
data.
• For now, we will look
at the analysis of data
that falls in a straight
line pattern.
Size of Diamond vs. Price
Introduction
• When we see a straight line pattern, we
want to model the data with a linear
equation.
• This will allow us to make predictions and
actually use our data.
Linear Relations
•We know lines from algebra to come in the form y = mx + b,
where m is the slope and b is the y-intercept.
•In statistics, we use y = a + bx for the equation of a straight
line. Now a is the intercept and b is the slope.
•The slope (b) of the line, is the amount by which y increases
when x increase by 1 unit.
– This interpretation is very important.
•The intercept (a), sometimes called the vertical intercept, is
the height of the line when x = 0.
Example
y
• Consider the
equation: y=7+3x
15
y = 7 + 3x
– The slope is 3.
• For every increase of 1
in the x-variable, there
will be an increase of 3
in the y-variable.
– The intercept is 7.
• When the x-variable is
0, the y-variable is 7.
y increases by b =
3
x increases by 1
10
5
a=7
0
0
2
4
6
8
x
Example
y
• Consider the
equation: y=17-4x
15
– The slope is -4.
• For every increase of 1
in the x-variable, there
will be a decrease of 4
in the y-variable.
– The intercept is 17.
• When the x-variable is
0, the y-variable is 17.
y changes by b = -4 (i.e., changes by –4)
10
a = 17
y = 17 - 4x
5
x increases by 1
0
0
2
4
6
8
Least Squares Line
• How can we find the best line to fit the
data?
– We would like to minimize the total distance
away from the line
• This distance is measured vertically from the point
to the line.
– Go to the following applet and start plotting
points to see how this process works.
Make your own regression line
Least Squares Line
• You first get a line once you plot two points.
• When you plot the third, green bars appear
representing the error (actually called residual)
of the line.
– These are how far off your line is for each of the
points.
• The best line is the one that would minimize the
total length of the green lines (all put together).
Guess the best fit line
• Go to the following applet to practice your
skills at estimating an LSRL.
– Plot a bunch of points.
– Then click the draw line button and draw what
you think is the best fit line
– Then, check the “show least squares line”
checkbox
To the applet
The details of the LSRL
• The mathematics involved in calculating
the LSRL is a bit complicated.
Least Squares Line
The most widely used criterion for measuring the
goodness of fit of a line
y = a + bx to bivariate data (x1, y1), (x2, y2), ,
(xn,yn) is the sum of the of the squared deviations
about the line: 2
  y  (a  bx)
  y1  (a  bx1 )  
2
  y n  (a  bx n ) 
2
The line that gives the best fit to the data is the one
that minimizes this sum; it is called the least squares
line or sample regression line.
Coefficients a and b
The slope is:
br
The intercept is:
sy
sx
a  y  bx
S-sub y and s-sub x are the
sample standard deviations of y
and x (kinda like rise over run)
y-bar and x-bar are the mean
y and x respectively
The equation of the least squares regression line is written as:
yˆ  a  bx
The little symbol above the y is a hat! The equation is read as, “yhat equals a plus bx.” The ‘y-hat’ indicates that this is a regression
line and that the model (equation) is to be used to make
predictions.
Three Important Questions
•
To examine how useful or effective the line
summarizing the relationship between x and y, we
consider the following three questions.
1.
Is a line an appropriate way to summarize the
relationship between the two variables?
2.
Are there any unusual aspects of the data set that we
need to consider before proceeding to use the
regression line to make predictions?
3.
If we decide that it is reasonable to use the regression
line as a basis for prediction, how accurate can we
expect predictions based on the regression line to be?
Example #1 - Finding the LSRL
• Consider the following
data:
• With this data, find
the LSRL
• Start by entering this
data into list 1 and list
2
Shoe Size (men’s U.S.)
Height (in)
7
64
10
69
12
71
8
68
9.5
71
10.5
70
11
72
12.5
74
13.5
77
10
68
Example #1 - Finding the LSRL
• You should then see
the results of the
regression.
–
–
–
–
a=53.24
b=1.65
r-squared=.8422
r=.9177
yˆ  53.24  1.65 x
Height  53.24  1.65  ( shoe size )
This is the correlation coefficient for the scatterplot!!!
Example #2 – Interpreting LSRL
yˆ  53.24  1.65 x
Height  53.24  1.65  ( shoe size )
• Interpreting the intercept
– When your shoe size is 0, you should be about 53.24
inches tall
– Of course this does not make much sense in the
context of the problem
• Interpreting the slope
– For each increase of 1 in the shoe size, we would
expect the height to increase by 1.65 inches
Example #3 – Using LSRL
yˆ  53.24  1.65 x
Height  53.24  1.65  ( shoe size )
• Making predictions
– How tall might you expect someone to be who has a
shoe size of 12.5?
– Just plug in 12.5 for the shoe size above, so…
– Height = 53.24+1.65 (12.5)=73.865 inches
• Of course this is a prediction and is therefore not
exact.
Least Squares Regression Line
(LSRL)
• This concludes this presentation.