Mathematical Ideas

Download Report

Transcript Mathematical Ideas

Chapter 13
Statistics
© 2008 Pearson Addison-Wesley.
All rights reserved
Chapter 13: Statistics
13.1
13.2
13.3
13.4
13.5
13.6
Visual Displays of Data
Measures of Central Tendency
Measures of Dispersion
Measures of Position
The Normal Distribution
Regression and Correlation
13-6-2
© 2008 Pearson Addison-Wesley. All rights reserved
Chapter 1
Section 13-6
Regression and Correlation
13-6-3
© 2008 Pearson Addison-Wesley. All rights reserved
Regression and Correlation
• Linear Regression
• Correlation
13-6-4
© 2008 Pearson Addison-Wesley. All rights reserved
Regression
One important branch of inferential statistics, called
regression analysis, is used to compare quantities
or variables, to discover relationships that exist
between them, and to formulate those relationships
in useful ways.
13-6-5
© 2008 Pearson Addison-Wesley. All rights reserved
Regression
Suppose that we wish to get an idea of how the
number of hours preparing for a final exam relates
to the score on the exam. Data is collected and
shown below.
Hours
1
2
3
4
5
6
7
8
9
10
Score
50
62 62 74 70 86 78 90 96 94
13-6-6
© 2008 Pearson Addison-Wesley. All rights reserved
Linear Regression
The first step in analyzing these data is to graph the
results as shown in the scatter diagram on the next
slide.
13-6-7
© 2008 Pearson Addison-Wesley. All rights reserved
Scatter Diagram
120
Exam Score
100
80
60
40
20
0
0
5
10
15
Hours Studying
13-6-8
© 2008 Pearson Addison-Wesley. All rights reserved
Linear Regression
Once a scatter diagram has been produce, we can
draw a curve that best fits the pattern exhibited by
the sample points. The best-fitting curve for the
sample points is called an estimated regression
curve. If the points in the scatter diagram seem to
lie approximately along a straight line, the
relationship is assumed to be linear, and the line that
best fits the data points is called the estimated
linear regression.
13-6-9
© 2008 Pearson Addison-Wesley. All rights reserved
Estimated Regression Line
120
Exam Score
100
80
60
40
20
0
0
5
10
15
Hours Studying
13-6-10
© 2008 Pearson Addison-Wesley. All rights reserved
Linear Regression
If we let x denote hours studying and y denote exam
score in the data of the previous slide and assume that
the best-fitting curve is a line, then the equation of that
line will take the form
y = ax + b,
where a is the slope of the line and b is the ycoordinate of the y-intercept. To identify the estimated
regression line, we must find the values of the
“regression coefficients” a and b.
13-6-11
© 2008 Pearson Addison-Wesley. All rights reserved
Linear Regression
For each x-value in the data set, the corresponding
y-value usually differs from the value it would have
if the data point were exactly on the line. These
differences are shown in the figure by vertical line
segments. The most common procedure is to
choose the line where the sum of the squares of all
these differences is minimized. This is called the
method of least squares, and the resulting line is
called the least squares line.
13-6-12
© 2008 Pearson Addison-Wesley. All rights reserved
Regression Coefficient Formulas
The least squares line y’ = ax + b that
provides the best fit to the data points (x1, y1),
(x2, y2),… (xn, yn) has
a
n   xy     x   y 
n
 x    x
2
2
y  a  x

and b 
.
n
13-6-13
© 2008 Pearson Addison-Wesley. All rights reserved
Example: Computing a Least Squares
Line
Find the equation of the least squares line for the
hours and exam score data.
Hours
1
2
3
4
5
6
7
8
9
10
Score
50
62 62 74 70 86 78 90 96 94
13-6-14
© 2008 Pearson Addison-Wesley. All rights reserved
Example: Computing a Least Squares
Line
Solution
n   xy     x   y  10(4592)  (55)(762)
a
=
2
2
2
10(385)

(55)
n  x   x


a  4.86
y  a   x  762  (4.86)(55)

b

 49.47
n
10
The equation is y  4.86 x  49.47
13-6-15
© 2008 Pearson Addison-Wesley. All rights reserved
Example: Predicting from a Least
Squares Line
Use the result from the previous example to
predict the exam score for a student that studied
6.5 hours.
Solution
Use the equation y  4.86 x  49.47 and replace x
with 6.5.
y  4.86(6.5)  49.47  81.06
Based on the given data, the student should make
about an 81%.
13-6-16
© 2008 Pearson Addison-Wesley. All rights reserved
Correlation
One common measure of the strength of the linear
relationship in the sample is called the sample
correlation coefficient, denoted r. It is calculated
from the sample data according to the formula on
the next slide.
13-6-17
© 2008 Pearson Addison-Wesley. All rights reserved
Sample Correlation Coefficient Formula
In linear regression, the strength of the linear
relationship is measured by the correlation
coefficient
n   xy     x   y 
r
n
 x    x
2
2
 n
 y    y 
2
2
r is always between –1 and 1, or perhaps equal to
–1 or 1.
13-6-18
© 2008 Pearson Addison-Wesley. All rights reserved
Correlation Coefficient
Values of exactly 1 or –1 indicate that the least squares
line goes exactly through all the data points. If r is
close to 1 or –1, but not exactly equal, then the line
comes “close,” and the linear correlation between x
and y is “strong.” If r is equal, or nearly equal, to 0,
there is no linear correlation or the correlation is weak.
If r is neither close to 0 nor close to 1 or –1, we might
describe the linear correlation as “moderate.”
13-6-19
© 2008 Pearson Addison-Wesley. All rights reserved
Correlation Coefficient
A positive value of r indicates that the linear
relationship between x and y is direct; as x increases, y
also increases. A negative value of r indicates that there
is an inverse relationship between x and y; as x
increases, y decreases.
13-6-20
© 2008 Pearson Addison-Wesley. All rights reserved
Example: Finding a Correlation
Coefficient
Find r for the data.
Hours
1
2
3
4
5
6
7
8
9
10
Score
50
62 62 74 70 86 78 90 96 94
Solution
n   xy     x   y 
r
n
 x    x
2
2
 n
 y    y 
2
2
13-6-21
© 2008 Pearson Addison-Wesley. All rights reserved
Example: Finding a Correlation
Coefficient
Solution (continued)

10  4592    55  762 
10  385    55   10  60196    762 
2
2
r  .956
This value shows that hours studying and exam score
are highly correlated. As hours increase so does the
exam score.
13-6-22
© 2008 Pearson Addison-Wesley. All rights reserved