Transcript Slide 1

Math 227 Elementary Statistics
Sullivan, 4th ed.
Copyright of the definitions and examples is
reserved to Pearson Education, Inc.. In order to
use this PowerPoint presentation, the required
textbook for the class is the Fundamentals of
Statistics, Informed Decisions Using Data,
Michael Sullivan, III, fourth edition.
CHAPTER 4
Describing the Relation between
Two Variables
3
Ch 4.1 & 4.2 Two dimensions concept
I. Scatter plot
(x)
# hour
of sleep
6
8
10
2
(y)
performance
3
5
4
1
5
C2
4
3
2
1
2
3
4
5
6
C1
7
8
9
10
quadratic
regression
linear regression
power
regression
exponential
regression
Linear Regression
perfect
correlation
positive correlation
negative correlation
As x increases, y
increases
As x increases, y
decreases
no correlation
Linear correlation coefficient
• The linear correlation coefficient or
Pearson product moment correlation
coefficient is a measure of the strength and
direction of the linear relation between two
quantitative variables. The Greek letter ρ (rho)
represents the population correlation
coefficient, and r represents the sample
correlation coefficient. We present only the
formula for the sample correlation coefficient.
6
quadratic
regression
linear regression
power
regression
exponential
regression
Linear Regression
perfect
correlation
positive correlation
negative correlation
As x increases, y
increases
As x increases, y
decreases
no correlation
Linear correlation coefficient
• The linear correlation coefficient or
Pearson product moment correlation
coefficient is a measure of the strength and
direction of the linear relation between two
quantitative variables. The Greek letter ρ (rho)
represents the population correlation
coefficient, and r represents the sample
correlation coefficient. We present only the
formula for the sample correlation coefficient.
8
Properties of the Linear Correlation Coefficient
1. The linear correlation coefficient is always
between –1 and 1, inclusive. That is, –1 ≤ r ≤ 1.
2. If r = + 1, then a perfect positive linear relation
exists between the two variables.
3. If r = –1, then a perfect negative linear relation
exists between the two variables.
4. The closer r is to +1, the stronger is the evidence of
positive association between the two variables.
5. The closer r is to –1, the stronger is the evidence of
negative association between the two variables.
6. If r is close to 0, then little or no evidence exists of
a linear relation between the two variables. So r
close to 0 does not imply no relation, just no
linear relation.
7. The linear correlation coefficient is a unitless
measure of association. So the unit of measure for
x and y plays no role in the interpretation of r.
8. The correlation coefficient is not resistant.
Therefore, an observation that does not follow
the overall pattern of the data could affect the
value of the linear correlation coefficient.
EXAMPLE
Determining the Linear Correlation Coefficient
Determine the linear
correlation coefficient
of the drilling data.
xi  x
sx
x
y
yi  y
sy
 xi  x   yi  y 
 s   s 
 y 
x
 xi  x   yi  y 
  s   s  
x
y
 xi  x   yi  y 
  s   s 
x
y
r
n 1
8.501037

12  1
 0.773
Testing for a Linear Relation
• Step 1 Determine the absolute value of the
correlation coefficient
• Step 2 Find the critical value in Table II from
Appendix A for the given sample size
• Step 3 If the absolute value of the correlation
coefficient is greater than the critical value,
we say a linear relation exists between the
two variables. Otherwise, no linear relation
exists.
15
EXAMPLE
Does a Linear Relation Exist?
Step 1 The linear correlation coefficient
between depth at which drilling begins and the
time to drill 5 feet is 0.773
Step 2 Table II shows the critical value with n =
12 is 0.576
Step 3 Since |0.773|>0.576, we conclude a
positive association exists between depth at
which drilling begins and time to drill 5 feet.
4.2 Least-Squares Regression
17
EXAMPLE Finding an Equation that Describes Linearly
Relate Data
Using the following sample data:
(a) Find a linear equation that relates x (the explanatory variable) and y (the
response variable) by selecting two points and finding the equation of the line
containing the points.
Using (2, 5.7) and (6, 1.9):
5.7  1.9
m
26
 0.95
y  y1  m x  x1 
y  5.7  0.95 x  2 
y  5.7  0.95x  1.9
y  0.95x  7.6
(b) Graph the equation on the scatter diagram.
7
6
5
4
3
2
1
0
0
1
2
3
4
(c) Use the equation to
predict y if x = 3.
5
6
7
y  0.95x  7.6
 0.95(3)  7.6
 4.75
The difference between the observed value of y and the
predicted value of y is the error, or residual.
Using the line from the last example, and the
predicted value at x = 3:
residual = observed y – predicted y
= 5.2 – 4.75 = 0.45
7
6
(3, 5.2)
5
}
4
residual = observed y – predicted y
= 5.2 – 4.75
= 0.45
3
2
1
0
0
1
2
3
4
5
6
7
Least-Squares Regression Criterion
The least-squares regression line is the line that
minimizes the sum of the squared errors (or
residuals). This line minimizes the sum of the
squared vertical distance between the observed
values of y and those predicted by the line yˆ ,
(“y-hat”). We represent this as
“ minimize Σ residuals2 ”.
The Least-Squares Regression Line
The equation of the least-squares regression line is
given by
yˆ  b1 x  b0
where
sy
is
the
slope
of
the
least-squares
b1  r 
regression line
sx
and
is the y-intercept of the leastb0  y  b1 x
squares regression line
The Least-Squares Regression Line
Note: x is the sample mean and sx is the sample
standard deviation of the explanatory variable x ;
y is the sample mean and sy is the sample
standard deviation of the response variable y.
EXAMPLE Finding the Least-squares Regression Line
Using the drilling data
(a)Find the least-squares
regression line.
(b) Predict the drilling time if
drilling starts at 130 feet.
(c) Is the observed drilling time at
130 feet above, or below, average.
(d) Draw the least-squares
regression line on the scatter
diagram of the data.
(a) We agree to round the estimates of
the slope and intercept to four
decimal places.
yˆ  0.0116x  5.5273
(b)
yˆ  0.0116x  5.5273
 0.0116(130)  5.5273
 7.035
(c) The observed drilling time is 6.93 seconds.
The predicted drilling time is 7.035 seconds. The
drilling time of 6.93 seconds is below average.
(d)
8.5
8
Time to Drill 5 Feet
7.5
7
6.5
6
5.5
0
20
40
60
80
100
120
Depth Drilling Begins
140
160
180
200
Interpretation of Slope:
The slope of the regression line is 0.0116. For
each additional foot of depth we start drilling, the
time to drill five feet increases by 0.0116 minutes,
on average.
Interpretation of the y-Intercept: The y-intercept of the
regression line is 5.5273. To interpret the y-intercept, we
must first ask two questions:
1. Is 0 a reasonable value for the explanatory variable?
2. Do any observations near x = 0 exist in the data set?
A value of 0 is reasonable for the drilling data (this
indicates that drilling begins at the surface of Earth. The
smallest observation in the data set is x = 35 feet, which is
reasonably close to 0. So, interpretation of the y-intercept
is reasonable.
The time to drill five feet when we begin drilling at the
surface of Earth is 5.5273 minutes.
If the least-squares regression line is used to make
predictions based on values of the explanatory variable that
are much larger or much smaller than the observed values,
we say the researcher is working outside the scope of the
model. Never use a least-squares regression line to make
predictions outside the scope of the model because we can’t
be sure the linear relation continues to exist.
Predictions When There is No Linear Relation:
When the correlation coefficient indicates no linear
relation between the explanatory and response
variables, and the scatter diagram indicates no relation
at all between the variables, then we use the mean
value of the response variables, then we use the mean
value of the response variable as the predicted value
so that yˆ  y
Summary:
1. Use StatCrunch to plot a scatter plot
2. Use StatCrunch to calculate r
3. Determine whether there is a positive/negative linear
correlation between X and Y.
4. If there is a linear correlation between X and Y, use
StatCrunch to find the least squares regression line.
Otherwise, do not find the least squares regression line.
5. When a value is assigned to X  if there is a correlation
between X and Y, use the least squares regression line to
find the best predicted Y.
When a value is assigned to X  if there is no correlation
between X and Y, use StatCrunch to find
and the best y
predicted Y is
for any X. y