Chapter 4 Describing the Relation Between Two Variables

Download Report

Transcript Chapter 4 Describing the Relation Between Two Variables

Chapter 4
Describing the Relation
Between Two Variables
4.1
Scatter Diagrams; Correlation
Bivariate data is data in which two variables
are measured on an individual.
The response variable is the variable
whose value can be explained or
determined based upon the value of the
predictor variable.
A lurking variable is one that is related to
the response and/or predictor variable, but is
excluded from the analysis
A scatter diagram shows the relationship
between two quantitative variables measured
on the same individual. Each individual in
the data set is represented by a point in the
scatter diagram. The predictor variable is
plotted on the horizontal axis and the
response variable is plotted on the vertical
axis. Do not connect the points when
drawing a scatter diagram.
EXAMPLE
Drawing a Scatter Diagram
The following data are based on a study for
drilling rock. The researchers wanted to
determine whether the time it takes to dry drill
a distance of 5 feet in rock increases with the
depth at which the drilling begins. So, depth
at which drilling begins is the predictor
variable, x, and time (in minutes) to drill five
feet is the response variable, y. Draw a
scatter diagram of the data.
Source: Penner, R., and Watts, D.G. “Mining Information.” The American Statistician, Vol.
45, No. 1, Feb. 1991, p. 6.
Two variables that are linearly related are said to
be positively associated when above average
values of one variable are associated with above
average values of the corresponding variable.
That is, two variables are positively associated
when the values of the predictor variable increase,
the values of the response variable also increase.
Two variables that are linearly related are said to
be negatively associated when above average
values of one variable are associated with below
average values of the corresponding variable.
That is, two variables are negatively associated
when the values of the predictor variable increase,
the values of the response variable decrease
The linear correlation coefficient or Pearson
product moment correlation coefficient is a
measure of the strength of linear relation between
two quantitative variables. We use the Greek letter
(rho) to represent the population correlation
coefficient and r to represent the sample correlation
coefficient. We shall only present the formula for
the sample correlation coefficient.
Properties of the Linear Correlation Coefficient
1. The linear correlation coefficient is always
between -1 and 1, inclusive. That is, -1 < r < 1.
2. If r = +1, there is a perfect positive linear relation
between the two variables.
3. If r = -1, there is a perfect negative linear relation
between the two variables.
4. The closer r is to +1, the stronger the evidence of
positive association between the two variables.
5. The closer r is to -1, the stronger the evidence of
negative association between the two variables.
Properties of the Linear Correlation Coefficient
6. If r is close to 0, there is evidence of no linear
relation between the two variables. Because the
linear correlation coefficient is a measure of
strength of linear relation, r close to 0 does not
imply no relation, just no linear relation.
7. It is a unitless measure of association. So, the
unit of measure for x and y plays no role in the
interpretation of r.
EXAMPLE Drawing a Scatter Diagram and
Computing the Correlation Coefficient
For the following data
(a)Draw a scatter diagram and comment on the
type of relation that appears to exist between x
and y.
(b) By hand, compute the linear correlation
coefficient.
EXAMPLE
Determining the Linear
Correlation Coefficient
Determine the linear correlation coefficient
of the drilling data.
xi  x
sx
x
y
yi  y
sy
 xi  x   yi  y 


 
s
s
 x  y 
A linear correlation coefficient that implies
a strong positive or negative association
that is computed using observational data
does not imply causation among the
variables.
Chapter 4
Describing the Relation
Between Two Variables
4.2
Least-squares Regression
EXAMPLE Finding an Equation that Describes
a Linear Relation
Using the following sample data:
(a) Find a linear equation that relates x (the
predictor variable) and y (the response variable)
by selecting two points and finding the equation
of the line containing the points.
(b) Graph the equation on the scatter diagram.
(c) Use the equation to predict y if x = 5.
The difference between the observed value
of y and the predicted value of y is the error
or residual. That is
residual = observed - predicted
Compute the residual for the prediction
corresponding to x = 5.
EXAMPLE Finding the Least-squares
Regression Line
Using the sample data:
(a) Find the least-squares regression line.
(b) Interpret the slope and intercept.
(c) Predict y if x = 5.
(d) Compute the residual for x = 5.
(e) Draw the least-squares regression line on the
scatter diagram of the data.
EXAMPLE Computing the Sum of Squared
Residuals
Compute the sum of squared residuals for
the line describing the relation between x
and y that was obtained using two points.
Compute the sum of squared residuals for
the least-squares regression line. Which is
smaller?
EXAMPLE Finding the Least-squares
Regression Line
(a) Find the least-squares regression line
for the drilling data.
(b) Use the line to predict the drilling time
at x = 130 feet.
(c) Should the line be used to predict the
drilling time at x = 400 feet? Why?
(d) Interpret the slope and y-intercept.