Correlation - People Server at UNCW
Download
Report
Transcript Correlation - People Server at UNCW
Correlation
• We can often see the
strength of the
relationship between two
quantitative variables in a
scatterplot, but be careful.
The two figures here are
both of the same data, on
different scales. The
second seems to be a
stronger association…
• Here’s the formula (p.102) for Pearson’s correlation
coefficient:
xi x yi y
1
r
n 1 sx s y
• This formula is not for computing r but for understanding
r. Notice that the first step in this formula involves
standardizing each x and y value and then multiplying
the two standardized values (how many s.d.s above or
below the means the x’s and y’s are...) together.
• When two variables x and y are positively associated
their standardized values tend to be both positive or both
negative (think of height and weight) so the product is
positive.
• When two variables are negatively associated then if x
for example is above the mean, the y tends to be below
the mean (and vice versa) so the product is negative.
• The correlation coefficient, r, is a numerical
measure of the strength of the linear relationship
between two quantitative variables.
• It is always a number between -1 and +1.
Positive r
positive association
Negative r
negative association
• r=+1 implies a perfect positive relationship;
points falling exactly on a straight line with
positive slope
• r=-1 implies a perfect negative relationship;
points falling exactly on a straight line with
negative slope
• r~0 implies a very weak linear relationship
• Correlation makes no distinction between
explanatory & response variables – doesn’t
matter which is which…
• Both variables must be quantitative
• r uses standardized values of the observations,
so changing scales of one or the other or both of
the variables doesn’t affect the value of r.
• r measures the strength of the linear relationship
between the two variables. It does not measure
the strength of non-linear or curvilinear
relationships, no matter how strong the
relationship is…
• r is not resistant to outliers – be careful about
using r in the presence of outliers on either
variable
• To explore how extreme outlying observations
influence r, see the applet on Correlation and
Regression in the ebook...
• Homework:
– Read section 2.2
– Using technology to draw the scatterplots and do the
computations, work problems #2.29 - 2.32, 2.35, 2.39,
2.41, 2.43 & 2.46 (applet), 2.50, 2.51