Transcript Chapter 13

Chapter 14
Describing Relationships:
Scatterplots and Correlation
Chapter 14
1
Correlation
Objective: Analyze a collection of paired data
(sometimes called bivariate data).
A correlation exists between two variables when
there is a relationship (or an association)
between them.
We will consider only linear relationships.
- when graphed, the points approximate a
straight-line pattern.
Chapter 13
2
Scatterplot
A scatterplot is a graph in which paired (x, y) data (usually
collected on the same individuals) are plotted with one variable
represented on a horizontal (x -) axis and the other variable
represented on a vertical (y-) axis. Each individual pair (x, y) is
plotted as a single point.
Example:
Chapter 13
3
Examining a Scatterplot
You can describe the overall pattern of a scatterplot by the

Form – linear or non-linear ( quadratic, exponential, no
correlation etc.)

Direction – negative, positive.

Strength – strong, very strong, moderately strong,
weak etc.

Look for outliers and how they affect the correlation.
Chapter 13
4
Scatterplot
Example: Draw a scatter plot for the data below.
What is the nature of the relationship between
X and Y.
y
2
x 1
2
3
4
5
y -4 -2 1
0
2
x
2
4
6
–2
Strong, positive and linear.
–4
Chapter 13
5
Examining a Scatterplot
Two
variables are positively correlated when high
values of the variables tend to occur together and
low values of the variables tend to occur together.
The scatterplot slopes upwards from left to right.
Two variables are negatively correlated when high
values of one of the variables tend to occur with low
values of the other and vice versa.
The scatterplot slopes downwards from left to right.

Chapter 13
6
Types of Correlation
y
y
As x increases, y
tends to decrease.
As x increases, y
tends to increase.
x
Negative Linear Correlation
y
Positive Linear Correlation
y
x
No Correlation
x
x
Non-linear Correlation
Chapter 13
7
Examples of Relationships
70
Heath Status Measure
Heath Status Measure
60
50
40
30
20
10
60
50
40
30
20
10
0
0
$0
$10
$20
$30
$40
$50
$60
0
$70
20
40
80
100
Age
Income
18
65
Mental Health Score
16
Education Level
60
14
12
10
8
6
4
2
0
60
55
50
45
40
35
30
0
20
40
60
80
100
Age
0
20
40
60
80
Physical Health Score
Chapter 13
8
Thought Question 1
What type of association would the following
pairs of variables have – positive, negative, or
none?
1.
2.
3.
4.
5.
6.
Temperature during the summer and electricity bills
Temperature during the winter and heating costs
Number of years of education and height
Frequency of brushing and number of cavities
Number of churches and number of bars in cities
Height of husband and height of wife
Chapter 13
9
Thought Question 2
Consider the two scatterplots below. How does the
outlier impact the correlation for each plot?
– does the outlier increase the correlation, decrease
the correlation, or have no impact?
Chapter 13
10
Measuring Strength & Direction
of a Linear Relationship
 How
closely does a non-horizontal straight
line fit the points of a scatterplot?
 The correlation coefficient (often referred to
as just correlation): r
– measure of the strength of the relationship:
the stronger the relationship, the larger the
magnitude of r.
– measure of the direction of the relationship:
positive r indicates a positive relationship,
negative r indicates a negative relationship.
Chapter 13
11
r
 x  x  y  y 
1




n  1  s x  s y 
Correlation Coefficient
 x  x  y  y 
1
1




r

zx z y




n  1  s x  s y  n  1
r
 x  x  y  y 
1




n  1  s x  s y 

Greek Capital Letter Sigma – denotes
summation or addition.
Chapter 13
12
Correlation Coefficient
 The
range of the correlation coefficient
is -1 to 1.
-1
If r = -1 there is
a perfect
negative
correlation
0
If r is close to 0
there is no linear
correlation
Chapter 13
1
If r = 1 there is
a perfect
positive
correlation
13
Linear Correlation
y
y
r = 0.91
r = 0.88
x
x
Strong negative correlation
y
Strong positive correlation
y
r = 0.42
x
Weak positive correlation
r = 0.07
x
Try
Non-linear Correlation
Chapter 13
14
Correlation Coefficient

special values for r :
 a perfect positive linear relationship would have r = +1
 a perfect negative linear relationship would have r = -1
 if there is no linear relationship, or if the scatterplot
points are best fit by a horizontal line, then r = 0
 Note: r must be between -1 and +1, inclusive
r > 0: as one variable changes, the other variable
tends to change in the same direction
 r < 0: as one variable changes, the other variable
tends to change in the opposite direction

Chapter 13
15
Examples of Correlations
 Husband’s
r
= .94
 Husband’s
r
versus Wife’s ages
versus Wife’s heights
= .36
 Professional
Golfer’s Putting Success:
Distance of putt in feet versus percent
success
r
= -.94
Plot
Chapter 13
16
Correlation Coefficient

Because r uses the z-scores for the observations, it does not change
when we change the units of measurements of x , y or both.

Correlation ignores the distinction between explanatory and response
variables.

r measures the strength of only linear association between variables.

A large value of r does not necessarily mean that there is a strong
linear relationship between the variables – the relationship might not
be linear; always look at the scatterplot.

When r is close to 0, it does not mean that there is no relationship
between the variables, it means there is no linear relationship.

Outliers can inflate or deflate correlations.
Chapter 13
Try
17
Not all Relationships are Linear
Miles per Gallon versus Speed



Curved relationship
(r is misleading)
35
Speed chosen for each
subject varies from 20
mph to 60 mph
MPG varies from trial
to trial, even at the
same speed
miles per gallon

30
25
20
15
r=-0.06
10
5
0
Statistical relationship
Chapter 13
0
50
100
speed
18
Common Errors
Involving Correlation
1.
Causation: It is wrong to conclude that
correlation implies causality.
2. Averages: Averages suppress individual
variation and may inflate the correlation
coefficient.
3. Linearity: There may be some relationship
between x and y even when there is no
linear correlation.
Chapter 13
19
Correlation and Causation
 The
fact that two variables are strongly correlated
does not in itself imply a cause-and-effect
relationship between the variables.
 If there is a significant correlation between two
variables, you should consider the following
possibilities.
1. Is there a direct cause-and-effect relationship
between the variables?

Does x cause y?
Chapter 13
20
Correlation and Causation
2.
Is there a reverse cause-and-effect
relationship between the variables?
• Does y cause x?
3.
Is it possible that the relationship between
the variables can be caused by a third
variable or by a combination of several
other variables?
4.
Is it possible that the relationship between
two variables may be a coincidence?
Chapter 13
21
Example
A survey of the world’s nations in 2004 shows a strong
positive correlation between percentage of countries
using cell phones and life expectancy in years at birth.
a)
Does this mean that cell phones are good for your health?
No. It simply means that in countries where cell phone use is
high, the life expectancy tends to be high as well.
b)
What might explain the strong correlation?
The economy could be a lurking variable. Richer countries
generally have more cell phone use and better health care.
Chapter 13
22
Example
The correlation between Age and Income as measured on 100
people is r = 0.75. Explain whether or not each of these
conclusions is justified.
a)
When Age increases, Income increases as well.
b)
The form of the relationship between Age and Income is
linear.
c)
There are no outliers in the scatterplot of Income vs. Age.
d)
Whether we measure Age in years or months, the
correlation will still be 0.75.
Chapter 13
23
Example
Explain the mistakes in the statements below:
a)
“My correlation of -0.772 between GDP and Infant Mortality
Rate shows that there is almost no association between
GDP and Infant Mortality Rate”.
b)
“There was a correlation of 0.44 between GDP and
Continent”
c)
“There was a very strong correlation of 1.22 between Life
Expectancy and GDP”.
Chapter 13
24
Warnings about
Statistical Significance
 “Statistical significance” does not imply the
relationship is strong enough to be considered
“practically important.”
 Even
weak relationships may be labeled
statistically significant if the sample size is very
large.
 Even
very strong relationships may not be labeled
statistically significant if the sample size is very
small.
Chapter 13
25
Key Concepts

Strength of Linear Relationship

Direction of Linear Relationship

Correlation Coefficient

Problems with Correlations

r can only be calculated for quantitative data.
Chapter 13
26