Transcript correlation
Warm-up
Accidents
In 2001, Progressive Insurance
asked customers who had been
involved in auto accidents how
far they were from home when
the accident happened. The
data are summarized in the
table.
a)
Create an appropriate
graph of these data
b)
Do these data indicate that
driving near home is
particularly dangerous?
Explain.
Miles from
home
% of Accidents
Less than 1
23
1 to 5
29
6 to 10
17
11 to 15
8
16 to 20
6
Over 20
17
Review questions
When describing a scatterplot, what four things should
you always mention?
Direction, form, strength, unusual features (such as outliers,
clusters)
What does correlation measure?
Measures the strength of the linear association between two
quantitative variables
Explain the difference between association and
correlation.
Association is a vague term describing the relationship between two
variables. Correlation is a precise term describing the strength and
direction of the linear relationship between quantitative variables
Review questions
What three conditions are necessary in order to use
correlation as a measure of association?
Quantitative Variables condition
Straight Enough condition
Outlier Condition
Review questions
What does a correlation near zero indicate?
There is almost no linear association between the variables
Sketch an example of a scatterplot that shows two
variables with a strong association but a weak
correlation.
Is correlation resistant or nonresistant to outliers?
Explain.
Review questions
A school board study found a moderately strong
negative association between the number of hours
high school seniors worked at part-time jobs after
school hours and the students’ grade point averages.
Explain in this context what “negative association”
means.
Students who worked more hours tended to have lower grades
Correlation…more to think about…
Hoping to improve student performance, the school
board passed a resolution urging parents to limit the
number of hours students be allowed to work. Do
you agree or disagree with the school board’s
reasoning? Explain.
They are mistakenly attributing the association to cause and
effect. “Association does not imply causation.” Maybe
students with low grades are more likely to seek jobs, or maybe
there is some other factor in their home life that leads both to
lower grades and to the desire or need to work. (a lurking
variable)
Demo: Effect of individual points on correlation
http://bcs.whfreeman.com/ips4e/cat_010/applets/Co
rrelationRegression.html
Points near the center of the scatterplot have little
effect
Points that fit the pattern increase the strength (and
more so the farther the point is from the center)
Points that don’t fit the pattern decrease (and can
even reverse the sign of ) the correlation
Re-expressing data to make it linear
The variables year and U.S.
population, in millions of people
are displayed. The association
between year and population is
strong, positive, and curved.
Population has been increasing
over the last 200 years.
Furthermore, the rate of
population growth has been
increasing. The U.S. population
has been growing faster in more
recent years. We will attempt to
straighten the scatter plot using
a logarithmic re-expression and
a square root re-expression.
Year
Population
(millions)
1800
5
1850
23
1900
76
1950
151
2000
285
Re-expressing data to make it linear
Ex 1. Gordon Moore, one of the founders of Intel
Corporation, predicted in 1965 that the number of
transistors on an integrated circuit chip would
double every 18 months. This is “Moore’s Law,” one
way to measure the revolution in computing. Here
are the data on the dates and number of transistor
for Intel microprocessors.
Processor
Date
Transistors
4004
1971
2,250
8008
1972
2,500
8080
1974
5,000
8086
1978
29,000
286
1982
120,000
386
1985
275,000
486 DX
1989
1,180,000
Pentium
1993
3,100,000
Pentium II
1997
7,500,000
Pentium III
1997
24,000,000
Pentium 4
2000
42,000,000