AP Statistics
Download
Report
Transcript AP Statistics
Special Topics
Correlation
Correlation
Correlation is a numerical summary. It measures the
strength and direction of a linear relationship between
two variables (x,y) – denoted as “r”
Formula:
xi x yi y
1
r
n 1 sx sy
The correlation “r” is the average of the products of the
standardized x’s and standardized y’s.
Facts About Correlation
Standardizing removes the units and allows us to
calculate “r” while combining unrelated
variables.
It does not matter which variable you call x or y
– r calculates the same either way.
Both variables must be quantitative to calculate
correlation.
How to Interpret Correlation
A positive “r” indicates a positive association
between variables.
A negative “r” indicates a negative association.
Correlation falls between -1 and 1.
Correlation values near zero indicate a very weak
linear relationship.
Because “r” uses a standardized values of the
observations, “r” does not change when we
change units.
How to Interpret Correlation
Correlation only measures the strength of a
linear relationship. It does not describe curved
relationships.
Like the mean and standard deviation,
correlation is strongly affected by outliers (it is
non-resistant).
“r” is not a complete description of two-variable
data.
Connecting “r” to Scatterplots
Important Note!
Correlation does not exist on a linear scale.
Thus, a correlation of .8 is not twice the linear
strength of a correlation of .4!
You can’t use proportional reasoning with this
numerical summary.
A moderately strong to strong correlation begins
at r = .8.
Classroom Practice
Enter “Correct Calories” into L1 and “Guessed
Calories” into L2.
Classroom Practice
Run a 2-VarStat to get the mean and standard
deviation of the variables.
Plot the scatterplot on calculator.
*Turn diagnostics on.
Run a LinReg to see diagnostics.
Remove 394 and 419 (outliers) and run the
diagnostics again.
What did you notice about the r-value? Do
outliers affect the correlation?
Homework
Worksheet