Correlation and Regression

Download Report

Transcript Correlation and Regression

Correlation and Regression
A BRIEF overview
Correlation Coefficients
Continuous IV & DV
 or dichotomous variables (code as 0-1)

 mean

interpreted as proportion
Pearson product moment correlation
coefficient range -1.0 to +1.0
Interpreting Correlations
1.0, + or - indicates perfect relationship
 0 correlations = no association between
the variables
 in between - varying degrees of
relatedness
 r2 as proportion of variance shared by
two variables
 which is X and Y doesn’t matter

Positive Correlation
regression line is the line of best fit
 With a 1.0 correlation, all points fall
exactly on the line
 1.0 correlation does not mean values
identical
 the difference between them is
identical

Negative Correlation
If r=-1.0 all points fall directly on the
regression line
 slopes downward from left to right
 sign of the correlation tells us the
direction of relationship
 number tells us the size or magnitude

Zero correlation
no relationship between the variables
 a positive or negative correlation gives
us predictive power

Direction and degree
Direction and degree (cont.)
Direction and degree (cont.)
Correlation Coefficient





r = Pearson Product-Moment Correlation Coefficient
zx = z score for variable x
zy = z score for variable y
N = number of paired X-Y values
Definitional formula (below)
r
( z x z y )
N
Raw score formula
r
NXY  XY
[ NX  (X ) ][ NY  (Y ) ]
2
2
2
2
Interpreting correlation coefficients
comprehensive description of relationship
 direction and strength
 need adequate number of pairs
 more than 30 or so
 same for sample or population
 population parameter is Rho (ρ)
 scatterplots and r
 more tightly clustered around line=higher
correlation

Examples of correlations
-1.0 negative limit
 -.80 relationship between juvenile street
crime and socioeconomic level
 .43 manual dexterity and assembly line
performance
 .60 height and weight
 1.0 positive limit

Describing r’s

Effect size index-Cohen’s guidelines:
 Small – r = .10, Medium – r = .30, Large – r = .50

Very high = .80 or more
Strong = .60 - .80
Moderate = .40 - .60
Low
= .20 - .40
Very low = .20 or less





small correlations can be very important
Correlation as causation??
Nonlinearity and range restriction
if relationship doesn't follow a linear
pattern Pearson r useless
 r is based on a straight line function
 if variability of one or both variables is
restricted the maximum value of r
decreases

Linear vs. curvilinear relationships
Linear vs. curvilinear (cont.)
Range restriction
Range restriction (cont.)
Understanding r
Simple linear regression




enables us to make a “best” prediction of the
value of a variable given our knowledge of the
relationship with another variable
generate a line that minimizes the squared
distances of the points in the plot
no other line will produce smaller residuals or
errors of estimation
least squares property
Regression line
The line will have the form Y'=A+BX
 Where: Y' = predicted value of Y
 A = Y intercept of the line
 B = slope of the line
 X = score of X we are using to predict Y

Ordering of variables
which variable is designated as X and
which is Y makes a difference
 different coefficients result if we flip them
 generally if you can designate one as
the dependent on some logical grounds
that one is Y

Moving to prediction
statistically significant relationship
between college entrance exam scores
and GPA
 how can we use entrance scores to
predict GPA?

Best-fitting line (cont.)
Best-fitting line (cont.)
Calculating the slope (b)

N=number of pairs of scores, rest of the
terms are the sums of the X, Y, X2, Y2,
and XY columns we’re already familiar
with
N (XY )  (X )(Y )
b
2
2
N (X )  (X )
Calculating Y-intercept (a)
b = slope of the regression line

the mean of the Y values
Y

the mean of the X values
X

a  Y  (b) X
Let’s make up a small example
SAT – GPA correlation
 How high is it generally?
 Start with a scatter plot
 Enter points that reflect the relationship
we think exists
 Translate into values
 Calculate r & regression coefficients
