Transcript Document
Bivariate
Relationships
Adv. Experimental
Methods & Statistics
PSYC 4310 / COGS 6310
Michael J. Kalsher
Department of
Cognitive Science
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2012, Michael Kalsher
1
Overview
•
•
•
•
Back to variance ...
Cross-product deviations and Covariance
Characteristics of the correlation coefficient
Types of correlation
– Bivariate vs. partial correlation
– Parametric vs. non-parametric
– Partial vs. Semi-partial (or part)
• Reporting correlation coefficients
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
2
Variance (s2 or 2)
The average squared difference from the mean.
It tells us--on average--how much a given data
point differs from the mean of all data points.
s2 = SS = ( xi – x )2
N-1
N-1
Where: xi = a single data point
x = the mean of the sample
N = number of observations
SS = Sum of Squares or more precisely, sum of squared deviations from the mean.
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
Linear Bivariate Relationships
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
Covariance:
Assessing association between two variables
– Measures extent to which corresponding
elements from two sets of ordered data move in
the same direction (see example next slide).
– Not based on standard scores (more on this later).
– Its value is influenced by:
PSYC 4310/6310
•
the strength of the linear relationship between X and Y
•
the size of the standard deviations of X and Y (i.e., sx and sy)
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
Example: Imagine we expose five people to a specific number of advertisements
promoting a particular type of candy and then measure how many packages of the
candy each person purchases the following week
= = Packs of candy purchased
Deviation between each
data point and the mean
Mean = 11.0
= Ads watched
Mean = 5.4
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
What’s going
on here?
1. The pattern of deviations is similar for both variables.
2. But how do we quantify the level of similarity?
-- For a single variable, recall that we calculate the average squared deviation from the mean to
determine the level of dispersion in the data (i.e,, the variance).
-- For two variables, we multiply the individual deviations for one variable by the corresponding
deviations for the second variable to obtain the cross-product deviations, then divide by N-1 to
compute the covariance.
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
Covariance:
Cov XY
Conceptual Form
Var Y
Var X
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
Covariance:
cov(x,y) =
Equation Form
(xi - x)(yi - y)
N-1
= (-0.4)(-3) + (-1.4)(-2) + (-1.4)(-1) + (-0.6)(2) + (2.6)(4)
4
= (1.2) + (2.8) + (1.4) + (1.2) + (10.4)
4
= 17
4
= 4.25
Positive covariance = as one variable deviates from the mean, the other
variable deviates in the same direction.
Negative covariance = as one variable deviates from the mean, the other
variable deviates from the mean in the opposite direction.
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
Standardized Covariance:
The Correlation Coefficient
Covariance useful, but dependent on the scales of
measurement used.
A more practical approach is to use a unit of
measurement into which any scale of measurement can
be converted--standard deviation units.
The standardized covariance is known as the
correlation coefficient:
r = covxy =
sxsy
PSYC 4310/6310
Advanced Experimental Methods and Statistics
(xi - x)(yi - y)
(N - 1) sxsy
© 2011, Michael Kalsher
Standardized (z) scores
zx = (xi – x) / sx
Indep Var (x) Dep Var (y) zx
0
4
1
1
2
9
3
5
4
6
5
5
6
12
7
16
8
8
9
10
10
13
Mean
Std. Dev
PSYC 4310/6310
5.00
3.32
Advanced Experimental Methods and Statistics
8.09
4.44
© 2011, Michael Kalsher
zy
-1.51
-1.21
-0.90
-0.60
-0.30
0.00
0.30
0.60
0.90
1.21
1.51
-0.92
-1.60
0.20
-0.70
-0.47
-0.70
0.88
1.78
-0.02
0.43
1.11
0.00
1.00
0.00
1.00
Pearson Correlation Coefficient:
Some Characteristics
– The sample correlation coefficient (rXY) provides the best
sample estimate of the population correlation coefficient,
XY.
– Values vary between +1.0 and -1.0.
– rXY is a standardized measure of an observed effect.
– The square of the correlation coefficient, R2XY, is termed “r
squared” or the index of association and is defined as the
proportion of the variance of one variable that is shared by
another variable.
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
Calculating Correlation:
A Simple Example
A researcher wonders whether self-esteem is associated
with academic performance and collects the following data.
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
Calculating Correlations:
Using z-scores
Where:
rXY = 0.1434 + 0.1216 + (-0.3128) + (-0.5300) + 1.4902
4
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
= .2281
Calculating Correlations:
Using the Computational Formula
This formula is useful because it does not involve computing
means, standard deviations, and z scores.
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
Calculating Simple Correlation:
Using SPSS
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
Step 1
Step 2
Step 3
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
17
The results showed that student’s self-esteem was not significantly
related to their academic performance, r =.23, p >.05.
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
18
Types of Correlation
Bivariate correlation:
Used to assess the relationship between two variables.
A
Partial correlation:
C
When we do a partial correlation between two variables,
we control for the effects of a third variable. Specifically,
the effect that the third variable has on both variables in
the correlation is controlled. (Useful for isolating the unique
relationship between two variables when other variables are
ruled out).
B
A
Semi-partial (or part) correlation:
B
C
When we do a semi-partial correlation, we control for
the effect that the third variable has on only one of the
variables in the correlation. (Useful when trying to explain
the variance in one particular variable from a set of predictor
variables).
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
A
B
Bivariate Correlation:
Exam Anxiety.sav
A psychologist is interested in the effects of exam stress and revision on exam
performance. Exam anxiety is assessed with a standardized measure (EAQ). Revision
is defined as the number of hours students spend studying for the exam.
Revise = Revision time
Exam = Exam performance
Anxiety = Exam anxiety
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
20
Step 1
Step 3
Step 2
Pearson’s r: Interesting Facts
•Data must be score-level
•Significance testing requires that
data must be normally distributed.
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
21
Bivariate Correlation: SPSS Output
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
22
Partial Correlation: Examining the relationship between two
variables when the effects of a third variable are held constant
Exam Performance and Exam Anxiety share 19.4% of
their variation.
r = .441; R2 = .194 = 19.4% shared variance
Exam Performance and Revision Time share 15.7% of
their variation.
r = .397; R2 = .157 = 15.7% shared variance
Exam Anxiety and Revision Time share 50% of their variation
r = .709; R2 = . 502 = 50.2% shared variance
Two “chunks” of variance in Exam
Performance share unique variance with
Exam anxiety and Revision time.
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
23
Calculating Partial-Correlation:
Using SPSS – Exam Anxiety.sav
Let’s next try assessing the partial correlation between exam anxiety and exam performance while
controlling for the effect of revision time.
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
24
Zero-order
Correlations
Partial
Correlation
R2 = 19.4%
PSYC 4310/6310
Note: The partial correlation is still
statistically significant, but is much
lower when the effects of Time
Spend Revising is held constant.
R2 = 6%
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
25
Correlation: Non-Parametric Alternatives
(See TheBiggestLiar dataset; choose Creativity and Position as variables)
Spearman’s Correlation Coefficient (rs):
Used when the data violate parametric assumptions (e.g.,
normally distributed data; ordinal data).
Works by first ranking the data and then applying
Pearson’s equation to these ranks.
Kendall’s tau ():
Used instead of Spearman’s correlation coefficient when
data set is small and has a large number of tied ranks.
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
26
Non-Parametric Correlation: TheBiggestLiar.sav
A researcher gathers 68 past participants of The World’s Biggest Liar Competition. Each
person indicates his/her placement (1st, 2nd, etc.) and completes a creativity questionnaire
(max. score = 60). Is level of creativity related to a person’s ability to tell tall tales?
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
27
Non-Parametric Correlation:
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
Output
28
Correlation: Applied to Dichotomous Variables
(see pbcorr.sav dataset)
Point-Biserial Correlation (rpb):
Used when one variable is a discrete dichotomy
(e.g., pregnancy).
Biserial Correlation (rb):
Used when one variable is a continuous dichotomy
(e.g.,
passing or failing an exam).
SPSS calculates rpb; must the following equation to calculate rb
rb =
PSYC 4310/6310
rpb
pq
y
Advanced Experimental Methods and Statistics
Where:
p = proportion of cases in largest category
q = proportion of cases in smallest category
y = value obtained from Appendix A.1: Table of
the standard normal distribution
© 2011, Michael Kalsher
29
Note: to obtain values for “p” and
“q”, go to Analyze, Descriptive
Statistics, and then Frequencies.
Select the dichotomous variables
of interest, and the run the
analysis.
(See Appendix A.1: Table of the standard normal distribution.
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
30
Suppose we are interested in the relationship between the
gender of a cat and how much time it spends away from home?
How will you test that relationship?
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
31
Converting rpb to rb
(see pbcorr.sav dataset)
rb =
PSYC 4310/6310
rpb
pq
y
Advanced Experimental Methods and Statistics
=0.475
© 2011, Michael Kalsher
32
Problem #1: Correlation
Using the ChickFlick.sav data, is there a
relationship between gender and arousal? Using
the same data, is there a relationship between the
film watched and arousal?
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
33
Problem #2: Correlation
Using the data collected in class, is there a relationship
between your age, gender, political perspective, and
your current level of satisfaction with the President
Obama’s performance. Does there appear to be any
shared variance among any/all of the variables?
Codes:
Gender: 1 = Female; 2 = Male
Political Leaning: 1 = Liberal; 2 = Moderate; 3 = Conservative
President’s Performance Rating:
0
1
2
3
4
5
6
|-------------|------------|-------------|-------------|--------------|-------------|
Not at All
Somewhat
Moderately
Very
Satisfied
Satisfied
Satisfied
Satisfied
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
34