Standard Scores (Z-scores)

Download Report

Transcript Standard Scores (Z-scores)

Z-scores and Correlations
Lecture 6, Psych 350 - R. Chris Fraley
http://www.yourpersonality.net/psych350/fall2012/
Announcements
• No lab on Wed this week
• No lecture next week
• Email TA about zero-acquaintance data;
working with it on Friday
Answering Descriptive Questions in
Multivariate Research
• When we are studying more than one
variable, we are typically asking one (or
more) of the following two questions:
– How does a person’s score on the first variable compare to
his or her score on a second variable?
– How do scores on one variable vary as a function of scores
on a second variable?
Making Sense of Scores
• Let’s work with this first issue for a moment.
• Let’s assume we have Marc’s scores on his
first two Psych 350 exams.
• Marc has a score of 50 on his first exam and
a score of 50 on his second exam.
• On which exam did Marc do best?
15
Example 1
Exam1
Exam2
•In one case, Marc’s exam
score is 10 points above the
mean
10
•In the other case, Marc’s
exam score is 10 points
below the mean
0
5
•In an important sense, we
must interpret Marc’s grade
relative to the average
performance of the class
0
2 0
4 0
6 0
8 0
1 0 0
G R AD E
Mean Exam1 = 40
Mean Exam2 = 60
0 5 10 15 20 25 30
Example 2
•Both distributions have the
same mean (40), but
different standard deviations
(10 vs. 20).
Exam1
•In one case, Marc is
performing better than
almost 95% of the class. In
the other, he is performing
better than approximately
68% of the class.
Exam2
0
•Thus, how we evaluate
Marc’s performance depends
on how much spread or
2 0
4 0
6 0
8 0
1 0variability
0
there is in the
G R AD E exam scores.
Standard Scores
• In short, what we would like to do is express
Marc’s score for any one exam with respect to
(a) how far he is from the average score in the
class and (b) the variability of the exam
scores.
– how far a person is from the mean:
• (X – M)
– variability in scores:
• SD
Standard Scores
• Standardized scores, or z-scores, provide a
way to express how far a person is from the
mean, relative to the variation of the scores.
Z = (X – M)/SD
• (1) Subtract the person’s score from the
mean. (2) Divide that difference by the
standard deviation.
** This tells us how far a person is from the mean, in the metric
of standard deviation units **
Example 1
15
Marc’s z-score on Exam1:
Exam1
Exam2
z = (50 - 40)/10 = 1
10
(one SD above the mean)
5
Marc’s z-score on Exam2
z = (50 - 60)/10 = -1
0
(one SD below the mean)
0
2 0
4 0
6 0
8 0
1 0 0
G R AD E
Mean Exam1 = 40
Mean Exam2 = 60
SD = 10
SD = 10
0 5 10 15 20 25 30
Example 2
An example where the
means are identical,
but the two sets of
scores have different
spreads
Exam1
SD = 5
Marc’s Exam1 Z-score
Exam2
SD = 20
0
Z = (50-40)/5 = 2
Marc’s Exam2 Z-score
2 0
4 0
6 0
8 0
1 0 0
G R AD E
Z = (50-40)/20 = .5
Some Useful Properties of Standard
Scores
(1) The mean of a set of z-scores is always zero
Why? If we subtract a constant, C, from each score, the mean
of the scores will be off by that amount (M – C). If we
subtract the mean from each score, then mean will be off by
an amount equal to the mean (M – M = 0).
(2) The SD of a set of standardized scores is always 1
Why? SD/SD = 1
if x = 60,
60  50 10

1
10
10
M = 50
SD = 10
x
20
30
40
50
60
70
80
z
-3
-2
-1
0
1
2
3
(3) The distribution of a set of standardized scores has
the same shape as the unstandardized (raw) scores
STANDARDIZED
0
0 .0
0 .1
2
0 .2
0 .3
4
0 .4
6
0 .5
UNSTANDARDIZED
0.4
0.6
0.8
1.0
-6
-4
-2
0
2
0.0
0.1
0.2
0.3
0.4
The “normalization” (mis)interpretation
-4
-2
0
SCORE
2
A “Normal” Distribution
4
Some Useful Properties of Standard
Scores
(4) Standard scores can be used to compute
easily centile scores: the proportion of
people with scores less than or equal to a
particular score.
0. 0.1 0.2 0.3 0.4
The area under a normal curve
50%
34%
34%
14%
14%
2%
- 4
2%
- 2
0
SC
2
4
O R
E
Some Useful Properties of Standard
Scores
(5) Z-scores provide a way to “standardize”
different metrics (i.e., metrics that differ in
variation or meaning). Different variables
expressed as z-scores can be interpreted on
the same metric (the z-score metric). (Each
score comes from a distribution with the same
mean [zero] and the same standard deviation
[1].)
Person
Heart
Rate
Complaints
Z-score (Heart
Rate)
Z-score
(Complaints)
Average
A
80
2
(80-100)/20 = -1
(2-2.5)/.5 = -1
-1
B
80
3
(80-100)/20 = -1
(3-2.5)/.5 = 1
0
C
120
2
(120-100)/20 = 1
(2-2.5)/.5 = -1
0
D
120
3
(120-100)/20 = 1
(3-2.5)/.5 = 1
1
Average
100
2.5
0
0
0
20
.5
1
1
1
SD
Correlations in Personality Research
• Many research questions that are addressed
in personality psychology are concerned with
the relationship between two or more
variables.
Some examples
• How does dating/marital satisfaction vary as a
function of personality traits, such as
emotional stability?
• Are people who are relatively sociable as
children also likely to be relatively sociable as
adults?
• What is the relationship between individual
differences in violent video game playing and
aggressive behavior in adolescents?
8
6
4
2
0
• Many of the
relationships we’ll focus
on in this course are of
the linear variety.
• The relationship
between two variables
can be represented as
a line.
10
Graphic presentation
0
2
4
6
8
violent video game playing
10
0
0
2
2
4
4
6
6
8
8
10
10
• Linear relationships can be negative or
positive.
0
2
4
6
8
violent game playing
10
0
2
4
6
8
violent game playing
10
• How do we determine whether there is a
positive or negative relationship between two
variables?
Scatter plots
20
22
One way of determining the
form of the relationship
between two variables is to
create a scatter plot or a
scatter graph.
16
18
The form of the relationship
(i.e., whether it is positive or
negative) can often be seen
by inspecting the graph.
7
8
9
10
11
12
violent game playing
13
y
-2 -1 0 1 2
How to create a scatter plot
A
D
B
Use one variable as the xaxis (the horizontal axis)
and the other as the y-axis
(the vertical axis).
Plot each person in this two
dimensional space as a set
of (x, y) coordinates.
F
E
C
- 2
- 1
0
x
1
2
Person
A
B
C
D
E
F
Zx
1.55
0.15
-0.75
0.48
-1.34
0.08
Zy
1.39
0.28
-1.44
0.64
-0.69
-0.19
How to create a scatter plot in SPSS
How to create a scatter plot in SPSS
• Select the two
variables of
interest.
• Click the “ok”
button.
negative relationship
no relationship
12
16
8
-1
9
18
0
10
1
20
11
2
22
3
positive relationship
7
8
9
10
11
12
13
7
8
9
10
11
12
13
7
8
9
10
11
12
13
Quantifying the relationship
• How can we quantify the linear relationship
between two variables?
• One way to do so is with a commonly used
statistic called the correlation coefficient
(often denoted as r).
Some useful properties of the
correlation coefficient
(1) Correlation coefficients range between –1
and + 1.
Note: In this respect, r is useful in the same way
that z-scores are useful: they both use a
standardized metric.
Some useful properties of the
correlation coefficient
(2) The value of the correlation conveys
information about the form of the relationship
between the two variables.
– When r > 0, the relationship between the two variables is
positive.
– When r < 0, the relationship between the two variables is
negative--an inverse relationship (higher scores on x
correspond to lower scores on y).
– When r = 0, there is no relationship between the two
variables.
r = -.80
r=0
12
16
8
-1
9
18
0
10
1
20
11
2
22
3
r = .80
7
8
9
10
11
12
13
7
8
9
10
11
12
13
7
8
9
10
11
12
13
Some useful properties of the
correlation coefficient
(3) The correlation coefficient can be interpreted
as the slope of the line that maps the
relationship between two standardized
variables.
slope as rise over run
takes you
up .5 on y
y
1
2
3
r = .50
0
rise
-1
run
-2
moving from
0 to 1 on x
-2
-1
0
x
1
2
How do you compute a correlation
coefficient?
z
X
N
zY
r
• First, transform each variable to a
standardized form (i.e., z-scores).
• Multiply each person’s z-scores together.
• Finally, average those products across
people.
Example
Person
Violent game
playing (zscores): Zx
Aggressive
behavior (zscores): Zy
1
N
z
Adair
1
1
1
Antoine
1
1
1
Colby
-1
-1
1
Trotter
-1
-1
1
Average
0
0
1=
1
N
x
zy  r
z
x
zy  r
y
-2 -1 0 1 2
Why products? Important Note on 2 x 2
A
Matching z-scores via
products
D
Person
A
B
C
D
E
F
B
F
E
C
Zx
1.55
0.15
-0.75
0.48
-1.34
0.08
Zy
1.39
0.28
-1.44
0.64
-0.69
-0.19
Z
- 2
- 1
0
x
1
2

-2
-1
0
y
1
2
3
Important Note on 2 x 2
-2
-1
0
x
1
2
Computing Correlations in SPSS
• Go to the
“Analyze” menu.
• Select
“Correlate”
• Select
“Bivariate…”
Computing Correlations in SPSS
• Select the
variables you want
to correlate
• Shoot them over
to the right-most
window
• Click on the “Ok”
button.
Magnitude of correlations
• When is a correlation “big” versus “small?”
• Cohen’s standards:
– .1 small
– .3 medium
– > .5 large
What are typical correlations in
personality psychology?
Typical sample sizes and effect sizes in studies conducted in personality psychology.
Mdn
M
SD
Range
N
120
179
159
15 – 508
r
.21
.24
.17
0 – .96
Note. The absolute value of r was used in the calculations reported here. Data are based on
articles published in the 2004 volumes of JPSP:PPID and JP.
A selection of effect sizes from various
domains of research
Variables
r
Effect of sugar consumption on the behavior and cognitive
process of children
.00
Chemotherapy and surviving breast cancer
.03
Coronary artery bypass surgery for stable heart disease and
survival at 5 years
.08
Combat exposure in Vietnam and subsequent PTSD within 18
years
.11
Self-disclosure and likeability
.14
Post-high school grades and job performance
.16
Psychotherapy and subsequent well-being
.32
Social conformity under the Asch line judgment task
.42
Attachment security of parent and quality of offspring attachment
.47
Gender and height for U.S. Adults
.67
Note. Table adapted from Table 1 of Meyer et al. (2001).
Magnitude of correlations
• “real world” correlations are rarely get larger
than .30.
• Why is this the case?
– Any one variable can be influenced by a hundred other
variables. To the degree to which a variable is multidetermined, the correlation between it and any one variable
must be small.
Qualify
• For the purposes of this class, I want you to
describe the correlation: What is it
numerically? And, qualitatively speaking, is it
“zero or close to zero” (< .1), “small” (.1 to
.29), “medium” (.30 to .49), or “large” (> .50).