Transcript PowerPoint

Multivariate Descriptive Research
• In the previous lecture, we discussed ways to
quantify the relationship between two
variables when those variables are
continuous.
• What do we do when one or more of the
variables is categorical?
Categorical Variables
• Fortunately, this situation is much easier to
deal with because we can use the same
techniques that we’ve discussed already.
• Let’s consider a situation in which we are
interested in how one continuous variable
varies as a function of a categorical variable.
• Example: How does mood vary as a function
of sex (male vs. female)?
• In this case, we want to know how the
average woman’s score compares to that of
the average man’s score.
• level of a categorical variable
Participants
Mood score
Males
A
4
B
3
C
4
D
3
M = 3.5, SD = .5
Females
A
5
B
4
C
5
D
4
M = 4.5 , SD = .5
First, find the average score for
each level of the categorical
variable separately. (Also find
the SD.)
Second, find the difference
between the means of each
group. This is called a mean
difference. (4.5 – 3.5 = 1.0)
Third, express this mean
difference relative to the SD.
This is called a standardized
mean difference.
1/.5 = 2
In this example, women score 2
SD higher than the men.
Participants
Mood score
Males
A
4
B
3
C
4
D
3
M = 3.5, SD = .5
Females
A
5
B
4
C
5
D
3
M = 4.25 , SD = .83
Note: If the SD’s for the two
groups are different, you can
simply average the two SD’s.
Here, the two SD’s are .5 and
.83. Averaged, these are (.5 +
.83)/2 = .66.
The standardized mean
difference is (4.25 – 3.5)/.66 =
.75/.66 = 1.13
Thus, on average, women
score 1.13 SD’s higher than
men on this mood variable.
Cohen’s d
• If we divide the mean difference by the
average SD of the two groups, we obtain a
standardized mean difference or Cohen’s
d.
d
MA  MB
SD
2
A

 SD / 2
2
B
Pooled standard
deviation
Mood
Bargraph
5
4.5
4
3.5
3
2.5
2
1.5
1
Men
Women
Sex
Bargraph: More than two categorical
variables
7
Mood
6
5
4
3
2
1
Men
Women
Sex
Non-bereaved
Bereaved
Both variables are categorical
• When two variables are categorical, it is
sometimes most useful to express the data as
percentages.
• Example: Let’s assume that depression is a
categorical variable, such that some people
are depressed and others are not.
• What is the relationship between biological
sex and depression?
Depression status
Sex
Not Depressed
Depressed
row total
Male
600
60
660
Female
40
300
340
column total
640
360
1000
Depression status
Sex
Not Depressed
Depressed
row total
Male
.60
.06
.66
Female
.04
.30
.34
column total
.64
.36
1.00
In this table, we’ve expressed each cell as a proportion
of the total.
Depression status
Sex
Not Depressed
Depressed
row total
Male
.60
.06
.66
Female
.04
.30
.34
column total
.64
.36
1.00
.60/.64 = .94
.06/.36 = .16
Here, we’ve expressed the association with respect to
sex. For example, we can see here that 16% of
people who are depressed are male. Moreover, 94%
of people who are not depressed are male.
Depression status
Sex
Not
Depressed
Depressed
row total
Male
.60
.06
.66
.06/.66 = .09
Female
.04
.30
.34
.30/.34 = .88
column total
.64
.36
1.00
Here, we’ve expressed the association with respect to
depression status. For example, we can see here that
9% of men are depressed and 88% of women are
depressed.
Phi
• It is possible to quantify the association
among these variables using a correlation
coefficient when the two variables are binary.
• This statistic is sometimes referred to as phi.
• (Phi is + .78 in this example)
Variable 1
Variable 2
0
1
row total
0
a
b
n3
1
c
d
n4
Col total
n1
n2
Phi = (a*d) – (b*c) / sqrt(n1*n2*n3*n4)
Online calculator at:
http://www.quantitativeskills.com/sisa/statistics/twoby2.htm