Transcript ppt
Single Factor Experimental
Designs I
Lawrence R. Gordon
Psychology Research Methods I
Simplest Experimental Designs:
One “IV” with two levels
Independent groups
– Between-subjects, random assignment
Matched groups
– Between-subjects, block random assignment
Nonequivalent groups
– Between-subject, selected person variables
Repeated measures
– Within-subjects, assignable levels or “time”
Stroop Experiment
120
BLUE
RED
GREEN
BLACK
Reading Times (sec)
100
80
60
40
20
0
NC
NC
NCWd
NCWd
Reading Instructions
Question:
BOWER “DROODLES” EXPERIMENT
10
Mean # Droodles
8.79
Mean # Recalled
8
8.78
8.77
8.76
6
8.75
8.74
4
No
Words
Words
8.73
8.72
2
8.71
number incorrect
# Droodles
0
number correct
Without Words
With Words
Condiion
recalled with no cue =
7.03 C (.91 I)
Mean # Droodles
recalled with cue =
8.26 C (.33 I)
ARE THESE MEANS
“DIFFERENT”?
YES - why? Later...
Question:
CLASS “HAVING FUN” EXAMPLE
MORE Fun mean time
14
12
10
More
Fun
Less Fun
8
6
4
2
0
Time Est
estimated = 8.6
LESS Fun mean time
estimated = 12.5
ARE THESE MEANS
“DIFFERENT”?
YES - Why? Later...
ANSWERS
How do we make these “judgments”?
INFERENTIAL STATISTICS
– Null hypothesis significance test
– An instance of a very general decision scheme
NULL HYPOTHESIS
SIGNIFICANCE TESTING
ABSTRACT
– Null hypothesis “Chance” (H0)
– Alternative hypothesis “Effect” (H1)
ABSTRACT
RESEARCH
QUESTION
VERBAL NULL SYMBOLIC
HYPOTHESIS
NULL
HYPOTHESIS
VERBAL
SYMBOLIC
ALTERNATIVE ALTERNATIVE
HYPOTHESIS
HYPOTHESIS
Do teachers score
higher on the GRE
VERBAL than the
national average?
The teacher
population GRE-V
mean is equal to
the national
average of 476
H0: GRE-V =
476
The teacher
population GRE-V
mean is different
from the national
average of 476
H1: GRE-V
476
Do males or
females tend to
score better on the
GRE-Verbal?
The male and
female GRE-V
population means
are not different
H0: M = F
The male and
female GRE-V
population means
are different
H1: M F
Is there a
relationship
between GPA (X)
and starting salary
(Y) for college
grads?
The population
correlation between
GPA and starting
salary is equal to
zero
H0: XY = 0
The population
correlation between
GPA and starting
salary is not equal
to zero
H1: XY 0
Adapted from Johnson and Christensen (2000). Educational Research. Allyn & Bacon
NULL HYPOTHESIS
SIGNIFICANCE TESTING
ABSTRACT
– Null hypothesis “Chance” (H0)
– Alternative hypothesis “Effect” (H1)
ASSESS
– ASSUMING null is true, what is the “chance”
(probability) of obtaining the data we did?
ASSESS
Key question: “IF I assume that the null
hypothesis is true, is my sample statistic so
unlikely that it makes more sense to reject
the null hypothesis (and thereby accept the
alternative)?
Key concept: the “Sig” or “p =” is the
answer to the question “how likely is my
result IF the null hypothesis is true?”
NULL HYPOTHESIS
SIGNIFICANCE TESTING
ABSTRACT
– Null hypothesis “Chance” (H0)
– Alternative hypothesis “Effect” (H1)
ASSESS
– ASSUMING H0 is true, what is the probability
or “chance” of obtaining the data we did
DECIDE
– IF the chance is “small enough,” reject H0 and
INFER the “Effect” is real (what can go wrong?)
DECIDE
One way to assess:
Compare p (or Sig.) from SPSS to a preselected level of
“small enough” (“significance level”) and reject the null if
it is equal to or less than that
– REJECT NULL if p (usually =.05!)
– Example, reject null if p=.037 =.05 (it is!)
– NOTE: you select α (usually .05);
p is
computed from your data!
BUT WHAT CAN GO WRONG? Errors…!
THE DECISION SCHEME
Examples
A major problem is that NO decision is
ever “guaranteed” to be right
Some examples:
– Fire alarm
– Jury trial
– NHST
FIRE ALARM
The Decision Scheme
DECIDE:
RETAIN NULL
NULL IS TRUE H0
ALT IS TRUE H1
NO FIRE
FIRE
CORRECT!
ERROR TYPE II
NO ALARM PULLED No one bothered
"Missed" fire
DECIDE:
REJECT NULL
ERROR TYPE I
CORRECT!
"False alarm"
LIVES SAVED!
PULL ALARM
TRIAL BY JURY
The Decision Scheme
DECIDE:
RETAIN NULL
ACQUIT
DECIDE:
REJECT NULL
CONVICT
NULL IS TRUE H0
Defendant is "really"
innocent (assumed!)
ALT IS TRUE H1
Defendant is "really"
guilty
CORRECT!
ERROR TYPE II
Innocent person is
freed
Guilty person gets
off
ERROR TYPE I
CORRECT!
Innocent person is
convicted
Guilty person does
the time
Null Hypothesis Significance Testing
The Decision Scheme
NULL IS TRUE H0
ALT IS TRUE H1
"Really" just CHANCE "Really" an EFFECT
DECIDE:
RETAIN NULL
"No effect found"
DECIDE:
REJECT NULL
"Effect found"
CORRECT!
ERROR TYPE II
1- = .95
ERROR TYPE I
CORRECT!
=.05
1- "Power"
"Level of Significance"
The Decision Scheme: Comments
If the reality is “chance”, we are correct by
NOT inferring an effect, or wrong if we do.
• TYPE I ERROR: Reject null when null is true
• Probability of a Type I error is (alpha) -- the
“level of significance”
If the reality is “effect”, we are correct BY
inferring an effect, or wrong if we do not.
• TYPE II ERROR: Retain null when null is false
• Probability of a Type II error is (beta)
– More common to use Power = 1 -
Question:
BOWER “DROODLES” EXPERIMENT
10
Mean # Droodles
8.79
Mean # Recalled
8
8.78
8.77
8.76
6
8.75
8.74
4
No
Words
Words
8.73
8.72
2
8.71
number incorrect
# Droodles
0
number correct
Without Words
With Words
Condiion
recalled with no cue =
7.03 C (.91 I)
Mean # Droodles
recalled with cue =
8.26 C (.33 I)
ARE THESE MEANS
“DIFFERENT”?
YES - why?
ANSWERS REVISITED
Bower Experiment F’02
INFERENTIAL STATISTICS
Group Statistics
S
t
Std. Error
E
condiion
N
Mean
Std. Deviation
Mean
e
N
e
e
C
a
Number Correct Without
Words
97
7.0309
2.35608
.23922
N
N
5
9
1
8
With Words
97
8.2577
2.40346
.24403
R
W
9
4
3
2
Number
Without Words
97
.9072
1.07124
.10877
N
N
5
4
5
1
Incorrect
With Words
I
n
W
97
.3299
.80002
.08123
9
3
4
2
BOWER EXPERIMENT: Compare Groups on Each of Two DVs
t-tes t for Equality of Means
Number of Droodles Correct
Number of Droodles Incorrect
t
-3.590
4.253
df
192
192
Sig. (2-tailed)
.000
.000
Question:
CLASS “HAVING FUN” EXAMPLE
MORE Fun mean time
14
12
10
More
Fun
Less Fun
8
6
4
2
0
Time Est
estimated = 8.6
LESS Fun mean time
estimated = 12.5
ARE THESE MEANS
“DIFFERENT”?
YES - Why?
ANSWERS REVISITED
“Having Fun” Example
Inferential Statistics
S
t
N
e
e
E
E
'
M
0
4
2
m
'
L
0
4
3
u
S
a
e
a
e
2
d
t
r
f
e
E
E
3
8
0
0
m
a
A New Example – from scratch:
Doob and Gross (1968)...
Status of frustrator as an inhibitor of horn-
honking responses. J Social Psychology, 76,
213-218.
IV: Low vs. high status car
DV: Latency of following car to honk when light
turns green and car doesn’t move (seconds)
Results:
– Low-status -- N=15, X Lo= 7.12 (2.77) sec
– High-status -- N=20, X Hi = 9.23 (2.82) sec
SPSS output and interpretation
SPSS COMPUTATION
ss
(
19
enc
y
to
k
H
or
s
s
e
G
c r
)
ou
1
S
8
Low
s
2
2
Low
s
3
8
Low
s
E
4
5
Low
s
N
e
e
e
S
5
9
Lo
w
s
6
6
L
o
w
s
L
L
5
0
7
9
7
4
Low
s
H
H
0
5
3
6
8
4
Lo
w
s
9
8
Low
s
1
0
1
Low
s
1
1
1
L
ow
s
1
2
3
L
ow
s
1
3
6
L
ow
s
1
4
1
L
ow
s
1
5
4
L
ow
s
1
6
0
H
i
g
h
1
7
3
H
i
g
h
1
8
2
H
i
g
h
1
9
9
H
i
g
h
2
0
0
H
i
g
h
2
1
9
H
i
g
h
2
2
7
H
i
g
h
2
3
1
H
i
g
h
2
4
0
H
i
g
h
2
5
5
H
i
g
h
2
6
4
H
i
g
h
2
7
1
H
i
g
h
2
8
1
H
i
g
h
2
9
9
H
i
g
h
3
0
4
H
i
g
h
3
1
4
H
i
g
h
3
2
8
H
i
g
h
3
3
6
H
i
g
h
3
4
5
H
i
g
h
3
5
3
H
i
g
h
S
H
o
n
q
qu
u
n
a
c
n
s
s
u
um
L
F
4
E
S
7
t
t
1
7
2
M
d
83
6
S
3
4
5
M
0
5
5
S
4
4
1
9
L
7
4
4
o
U
3
6
6
NHST: Two Independent Means
ABSTRACT
– H0 “chance”: 1 = 2 OR 1 - 2 = 0
– H1 “effect”: 1 2 OR 1 - 2 0
ASSESS
– ASSUMING H0 is true, what is the probability or “chance” of
the empirical sample outcome?
– Compute independent-sample t; note the “p = ”.
DECIDE
– IF the chance is “small enough,” reject H0; otherwise do not.
– If p(t|Null) (=.05), reject null and interpret
alternative.
Repeated-measures
Definitional Example
1
“Family therapy for
9
e
f f
t
ft
e
o
e
e
e
r
r
f
a
a
o
1
8
2
4
2
3
3
0
3
0
5
5
4
5
9
4
5
7
3
6
6
6
7
9
7
9
8
1
8
2
6
4
9
4
9
5
1
5
2
3
1
6
8
8
1
1
5
4
1
6
7
1
1
5
5
0
1
9
8
9
1
0
7
7
1
3
0
7
anorexia” (1994)
Before and after
family therapy weights
– Using SPSS
• t-test on paired samples
– Was there a
change?
m
a
e
2
d
t
r
A
5
6
1
5
Repeated-measures
Definitional Example
“Family therapy for anorexia” (1994)
SPSS -- standard analysis for paired-samples:
l
e
N
e
a
P
B
9
7
7
1
A
4
7
5
e
s
e
N
i
l
g
P
B
1
&
m
a
i
t
t
P
M
5
S
7
t
5
d
6
S
1
Can a Machine Tickle?
Reference: Harris & Christenfeld (1999)
Psychonomic Bull. & Rpts.
Problem: Why can’t you tickle yourself? Reflex
vs Interpersonal explanations. Review of long
history and more recent research.
Method: 21 & 14 - UCSD undergrads, age
18-28. Tickled twice on foot by “machine” vs
“person” (both really a person). Measured by
both self-report and behavior rating of video.
Open answers to “was sensation produced by the
E different than [that] produced by the machine?”
Can a machine, cont…
Results: interrater reliability, descriptives in
table, inferential tests in text of article:
“There was no hint of any difference between tickle responses
produced by the experimenter and by the machine for behavior
[t(32)=0.27, n.s.] or self-report [t(32)=0.12, n.s.].”
Discussion:
– “generally favorable to the view that the tickle
response is some form of innate stereotyped motor
behavior, perhaps akin to a reflex”
– suggests “that ticklish laughter itself does not require
any belief that another human being is producing the
stimulation.”
NHST: Two Related Means
ABSTRACT
– H0 “chance”: 1 - 2 = 0 OR 1 = 2
– H1 “effect”: 1 - 2 0 OR 1 2
ASSESS
– ASSUMING H0 is true, what is the probability or “chance” of
the empirical sample outcome?
– Compute related-sample t; note the “p = ”.
DECIDE
– IF the chance is “small enough,” reject H0; otherwise do not.
– If p(t|Null) (=.05), reject null and interpret
alternative.
REVIEW & SUMMARY
Two-level single factor (IV) designs:
– Independent groups *
– Nonequivalent groups *
– Matched groups **
– Repeated measures **
* use t-test for independent means
** use t-test for paired means