Lecture 3 - School of Psychology

Download Report

Transcript Lecture 3 - School of Psychology

Lecture 3:
Null Hypothesis Significance Testing
Continued
Laura McAvinue
School of Psychology
Trinity College Dublin
Previous Lectures
• Inferential Statistics
– Sample
Population
• Null Hypothesis Significance Testing
– Proceeds in series of steps
– Allows us to assess the statistical significance of our
results
– To reject or accept the Ho on the basis of the p value
Previous Lectures
• Misleading nature of statistical significance
– Results can be labelled as
• ‘Statistically significant’
• ‘Not statistically significant’
– People interpret results in a cut and dried fashion
• ‘Statistically significant result means there is a true effect in
the population’
• ‘Non-significant result means there is no true effect’
Previous Lectures
• NHST is not so straightforward
• Statistical significance is affected by
–
–
–
–
One or two tailed test
Significance level /  / probability of Type I error
Power / Probability of Type II error
Sample size
• These factors must be considered
– Research evaluation
– Research planning
Research Evaluation
• A result is statistically significant
– Implies a true effect exists in the population
– But is this effect clinically significant?
• How big if the effect?
• Real world relevance?
• Recall that a large enough sample size will make a small effect
statistically significant
Research Evaluation
• A result is not statistically significant
– Implies a true effect does not exist in the population
– Power
• Did the study have enough power to identify an effect as
statistically significant even if a true effect existed?
Research Planning
• Power
– Require enough power to obtain statistically significant
results if a true effect exists
• Sample Size
– Obtain an adequate sample size
Effect Size
• NHST
– Enables us to say whether or not a true effect exists in
the population
• Effect Size
– Provides an estimate of the size of this true effect
– A measure of the degree to which the Ho is false
– A measure of the discrepancy between Ho and H1
Small ES
0 - 1 = small
ES
0
1
Large ES
0- 1 = large
ES
0
1
Effect Size
• There is a different effect size measure for each
statistical test
• The difference between two independent group
means
– Cohen’s d
– 1 - 0
σ
– Standardised difference
– Express the difference between the means in terms of
the standard deviation
Effect Size
• To calculate Cohen’s d for a study in which you compared
two groups
Meantreat – Meancontrol
SDcontrol
• For example, I compared the effects of an exercise regime
and a control regime on physical fitness (rated /20) in two
groups and obtained the following results…
Effect Size
• Mean rating in exercise group was 17 (SD = 10)
• Mean rating in control group was 11 (SD = 10)
• Cohen’s d was
17 – 11
10
= .6
• The exercise group had a mean rating .6 SDs higher than the control
group
• You can use Cohen’s d to compare studies that have used different
measures
Comparing Studies
•
Four studies examined the effect of cognitive behavioural therapy on selfesteem but each study used a different scale to assess self-esteem.
•
•
Calculate the effect size for each of the following studies
Which study found the greatest effect?
Study
Treatment
Group Mean
Control Group
Mean
Mean
Difference
SD
A
17
11
6
10
B
225
215
10
100
C
12
9
3
9
D
31
23
8
20
d
Comparing Studies
•
Four studies examined the effect of cognitive behavioural therapy on selfesteem but each study used a different scale to assess self-esteem.
•
•
Calculate the effect size for each of the following studies
Which study found the greatest effect?
Study
Treatment
Group Mean
Control Group
Mean
Mean
Difference
SD
d
A
17
11
6
10
.6
B
225
215
10
100
.1
C
12
9
3
9
.33
D
31
23
8
20
.4
What is a big Effect Size?
• Cohen’s (1992) rules of thumb
• For independent t-tests comparing two means…
Cohen’s d
Small
Medium
Large
.2
.5
.8
Cohen, J. (1992). A power primer. Psychological Bulletin, 112 (1), 155-159.
Research Evaluation
• A statistically significant result
– Is it clinically significant?
– Real world relevance?
– Effect Size
• A non-significant result
– No true effect?
– Lack of power?
Calculating Power
• Recall that power is determined by a number of factors
• To calculate the power of an experiment you need to know
–
–
–
–
One or two-tailed test
Significance level 
Sample size
Effect size
• You calculate the power of an experiment to identify a
certain effect size as statistically significant, using a
one/two-tailed test with a certain  level and a certain
sample size
Example: The effects of therapy on
depression
Analysis 1
Analysis 2
Size of sample
20
200
Therapy mean score
5.5
5.5
Therapy standard
deviation
3.03
2.89
Control mean score
6.3
6.3
Control standard
deviation
2.75
2.62
Mean difference
-.8
-.8
T statistic
-.618
-2.051
Df
18
198
P-value
.54
.042
Study 1
Study 2
Independent samples
T-test
Independent samples
T-test
One or two-tailed
Two-tailed
Two-tailed
Significance Level
.05
.05
Size of each group
10
100
5.5 – 6.3
2.75
.29
.3
5.5 – 6.3
2.62
.305
.3
.1
.56
10% chance of finding
an ES of .3 as
statistically significant
at p < .05 using twotailed test
56% chance of finding
an ES of .3 as
statistically significant
at p < .05 using twotailed test
Test
Effect Size
Power
The difference in power for these two studies was due to sample size
Power
• Computer programmes can calculate power
– http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/
– Free download of gpower3 package
• Research planning
– Rather than computing power post hoc, best to plan to have
adequate power to obtain statistically significant results if Ho is
false and a true effect exists
– Convention
•
•
•
•
Aim for power of .8
80% chance of obtaining significant results if Ho is false
.2 probability of Type II error
1 : 4 ratio of Type I (.05) to Type II (.2) errors
Power & Sample Size
• Main avenue for increasing power
– Increase sample size
• Common question
– How big a sample do I need?
• Answer depends
–
–
–
–
–
The power you want to have
Significance level you set
Effect size you expect to obtain
Statistical test you are running
One or two tailed prediction
Power & Sample Size
• The Real Question
– “What sample size do I need to have power of ____ to detect an
ES of ____ as being statistically significant at ____ level, when
doing a ____ statistical test and making a ____-tailed prediction?”
• Most of the gaps are easy to complete
–
–
–
–
–
Power

Test
Prediction
ES
=
=
=
=
=
.8
.05
depends on experimental design
depends on theory
?
• Need to estimate effect size
Estimate Effect Size
• Pilot Study
• Do analysis on small group to give idea of results
• Previous Research
• Calculate ES in previously published studies
• Theory
• Based on theory or understanding of research area, estimate
the ES or the smallest ES that would be of interest
• Cohen’s Standards
• Would you like to detect a small, medium or large effect?
• Difference between two groups
• Small (.2), Medium (.5), Large (.8)
Power & Sample Size
• Once you have decided on the following
– Statistical test, prediction, Power,  and ES
• You can calculate necessary sample size in two ways
– Computer package, such as gpower3
– Cohen’s tables
• Let’s try an example
– Turn to the handout showing Cohen’s table of required sample
size
• (note that this table refers to two-tailed predictions)
Calculating Required Sample Size
• I would like to investigate the difference between
clinically anxious and normal people in relation to
performance on an attention task
• “How many people do I need in each group to have
power of .8 to detect a large ES as being statistically
significant at .05 level, when doing an independent
samples t-test and making a two-tailed prediction?”
Cohen’s Table
N for Small, Medium, and large ES at power = 0.80 for  =
.01, .05 and .10
Sm
0.01
Med Lg
mean diff 586 95
38
Sm

0.05
Med Lg
393 64
26
Sm
0.10
Med Lg
310 50
20
• We need 26 people in each group to have a power of
0.80 to detect a large ES as statistically significant
at the 0.05 level
Some more
practice!
Sm
0.01
Med Lg
mean diff 586 95
38
Sm

0.05
Med Lg
393 64
26
Sm
0.10
Med Lg
310 50
20
– For a two group independent t-test, how many people do I need in each
group to detect…
•
•
•
•
•
Large ES as statistically significant at .10 level
Large ES as statistically significant at .05 level
Large ES as statistically significant at .01 level
Medium ES as statistically significant at .01 level
Small ES as statistically significant at .01 level
_________
_________
_________
_________
_________
– The smaller the alpha level, the _______________ the sample size
required to detect a given difference as being statistically significant
– The smaller the ES, the _______________ the sample size required to
detect a given difference as being statistically significant
Summary
• Factors affecting Statistical Significance
• Research Evaluation
• Effect size
• Power Calculations
• Research Planning
• Sample Size Calculations