PSYC2010 Lecture 4 - University of Queensland

Download Report

Transcript PSYC2010 Lecture 4 - University of Queensland

Power
Winnifred Louis
15 July 2009
Overview of Workshop
 Review
of the concept of power
 Review of antecedents of power
 Review of power analyses and effect size
calculations
 DL and discussion of write-up guide
 Intro to G-Power3
 Examples of GPower3 usage
Power


Comes down to a “limitation” of Null hypothesis testing
approach and concern with decision errors
Recall:


Significant differences are defined with reference to a criterion,
(controlled/acceptable rate) for committing type-1 errors, typically
.05
• the type-1 error  finding a significant difference in the
sample when it actually doesn’t exist in the population
• type-1 error rate denoted 
However relatively little attention has been paid to the
type-2 error
• the type-2 error  finding no significant difference in the
sample when there is a difference in the population
• type-2 error rate denoted 
3
Reality vs Statistical Decisions
Reality:
Statistical Decision:
H0
H1
Reject H0
Retain H0
Hit (correct
decision)
1- α
4
Reality vs Statistical Decisions
Reality:
Statistical Decision:
Reject H0
H0
H1
“False alarm”
α
(aka Type 1 error)
Retain H0
5
Reality vs Statistical Decisions
Reality:
Statistical Decision:
H0
H1
Reject H0
Retain H0
“Miss”
β
(aka Type 2 error)
6
Reality vs Statistical Decisions
Reality:
Statistical Decision:
Reject H0
H0
H1
Hit (correct
decision)
1-β
Power
Retain H0
7
Reality vs Statistical Decisions
Reality:
Statistical Decision:
Reject H0
H0
“False alarm”
α
(aka Type 1 error)
Retain H0
Hit (correct
decision)
1- α
H1
Hit (correct
decision)
1-β
Power
“Miss”
β
(aka Type 2 error)
8
power
 power is:



the probability of correctly rejecting a false
null hypothesis
the probability that the study will yield
significant results if the research
hypothesis is true
the probability of correctly identifying a true
alternative hypothesis
sampling distributions

the distribution of a statistic
that we would expect if we
drew an infinite number of
samples (of a given size) from
the population

sampling distributions have
means and SDs

can have a sampling
distribution for any statistic,
but the most common is the
sampling distribution of the
mean
Recall: Estimating pop means from sample means
Here – Null hyp is true
H0: 1 = 2
so if our test tells us - our sample
of differences between means falls
into the shaded areas, we reject
the null hypothesis. But, 5% of the
time, we will do so incorrectly.
/2 = .025
(type I error)

/2 = .025
(type I error)
Here – Null hyp is false
H0: 1 = 2
/2 = .025
1
H1: 1  2
/2 = .025
2
H0: 1 = 2
H1: 1  2
to the right of this
line we reject the
null hypothesis
POWER : 1 -
/2 = .025
Don’t Reject H0
/2 = .025
Reject H0
H0: 1 = 2
H1: 1  2
Correct decision:
Rejection of H0
1-
POWER
Correct decision:
Acceptance of H0
1-
type 1 error ( )
type 2 error ()
factors that influence power
1.
 level

remember the  level defines the probability of making
a Type I error

the  level is typically .05 but the  level might change
depending on how worried the experimenter is about
type I and type II errors

the bigger the  the more powerful the test (but the
greater the risk of erroneously saying there’s an effect
when there’s not ... type I error)
E.g., use one-tail test

factors that influence power:  level
H0: 1 = 2
 = .025
(type I error)
 = .025
(type I error)
factors that influence power:  level
H0: 1 = 2
H1: 1  2
POWER
 = .025
 = .025
factors that influence power:  level
H0: 1 = 2
 = .025
H1: 1  2
 = .05
 = .025
factors that influence power
2. the size of the effect (d)

the effect size is not something the experimenter
can (usually) control - it represents how big the
effect is in reality (the size of the relationship
between the IV and the DV)
 Independent of N (population level)
 it stands to reason that with big effects you’re
going to have more power than with small,
subtle effects
factors that influence power: d
H0: 1 = 2
 = .025
H1: 1  2
 = .025
factors that influence power: d
H0: 1 = 2
 = .025
H1: 1  2
 = .025
factors that influence power
3. sample size (N)
 the
bigger your sample size, the more
power you have
 large
sample size allows small effects to
emerge

or … big samples can act as a magnifying
glass that detects small effects
factors that influence power
3. sample size (N)

you can see this when you look closely at formulas
Std err


X =
N
z =
X - 
X
the standard error of the mean tells us how much
on average we’d expect a sample mean to differ
from a population mean just by chance. The bigger
the N the smaller the standard error and … smaller
standard errors = bigger z scores
factors that influence power
4. smaller variance of scores in the
population (2)

small standard errors lead to more power. N is one
thing that affects your standard error

the other thing is the variance of the population (2)

basically, the smaller the variance (spread) in
scores the smaller your standard error is going to
be
factors that influence power: N & 2
H0: 1 = 2
 = .025
H1: 1  2
 = .025
factors that influence power: N & 2
H0: 1 = 2
 = .025
H1: 1  2
 = .025
outcomes of interest
 power
N
determination
determination
, effect size, N, and power related
Effect sizes
 Measures


Cohen’s d (t-test)
Cohen’s f (ANOVA)
 Measures




of group differences
of association
Partial eta-squared (p2)
Eta-squared (2)
Omega-squared (2)
R-squared (R2)
Classic 1988 text
In the library
Measures of difference - d

When there are only two groups d is the
standardised difference between the two groups
 to calculate an effect size (d) you need to
calculate the difference you expect to find
between means and divide it by the expected
standard deviation of the population
 conceptually, this tells us how many SD’s apart
we expect the populations (null and alternative)
to be
1
2
1
0
 -
d=

x

x
dˆ 
MSerror
Cohen’s conventions for d
d
% overlap
Small
.20
85
Medium
.50
67
Large
.80
53
Effect size
overlap of distributions
H0: 1 = 2
Large
Small
Medium
H1: 1  2
Measures of association Eta-Squared

Eta squared is the proportion of the total
variance in the DV that is attributed to an effect.
2 

SStreatm ent
SStotal
Partial eta-squared is the proportion of the
leftover variance
in the DV (after all other IVs are

accounted for) that is attributable to the effect
2p 

SStreatment
SStreatment  SSerror
This is what SPSS gives you but dodgy (over
estimates the effect)

Measures of association Omega-squared
 Omega-squared
is an estimate of the
dependent variable population variability
accounted for by the independent variable.
 For a one-way between groups design:
( p 1)(F 1)

ˆ 
( p 1)(F 1) np
2

 p=number
2= SSeffect – (dfeffect)MSerror
SStotal + Mserror
of levels of the treatment
variable, F = value and n= the number of
participants per treatment level
Measures of difference - f

Cohen’s (1988) f for the one-way between
groups analysis of variance can be calculated as
2

ˆ
follows
fˆ 
1 
ˆ2

Or can
use eta sq instead of omega

 It is an averaged standardised difference
between the 3 or more levels of the IV (even
though the above formula doesn’t look like that)
 Small effect - f=0.10; Medium effect - f=0.25;
Large effect - f=0.40
Measures of association - RSquared
 R2
is the proportion of variance explained
by the model
SS
R 
2
 In general R is given by
SS
 Can be converted to effect size f2

 F2 = R2/(1- R2)
 Small effect – f2=0.02;
 Medium effect - f2 =0.15;
 Large effect - f2 =0.35
2
model
total
Summary of effect
conventions
 From

G*Power
http://www.psycho.uniduesseldorf.de/aap/projects/gpower/user_manual/user_manual_02.html#input_val
estimating effect
 prior
literature
 assessment
of how great a difference is
important

e.g., effect on reading ability only worth the
trouble if at least increases half a SD
 special
conventions
side issues…

recall the logic of calculating estimates of effect
size (i.e., criticisms of significance testing)


the tradition of significance testing is based upon an
arbitrary rule leading to a yes/no decision
power illustrates further some of the caveats
with significance testing


with a high N you will have enough power to detect a
very small effect
if you cannot keep error variance low a large effect
may still be non-significant
38
side issues…
 on


the other hand…
sometimes very small effects are important
by employing strategies to increase power
you have a better chance at detecting these
small effects
39
power
Common constraints :
Cell size too small
• B/c sample difficult to recruit or too little time / money
Small effects are often a focus of theoretical interest
(especially in social / clinical / org)

• DV is subject to multiple influences, so each IV has small impact
• “Error” or residual variance is large, because many IVs unmeasured
in experiment / survey are influencing DV
• Interactions are of interest, and interactions draw on smaller cell
sizes (and thus lower power) than tests of main effects [Cell means
for interaction are based on n observations, while main effects are
based on n x # of levels of other factors collapsed across]
40
determining power
 sometimes,
for practical reasons, it’s
useful to try to calculate the power of your
experiment before conducting it
 if the power is very low, then there’s no
point in conducting the experiment.
basically, you want to make sure you have
a reasonable shot at getting an effect (if
one exists!)
 which is why grant reviewers want them
Post hoc power calculations
 Generally
useless / difficult to interpret
from the point of view of stats
 Mandated within some fields
 Examples of post hoc power write-ups
online at http://www.psy.uq.edu.au/~wlouis
G*POWER

G*POWER is a FREE program that can make the
calculations a lot easier
http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/
Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis
program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39, 175-191.
G*Power computes:
 power values for given sample sizes, effect sizes,
and alpha levels (post hoc power analyses),
 sample sizes for given effect sizes, alpha levels, and
power values (a priori power analyses)
 suitable for most fundamental statistical methods
 Note – some tests assume equal variance across
groups and assumes using pop SD (which are likely
to be est from sample)
Ok, lets do it: BS t-test
 two
random samples of n = 25
 expect
difference between means of 5
 two-tailed
– 1 = 5
– 2 = 10
–  = 10
test,  = .05
5 - 10
d=
= .500
10
G*POWER
determining N

So, with that expected effect size and n we get
power = ~.41
 We have a probability of correctly rejecting null
hyp (if false) 41% of the time
 Is this good enough?
 convention dictates that researchers should be
entering into an experiment with no less than
80% chance of getting an effect (presuming it
exists) ~ power at least .80
Determine n
 Calculate
effect size
 Use power of .80 (convention)
WS t-test
 Within
subjects designs more powerful
than between subjects (control for
individual differences)
 WS t-test not very difficult in G*Power,
but becomes trickier in ANOVA
 Need to know correlation between
timepoints (luckily SPSS paired t gives
this)
 Or can use the mean and SD of
“difference” scores (also in SPSS output)
s
Screen clipping taken: 7/8/2008, 4:30 PM
Method 1
Difference scores
Dz = Mean Diff/ SD diff
= .0167/.0718
= .233
s
Screen clipping taken: 7/8/2008, 4:30 PM
WS t-test
I
said before that WS are more powerful
than the equivalent BS version
 Let’s test this by using the same means
and SDs and using the Independent
Samples t-test calculator in GPower
Screen clipping taken: 7/8/2008, 4:30 PM
Between subjects
Power = .18
Screen clipping taken: 7/8/2008, 4:30 PM
Within subjects
Power = .07
Extension to 1-way anova…

In PSYC3010 you used Phi prime as the ANOVA equivalent
of d which is the same as Cohen’s f
 G*Power uses Cohen’s f
 Numerous methods
1) calculate Omega sq and then use the formula for f and enter
directly
2) Calculate Omega sq or eta sq and enter into “Direct” under
“Effect size from variances”
3) Use means and use “Effect size from means”
( p 1)(F 1)

ˆ 
( p 1)(F 1) np
2
2

ˆ
fˆ 
1 
ˆ2
56
Calculating omega & f
ANOVA
PTSD Severity
SS
Between Groups 507.84
Within Groups
2278.74
Total
2786.58
 Given
So
Mean Square
169.28
51.7895
F
3.269
the above analysis

ˆ2 

df
3
44
47
( p 1)(F 1)
(4 1)(3.269 1)

 0.124
( p 1)(F 1) np (4 1)(3.269 1)(12)(4)
fˆ 

ˆ2
0.124

 0.378
2
1 0.124
1 
ˆ
Sig.
0.030
Not sure if this works with
SPSS partial eta sq – have
had problems before &
Omega more conservative
anyway
Alternatively

Alternatively, if have means (note – this is a
different data set)
mean DV score
Coffee
Energy Drink
Water
MSerror = 125.21
63.75
64.69
46.56
 =58.33
n
16
16
16
N=48
use square root of MSE to enter into SD within
each group in GPOwer
60
61
how about 2-way factorial
anova?
Need to test for 3 effects to estimate the power:
 Main effect IV 1
 Main effect IV 2
 Interaction effect (usually less power than main
effects due to smaller n in each cell)
See http://www.psycho.uniduesseldorf.de/aap/projects/gpower/reference/reference_manual_07.html
62
Within subjects ANOVA
 Not
only need to know effect size but
also correlation across time/vars




Use a convention for estimating effect size
(G*Power uses either Lambda or Cohen’s f)
Calculate f using number of levels, effect
convention, correlation (e.g., test-retest)
Calculate Lambda (f * N)
Use Generic F test
Within Example
3
levels over time (m)
 64 Participants (n)
 Look for small effect (f = .01)
 Test-retest corr = .79 (p)
 Calc f = (m*f)/(1-p) = (3*.01)/(1-.79) = .143
 Calc Lambda = f*n = .143*64 = 9.152
 DF 1 = m- 1 = 2
 DF 2 = n*(m-1) = 128
Note. Can’t do
a priori. If need to
estimate upfront
play with denominator
DF (based on N)
Within Example
 Refer
to Karl Wuensch’s website for more
details re: RM
 http://core.ecu.edu/psyc/wuenschk/StatsLe
ssons.htm
 And Gpower manuals online – e.g.:
http://www.psycho.uniduesseldorf.de/abteilungen/aap/gpower3/userguide-type_of_power_analysis
Regression analyses
 Effect

f2 = R2/1-R2
 For

 f2
size associated with R2
semipartial
f2 = sr2/1-R2full
= .02 (small)
 f2 = .15 (medium)
 f2 = .35 (large)
 Convert to variance acct f2/(1+ f2)
R2
3 predictor variables
R2 for full model = .22
f2 = .22/(1-.22) = .282
N = 110
Change R2 (HMR)
2 steps, 2 predictors in step 1, 3 in step 2
R2 for full model = .10
Change R2 for step 2 = .04
f2 = R2change/(1-R2full)
f2 = .04/(1-.1) = .0444
N = 95
DF numerator for Step 2= 3
Complex analyses
 G*POWER
useful for basic analyses
 Complex analyses e.g., SEM, MLM etc
usually look to monte carlo studies
Additional Resources
 http://www.danielsoper.com/statcalc/

Some other statistical calculators including for
power