t Test for a Single Group Mean (Part 4), Power

Download Report

Transcript t Test for a Single Group Mean (Part 4), Power

Psych 5500/6500
The t Test for a Single Group Mean (Part 4):
Power
Fall, 2008
1
Power
Power is the probability that you will be able to reject H0
when H0 is actually false. Another way of saying that
is that it is the probability you will be able to conclude
there was an effect in your study (that something caused
the population mean to differ from what was predicted
by H0) when there really was such an effect.
Beta is the probability that you won’t be able to reject H0
when H0 is actually false.
Power + beta = 1.00 (i.e. if H0 is false you will either
make the correct decision and reject it or you won’t)
2
Power (metaphorically)
Power can be thought of metaphorically as how good
of a search you do for an effect. If you don’t
know how carefully someone searches for
something it is hard to interpret what it means if
they failed to find it. That is why failure to reject
H0 is worded as ambiguously as it is, if someone
failed to find an effect (i.e. failed to reject H0) is
that because there was no effect (i.e. H0 is true) or
is that because they didn’t look very hard (i.e. they
had a low power experiment).
3
Metaphor Continued
Say you have a kitchen drawer full of miscellaneous
stuff and you wonder if the bottle opener is in the
drawer. A low power search would be to open the
drawer and take a quick glance. If you find the
bottle opener that’s great, but if you don’t, does
that mean it isn’t in the drawer? Say, however,
that you spend five minutes rummaging around in
the drawer and can’t find the bottle opener, that is
a more powerful search and failure to find the
bottle opener might be more reasonably
interpreted as evidence that it wasn’t in the drawer.
4
Influences on Power
We will take a close four factors that affect the
power of an experiment (and briefly mention
two others).
1.
2.
3.
4.
Effect size.
Variability (i.e. variance/standard deviation)
Sample size.
One-tail test versus a two-tail test.
5
Increasing Power
There are things you can do to increase the power
of an experiment. The neat thing about these
ways of increasing power is that they increase
the chances of rejecting H0 only when H0 is
false (i.e. the help us to find an effect only
when an effect exists). If H0 is true, then these
manipulations have no effect on the outcome
of the experiment, thus you are not biasing the
results of the analysis.
6
Example
We have a math test whose mean score in the past
has been 50. We have a new way of teaching math
and we think it might lead to different test scores
than the old way, to test this theory we sample 10
students, teach them the new way, and then give
them the math test:
H0: μ=50 (i.e. mean is the same as it used to be)
Ha: μ50 (i.e. the mean is no longer the same)
7
The Sampling Distribution
Assuming H0 is True
μ Y  μ Y  50
In the sample, est.  Y  4.74
N  10
4.74
so est.  Y 
 1.50
10
df  N  1  9
tc  2.262
8
The Sampling Distribution
Assuming H0 is True
Upon this we will base our decision regarding whether or not
to reject H0.
9
A Specific Alternative
Hypothesis
To look at the power of an experiment we need a
specific value for what μ would equal if Ha is true.
Let’s examine the power of this test if in reality
the effect of the new teaching method is to
increase scores on the math test by an average of
five.
H0: μ = 50
Ha: μ = 55
10
The Sampling Distribution if Ha is True
If Ha is correct and μ = 55 then the curve below
represents the possible outcomes of the experiment.
Note that the standard error, which is based upon the
data and not upon a theory, hasn’t changed.
11
Power
The decision is (always) based upon assuming H0 is true,
but if Ha is true then its curve represents reality, in that
case the proportion of the Ha curve that is shaded is the
probability that H0 will be rejected.
Note: a tiny bit of the Ha curve is in the other rejection region as12well.
Power and Effect Size
The power of an experiment is affected by the size of
the effect being tested, the greater the effect size
the more power in the experiment. In our example
the new teaching procedure raised math scores on
the average by ‘5’, the following graphs show the
increase in power that would result if the new
procedure had a greater effect and raised math
scores on the average by ‘6’.
13
For More Power: Increase Effect Size
The effect on power of moving from Ha: μ=55 to Ha: μ=56
It is easier to reject H0 when the effect size is
greater because it is more likely that the
sample mean will be further from what H0
proposes and thus more likely to fall within
the ‘reject H0’ region. In terms of our
‘search’ metaphor, it is easier to ‘find’ a big
effect than a small one.
15
Power and Variance
σY
est. σ Y 
N
The standard error of the mean is affected by the
standard deviation of the population from which we
are sampling, decreasing the variability of the
population will decrease the standard error of the
mean, which will give the test more power. In the
following curve the standard error has been reduced
from 1.5 to 0.95, this would occur if σY = 3 rather
than 4.74
16
For More Power Decrease Variance
Y 
Y 
Y
N

4.74
 1.50
10
Y
3.00

 0.95
N
10
Effect on power of moving from σY=4.74 to σY=3.00
17
When the variance/standard deviation of the
population decreases the stability of the
sample mean increases, meaning that it is
less likely that you will get a weird sample
mean just due to chance, which in turn
makes it easier to draw inferences about the
population mean based upon the sample
mean.
18
How to Decrease Variance
1. Sample from a more homogeneous population,
but when you do that limits to whom you can
generalize the results.
2. Use a different experimental design that has a
decrease of variance built into it. We will see
our first example of that when we get to the ‘t
test for dependent groups’ (and it will also be a
major part of what we cover next semester).
19
Power and N
σY
est. σ Y 
N
Increasing N will also decrease the standard error of
the mean. In addition, increasing N also will
decrease the tc values and thus increase power.
In the following curve I increased N from 10 to 22.
I chose this value of N so that I could reuse the
previous slide (increasing N from 10 to 22 has
the same effect as decreasing the standard
deviation from 4.7 to 3).
20
For More Power Increase N
Y 
Y 
Y
N
Y
N
Note: if N=22 then tc would be 2.074

4.74
 1.50
10

4.74
 0.95
22
21
As with decreasing variance, increasing N
will make the sample mean more stable and
thus make it easier to make inferences about
the true value of the population mean.
Important caveat here: it is very possible to
increase N to the point where even the most
trivial effect will be statistically significant.
22
Power and One-Tail Tests
If you perform a one-tail test and if your
prediction is correct concerning in which
direction the effect will fall then you will
have a more powerful test. Of course you
must have a priori justification for making
the test one-tailed.
23
For More Power Use a 1-Tail Test
24
Other Influences on Power
5. Significance level (and thus alpha). Although I
didn’t create another graphic for it, it is obvious
if you look back at the graphic for the 1-tail vs.
2-tail test that if I had used a smaller
significance level (e.g. of .01 rather than of .05)
then less of the Ha curve would have fallen in
the reject H0 region, and thus power would have
been less. Making it harder to reject H0 hurts
power.
6. Violations of the assumptions underlying a test
can also influence power, we will cover that in
the lecture on assumptions.
25
Estimating Power
With the advent of better software (we will be using
GPower 3) it has become easier to estimate the
power of an experiment and this has taken on an
increasingly important role in experimental
psychology. We will look at two of the contexts
in which power estimates have become
important:
1. A priori estimates of power
2. Post hoc estimates of power.
26
A Priori Estimates of Power
A priori estimates of power are used to help design a
study that has a specific, desired, level of power.
This is useful for two reasons:
1. There is little reason to run an experiment that
has low power (what is the use of looking for an
effect if there is little chance you will find it
even if it is there?).
2. You will probably have to include an a priori
estimate of power to obtain a grant to fund your
research (what is the point of giving money to a
project that has little chance of finding an
effect?).
27
A Priori: GPower 3
With the GPower 3 ‘a priori’ function you enter:
1.
2.
3.
4.
The level of power you want .
The anticipated value for Cohen’s d.
Alpha.
Whether it is a 1-tail or a 2-tail test.
GPower then computes what N would give you that level
of power. It will also provide a graph showing the
relationship between N and power given the values
entered above, which can be useful in seeing how
much power would be gained or lost if you were to
increase or decrease N.
28
From GPower. Large effect size of d= 0.7: note an N of around 18
would give you power=.80 (an often recommended level) and that you
29
gain little from making N greater than 40.
Much smaller effect size (d=.2), note N must be around 200 to
obtain a power of 0.80
30
Anticipated Value for d
How do you arrive at the anticipated value for Cohen’s
d in your upcoming study?
1. Look at prior, similar studies (your own pilot study,
similar experiments run by others, etc.). If those
studies report values for d then use them to
anticipate the value of d in your study. If those
studies don’t report d but do report the raw effect
size and the standard deviation of the variable then
you can plug those into GPower 3 and it will
compute d for you.
31
Anticipated Value for d
How do you arrive at the anticipated value for Cohen’s d
in your upcoming study?
2. If prior, similar studies are not available then make
your best guest, based upon your knowledge of the
literature, about whether your study will find a ‘small’,
‘medium’, or ‘large’ effect, then use Cohen’s
guidelines for what value of d you would expect to
obtain: d=.2 would be a small effect size, d=.5 would
be a medium effect size, and d=.8 would be a large
effect size. This, in my view, is the most useful thing
you can do with Cohen’s guidelines.
32
Post Hoc: GPower 3
The GPower 3 post hoc analysis is designed to help
you compute the power of an experiment after it
has been completed. You enter values for: the
effect size (d) that was found in your experiment
(or you can enter the raw effect size and standard
deviation and GPower 3 will compute d for you);
the N of your experiment; the alpha you used; and
whether it was a 1-tail or a 2-tail test, and GPower
will compute the power of your experiment.
33
Using Post Hoc Power Analysis
Knowing post hoc the power of your experiment can be
useful in interpreting a decision to not reject H0. If you
had a powerful search and failed to find an effect (i.e.
failed to reject H0) that probably means that the effect
wasn’t there (i.e. it probably means H0 is true).
If you don’t reject H0 you will still want to decide for
yourself whether it’s probably the case that H0 is true or
if it’s probably the case that H0 is false but you simply
failed to reject H0 in this experiment. This will
influence where you go next (replicate the same
experiment but with more power or change directions
because you think H0 is actually true).
34
When I first introduced null hypothesis testing I said
that you can either:
– ‘reject H0’, or ‘not reject H0’,
But that you can’t
– ‘accept H0’ (conclude that H0 is true)
A type 1 error is associated with rejecting H0, and
the probability of rejecting H0 when H0 is actually
true is equal to alpha=.05. We, as scientists, have
decided we can live with that.
A type 2 error is associated with not rejecting H0,
and the probability of not rejecting H0 when H0 is
actually false is equal to beta. If we don’t know
what beta is, or if we do and it is greater than .05,
then there is too much uncertainty to make a
strong statement such as ‘I conclude H0 is true’. 35
While I’ve never heard this said, it would seem to me
that if power.95, which would make beta.05,
then it would be reasonable to say the two possible
results of the experiment would be:
1. reject H0
2. accept H0
In other words, if power .95 and we don’t reject H0
then it is possible to state (within acceptable
probability of error) that we have shown that H0 is
true.
36