Statistical Power (Powerpoint)

Download Report

Transcript Statistical Power (Powerpoint)

Sample Size
Review: Base Rate Neglect
On HW 4, I asked you to find a fallacy on the
internet.
More than one of you found studies where
scientists took data from thousands of people
and found correlations.
Misunderstanding
You said, “this is the base rate neglect fallacy.
There are billions of people in the world, and
these studies only looked at thousands of
people. They are neglecting the base rate.”
But that is not the base rate neglect fallacy. And
this is important to know: you shouldn’t ignore
good science just because you’re confused
about what the base rate fallacy is.
Base Rate Neglect
First of all, the base rate neglect fallacy has
nothing at all to do with the number of people
there are in the world. Nothing.
It has to do with the probability of a variable
taking on a certain value, for instance, the
probability that someone’s height = 1.5m, the
probability that terrorist = true (someone is a
terrorist)…
Base Rates
This is the “base rate” of people who are 1.5m
tall, and the “base rate” of terrorists.
If 1 in 100 people are terrorists, then the rate of
terrorists is 1 in 100 and the probability that a
randomly selected person is a terrorist is 1 in
100.
Base Rates
We call this the base rate, because it is the
probability that someone is a terrorist when we
don’t know anything else about them.
It might be that the base rate of terrorists is 1 in
100, but the rate of terrorists among people
who are holding rocket launchers is 1 in 2, and
the rate of terrorists among retirees is 1 in 500.
Tests
The base rate neglect fallacy happens when we
have a test that is meant to detect the value of a
variable.
For example we might have a test that tells us
whether someone has AIDS or not, or whether
someone is driving over the speed limit, or
whether they are drunk.
Reliability of Tests
Here is the important, and crucial fact. Please
learn this:
As the base rate of X = x decreases, the # of
false positives on tests for X = x increases.
Tests are less reliable when the condition we are
testing for becomes rare (low base rate).
Base Rate Neglect Fallacy
The base rate neglect fallacy happens when:
1.
2.
3.
4.
There is a low base rate of some condition.
We have a test for that condition.
Someone tests positive.
We assume that means they have the
condition, ignoring the unreliability of tests
for conditions with low base rates.
Prosecutor’s Fallacy
The base rate neglect
fallacy is often called the
prosecutor’s fallacy, as I
shall explain.
Murder!
Let’s suppose that there
has been a murder.
There is almost no
evidence to go on except
that the police find one
hair at the crime scene.
You are the Suspect
If someone is the killer, there is a 100% chance
that their DNA will match the hair’s DNA.
The police have a database that contains the
DNA of everyone in Hong Kong.
They run the DNA in the hair through their
database and discover that you are a match!
Comprehension Question
If you have been following along you should be
able to answer this question:
What is the probability that you are the
murderer, given that you are a DNA match for
the hair?
Answer
If you said 100%, then you have just committed
the base rate neglect fallacy.
The correct answer is “Much lower, because the
base rate of people who committed this murder
out of the Hong Kong population as a whole is 1
in 7 million.”
Perfect Conditions for Fallacy
Here’s what we have:
1. A low base rate (only 1 person who
committed this murder in the world).
2. A test for whether someone is the murderer.
3. You, who’ve tested positive on this test.
4. And the police who think you did it!
Let’s Look at the Numbers
We know that if you are the murderer, then
there is a 100% chance of a DNA match.
But what is the false positive rate? How likely is
a randomly selected person will match the DNA?
False Results
Here’s a quote from “False result fear over DNA
tests,” Nick Paton Walsh, The Guardian:
“Researchers had asked the labs to match a
series of DNA samples. They knew which ones
were from the same person, but found that in
over 1 per cent of cases the labs falsely matched
samples, or failed to notice a match.”
Let’s assume that half of the cases where “labs
falsely matched samples, or failed to notice a
match.” were cases where they falsely matched
samples.
So the probability of a false positive is ½ x 1% =
0.5%, or 5 in 1,000.
Since there are 7 million people in Hong Kong,
we expect about 0.5% x 7 million = 35,000 of
them to match the hair’s DNA.
Actually, it’s 35,000 + 1, because the true killer is
a match, and not by accident.
So we expect that there are 35,001 DNA
matches in all of Hong Kong.
And only one of them is the murderer. So what
is the probability that you are the murderer?
1 in 35,001. That’s way less than 100%.
Important Things to Remember
There are three important things to remember:
1. If the test is more accurate (fewer false
positives), then it’s more reliable
2. If the base rate is higher, the test is more
reliable.
3. If the police have other reasons to suspect
you, the test is more reliable.
1. If the test is more reliable…
Theoretically, DNA tests only return a false
positive about 1 in 3 billion times.
In that case, we’d expect only .002 false
positives in all of Hong Kong.
So your chances of being guilty would be 1 in
1.002, or 99.8%. Still, that’s lower than 100%.
2. If there base rate is higher…
Maybe the person who died was stabbed 5,000
times, once each by 5,000 different people. So
there are 5,000 murderers.
Then with the previous false positive number at
35,000, you have a 5,000 in 40,000 chance of
being one of the killers, or 12.5%.
3. If the police have some other reason
to suspect you…
To figure out your chances of being guilty, we
looked at the probability that a randomly
selected person from HK would be a DNA match.
We were assuming you were randomly selected.
But what if you weren’t randomly selected?
What if the police tested you because you had a
reason to kill the victim?
Reason to Suspect You
Then we would have to look at not the
probability that a randomly selected person
would match, but the probability that a person
who had reason to kill the victim would match.
Suppose there are 5 people who had reasons to
kill the victim, and the killer is one of them.
Much Higher Chance
Then your chances are:
Let K = you’re the killer and M = you’re a match
P(K/ M)
= [P(K) x P(M/ K)] ÷ P(M)
= [(1/5) x 100%] ÷ P(M)
= 0.2 ÷ [(1 + 0.025) ÷ 5]
= 97.6%
SAMPLING
Now we know what the base rate neglect bias is
(hopefully).
But this still doesn’t answer our question: how
many people do we need in our scientific study
to reliably generalize the results to everyone?
For example, if I want to know whether
increased economic dependence in men is
correlated with increased infidelity, how many
people do I need to study?
Surely one is too few. Is 10 fine? Do I need 100?
A million?
Sample
In statistics, the people who we are studying are
called the sample. (Or if I’m studying the
outcomes of coin flips, my sample is the coin
flips that I’ve looked at. Or if I’m studying
penguins, it’s the penguins I’ve studied.)
Our question is then: what sample size is
needed for a result that applies to the
population?
Evaluating Evidence
Well, remember what we learned last class.
There are two measures of success for a study:
Statistical significance: how likely would my
results be if they were just due to random
chance? Does the study rule out the null
hypothesis?
Evaluating Evidence
Well, remember what we learned last class.
There are two measures of success for a study:
Effect size: If I find that A and B are positively
correlated, how much does the value of A affect
B? What’s the percentage difference in the
odds/probability of B as we vary the
odds/probability of A?
Two Questions
So there are really two questions we’re asking:
How many people do I need to study to obtain
statistically significant results?
How big should my sample be to accurately
estimate effect sizes in the population at large?
Law of Large Numbers
Luckily, we do know that more is always better.
The “Law of Large Numbers” says that if you
make a large number of observations, the
results should be close to the expected value.
(There is no “Law of Small Numbers”)
Average of Dice Rolls
Example
Let’s think about a particular problem.
Suppose we are having an election between
Mitt and Barack and we want to know how
many people in the population plan to vote for
Mitt.
How many people do we need to ask?
Non-Random Samples
The first thing we should realize is that it’s not
going to do us any good to ask a non-random
group of people.
Suppose everyone who goes to ILoveMitt.com is
voting for Mitt. If I ask them, it will seem like
100% of the population will vote for Mitt, even if
only 3% will really vote for him.
Internet Polls
(Important Critical Thinking Lesson:
Internet polls are not trustworthy. They are
biased toward people who have the internet,
people who visit the site that the poll is on, and
people who care enough to vote on a useless
internet poll.)
Representative Samples
The opposite of a biased sample is a
representative sample.
A perfectly representative sample is one where
if n% of the population is X, then n% of the
sample is X, for every X.
For example, if 10% of the population smokes,
10% of the sample smokes.
Random Sampling
One way to get a representative sample is to
randomly select people from the population, so
that each has a fair and equal chance of ending up
in the sample.
For example, when we randomize our experiments,
we randomly sample the participants to obtain our
experimental group. (Ideally our participants are
randomly sampled from the population at large.)
Problems with Random Sampling
Random sampling isn’t a cure-all, however.
For example, if I randomly select 10 people from
a (Western) country, on average I’ll get 5 men
and 5 women. On average.
But, on any particular occasion, I might select
(randomly) 7 men and 3 women, or 4 men and 6
women.
Stratified Sampling
One way to fix these problems would be to
randomly sample 5 women and randomly
sample 5 men. Then I would always have an
even split between men and women, and my
men would be randomly drawn from the group
of men, while my women were randomly drawn
from the group of women.
Example
Let’s continue with our example.
We’re convinced that we should randomly
sample n individuals from the population of
women and n from the population of men.
Still, what is that number n?
We know that, of the people in our sample, X%
will vote for Mitt.
We want to know, of the people in the
population, what percent will vote for Mitt?
We can never know that it is exactly X, unless we
ask everyone. But we can increase our
confidence.
Confidence Interval
What we can do is find out, based on our
sample, that we are Z% sure (confident) that the
number of people who will vote for Mitt is
between X% and Y%.
For example, we can be 90% confident that the
percentage of people who vote for Mitt is
between 44% and 48%.
Confidence Interval
This would mean we think there’s a 10% chance
that either less than 44% or more than 48% of
people vote for Mitt.
The very same data might warrant us in saying
that we are 95% confident that the percentage
of people who vote for Mitt is between 40% and
52%.
Sample Size Determination
So if we want to know how many people to look
at, we should determine:
1. What level of confidence we want
2. How big we want our confidence interval to
be.
Common Choices
Common choices for these numbers are:
1. We want to be 95% confident of our
estimation.
2. We want our confidence interval to be 6%
wide (e.g. between 42% and 48%).
Expected Value, Deviation
Each variable has an expected value (for
example, 3.5 is the expected value of a dice roll,
the average of all the sides of a die).
Each variable has an expected deviation from its
expected value: how far are all the dice values
(1, 2, 3, 4, 5, 6) from the expected value (3.5)–
the answer is 1.5 on average.
Variance
The variance is the expected squared deviation–
[(6 – 3.5)^2 + (5 – 3.5)^2 + (4 – 3.5)^2
+ (3.5 – 3)^2 + (3.5 – 2)^2 + (3.5 – 1)^2] ÷ 6
Or about 2.9 for a die. Don’t worry you don’t
need to know this.
Standard Deviation
The standard deviation is the square root of the
variance. So √2.9 for a die.
The important point is that we can use this
number, the standard deviation, to figure out
how many people we need in our sample.
Solving for Sample Size
If we want our confidence interval to be 6%
wide, then a 95% confidence interval of this
width will be:
4 x standard deviation = 6%
The standard deviation of any estimate of a
proportion will be √(0.25/n)
A Little Bit of Math
So,
4 x standard deviation = 6%
4 x √(0.25/n) = 0.06
√(0.25/n) = 0.06/4 = 0.015
(0.25/n) = 0.015^2 = 0.000225
n = 0.25/0.000225 = 1,111
The Important Point
What’s the point?
The point is that you need about 1,000 people
to be 95% sure that the vote counts you
estimate from the sample are within 6% of the
actual voting behavior of the population.
Things to Note
This doesn’t mean that studies with less than a
thousand people can’t tell us anything—
What they tell us will just be either less
confident than 95% or have greater error bars
than 6%.
If a confidence interval of 20% is fine, you only
need 100 people.
The Base Rate Still Matters
It also doesn’t mean that 100 or 1,000 people is
sufficient for any study.
When the value of the variable being studied is
rare in the population, you need more people.
For example, if it’s 1 in 1 million, then most
samples of 1,000 won’t contain it, but that
doesn’t mean it’s at 0 prevalence.