June 30th and July 1st: Statistical Inference

Download Report

Transcript June 30th and July 1st: Statistical Inference

Statistical Inference
June 30-July 1, 2004
Statistical Inference
The process of making
guesses about the truth from a
sample.
Truth (not
observable)
Sample
(observation)
Make guesses
about the whole
population
FOR EXAMPLE: What’s the average weight of
all medical students in the US?
1. We could go out and measure all US medical students
(>65,000)
2. Or, we could take a sample and make inferences about
the truth from our sample.
Using what we observe,
1. We can test an a priori guess (hypothesis testing).
2. We can estimate the true value (confidence
intervals).
Statistical Inference is based
on Sampling Variability

Sample Statistic – we summarize a sample into one
number; e.g., could be a mean, a difference in means or
proportions, or an odds ratio
– E.g.: average blood pressure of a sample of 50 American men
– E.g.: the difference in average blood pressure between a sample
of 50 men and a sample of 50 women
Sampling Variability – If we could repeat an experiment
many, many times on different samples with the same
number of subjects, the resultant sample statistic would
not always be the same (because of chance!).
Standard Error – a measure of the sampling variability (a
function of sample size).
Sampling Variability
The Truth (not
knowable)
The average of
all 65,000+
US medical
students at
this moment is
exactly 150 lbs
Random
students
175.9 lbs
189.3 lbs
92.1 lbs
152.3 lbs
169.2 lbs
110.3 lbs
Sampling Variability
The Truth (not
knowable)
The average of
all 65,000+
US medical
students at
this moment is
exactly 150 lbs
Random
samples of 5
students
135.9 lbs
139.3 lbs
152.1 lbs
158.3 lbs
149.2 lbs
170.3 lbs
Sampling Variability
The Truth (not
knowable)
The average of
all 65,000+
US medical
students at
this moment is
exactly 150 lbs
Samples of 50
students
146.9 lbs
148.9 lbs
150.0 lbs
152.3 lbs
147.2 lbs
155.3 lbs
Sampling Variability
The Truth (not
knowable)
The average of
all 65,000+
US medical
students at
this moment is
exactly 150 lbs
Samples of 150
students
150.31 lbs
150.02 lbs
149.8 lbs
149.95 lbs
150.3 lbs
150.9 lbs
The Central Limit Theorem:
how sample statistics vary

Many sample statistics (e.g., the sample average)
follow a normal distribution
– centers around the true population value (e.g. the true
mean weight)
– Becomes less variable (by a predictable amount) as
sample size increases:


Standard error of a sample statistic = standard deviation /
square root (sample size)
Remember: standard deviation reflects the average variability
of the characteristic in the population
The Central Limit Theorem:
Illustration

I had SAS generate 1000 random
observations from the following probability
distributions:
 ~N(10,5)
 ~Exp(1)
 Uniform on [0,1]
 ~Bin(40, .05)
~N(10,5)
Uniform on [0,1]
~Exp(1)
~Bin(40, .05)
The Central Limit Theorem:
Illustration

I then had SAS generate averages of 2,
averages of 5, and averages of 100 random
observations from each probability
distributions…

(Refer to end of SAS LAB ONE, which we
will implement next Wednesday, July 7)
~N(10,25): average of 1
(original distribution)
~N(10,25): 1000 averages of 2
~N(10,25): 1000 averages of 5
~N(10,25): 1000 averages of 100
Uniform on [0,1]: average of 1
(original distribution)
Uniform: 1000 averages of 2
Uniform: 1000 averages of 5
Uniform: 1000 averages of 100
~Exp(1): average of 1
(original distribution)
~Exp(1): 1000 averages of 2
~Exp(1): 1000 averages of 5
~Exp(1): 1000 averages of 100
~Bin(40, .05): average of 1
(original distribution)
~Bin(40, .05): 1000 averages of 2
~Bin(40, .05): 1000 averages of 5
~Bin(40, .05): 1000 averages of 100
The Central Limit Theorem:
formally
If all possible random samples, each of size n, are taken
from any population with a mean  and a standard
deviation , the sampling distribution of the sample
means (averages) will:
1. have mean:
x  
2. have standard deviation:

x 
n
3. be approximately normally distributed regardless of the shape
of the parent population (normality improves with larger n)
Example
Pretend that the mean weight of
medical students was 128 lbs with a
standard deviation of 15 lbs…
Hypothetical histogram of
weights of US medical students
(computer-generated)
mean= 128 lbs; standard deviation = 15 lbs
3.0
Standard
deviation reflects
the natural
variability of
weights in the
population
2.5
P
e
r
c
e
n
t
2.0
1.5
1.0
0.5
0
69
77
85
93
101 109
117
125
133 141
Weight in pounds
149
157
165 173
181
189
197
Average weights from 1000
samples of 2
4.0
standard error of the mean  15
3.5
2
 10.6lbs
3.0
P
e
r
c
e
n
t
2.5
2.0
1.5
1.0
0.5
0
80
87
94
101 108 115
122 129 136
143 150 157
The average weight of a pair of students
164 171 178 185
192 199
Average weights from 1000
samples of 10
9
standard error of the mean  15
8
10
 4.74lbs
7
P
e
r
c
e
n
t
6
5
4
3
2
1
0
80
87
94
101 108 115 122 129 136 143 150 157 164 171 178 185 192 199
The average weight of 10 students
Average weights from 1000
samples of 120
30
standard error of the mean  15
25
P
e
r
c
e
n
t
120
 1.37lbs
20
15
10
5
0
80
87
94
101 108 115 122 129 136 143 150 157 164 171 178 185 192 199
The average weight of 120 students
Using Sampling Variability

In reality, we only get to take one sample!!

But, since we have an idea about how
sampling variability works, we can make
inferences about the truth based on one
sample.
Hypothesis Testing
Hypothesis Testing

The null hypothesis is the “straw man” that we are
trying to shoot down.

Example 1: Possible null hypothesis: “mean
weight of medical students = 128 lbs”

Let’s say we take one sample of 120 medical
students and calculate their average weight….
Expected Sampling Variability for n=120
if the true weight is 128 (and SD=15)
30
25
P
e
r
c
e
n
t
What are we
going to think if
our 120-student
sample has an
average weight of
143??
20
15
10
5
0
80
87
94
101 108 115 122 129 136 143 150 157 164 171 178 185 192 199
The average weight of 120 students
“P-value” associated with this experiment
30
“P-value” (the
25
P
e
r
c
e
n
t
probability of our
sample average being
143 lbs or more IF the
true average weight is
20
15
128)
10
< .0001
Gives us evidence that
128 isn’t a good guess
5
0
80
87
94
101 108 115 122 129 136 143 150 157 164 171 178 185 192 199
The average weight of 120 students
Estimation (a preview)
30
We’d estimate based on
these data that the
average weight is
somewhere closer to 143
lbs. And we could state
the precision of this
estimate (a “confidence
interval”—to come
later)
25
P
e
r
c
e
n
t
20
15
10
5
0
80
87
94
101 108 115 122 129 136 143 150 157 164 171 178 185 192 199
The average weight of 120 students
Expected Sampling Variability for n=2
4.0
What are we
going to think if
our 2-student
sample has an
average weight of
143?
3.5
3.0
P
e
r
c
e
n
t
2.5
2.0
1.5
1.0
0.5
0
80
87
94
101 108 115
122 129 136
143 150 157
The average weight of a pair of students
164 171 178 185
192 199
Expected Sampling Variability for n=2
4.0
3.5
P-value = 11%
3.0
P
e
r
c
e
n
t
i.e. about 11 out of 100
“average of 2”
experiments will yield
values 143 or higher
even if the true mean
weight is only 128
2.5
2.0
1.5
1.0
0.5
0
80
87
94
101 108 115
122 129 136
143 150 157
The average weight of a pair of students
164 171 178 185
192 199
The P-value
P-value is the probability that we would have seen our
data (or something more unexpected) just by chance if
the null hypothesis (null value) is true.
Small p-values mean the null value is unlikely given
our data.
The P-value

By convention, p-values of <.05 are often
accepted as “statistically significant” in the
medical literature; but this is an arbitrary
cut-off.

A cut-off of p<.05 means that in about 5 of
100 experiments, a result would appear
significant just by chance (“Type I error”).
What factors affect the pvalue?
 The
effect size
 Variability of the sample data
 Sample size**
Statistical Power

Note that, though we found the same sample value
(143 lbs) in our 120-student sample and our 2student sample, we only rejected the null (and
concluded that med students weigh more on
average than 128 lbs) based on the 120-student
sample.

Larger samples give us more statistical power…
Hypothesis Testing: example 2

Hypothesis: more babies born in November
(9 months after Valentine’s Day)

Empirical evidence: Our researcher
observed that 6/19 kids in one classroom
had November birthdays.
Hypothesis Testing
Is a contest between…
The Null Hypothesis and the Alternative
Hypothesis
– The null hypothesis (abbreviated H0) is usually the
hypothesis of no difference

Example: There are no more babies born in November (9
months after Valentine’s Day) than any other month
– The alternative hypothesis (abbreviated Ha)

Example: There are more babies born in November (9 months
after Valentine’s Day) than in other months
The Steps

1. Define your null and alternative hypotheses:
– H0: P(being born in November)=1/12
– Ha: P(being born in November)>1/12
“one-sided” test
The Steps

2. Figure out the “null distribution”:
– If I observe a class of 19 students and each student has a probability of
1/12th of being born in November…
– Sounds BINOMIAL!
– In MATH-SPEAK: Class ~ binomial (19, 1/12th)

***If the null is true, how many births should I expect to
see?
– Expected November births= 19*(1/12)= 1.5 why?
– Reasonable Variability = [19*(1/12)*(11/12)]**1/2 = 1.2 why?
If I see 0-3 November births, it seems reasonable that the null is
true…anything else is suspicious…
The Steps

3. Observe (experimental data)

We see 6/19 babies were born in November in this
case.
The Steps

4. Calculate a “p-value” and compare to a
preset “significance level”
The Almighty P-Value
The P-value roughly translated is… “the
probability of seeing something as extreme as
you did due to chance alone”
Example: The probability that we would have seen 6 or more November
births out of 19 if the probability of a random child being born in
November was only 1/12.
Based on the
null distribution
Easy to Calculate in SAS:
data _null_;
pval = 1- CDF('BINOMIAL',5, (1/12), 19);
put pval;
run;
0.003502582
The Steps

4a. Calculate a “p-value”
data _null_;
pval = 1- CDF('BINOMIAL',5, (1/12), 19);
put pval;
run;
0.003502582

b. and compare to a preset “significance level”….
.0035<.05
5% is often chosen due to
convention/history
The Steps

5. Reject or fail to reject (accept) Ho.

In this case, reject Ho.
Summary: The Underlying
Logic…
Follows this logic:
Assume A.
If A, then B.
Not B.
Therefore, Not A.
But throw in a bit of uncertainty…If A,
then probably B…
Summary: It goes something
like this…

The assumption: The probability of being born in
November is 1/12th.
 If the assumption is true, then it is highly likely
that we will see fewer than 6 November-births
(since the probability of seeing 6 or more is .0035,
or 3-4 times out of 1000).
 We saw 6 November-births.
 Therefore, the assumption is likely to be wrong.
Example 3: the odds ratio
Null hypothesis: There is no association
between an exposure and a disease (odds
ratio=1.0).
Example 3: Sampling Variability of the null
Odds Ratio (OR) (100 cases/100 controls/10% exposed)
6
5
P 4
e
r
c 3
e
n
t
2
1
0
0.3
1.0
Observed Odds Ratio
2.0
3.0
The Sampling Variability of the natural
log of the OR (lnOR) is more Gaussian
10
Sample values far from
lnOR=0 give us evidence
of an association. These
values are very unlikely
if there’s no association
in nature.
8
P
e
r
c
e
n
t
6
4
2
0
0
lnOR
Statistical Power

Statistical power here is the probability of
concluding that there is an association between
exposure and disease if an association truly exists.
– The stronger the association, the more likely we are to
pick it up in our study.
– The more people we sample, the more likely we are to
conclude that there is an association if one exists
(because the sampling variability is reduced).
Error and Power

Type-I Error (false positive):
– Concluding that the observed effect is real when it’s
just due to chance.

Type-II Error (false negative):
– Missing a real effect.

POWER (the flip side of type-II error):
– The probability of seeing a real effect.
Think of…
Pascal’s Wager
The TRUTH
Your Decision
God Exists
God Doesn’t Exist
BIG MISTAKE
Correct
Correct—
Big Pay Off
MINOR MISTAKE
Reject God
Accept God
Type I and Type II Error in a box
Your Statistical
Decision
Reject H0
True state of null hypothesis (H0)
H0 True
H0 False
Type I error (α)
Correct
Correct
Type II Error (β)
Do not reject H0
Statistical vs. Clinical
Significance
Consider a hypothetical trial comparing death rates in 12,000
patients with multi-organ failure receiving a new inotrope,
with 12,000 patients receiving usual care.
If there was a 1% reduction in mortality in the treatment group
(49% deaths versus 50% in the usual care group) this would be
statistically significant (p<.05), because of the large sample
size.
However, such a small difference in death rates may not be
clinically important.
Confidence Intervals
(Estimation)
Confidence Intervals
(Estimation)

Confidence intervals don’t presuppose a null
value.

Shows our best guess at the plausible range of
values for the population characteristic based on
our data.

The 95% confidence interval contains the true
population value approximately 95% of the time.
95% CI should contain true
value ~ 19/20 times
X = TRUE VALUE
(--------------------X-----------------)
(-------- X-------------------------)
(---------------------X----------------)
X
(-----------------------------------)
(-----------------X----------------)
(----------------------X----------------)
(----X---------------------------------)
Confidence Intervals
(Sample statistic)  (measure of how confident we
want to be)  (standard error)
95% CI from a sample of 120:
143 +/- 2 x (1.37) = 140.26 --145.74
standard error of the mean  15
30
120
 1.37lbs
25
P
e
r
c
e
n
t
20
15
10
5
0
80
87
94
101 108 115 122 129 136 143 150 157 164 171 178 185 192 199
The average weight of 120 students
95% CI from a sample of 10:
143 +/- 2 x (4.74) = 133.52 –152.48
standard error of the mean  15
9
10
 4.74lbs
8
7
P
e
r
c
e
n
t
6
5
4
3
2
1
0
80
87
94
101 108 115 122 129 136 143 150 157 164 171 178 185 192 199
The average weight of 10 students
99.7% CI from a sample of 10:
143 +/- 3 x (4.74) = 128.78 –157.22
standard error of the mean  15
9
10
 4.74lbs
8
7
P
e
r
c
e
n
t
6
5
4
3
2
1
0
80
87
94
101 108 115 122 129 136 143 150 157 164 171 178 185 192 199
The average weight of 10 students
What Confidence Intervals do
 They indicate the un/certainty about the size of a
population characteristic or effect. Wider CI’s
indicate less certainty.
  Confidence intervals can also answer the
question of whether or not an association exists or
a treatment is beneficial or harmful. (analogous to
p-values…)

e.g., if the CI of an odds ratio includes the value 1.0 we cannot be
confident that exposure is associated with disease.