Moving From Consulting to Collaboration

Download Report

Transcript Moving From Consulting to Collaboration

Sample Size
Robert F. Woolson, Ph.D.
Department of Biostatistics,
Bioinformatics & Epidemiology
Joint Curriculum
NIH Study Section





Significance
Approach
Innovation
Environment
Investigators
Approach




Feasibility
Study Design: Controls,
Interventions
Study Size: Sample Size, Power
Data Analysis
Sample Size



# of Animals
# of Measurement Sites/Animal
# of Replications
Sample Size





What #s are Proposed?
Adequacy of #s?
Compelling Rationale for
Adequacy?
Do We need More?
Can We Answer Questions With
Fewer?
Sample Size


Simple Question to Ask
Answer May Involve:
• Assumptions
• Pilot Data
• Simplification of Overall Aims to a
Single Question
Simplification



What Is The Question?
What Is The Primary Outcome
Variable?
What Is The Principal
Hypothesis?
Pilot Data



Relationship To Question.
Relationship To Primary
Variable.
Relationship To Hypothesis.
Sample Size/Power
Freeware on Web:

http://www.stat.uiowa.edu/~rlenth/Po
wer/


http://hedwig.mgh.harvard.edu/sampl
e_size/size.html


http://www.bio.ri.ccf.org/power.html


http://www.dartmouth.edu/~chance/
Sample Size

Purchase Software
• http://www.powerandprecision.co
m/
• Nquery: www.statsolusa.com
Animal Studies





Differences usually large
Variability usually small
Small sample sizes
Many groups
Repeated measures
Sample Size (# Animals
Required)

Excerpts from the MUSC Vertebrate
Animal Review Application Form:
“ A power analysis or other
statistical justification is required
where appropriate. Where the
number of animals required is
dictated by other than statistical
considerations… justify the number…
on this basis.”
Sample Size: Ethical
Issues in Animal Studies

Ethical Issues
• Study too large implies some
animals needlessly sacrificed
• Study too small implies potential
for misleading conclusions,
unnecessary experimentation

Mann MD, Crouse DA, Prentice ED. Appropriate animal
numbers in biomedical research in Light of Animal
Welfare Considerations. Laboratory Animal Science,
1991, 41:
Ethical Issues Cont.

Human studies - same rationale
hold for studies that are too
large or too small.
Sample Size:
Specifying the Hypothesis

Specifying the hypothesis
• difference from control?
• differences among groups over
time?
• differences among groups at a
particular point in time?

A “non-hypothesis”
• Animals in Group A will do better
than animals in Group B
Sample Size:
Specifying the Hypothesis


Ho: Mean blood pressure on
drug A = mean blood pressure
on drug B measured six hours
after start of treatment.
Ha: Mean blood pressure on
drug A < mean blood pressure
on drug B measured six hours
after start of treatment.
Example
(SHR )




Animal blood pressures
measured at baseline
Animals randomly assigned to
placebo or minoxidil
Animals measured 6 hours post
treatment
Changes from baseline
calculated for each animal
Example
(Continued)



Placebo changes thought to be
centered at 0
Expect minoxidil to lower blood
pressure, we think by 10 mm Hg
Blood pressure changes have a
standard deviation of 5 mm Hg
If two blood pressure change distributions
1. have the same standard deviation  (say 5 mm Hg),
2. have different means 1 (- 10)and 2 (0) and
then
their distributions might look like the following:
Example
(Continued)


How many animals/group
needed to have 90 % power to
detect the 10 mm Hg mean
difference?
How would this sample size
change if the standard deviation
is 10 mm Hg rather than 5 mm
Hg?
Example
(Continued)



Suppose we change the endpoint to,
did the animal achieve a reduction in
blood pressure of 10 or more mm Hg.
Therefore 50 % of those on minoxidil
would be expected to have reduction
of 10 or more.
About 2.5 % of those on placebo
would have reduction of 10 or more.
Example
(Continued)


How many animals/group required to
have 90 % power to detect the 50 %
vs.. 2.5 %?
Why the difference in sample sizes
for the same experiment?Comment
on:
• Assumptions
• Endpoint
• Specific hypothesis.
Sample Size:
Distribution of Response

Nominal/binary (Binomial)
• dead, alive

Ordinal (Non-parametric)
• inflammation (mild, moderate,
severe)

Continuous (Normal*)
• blood pressure
* may require transformation
Sample Size:
Distribution of Response
• Binomial
• N is a function of probability of
response in control and probability of
response in treated animals
• Normal
• N is a function of difference in means
and standard deviation
Sample Size:
One Sample or 2-sample Test

One sample
• Change from baseline in one group
• Comparison to standard (historical
controls)

Two sample
• Two independent study groups
Sample Size:
One or Two sided test

One sided test :
• Ha: a > 0
• Ha: a < 0

Two sided test
• Ha: a not = 0
Sample Size: Choosing 





 = probability of Type 1 error
probability of rejecting Ho when Ho is
true
significance level usually 0.01 or
0.05
“calling an innocent person guilty”
“concluding two groups are different
when they are not”
Sample Size: Choosing 



Multiple testing can lead to 
errors.
Pre-specified hypotheses, may
not need to adjust;
If all pairwise comparisons are
of interest, adjust  ( /#tests)
Sample Size: Choosing 






 = probability of type II error;
probability of failing to reject Ho
when a true difference exists.
“Calling guilty person innocent”
“Missing a true difference”
Power = 1 - 
Large clinical trials use 0.9 or 0.95;
animal studies usually use 0.8 (80%
power).
Sample Size: Power



Concluding groups do not differ
when power is low is risky. True
difference may have been missed.
80% power implies a 20% chance of
missing a true difference.
40% power implies a 60% chance of
missing a true difference.
Sample Size: Calculation
Calculate N
• specify difference to be detected
• specify variability (continuous
data)
OR
 Calculate detectable difference:
• specify N
• specify variability (continuous) or
control %

Sample Size:
Putting it all together
Continuous (Normal) Distribution
4(Z  + Z  ) 2  2
2n =
2
Need all but one: , , 2, , N
• Z  = 1.96 (2 sided, 0.05);
• Z  = 1.645 (always one-sided, 0.05, 95%
power)
•  = difference between means
• 2 = pooled variance
Difference (P1-P2)
(=0.05, one-sided test, N per
group=100, P1=0.5)
120
100
1.0
80
Power
0.8
60
0.6
40
0.4
20
0.2
0
0.0
0
0.05
0.1
0.15
0.2
0.25
P1 - P2
0.3
0.35
0.4
0.45
Sample Size
(=0.05, one-sided test, P1=.5, P2=.3)
1
0.9
0.8
Power
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
2
5
10
15
20
25
30
35
40
45
50
Sample Size
55
60
65
70
75
80
90
Nquery Advisor



About $700
Many more options than many
other programs
Available in student room in our
department
Nquery Advisor


Under “file” choose “New”
Choices
•
•
•
•
•


means
proportions
agreement
survival (time to event)
regression
# groups (1,2,>2)
testing, confidence intervals,
equivalence
Examples


Continuous response
Binary response
Sample Size:
Specifying the Hypothesis


Ho: Mean blood pressure on
drug A = mean blood pressure
on drug B measured six hours
after start of treatment.
Ha: Mean blood pressure on
drug A < mean blood pressure
on drug B measured six hours
after start of treatment.
Example
(SHR )




Animal blood pressures
measured at baseline
Animals randomly assigned to
placebo or minoxidil
Animals measured 6 hours post
treatment
Changes from baseline
calculated for each animal
Example
(Continued)



Placebo changes thought to be
centered at 0
Expect minoxidil to lower blood
pressure, we think by 10 mm Hg
Blood pressure changes have a
standard deviation of 5 mm Hg
If two blood pressure change distributions
1. have the same standard deviation  (say 5 mm Hg),
2. have different means 1 (- 10)and 2 (0) and
then
their distributions might look like the following:
Example
(Continued)


How many animals/group
needed to have 90 % power to
detect the 10 mm Hg mean
difference?
How would this sample size
change if the standard deviation
is 10 mm Hg rather than 5 mm
Hg?
Example
(Continued)



Suppose we change the endpoint to,
did the animal achieve a reduction in
blood pressure of 10 or more mm Hg.
Therefore 50 % of those on minoxidil
would be expected to have reduction
of 10 or more.
About 2.5 % of those on placebo
would have reduction of 10 or more.
Example
(Continued)


How many animals/group required to
have 90 % power to detect the 50 %
vs. 2.5 %?
Why the difference in sample sizes
for the same experiment?Comment
on:
• Assumptions
• Endpoint
• Specific hypothesis.
Sample Size: More Than
One Primary Response

Use largest sample size.
Sample Size: Food for
Thought



Is detectable difference
biologically meaningful?
Is sample size too small to be
believable?
N = 5 “rule of thumb” but is this
valid for the experiment being
planned.
Sample Size:
Misunderstandings


“Larger the difference, smaller
the sample size” ignores
contribution of variability
failing to report power for
negative study
• calculate based on hypothesized
difference and observed variability
Sample Size: Keeping It
Small

Study continuous rather than
binary outcome (if variability
does not increase)
• change in tumor size instead of
recurrence

Study surrogate outcome where
effect is large
• cholesterol reduction rather than
mortality
Examples Of Surrogate
Outcome Measures?






Bone density
Quality of life
Patency
Pain relief
Functional Status
Cholesterol
Sample Size: Keeping It
Small

Decrease variability
• Change from baseline or analysis
of covariance
• training
• equipment
• choice of animal model
Sample Size: Keeping It
Small





 = 0.05, 2-sided test
 = 0.2 ; power = 0.8 (80%)
Difference between two means
=1
Standard deviation = 2; N =
64/group
Standard deviation = 1; N =
17/group
Sample Size Estimation






Parameters are estimates
Estimate of relative effectiveness
based on other populations
Effectiveness overstated
Patients in trials do better
Assuming mathematical models
Compromise between available
resources/objectives
Sample Size: Pilot
Studies



No information on variability
No information on efficacy
Use effect size from similar
studies or gather pilot data for
estimation
Simplification



What Is The Question?
What Is The Primary Outcome
Variable?
What Is The Principal
Hypothesis?
Sample Size/Power
Freeware on Web:

http://www.stat.uiowa.edu/~rlenth/Po
wer/


http://hedwig.mgh.harvard.edu/sampl
e_size/size.html


http://www.bio.ri.ccf.org/power.html


http://www.dartmouth.edu/~chance/
Sample Size

Purchase Software
• http://www.powerandprecision.co
m/
• Nquery: www.statsolusa.com
Additional Comments
Pilot Studies

Complication rate
• P = 1 – (1 – r)N
where r = complication rate
N = sample size
• If know desired P and N can solve
for r
• If know desired r and P, can solve
for N
Example to Work


Want to have 90% probability of
detecting at least one complication,
given a 25% complication rate. What
N do you need?
You are studying 25 people and want
80% probability of detecting at least
one complication. What is the
complication rate that would yield
this probability.
Pilot Studies

Use larger alpha (>0.05, e.g.
0.15 or 0.2) to compute sample
size
• If reject null hypothesis will test in
future study

Underlying concept – futility;
ensure new treatment not
worse than standard.
Pilot Studies

Can reformulate hypothesis
• Ho: new treatment = placebo
• Ha: new treatment < placebo
• Continue to larger study if fail to
reject Ho.
Avoid Data Driven
Comparisons
Test here 

80
60
40
20
0
month month month month
1
2
3
4
Group 1
Group 2
Randomization: Bias Due
to Order of Observations




Learning effect
Change in laboratory
techniques
Different litters
Carry-over effects
• under estimate carry-over
• two treatments, same animal give
A & B; can only test effect of B
after A
Randomization: Order
Effects Continued

System fatigue
• rabbit heart’s ability to function
after two different treatments
Randomization: Order
Effects Continued

Seasonal variability
• All rats male, same weight, same age,
media temperature and other incubation
conditions identical, housed in identical
conditions
• Outcome - unstimulated renin release
from kidneys (in vitro) samples at 30
minutes
• Outcome - Metastasis - winter 16%
(n=767; summer 8% (n=142) ; logistic
regression p<0.03 for season