Pamela Minicozzi

Transcript Pamela Minicozzi

Sampling and
power analysis in the
High Resolution studies
Pamela Minicozzi
Descriptive Studies and Health Planning Unit,
Department of Preventive and Predictive Medicine,
Fondazione IRCCS Istituto Nazionale dei Tumori, Milan
High Resolution studies
collected detailed data
from patients’ clinical records, so that the influence
of non-routinely collected factors
(tumour molecular characteristics, diagnostic
investigations, treatment, relapse)
on survival and differences in standard care
could be analysed
2
Problem
 In each country, the population of incident cases
for a particular cancer consists of N subjects
 N is large (so, rare cancers are not considered here)
 Since N is large, not all cases can be investigated
Solution
 use a representative sample to derive valid conclusions
that are applicable to the entire original population
3
Two questions
1) What kind of probability sampling should we use?
2) What sample size should we use?
4
Sampling
5
Previous High Resolution studies
 Samples were representative of
 1-year incidence
 a time interval
(e.g. 6 months) within
the study period, provided that
incidence was complete
 an administratively defined area covered by cancer registration
6
Present High Resolution studies
We want to eliminate variations in
types of sampling between countries
and within a single country
This implies more
sophisticated sampling
Main types of probability sampling
7
Simple random sampling
 assign a unique number to each element of the study population
 determine the sample size
 randomly select the population elements using
 a table of random numbers
 a list of numbers generated randomly by a computer
Advantage:
- auxiliary information on subjects is not required
Disadvantage: - if subgroups of the population are of particular
interest, they may not be included in sufficient
numbers in the sample
8
Stratified sampling
 identify stratification variable(s) and determine the number of
strata to be used (e.g. day and month of birth, year of diagnosis, cancer registry, etc.)
 divide the population into strata and determine the sample size of each
stratum
 randomly select the population elements in each stratum
Advantage:
- a more representative sample is obtained
Disadvantage: - requires information on the proportion of the total
population belonging to each stratum
9
Systematic sampling
 determine the sample size (n); thus the sampling interval “i” is n/N
 randomly select a number “r” from 1 to “i”
 select all the other subjects in the following positions:
r, r+ i, r+ 2*i, etc, until the sample is exhausted
Advantage:
- eliminate the possibility of autocorrelation
Disadvantage: - only the first element is selected on a probability
basis  pseudo-random sampling
10
How
many subjects do we
need?
11
The main elements
the probability that the
difference will be
detected (e.g. 80%, 90%)
the probability that
a positive finding
is due to chance
alone (e.g. 1%, 5%)
Hypothesis
test and
significance
level
Statistical
power
Previous
Previous
pilot
pilot
study
studies
they explored whether some
variables can be measured
with sufficient precision
(or available) and checked
the study vision
to determine the minimum sample size
required to get a significant result (or
to detect a meaningful effect)
12
Previous High Resolution studies
Number of patients was defined based on:
 observed differences in survival and risk of death
 incidence of the cancer under study
 difficulties in collecting clinical information
 available economic resources
Notwithstanding that ...
we were able to identify statistically significant relative excess risks of
death
 up to 1.60 among European countries
 up to 1.40 among Italian areas
for breast cancer for which differences in survival are small.
 Applicable to other cancers for which survival differences are larger
13
Example for breast cancer (diagnosis 95-99)
Plot power as a function of hazard ratio for a 5% two-sided log-rank test
with 80% power over sample sizes ranging from 100 and 1000
Assume 75% survival as reference (the overall survival in Europe, range: 65-90%)
45%
14
Example for colorectal cancer (diagnosis 95-99)
Plot power as a function of hazard ratio for a 5% two-sided log-rank test
with 80% power over sample sizes ranging from 100 and 1000
Assume 50% survival as reference (the overall survival in Europe, range: 30-70%)
32%
15
Example for lung cancer (diagnosis 95-99)
Plot power as a function of hazard ratio for a 5% two-sided log-rank test
with 80% power over sample sizes ranging from 100 and 1000
Assume 10% survival as reference (the overall survival in Europe, range: 5-20%)
30%
16
Present High Resolution studies
We want to analyse both differences in survival and
adherence to standard care
Power analysis for both
logistic regression analysis
(to analyse the odds of receiving one type of care (typically standard care))
and relative survival analysis
(to analyse differences in relative survival and relative excess risks of death)
17
Conclusions
Taking into account
 existing samplings and power methodology
 experience from previous studies
 different coverage of Cancer Registries
 available economic resources
We want to
 standardize the selection of data
 include a minimum number of cases that satisfies statistical
considerations related to all aims of our studies
Prof. JS Long1 (Regression Models for Categorical and Limited
Dependent,1997) suggests that sample sizes of less than
100 cases should be avoided and that 500
observations should be adequate for almost any
situation.
18
1Professor
of Sociology and Statistics at Indiana University
19

Pamela Minicozzi

Transcript Pamela Minicozzi

Directory