Statistical Considerations in Research Study Designs

Download Report

Transcript Statistical Considerations in Research Study Designs

Research Study Design
and Statistical Methods for
Cardiology
Nathan D. Wong, PhD, FACC
Professor and Director
Heart Disease Prevention Program
Division of Cardiology
University of California, Irvine
Why are papers rejected for
publication? (The Top 11 Reasons)
1. The study did not address an important scientific
issue
2. The study was not original
3. The study did not actually test the authors’
hypothesis
4. A different type of study should have been done
5. Practical difficulties led the authors to
compromise on the original study protocol (e.g.,
recruitment, procedures)
Greenhalgh T, BMJ 1997; 15: 243-6
Reasons 6-11 for Paper Rejection
6. The sample size was too small
7. The study was uncontrolled or inadequately
controlled
8. The statistical analysis was incorrect or
inappropriate
9. The authors drew unjustified conclusions from
the data
10. There is a significant conflict of interest among
authors
11. The paper is so badly written that it is
incomprehensible
Outline
• Elements of Designing a Research
Protocol
• Selecting a Study Design – Which is best
for answering your question?
• Selection and Classification of Study
Variables (e.g., predictors and outcomes)
• Sample size and power considerations
• Choice of statistical procedures for
different study designs
Nine Key Elements of a Research Study Protocol
•
•
•
•
•
•
•
•
•
Background
Hypotheses
Clinical Relevance
Specific Aims / Objectives
Methodology
Power / Sample Size
Measures and Outcomes
Data Management
Statistical Methodology
(UCI-SOM Dean’s Scientific Review Committee:
http://www.rgs.uci.edu/ora/rp/hrpp/deansscientificreview.htm)
Background
• A brief review of the problem to be
studied and of related studies that
generated the rationale and the
central idea of the proposed study.
Several pertinent references should
be provided.
Was the study original?
• Few studies break entirely new ground
• Many studies add to the evidence base of
earlier studies which may have had other
or more limitations
• Meta-analyses depend on literature
containing multiple studies addressing a
question in a similar manner
Features Distinguishing New vs.
Previous Studies
• Is the study in question bigger in sample size, or
with longer-follow-up (e.g., adding to metaanalyses of previous studies)?
• Is methodology more rigorous (e.g., having
addressed criticisms of previous ones)?
• Is the population studied different from that of
previous studies (ages, gender, ethnic groups)?
• Does the new study address a clinical issue of
sufficient importance so it is politically desirable
even if not scientifically necessary?
Greenhalgh T, BMJ 1997; 315: 305-8
Hypotheses
• The problem/s stated in the Background may
generate a primary hypothesis and possibly one
or two secondary hypotheses.
• A hypothesis is often stated in the null – e.g.,
"No difference between treatments A and B" is
anticipated, or "No association between X and Y
exists".
• Alternatively, it can be stated according to what
one expects e.g., “A will be more effective than B
in reducing levels or symptoms of C", or “X will
be associated with Y".
Clinical Relevance
• In the case of clinical studies, the potential
value in the understanding, diagnosis, or
management of a clinical condition or
pathological state should be stated.
Specific Aims / Objectives
• This states what the study is intended to study or
demonstrate and generally includes mention of
predictor and outcome (or endpoint) variables.
• For example: "The primary aim of the study is to
examine whether treatment A is more effective
than treatment B in reducing levels of C", or "in
finding out whether X is associated with Y", etc.
• There may be several specific aims in a given
study. The methods of study should address
each of them.
Elements of a Formulated Question
• Patient or Population: Who is the question
about? (e.g., pts with diabetes mellitus)
• Intervention or Exposure: What is being done or
what is happening to the patient/population?
(e.g., tight control)
• Outcome(s): How does the intervention affect
the patient/population (mortality, CHD incidence)
• Comparison(s): What could be done instead of
the intervention? (e.g., standard management)
Methodology
• Methodology should validate or not validate the
hypothesis and specific aims using procedures
consistent with sound scientific study design
including:
– the size and nature of the subjects studied
– recruitment, screening, and enrollment
procedures
– inclusion and exclusion criteria
– treatment schedules, and follow-up
procedures, if applicable. A chart of the
studies to be performed at each visit and the
time of each visit and test is needed.
Study Population Issues
• How were the subjects recruited? Is there
potential recruitment bias (e.g., from taking
respondents of advertisements), or is
survey done in a random (e.g., random
digit-dialing) or consecutive sample?
• Who was included? Many trials exclude
those who have co-morbidities, do not
speak English, or take other
medications—may provide scientifically
clean results, but may not be
representative of disease in question.
Study Population (cont.)
• Who was excluded? Study may exclude
those with more severe forms of disease,
therefore limiting generalizibility
• Were subjects studied in “real-life”
circumstances? Is the consenting process
describing the benefits/risks, access to
study staff, equipment available, etc. be
similar to that in an ordinary practice
situation?
Power / Sample Size
• A power/sample size analysis should
include an estimate of minimum effect or
difference expected at a given level of
power when the sample size is fixed, or a
projection of the number of subjects
needed to achieve a clinically important
difference in what is being examined in the
hypotheses and the specific aims.
Measures and Outcomes
• Measures include both independent (predictor) and
dependent (outcome) variables.
• Outcomes include what the investigator is trying to
predict, e.g., new or recurrent onset of a disease state,
survival, or lowering of cholesterol as a result of a drug.
• The independent or predictor variables should always
include treatment status (e.g., active vs. placebo) in the
case of a clinical trial, or primary variables of interest
(such as age, gender, levels of X at baseline) for other
studies. In either case, there will often be possible
cofounders or covariates to adjust for in the analysis of
the results.
• The measures and outcomes are reasonably expected
to answer the proposed question and the importance of
the knowledge expected to result from the research.
Data Management
• Data Management includes how data is
captured for analysis and the tools that will
be utilized while capturing the data. This
includes:
– Case report forms for clinical trials
– Surveys, questionnaires, or interview
instruments
– Computerized spreadsheets or entry forms
– Methods for data entry, error checking, and
maintenance of study databases
Statistical Methods of Analysis
• Statistical analysis includes a description
of the statistical tests planned to perform
to examine the results obtained, e.g.,
– Student’s t-test will be used to compare levels
of A and B between treatment and placebo
groups
– Multiple logistic regression analysis will be
used to examine an independent treatment
effect on the likelihood of recurrent disease.
Hierarchy of Evidence
(for making decisions about clinical
interventions or proving causation)
1. Systematic reviews and meta-analyses
2. Randomized controlled trials with definitive and
clinically significant effects
3. Randomized controlled trials with nondefinitive results
4. Cohort studies
5. Case-control studies
6. Cross-sectional surveys
7. Case reports
Features Affecting Strength and
Generalizability of Study
• sample size
• selection of comparison group (control or
placebo)
• selection of study sample (is it representative of
population the study results are intended to
apply to?)
• length of time of follow-up
• outcome assessed (e.g., hard vs. soft or
surrogate endpoint)
• Measurement and ability to control for potential
confounders
Case Reports and Series
• Provides “anectdotal” evidence about a
treatment or adverse reaction
• Often with significant detail not available in other
study designs
• May generate hypotheses, help in designing a
clinical trial.
• Several reports forming a “case series” can help
establish efficacy of a drug, or thru adverse
reports, cause its demise (example: Cerivastatin
fatal cases of rhabdomyolysis).
Observational Studies
• Cross-sectional, prospective, and casecontrol studies seldom can identify two
groups of subjects (exposed vs.
unexposed or cases vs. controls) that are
similar (e.g., in demographic or other risk
factors).
• Much of the controlling for baseline and/or
follow-up differences in subject
characteristics occurs in the analysis stage
(e.g., multivariable analysis as in
Framingham)
Observational Studies (cont.)
• While statistical procedures may be done
correctly, have we considered all possible
confounders?
• Some covariates may not have been
measured as accurately as possible, and
more often, may not be even known or
measured.
Observational, cross-sectional
• Examines association between two
factors (e.g, an exposure and a disease
state) assessed at a single point in time,
or when temporal relation is unknown
• Example: Prevalence of a known
condition, association of risk factors with
prevalent disease.
• Conclusions: Associations found may
suggest hypotheses to be further tested,
but are far from conclusive in proving
Cross-Sectional Studies and
Surveys
• Examples: NHANES III, CHIS
(telephone), chart-review studies
• Surveys should include a representative,
ideally randomly-chosen (rather than a
small sample of approached subjects who
actually agree to be surveyed) sample.
• Data collected cannot assume any
directionality in exposure / disease.
• Can statistically adjust for confounders,
but difficult to establish the temporal
nature of exposure and disease.
Prevalence of CHD by the Metabolic Syndrome and
Diabetes in the NHANES Population Age 50+
CHD Prevalence
25%
19.2%
20%
13.9%
15%
10%
8.7%
7.5%
5%
0%
No MS/No
% of
DM
Population =
54.2%
MS/No DM
28.7%
DM/No MS
2.3%
Alexander CM et al. Diabetes 2003;52:1210-1214..
DM/MS
14.8%
Odds of CVD Stratified by CRP Levels in U.S. Persons
(Malik and Wong et al., Diabetes Care, 2005)
6
O
d
d
s
R
a
t
i
o
***
5
4
3
2
***
*
*
**
1
0
High CRP
No
Metabolic
disease
Syndrome
Low CRP
Diabetes
–*p<.05, **p<.01, **** p<.0001 compared to no disease, low CRP
–CRP categories: >3 mg/l (High) and <3 mg/L (Low)
–age, gender, and risk-factor adjusted logistic regression (n=6497)
Metabolic Syndrome Independently Associated with
Inducible Ischemia from SPECT
(Wong ND et al., Diabetes Care 2005; 28: 1445-50 )
Predictor
OR
95% CI
P value
Log coronary calcium
(per SD)
4.11
2.60-6.51
<0.001
Chest Pain Symp
2.94
1.69-5.09
<0.001
1-2 MetS risk factors
2.99
0.70-12.8
0.14
3 MetS risk factors
4.80
1.01-22.9
0.049
4-5 MetS risk factors
10.93
2.09-57.2
0.005
Diabetes
4.55
0.98-21.1
0.053
*Estimates adjusted for age, gender, cholesterol and
smoking. Odds of ischemia for metabolic abnormalities
(yes vs. no) (separate model): 1.98 (1.20-3.98), p=0.008
Prospective (Cohort) Studies
• Cohort studies begin with identification of
a population, assessment of exposure
(e.g., lipid or BP levels)
• Follow-up to the occurrence of outcomes
(CHD events)-- temporal sequence to
events is known
Cohort Studies (cont.)
• Difficult to ascertain effect of exposure
because of many differences between
exposed and unexposed groups
(confounding factors).
• Statistical adjustment for known risk factor
differences can help, but unknown factors
that may differ between exposed and
unexposed groups will never be adjusted
for.
Duration of Follow-up
• Is the planned follow-up reasonable and
practical for the study question and
sample size utilized?
– effect of a new painkiller on degree of pain
relief may only require 48 hours
– effect of a cholesterol medication on mortality
may require 5 years)
Prospective cohort studies
• Examples:
– Framingham Heart Study
– Cardiovascular Health Study (CHS)
– Multiethnic Study of Atherosclerosis
(MESA)
– Nurses Health Study
• Advantages:
– large sample size
– ability to follow persons from healthy to
diseased states
– temporal relation between risk factor
measures and development of disease
Prospective Studies (cont.)
• Disadvantages:
– expensive due to large sample size often
needed to accrue enough events
– many years to development of disease
– possible attrition
– causal inference not definitive as difficult to
consider all potential confounders
Prospective Cohort Example:
Framingham Heart Study
• Longest running epidemiologic study
• Began with 5209 persons aged 30-62 at
baseline in 1948, studied biennially to date
(most are deceased now)
• Risk factors measured at each examination,
some began later (e.g., HDL-C around 1970) or
done only at certain exams (echocardiography,
CRP)
• Event ascertainment/adjudication involves panel
of 3 physicians reviewing medical records
14-y incidence
rates (%) for CHD
Low HDL-C Levels Increase CHD Risk
Even When Total-C Is Normal
(Framingham)
14
12
10
8
6
4
2
0
< 40 40–49 50–59  60
HDL-C (mg/dL)
 260
230–259
200–229
< 200
Risk of CHD by HDL-C and Total-C levels; aged 48–83 y
Castelli WP et al. JAMA 1986;256:2835–2838
Patients (%)
4-Year Progression To Hypertension:
The Framingham Heart Study
50
Participants age 36 and older
40
37
30
18
20
10
5
0
Optimal
Normal
(<120/80 mm
(130/85 mm
Hg)
Hg)
Vasan, et al. Lancet 2001;358:1682-86
High-Normal
(130-139/85-89 mm
Hg)
CHD, CVD, and Total Mortality:
US Men and Women Ages 30-74
(age, gender, and risk-factor adjusted Cox regression) NHANES II
Follow-Up (n=6255)(Malik and Wong, et al., Circulation 2004; 110: 12451250)
7
***
6
***
5
***
4
***
***
***
3
***
***
***
2
*
**
1
0
CHD Mortality CVD Mortality Total Mortality
* p<.05, ** p<.01, **** p<.0001 compared to none
None
MetS
Diabetes
CVD
CVD+Diabetes
CV Event-Free 8-year Survival Using
Combined hs-CRP and LDL-C
Measurements
(n=27,939)
1.00
Median LDL 124 mg/dl
Probability of Event-free Survival
Median CRP 1.5mg/l
Low CRP-low LDL
0.99
Low CRP-high LDL
0.98
High CRP-low LDL
0.97
0.96
High CRP-high LDL
0.00
0
2
4
6
Years of Follow-up
Ridker et al, N Engl J Med. 2002;347:1157-1165.
8
Case-control Studies
• Most frequent type of epidemiologic study, can be
carried out in a shorter time and require a smaller
sample size, so are less expensive
• Only practical approach for identifying risk factors
for rare diseases (where follow-up of a large sample
for occurrence of the condition would be
impractical)
• Selection of appropriately matched control group
(e.g., hospital vs. healthy community controls) and
consideration of possible confounders crucial
• Relies on historical information to obtain exposure
status (and information on confounders)
Case-Control Studies (cont.)
• Cannot determine for sure whether
exposure preceded development of
disease
• Also difficult to identify all differences
between cases and controls that can be
statistically adjusted for
Example of case-control study:
Folate and B6 intake and risk of MI
(Tavani et al. Eur J Clin Nutr 2004)
• Cases were 507 patients with a first episode
of nonfatal AMI, and controls were 478
patients admitted to hospital for acute
conditions
• Information was collected by intervieweradministered questionnaires
• Compared to patients in the lowest tertile of
intake, the ORs for those in the highest tertile
were 0.56 (95% CI 0.35-0.88) for folate and
0.34 (95% CI 0.19-0.60) for vitamin B6.
• Author conclusion: A high intake of folates,
vitamin B6 and their combination is inversely
associated with AMI risk
Potential sources of bias and error
in case control studies
• Information on the potential risk factor or
confounding variables may not be available
from records or subjects’ memories
• Cases may search for a cause of their
disease and be more likely to report an
exposure than controls (recall bias)
• Uncertainty as to whether agent caused
disease or whether occurrence of the disease
caused the person to be exposed to the
agent
• Difficulty in assembling a case group
representative of all cases, and/or
assembling an appropriate control group
Prospective, observational:
nested case-control
• In this design, one takes incident cases
(e.g., incident CVD) and a matched set of
controls to examine the association of a
risk factor measured sometime before
development of the outcome of interest
• Less costly than a true prospective design
where all subjects are included in analysis;
may not provide equivalent estimates
Prospective study of CRP and risk of future
CVD events among apparently healthy
women (Ridker et al., Circulation 1998) – a
nested case control study
• 122 female pts who suffered a first CVD
event and 244 age and smoking-matched
controls free of CVD
• Logistic regression estimated relative risks
and 95% CI’s, adjusted for BMI, diabetes,
HTN, hypercholesterolemia, exercise, family
hx, and trt
• Those who developed CVD events had
higher baseline CRP than controls; those in
the highest quartile of CRP had a 4.8-fold (4.1
adjusted) increased risk of any vascular
event. For MI or stroke, RR=7.3 (5.5
adjusted)
hs-CRP Adds to Predictive Value of TC:HDL
Ratio in Determining Risk of First MI
Relative Risk
5.0
4.0
3.0
2.0
1.0
0.0
High
Medium
High
Medium
Low
Total Cholesterol:HDL Ratio
Ridker et al, Circulation. 1998;97:2007–2011.
Low
Examples where observational
studies have taken us down the
wrong path……
• Meta-analysis of observational studies have
shown a 50% lower risk of CHD among estrogen
users vs. non-users (which may have had many
unknown differences that were not adjusted for),
but recently randomized trials (HERS, WHI)
show no benefit
• Numerous prospective studies show a 25-50%
lower risk of CHD among those taking vitamin E
and other antoxidants vs. placebo– recent
randomized trials (e.g., HOPE, HPS) show no
benefit.
Randomized Clinical Trial
• Considered the gold standard in proving
causation– e.g., by “reducing” putative risk
factor of interest
• Randomization “equalizes” known and
unknown confounders/covariates so that
results can be attributed to treatment with
reasonable confidence
• Inclusion and exclusion criteria can often be
strict (to maximize success of trial) and may
require screening numerous patients for each
patient randomized
Randomized Clinical Trials (2)
• Expensive, labor intensive, attrition from
loss to follow-up or poor compliance can
jeopardize results, esp. if more than
outcome difference between groups
• Conditions are highly controlled and
may not reflect clinical practice or the
real world
• Funding source of study and
commercial interests of investigators
can raise questions about conclusions
of study
Randomized Controlled Trials (3)
• Randomized controlled trial eliminates
systematic bias (in theory) by allocating
treatments among participants in a random
fashion
• The allocation process eliminates selection bias
in group characteristics (check comparability of
baseline characteristics such as age, gender,
severity of disease and covariate risk factors)
(selection bias)
RCT’s (4)
• Need to check for any biases in treatments or
care provided between the groups (performance
bias)
• Need to check for differences in follow-up and
withdrawals between the groups– large
differences in loss to follow-up can compromise
validity of trial (exclusion bias)
• Need to check for any differences in how the
outcomes were ascertained between the groups
(detection bias)
Advantages of RCT’s
– Allows rigorous evaluation of a single
intervention in a well-defined population
– Prospective design (events occur after the
intervention)
– Presumably eradicates bias by comparing
two identical groups (but see below)
– Allows for meta-analysis
Disadvantages of RCT’s
• Expensive and time-consuming
• Often performed on too few patients, or
undertaken for too short a period
• Often funded by large research bodies or
pharmaceutical companies which dictate the
research agenda
• Often involves many inclusion and exclusion
criteria to recruit those who will respond to
intervention, thus limiting generalizibility to a
more general patient population.
Completeness of Follow-up
–Conclusions of study can be at
jeopardy if there are more unknown
subjects lost to follow-up than
explain the differences in outcome.
–Ignoring those withdrawals will
often bias results in favor of the
intervention, so standard to analyze
results on an “intention-to-treat”
basis, including all who were
originally randomized.
Follow-up (cont.)
– Patient withdrawal may be caused by:
• Incorrect entry of patient into a trial
• Suspected adverse reaction to a drug
(although many drug AE’s are similar to
placebo AE’s)
• Loss of patient motivation
• Withdrawal by clinician for clinical reasons
• Loss to follow-up
• Death
Non-randomized Controlled Trials
• Treatment intervention may be applied in one
group of patients (hospitalized), and “control”
intervention in a separate group of patients from
another source (outpatient clinic)
• May be done when randomization is unethical or
inappropriate (e.g., trial examining exposure to
cigarette smoking)
• Need to check for any self-selection biases—are
there any baseline differences between the two
groups that could invalidate the effects of the
intervention? (e.g., treated group could have
more severe confounding risk factors)
Statistics and Statistical
Procedures for Cross-Sectional
and Case-Control Designs
– When both independent and dependent
variables are continuous: Pearson
correlation or linear/polynomial regression
– When dependent variable is continuous
and independent variables are categorical
(with or without continuous or categorical
covariates)
Analysis of variance (Analysis of
covariance with covariates).
Analysis for Cross-Sectional and
Case Control Designs (cont.)
– When both independent and dependent
variables are categorical: Chi-square test of
proportions- prevalence odds ratio for
likelihood of factor Y in those with vs. w/o
factor X.
– When outcome is binary (e.g., survival) and
explanatory variables are categorical and/or
continuous:
• Student-test or Chi-square for initial analysis
• Logistic regression (multiple logistic regression for
covariate adjustment)
Wong et al. JACC 2003; 41: 1547-53.
Malik and Wong et al., Diabetes Care 2005; 28: 690-3
Likelihood of CVD by Metabolic
Syndrome, Diabetes, and CRP
Levels
Malik and Wong et al., Diabetes Care 2005; 28: 690-3
Statistical Procedures for
Prospective Cohort Studies
• When outcome is continuous: Linear and/or
polynomial regression
• When outcome is binary: Relative risk (RR) for
incidence of disease in those with vs. without
risk factor of interest, adjusted for covariates and
considering follow-up time to event--Cox
proportional hazards regression: HR (t,zi) = HR0
(t) exp (α’zi)
• If follow-up time is not known, use logistic
regression: p (Y=1 | r1,r2,…) = 1/(1+ exp[-a-b1r1… b nr n)
Deaths/1000 Person Years
Total Mortality Rates in U.S. Adults,
Age 30-75, with Metabolic Syndrome
(MetS), With and Without Diabetes
Mellitus and Pre-Existing CVD
NHANES II: 1976-80 Follow-up Study**
50.0
45.0
40.0
35.0
30.0
25.0
20.0
15.0
10.0
5.0
0.0
44.1
No MetS or DM
28.1
16.7
11.5
17.0
2.6
4.3 4.8
10.9
6.3
CHD Mortality
5.3
30.0
26.1
21.1
17.1
14.4
7.8 8.6
CVD Mortality
MetS w/o DM
MetS w/DM
DM only
Prior CVD
Prior CVD and DM
Total Mortality
** Average of 13 years of follow-up.
Source: Malik and Wong et al., Circulation 2004;110:1245-50.
Malik and Wong et al., Circulation 2004; 110: 1239-44
Statistics and Statistical Procedures for
Randomized Clinical Trials
Relative risk (RR) of binary event
occurring in intervention vs. control
group:
- when follow-up time is known and
varies, use Cox PH regression, where
RR= ebeta for the trt var.
-- when follow-up time is uniform or
unknown, use logistic regression
Statistics and Statistical Procedures for
Randomized Clinical Trials (cont.)
For continuously measured outcomes,
(e.g., changes in blood pressure):
• Pre-post differences in a single group
examined by paired t-test
• Treatment vs. control differences examined
by Student’s T-test (ANCOVA used when
adjusting for covariates)
• repeated measures ANOVA / ANCOVA used
for multiple measures across a treatment
period and covariates
LaRosa et al., N Engl J Med 2005; 352
LaRosa et al., N Engl J Med 2005; 352
LaRosa et al., N Engl J Med 2005; 352
LaRosa et al., N Engl J Med 2005; 352
LaRosa et al., N Engl J Med 2005; 352
LaRosa et al., N Engl J Med 2005; 352
Questions to ask regarding study
results
• How large is the treatment effect (or likelihood of
outcome)?
– Relative risk reduction (may obscure comparative
absolute risks)
– Absolute risk reduction: is this clinically significant?
• How precise is the treatment effect (or likelihood
of outcome)?
– What are the confidence intervals?
– Do they exclude the null value?
(e.g., is the result statistically significant– magnitude
of Chi-square or F-value)
MRC/BHF Heart Protection Study
(HPS): Eligibility
• Age 40–80 years
• Increased risk of CHD death due to prior disease
– Myocardial infarction or other coronary heart
disease
– Occlusive disease of noncoronary arteries
– Diabetes mellitus or treated hypertension
• Total cholesterol > 3.5 mmol/L (> 135 mg/dL)
• Statin or vitamins not considered clearly indicated or
contraindicated by patient’s own doctors
Heart Protection Study Group. Lancet. 2002;360:7-22.
HPS: First Major Coronary
Event
StatinPlaceboType of Major
Allocated Allocated
Vascular Event (n = 10269) (n = 10267)
Coronary events
Nonfatal MI
357 (3.5%)
574 (5.6%)
Coronary death
587 (5.7%)
707 (6.9%)
Subtotal: MCE
898 (8.7%)
1212 (11.8%)
Statin Better
0.73 (0.670.79)
P < 0.0001
Revascularizations
Coronary
513 (5.0%)
725 (7.1%)
Noncoronary
450 (4.4%)
532 (5.2%)
Subtotal: any RV
939 (9.1%)
1205 (11.7%)
Any MVE
2033 (19.8%)
Placebo Better
0.76 (0.700.83)
P < 0.0001
0.76 (0.720.81)
2585 (25.2%)
P < 0.0001
0.4
0.6
0.8
1.0
1.2
1.4
These results from the Heart Protection Study frequently present a relative risk reduction of 24% (or
relative risk of 0.76), but an absolute risk reduction of only 5.5% associated with the simvastatin
treatment.
Heart Protection Study Collaborative Group. Lancet. 2002;360:722.
Relative vs. Absolute Risk: The
Example from The Women’s Health
Initiative
• Those randomized to estrogen/progestin
compared to placebo and statistically
significant increased risks:
– Breast cancer 26% (8/10,000 person
years)
– Total coronary heart disease 29%
(7/10,000 person years)
– Stroke 41% (8/10,000 person years)
– Pulmonary embolism 2.1 X (8/10,000
person years)
– Protective for colorectal cancer (37%
lower) and hip fracture (34% lower): no
effect endometrial cancer or total mortality
Examining Magnitude of Effect: HPS Study
Example of Vascular Event Reduction
Event Yes
Simvastatin/ a
Event No
Treatment
2042
b
8227
Placebo /
Control
c
2606
d
7661
Control event rate (CER) = c/c+d = 2606/10267=0.254
Experimental event rate (EER) = a/a+b = 2042/10269 = 0.199
Relative Risk (RR) = EER/CER = (.199)/(.254) = 0.78
Relative Risk Reduction (RRR) = CER-EER/CER=(0.254-0.199)/.254= 0.22
Absolute Risk Reduction (ARR) = CER-EER = 0.01 – 0.008 = 0.055, or 5.5%
Number Needed to Treat = 1/ARR = 1/0.055 = 18.2 (or 56 events prevented
per 1000 treated)
SUMMARY: Statistics and Statistical
Procedures
• Cross-sectional: Pearson correlation, Chi-square
test of proportions- prevalence odds ratio for
likelihood of factor Y in those with vs. w/o X
• Case-control: Odds ratio for likelihood of exposure
in diseased vs. non-diseased-- Chi-square test of
proportions / logistic regression
• Prospective: Relative risk (RR) for incidence of
disease in those with vs. without risk factor of
interest, adjusted for covariates and considering
follow-up time to event--Cox PH regression.
Correlations and linear/ transformed regression
used for continuous outcomes.
SUMMARY: Statistics and Statistical
Procedures (continued)
• Randomized clinical trial: Relative risk
(RR) of event occurring in intervention
vs. control group - Cox PH regression
– For continuously measured outcomes,
such as pre-post changes in risk factors
(lipids, blood pressure, etc.) initial
treatment vs. control differences examined
by Student’s T-test, repeated measures
ANOVA / ANCOVA used for multiple
measures across a treatment period and
covariates
Data Collection / Management
• Always have a clear plan on how to collect data-design and pilot questionnaires, case report forms.
• The medical record should only serve as source
documentation to back up what you have coded on
your forms
• Use acceptable error checking data entry screens or
spreadsheet software (e.g., EXCEL) that is covertable
into a statistical package (SAS highly recommended
and avail via UCI site license)
• Carefully design the structure of your database (e.g,
one subject/ record, study variables in columns) so
convertible into an analyzable format
Data Collection / Management
• Always have a clear plan on how to
collect data-- design and pilot
questionnaires, case report forms.
• The medical record should only
serve as source documentation to
back up what you have coded on
your forms
Data Collection / Management
(cont.)
• Use acceptable error checking data entry
screens or spreadsheet software (e.g.,
EXCEL) that is convertible into a statistical
package (SAS highly recommended and
avail via UCI site license)
• Carefully design the structure of your
database (e.g, one subject/ record, study
variables in columns, numeric coding of all
variables) so easily convertible for
statistical analysis
Critical Appraisal
1. Why was the study done, and what
clinical question is being asked? (a brief
background, review of the literature, and
aim / hypothesis should be stated)
2. What type of study was done?
(experiment, clinical trial, observational
cohort or cross-sectional study, or
survey)
Critical Appraisal (cont.)
3. Was the design appropriate for the research?
• Clinical trial preferred to test efficacy of
treatments (e.g., HPS simvastatin trial)
• Cross-sectional study preferred for testing
validity of diagnostic/screening tests or risk
factor associations (e.g., NHANES III)
• Longitudinal cohort study preferred for
prognostic studies (e.g., Framingham)
• Case-control study best to examine effects
of a given agent in relation to occurrence of
an illness, esp. rare illnesses (e.g., cancer)
Questions to Ask Regarding Study
Design and Performance
• Was assignment of patients to treatments
randomized?
• Were all patients who entered the trial
accounted for?
• Was follow-up sufficiently long and complete?
• Were patients analyzed in the groups to which
they were randomized (intent to treat)?
• Were patients, health workers, and study
personnel “blinded” to treatment assignment?
Questions to Ask Regarding Study
Design and Performance (cont.)
• Were groups similar (or study sample
representative of population) at start of the
trial? (selection bias)
• Aside from experimental intervention, were
the groups treated equally? (performance
bias)
• Were objective and unbiased outcome
criteria used? (detection bias)
Questions to Ask Regarding
Statistical Analysis
• Was there sufficient power/sample size?
• Was the choice of statistical analysis
appropriate?
• Was the choice (and coding/classification) of
outcome and treatment variables appropriate?
• Is there an adequate description of magnitude
and precision of effect?
• Was there adjustment for potential confounders?
Will the results help me in caring
for my patients?
For a study evaluating therapy:
– Can the results be applied to my patient care?
(was the study or meta-analysis large enough
with adequate precision?)
– Were all clinically important treatment
outcomes considered? (were secondary
outcomes and adverse events assessed?)
– Are the likely treatment benefits worth the
potential harms and costs? (does the absolute
benefit outweight the risk of adverse events
and cost of therapy?)
Will the results help me in caring
for my patients (cont.)?
For a study evaluating prognosis:
– Were the study patients similar to my own?
(demographically representative, stage of
disease)
– Will the results lead directly to selecting or
avoiding therapy? (useful to know clinical
course of pts.)
– Are the results useful for reassuring or
counseling patients? (a valid, precise result of
a good prognosis is useful in this case)
Measures of Precision of Effect
• The p-value, or alpha error most commonly indicates the
precision of the result, with a low p-value corresponding to
a precise result.
• A t-statistic, Chi-square, or r-square value gives the relative
magnitude of a relation.
• An F-statistic (or multiple r-square) identifies the magnitude
of the variance in the dependent variable explained by the
treatment or explanatory variable(s)
• A Wald or Likelihood Ratio Chi-square statistic is frequently
used in logistic or Cox regression survival analysis.
• The higher the magnitude of the above statistics, the more
precise or stronger is the relationship between the
explanatory variable (s) and the outcome of interest.
Precision of Effect: The Confidence
Interval
• The estimate of where the true value of a result lies is
expressed within 95% confidence intervals, which will
contain the true relative risk or odds ratio 95% of the
time – corresponds to 2-tailed alpha=0.05 where the null
result value is excluded (e.g., RR=1.0 is excluded)
• 95% Confidence intervals are the RR + 1.96 X SE (since
SE is SD/ sqrt(N), confidence intervals are smallest
(precision greatest) with larger studies.
• 95% CI of the ARR is + 1.96 X square root of
([CER X (1-CER)/# control patients + EER X (1-EER)/#
of exp’l patients]
• 95% CI for NNT = 1 / [95% CI for ARR]
Where to Go for Help
• Epidemiology and statistics books
• Statistical Consulting Center
• Dean’s Scientific Review Committee considers appropriateness of research
design, procedures, statistical
considerations for UCI-COM investigator
initiated studies
Sample Size Considerations
• What level of difference between the two
groups constitutes a clinically significant
effect one wishes to detect? (e.g.,
difference in mean SBP response or
difference in treatment vs. control
incidence rates of CHD or relative risk; if
continuous outcome, know mean and SD.
Guidelines for Sample Size /
Power Determination
• Necessary for any research grant application
• Need to estimate what “control group” rate of
disease or outcome is
• Need to state what is minimum difference (effect
size) you want to detect that is clinically
significant--e.g., difference in rates, or risk ratio
• Either power can be estimated for a fixed sample
size at fixed alpha (usually 0.05 two-tailed) for
different effect, OR sample size can be estimated
for a given power (usually 0.80) for different effect
sizes
Statistical significance and
power
• Statistical significance is based on the Type I
or Alpha error
– the probability of rejecting the null hypothesis
when it was true (saying there was a relationship
when there isn’t one)
– usually we accept being wrong <5% of the time, or
alpha=0.05
– Setting alpha depends on how important it is that
we not make a mistake in our conclusion.
• The Type II or Beta error is the probability of
accepting the null when it was false
– saying there is no relationship when there is one
– power is 1-B, and 80% or 90% (beta error of 10%
or 20%) is conventional.
Power of a Test
• Power of a test is the probability of detecting a
true result or difference (rejecting the null
hypothesis of no difference when it is false), also
1-beta
• Beta error is the probability of accepting a false
null hypothesis (e.g., saying there is no
difference or relationship when there is one).
• For instance if the null hypothesis is Mean group
A = Mean group B. If A really is different from B,
beta error is likelihood of concluding there is no
difference (accepting a false null hypothesis).
Ideally this should be <0.20, so power is 1-beta,
or at least 0.80.
Fallacies in Presenting Results:
Statistically vs. Clinically Significant?
• Having a large sample size can virtually assure
statistically significant results even if the
correlation, odds ratio, or relative risk are low
• Conversely, an insufficient sample size can hide
(not significant) clinically important differences
(higher beta error or concluding no difference
when there is one)
• Statistical significance directly related to sample
size and magnitude of difference, and indirectly
related to variance in measure
Variable Classification
• What is your outcome (Y) (dependent variable) of interest?
– Categorical (binary, 3 or more categories) examples:
survival, CHD incidence, achievement of BP control
(yes vs. no)
– Continuous: change in blood pressure
• What is the main explanatory or independent variable (X)
of interest?
– Categorical (binary, 3 or more categories) examples:
treatment status (active vs. placebo), JNC-7 blood
pressure category (normal, pre-HTN, Stage 1 HTN,
Stage 2 HTN)
– Continuous: baseline systolic / diastolic blood pressure
Covariates / Confounders
• The relationship between X and Y may be
partially or completely due to one or more
covariates (C1, C2, C3, etc.) if these
covariates are related to both X and Y
• A comparison of baseline treatment group
differences in all possible known
covariates is often done and presented
• Covariates / confounders normally
equalized between groups only in
randomized clinical trial designs
Analyzing Effects of Confounders
• The effect of confounders can be
assessed by:
– Stratifying your analysis by levels of these
variables (e.g., examine relationship of X and
Y separately among levels of covariates C)
– Adjusting for covariates in a multivariable
analysis
– Considering interaction terms to test whether
effect of one factor (e.g., treatment) on
outcome varies by level of another factor
(e.g., gender)
Fallacies in Presenting Results:
Statistically vs. Clinically Significant?
• Having a large sample size can virtually assure
statistically significant results, but often with a
very low effect size or relative risk
• Conversely, an insufficient sample size can hide
(not significant) clinically important differences
where the effect size or relative risk may be
large.
• Statistical significance is directly related to
sample size and magnitude of effect or
difference, and indirectly related to variance in
measure.
Assessing Accuracy of a Test
TRUE DISEASE STATUS /
TREATMENT DIFFERENCE
TEST
RESULT
DISEASED /
YES
NONDISEASED TOTAL
/ NO
POSITIVE /
reject null
a
b
a+b
NEGATIVE /
accept null
c
d
c+d
TOTAL
a+c
b+d
a+b+c+
d
SENSITIVITY = a / (a+c)
SPECIFICITY = d / (b+d)
Pos. Pred. Value = a / (a+b) Neg. Pred. Value = d/(c+d)
False positive error (alpha, Type I) = b / (b+d)
False negative error (beta, Type II) = c/ (a+c)