Lettura ed interpretazione dei lavori clinici
Download
Report
Transcript Lettura ed interpretazione dei lavori clinici
Medical statistics for
cardiovascular disease
Part 1
Giuseppe Biondi-Zoccai, MD
Sapienza University of Rome, Latina, Italy
[email protected]
[email protected]
Learning milestones
•
•
•
•
•
Key concepts
Bivariate analysis
Complex bivariate analysis
Multivariable analysis
Specific advanced methods
Why do you need to know statistics?
CLINICIAN
RESEARCHER
A collection of methods
The EBM 3-step approach
How an article should be appraised, in 3 steps:
Step 1 – Are the results of the study (internally) valid?
Step 2 – What are the results?
Step 3 – How can I apply these results to patient care?
Guyatt and Rennie, Users’ guide to the medical literature, 2002
The Cochrane Collaboration Risk of Bias Tool
http://www.cochrane.org
The ultimate goal of any clinical or scientific
observation is the appraisal of causality
Bradford Hill causality criteria
• Force:* precisely defined (p<0.05, weaker criterion) and
with strong relative risk (≤0.83 or ≥1.20) in the absence
of multiplicity issues (stronger criterion)
• Consistency:* results in favor of the association must be
confirmed in other studies
• Temporality: exposition must precede in a realistic
fashion the event
• Coherence: hypothetical cause-effect relationship is not
in contrast with other biologic or natural history findings
*statistics is important here
Mente et al, Arch Intern Med 2009
Bradford Hill causality criteria
• Biologic gradient:* exposition dose and risk of disease are
positively (or negatively) associated on a continuum
• Experimental: experimental evidence from laboratory
studies (weaker criterion) or randomized clinical trials
(stronger criterion)
• Specificity: exposition is associated with a single disease
(does not apply to multifactorial conditions)
• Plausibility: hypothetical cause-effect relationship makes
sense from a biologic or clinical perspective (weaker
criterion)
• Analogy: hypothetical cause-effect relationship is based on
analogic reasoning (weaker criterion)
*statistics is important here
Mente et al, Arch Intern Med 2009
Randomization
• Is the technique which defines experimental
studies in humans (but not only in them), and
enables the correct application of statistical tests of
hypothesis in a frequentist framework (according to
Ronal Fischer theory)
• Randomization means assigning at random a
patient (or a study unit) to one of the treatments
• Over large numbers, randomization minimizes the
risk of imbalances in patient or procedural
features, but this does not hold true for small
samples and for a large set of features
Any clinical or scientific comparison
can be viewed as…
A battle between an underlying hypothesis (null,
H0), stating that there is no meaningful difference
or association (beyond random variability) between
2 or more populations of interest (from which we
are sampling) and an alternative hypothesis (H1),
which implies that there is a non-random
difference between such populations.
Any statistical test is a test trying to convince us
that H0 is false (thus implying the working
truthfulness of H1).
Falsifiability
• Falsifiability or refutability of a statement,
hypothesis, or theory is an inherent possibility
to prove it to be false.
• A statement is called falsifiable if it is possible
to conceive an observation or an argument
which proves the statement in question to be
false.
• In this sense, falsify is synonymous with
nullify, meaning not "to commit fraud" but
"show to be false
Statistical or clinical significance?
• Statistical and clinical significance are 2 very
different concepts.
• A clinically significant difference, if demostrated
beyond the play of chance, is clinically relevant
and thus merits subsequent action (if costs and
tolerability issues are not overcoming).
• A statistically significant difference is a
probabilistic concept and should be viewed in
light of the distance from the null hypothesis and
the chosen significance threshold.
Descriptive statistics
100
100
AVERAGE
Inferential statistics
If I become a scaffolder, how likely
I am to eat well every day?
P
values
Confidence
Intervals
Samples and populations
This is a sample
Samples and populations
And this is its
universal population
Samples and populations
This is another sample
Samples and populations
And this might be its
universal population
Samples and populations
But what if THIS is its
universal population?
Samples and populations
Any inference thus
depend on our confidence
in its likelihood
Alpha and type I error
Whenever I perform a test, there is thus a
risk of a FALSE POSITIVE result, ie
REJECTING A TRUE null hypothesis.
This error is called type I, is measured as
alpha and its unit is the p value.
The lower the p value, the lower the risk of
falling into a type I error (ie the HIGHER the
SPECIFICITY of the test).
Alpha and type I error
Type I error is
like a MIRAGE
Because I see something
that does NOT exist
Beta and type II error
Whenever I perform a test, there is also a risk
of a FALSE NEGATIVE result, ie NOT
REJECTING A FALSE null hypothesis.
This error is called type
II, is measured as
beta, and its unit is a probability.
The complementary of beta is called power.
The lower the beta, the lower the risk of
missing a true difference (ie the HIGHER the
SENSITIVITY of the test).
Beta and type II error
Type II error is
like being BLIND
Because I do NOT see
something that exists
Accuracy and precision
true value
measurement
spread
Accuracy measures the distance from the true value
Precision measures the spead in the measurements
Accuracy and precision test
Accuracy and precision
Thus:
• Precision expresses
the extent of
RANDOM ERROR
• Accuracy expresses
the extent of
SYSTEMATIC ERROR
(ie bias)
Validity
Internal validity entails both PRECISION
and ACCURACY (ie does a study provide a
truthful answer to the research question?)
External validity expresses the extent to
which the results can be applied to other
contexts and settings. It corresponds to the
distinction between SAMPLE and
POPULATION)
Intention-to-treat analysis
• Intention-to-treat (ITT) analysis is an
analysis based on the initial treatment
intent, irrespectively of the treatment
eventually administered.
• ITT analysis is intended to avoid various types of
bias that can arise in intervention research,
especially procedural, compliance and survivor
bias.
• However, ITT dilutes the power to achieve
statistically and clinically significant differences,
especially as drop-in and drop-out rates rise.
Per-protocol analysis
• In contrast to the ITT analysis, the per-protocol
(PP) analysis includes only those patients who
complete the entire clinical trial or other particular
procedure(s), or have complete data.
• In PP analysis each patient is
categorized according to the actual
treatment received, and not according
to the originally intended treatment
assignment.
• PP analysis is largely prone to bias,
and is useful almost only in
equivalence or non-inferiority studies.
ITT vs PP
100 pts
enrolled
50 pts to group A
(more toxic)
45 pts treated with A, 5
shifted to B because of poor
global health (all 5 died)
RANDOMIZATION
ACTUAL THERAPY
50 pts to group B
(conventional Rx, less toxic)
50 patients treated
with B (none died)
ITT vs PP
100 pts
enrolled
50 pts to group A
(more toxic)
45 pts treated with A, 5
shifted to B because of poor
global health (all 5 died)
RANDOMIZATION
ACTUAL THERAPY
50 pts to group B
(conventional Rx, less toxic)
50 patients treated
with B (none died)
• ITT: 10% mortality in group A vs 0% in group
B, p=0.021 in favor of B
ITT vs PP
100 pts
enrolled
50 pts to group A
(more toxic)
45 pts treated with A, 5
shifted to B because of poor
global health (all 5 died)
RANDOMIZATION
ACTUAL THERAPY
50 pts to group B
(conventional Rx, less toxic)
50 patients treated
with B (none died)
• ITT: 10% mortality in group A vs 0% in group
B, p=0.021 in favor of B
• PP: 0% (0/45) mortality in group A vs 9.1%
(5/55) in group B, p=0.038 in favor of A
Mean (arithmetic)
Characteristics:
-summarises information well
-discards a lot of information
(dispersion??)
x
x
N
Assumptions:
-data are not skewed
– distorts the mean
– outliers make the mean very different
-Measured on measurement scale
– cannot find mean of a categorical measure
‘average’ stent diameter may be meaningless
Median
What is it?
– The one in the middle
– Place values in order
– Median is central
Definition:
– Equally distant from all other values
Used for:
– Ordinal data
– Skewed data / outliers
Standard deviation
Standard deviation (SD):
– approximates population σ
SD
as N increases
Variance
Advantages:
– with mean enables powerful synthesis
mean±1*SD 68% of data
mean±2*SD 95% of data
mean±3*SD 99% of data
Disadvantages:
– is based on normal assumptions
2
( x x )
N - 1
(1.96)
(2.86)
Interquartile range
25% to 75% percentile
or
1° to 3° quartile
16.5
1st-3rd Quartile
=16.5; 23.5
Interquartile Range
=23.5-16.5=7.0
Median
23.5
Variable
type
Continuous
Patient ID
Lesion Length
11
14
6
15
7
16
3
17
1
18
8
18
10
19
9
21
12
22
5
23
2
24
4
25
13
27
Coefficient of variation
CV =
Standard deviation
x 100
Mean
Coefficient of variation (CV) is a index of relative variability
CV is dimensionless
CV enables you to compare data dispersion of variables with
different units of measurement
Learning milestones
•
•
•
•
•
Key concepts
Bivariate analysis
Complex bivariate analysis
Multivariable analysis
Specific advanced methods
Point estimation & confidence intervals
• Using summary statistics (mean and standard
deviation for normal variables, or proportion
for categorical variable) and factoring sample
size, we can build confidence intervals or test
hypotheses that we are sampling from a given
population or not
• This can be done by creating a powerful tool,
which weighs our dispersion measures by
means of the sample size: the standard error
First you need the SE
• We can easily build the standard error of a
proportion, according to the following
formula:
SE =
P * (1-P)
n
Where variance=P*(1-P) and n is the
sample size
Point estimation & confidence intervals
• We can then create a simple test to check
whether the summary estimate we have found
can be compatible according to random variation
with the corresponding reference population
mean
• The Z test (when the population SD is known) and
the t test (when the population SD is only
estimated), are thus used, and both can be
viewed as a signal to noise ratio
Signal to noise ratio
Signal to noise ratio
=
Signal
Noise
From the Z test…
Signal to noise ratio
=
Signal
Noise
Absolute difference
in summary
estimates
Z score =
Standard error
Results of z score correspond to a distinct tail probability of the Gaussian curve (eg
1.96 corresponds to a 0.025 one-tailed probability or 0.050 two-tailed probability)
…to confidence intervals
Standard error (SE or SEM) can be used to test a hypothesis or
create a confidence interval (CI) around a mean for a
continuous variable (eg mortality rate)
SE =
SD
n
95% CI = mean ± 2 SE
95% means that, if we repeat the study 20 times, 19 times out of
20 we will included the true population average
Ps and confidence intervals
P values and confidence intervals
are strictly connected
Any hypothesis test providing a
significant result (eg p=0.045) means that
we can be confident at 95.5% that the
population average difference lies far
from zero (ie the null hypothesis)
Trivial difference
Important difference
P values and confidence intervals
Ho
significant
difference
(p<0.05)
non significant
difference
(p>0.05)
Power and sample size
Whenever designing a study or analyzing a dataset, it is
important to estimate the sample size or the power of the
comparison.
SAMPLE SIZE
Setting a specific alpha and a specific beta, you calculate
the necessary sample size given the average inter-group
difference and its variation.
POWER
Given a specific sample size and alpha, in light of the
calculated average inter-group difference and its
variation, you obtain an estimate of the power (ie 1-beta).
Hierachy of analysis
• A statistical analysis:
– Univariate (e.g. when describing mean or
standard deviation)
– Bivariate (e.g. when comparing age in men adn
women)
– Multivariable (e.g. when appraising how age
and gender impact on the risk of death)
– Multivariate (e.g. when appraising how age and
gender simultaneously impact on risk of death
and hospital costs)
Types of variables
Variables
PAIRED
OR
REPEATED
MEASURES
eg
blood pressure measured
twice in the same patients
at different times
UNPAIRED
OR
INDEPENDENT
MEASURES
eg
blood pressure measured
in several different groups
of patients only once
Types of variables
Variables
CATEGORY
nominal
QUANTITY
ordinal
ordered
categories
ranks
discrete
continuous
counting
measuring
Statistical tests
Are data categorical or continuous?
Categorical data: compare
proportions in groups
Continuous data: compare means
or medians in groups
How many groups?
Two groups;
normal data?
Non-normal data; use
Mann Whitney U test
Normal data;
use t test
More than two groups;
normal data?
Normal data;
use ANOVA
Non-normal
data; use
Kruskal Wallis
Student t test
•It is used to test the null hypothesis that the means of
two normally distributed populations are equal
•Given two data sets (each with its mean, SD and number
of data points) the t test determines whether the means
are distinct, provided that the underlying distributions
can be assumed to be normal
•the Student t test should be used if the variances (not
known) of the two populations are also assumed to be
equal; the form of the test used when this assumption is
dropped is sometimes called Welch's t test
Mann Whitney rank sum U test
Ranks
late loss
typestent
A
chyper
B
taxus
Total
N
267
295
562
Mean Rank
266,65
294,94
Test Statisticsa
late loss
Mann-Whitney U
35416,500
Wilcoxon W
71194,500
Z
-2,063
Asymp. Sig. (2-tailed)
,032
a. Grouping Variable: typestent
Sum of Ranks
71194,50
87008,50
Paired Student t test
55.1% (7.4)
48.7% (8.3)
Only 11 patients !!!
EF at baseline and FU in patients treated with BMC for MI
Significant increase in EF by paired t test P=0.005
MAGIC, Lancet 2004
Wilcoxon signed rank test
Descriptive Statistics
N
mld pos t
mld fu
562
562
Minimum
1,50
,00
Maximum
4,40
4,31
25th
2,4400
1,8700
Percentiles
50th (Median)
2,7500
2,4000
75th
3,1000
2,8400
Ranks
N
mld fu - mld pos t Negative Ranks
Pos itive Ranks
Ties
Total
407 a
153 b
2c
562
Mean Rank
322,51
168,74
Sum of Ranks
131263,00
25817,00
Test Statisticsb
a. mld fu < mld pos t
b. mld fu > mld pos t
c. mld fu = mld pos t
Z
Asymp. Sig. (2-tailed)
mld fu mld pos t
-13,764 a
,000
a. Bas ed on positive ranks.
b. Wilcoxon Signed Ranks Tes t
1-way ANOVA
•As with the t-test, ANOVA is appropriate when
the data are continuous, when the groups are
assumed to have similar variances, and when the
data are normally distributed
•ANOVA is based upon a comparison of variance
attributable to the independent variable
(variability between groups or conditions)
relative to the variance within groups resulting
from random chance. In fact, the formula
involves dividing the between-group variance
estimate by the within-group variance estimate
Post-hoc test
Descriptives
blood pressure post 2 months
N
plac
A
B
Total
5
4
4
13
Mean
90,00
79,00
86,00
85,38
Std. Deviation
2,646
4,082
1,633
5,455
Std. Error
1,183
2,041
,816
1,513
95% Confidence Interval for
Mean
Lower Bound Upper Bound
86,71
93,29
72,50
85,50
83,40
88,60
82,09
88,68
Minimum Maximum
87
94
74
84
84
88
74
94
Multiple Comparisons
Dependent Variabl e: bl ood pres sure pos t 2 m onths
Bonferroni
(I) drug
plac
A
B
(J) drug
A
B
plac
B
plac
A
Mean
Difference
(I-J)
11,00*
4,00
-11,00*
-7,00*
-4,00
7,00*
Std. Error
1,967
1,967
1,967
2,074
1,967
2,074
Sig.
,001
,208
,001
,021
,208
,021
*. The m ean difference is si gnifi cant at the .05 l evel.
95% Confidence Interval
Lower Bound
Upper Bound
5,35
16,65
-1,65
9,65
-16,65
-5,35
-12,95
-1,05
-9,65
1,65
1,05
12,95
Kruskal Wallis test
Ranks
blood pres s ure
pos t 1 month
drug
plac
A
B
Total
N
5
4
4
13
Test Statisticsa, b
Chi-Square
df
Asym p. Sig.
blood
press ure post
1 m onth
8,339
2
,015
a. Kruskal Wallis Tes t
b. Grouping Variable: drug
Post-hoc analysis with
Mann Withney U and
Bonferroni correction
Mean Rank
10,50
3,13
6,50
Compare continuous variables
Three (or more) paired groups
Again ask yourself… Parametric or not?
If parametric: ANOVA for repeated measures
in SPSS… in the General Linear Model
If non-parametric: Friedman test
Friedman test
a
Descriptive StatisticsDescriptive
Statisticsa
iptive Statisticsa
Descriptive Statisticsa
Percentiles
Percentiles
Percentiles
Percentiles
Mean
Std.Maximum
Deviation
Minimum
Maximum
Mean
Std. Deviation
Minimum
Maximum
25th
50th (Median)
75th
n MinimumN Maximum
25th
50th
(Median)
75th
N
Mean
Std. N
Deviation
Minimum
25th
50th
(Median) 25th75th 50th (Median)
blood
s ure
pre
ure
4
88 93,75 92
88,50
90,00
4 s ure pre
90,00 88,00
4,082
9590,00 86
86,25 1,633
8 pre blood
86 pres
94
90,00 85 92,00
5 pres
90,00
2,828
94 90,0088,00
90,00
92,00
blood pres s ure
ure
blood pres s ure
4
83 83,75 87
83,50
85,00
80,00 87,505 4,08290,00
75 92,00
8585,00 85
76,25 1,633
0
85 4
93
3,000
93 80,0087,50
91,00
92,00
pos t 1 month 91,00
pos t 1 month
blood pres s ure
ure
blood pres s ure
4
84 82,75 88
84,50
86,00
79,00 88,005 4,08290,00
74 92,50
8486,00 87
75,25 1,633
6s
87 4
94
2,646
94 79,0088,00
89,00
92,50
pos t 2 months 89,00
pos t 2 months
a. drug = B
a. drug = plac
A
Ranksa
blood pres s ure pre
blood pres s ure
pos t 1 month
blood pres s ure
pos t 2 months
Ranksa
Mean Rank
2,00
1,90
2,10
a. drug = plac
5
,111
2
,946
Mean Rank
3,00
2,00
1,00
87,50
blood pres s ure pre
blood pres s ure
pos t 1 month
blood pres s ure
pos t 2 months
Mean Rank
3,00
1,00
2,00
a. drug = B
Test Statisticsa, b
N
Chi-Square
df
Asymp. Sig.
86,50
Ranksa
a. drug = A
Test Statisticsa, b
N
Chi-Square
df
Asymp. Sig.
blood pres s ure pre
blood pres s ure
pos t 1 month
blood pres s ure
pos t 2 months
75th
91,50
4
8,000
2
,018
Test Statisticsa, b
N
Chi-Square
df
Asymp. Sig.
4
8,000
2
,018
a. Friedman Tes t
a. Friedman Tes t
a. Friedman Tes t
b. drug = plac
b. drug = A
b. drug = B
2-way ANOVA
A mixed-design ANOVA is used to test for differences between independent
groups whilst subjecting participants to repeated measures. In a mixed-design
ANOVA model, one factor is a between-subjects variable (drug) and the other is
within-subjects variable (BP)
CAMELOT, JAMA 2004
Binomial test
Is the percentage of diabetics in this sample
comparable with the known chronic AF population? We
assume the population rate is at 15%
Binomial Test
DIABETES
Group 1
Group 2
Total
Category
yes
no
N
5
8
13
Obs erved
Prop.
,38
,62
1,00
Tes t Prop.
,15
Exact Sig.
(1-tailed)
,034
Compare discrete variables
The second basis is the “observed”-“expected” relation
Compare event rates
• Absolute Risk (AR)
7.9% (47/592) & 15.1% (89/591)
• Absolute Risk Reduction (ARR)
7.9% (47/592) – 15.1% (89/591) = -7.2%
• Relative Risk (RR)
7.9% (47/592) / 15.1% (89/591) = 0.52
(given an equivalence value of 1)
• Relative Risk Reduction (RRR)
1 – 0.52 = 0.48 or 48%
• Odds Ratio (OR)
8.6% (47/545) / 17.7% (89/502) = 0.49
(given an equivalence value of 1)
Post-hoc groups
the chi-square test was used to determine
differences between groups with respect to
the primary and secondary end points. Odds
ratios and their 95 percent confidence
intervals were calculated. Comparisons of
patient
characteristics
and
survival
outcomes were tested with the chi-square
test, the chi-square test for trend, Fisher's
exact test, or Student's t-test, as
appropriate.
This is a sub-group !
Bonferroni !
The level of significant
p-value should be
divided by the number
of tests performed…
Or the computed p-value,
multiplied for the number
of tests… P=0.12 and not P=0.04 !!
Wenzel et al, NEJM 2004
Fisher Exact test
Exp
Ctrl
Event
a
b
r1
No event
c
d
r2
s1
s2
N
P=
s1! * s2! * r1! * r2!
N! * a! * b! * c! * d!
McNemar test
• The McNemar test is a hypothesis test to compare categorical
variables in related samples.
• For instance, how can I appraise the statistical significance of
change into symptom status (asymptomatic vs symptomatic) in
the same patients over time?
• The McNemar test exploits the discordant pairs to generate a p
value
Follow-up
symptomatic
Follow-up
asymptomatic
Baseline
symptomatic
15
3
Baseline
asymptomatic
5
17
p=0.72 at McNemar test
Follow-up
symptomatic
Follow-up
asymptomatic
Baseline
symptomatic
15
0
Baseline
asymptomatic
8
17
p=0.013 at McNemar test
Take home messages
Take home messages
• Biostatistics is best seen as a set of different
tools and methods which are used according the
problem at hand.
• Nobody is experienced in statistics at beginning,
and only by facing everyday real-world problems
you can familiarize yourself with different
techniques and approaches.
• In general terms, it is also crucial to remember
that the easiest and simplest way to solve a
statistical problem, if appropriate, is also the
best and the one recommended by reviewers.
Many thanks for your attention!
For any query:
[email protected]
[email protected]
For these slides and similar slides:
http://www.metcardio.org/slides.html
For similar slides:
http://www.metcardio.org/slides.html
Medical statistics for
cardiovascular disease
Part 2
Giuseppe Biondi-Zoccai, MD
Sapienza University of Rome, Latina, Italy
[email protected]
[email protected]
Learning milestones
•
•
•
•
•
Key concepts
Bivariate analysis
Complex bivariate analysis
Multivariable analysis
Specific advanced methods
Linear regression
Which of these different possible lines
that I can graphically trace and compute
is the best regression line?
400
Time to restenosis (days)
350
300
y'
250
y
200
150
100
50
0
0
10
20
30
40
50
60
Lesion Lenght
It can be intuitively understood that it is the line that minimizes the
differences between observed values (yi) and estimated values (yi’)
Correlation
• The square root of the coefficient of
determination (R2) is the correlation
coefficient (R) and shows the degree
of linear association between 2
continuous variables, but disregards
K. Pearson
causation.
• Assumes values between -1.0 (negative
association), 0 (no association), and +1.0 (positive
association).
• It can be summarized as a point summary
estimate, with specific standard error, 95%
confidence interval, and p value.
Dangers of not plotting data
4 sets of data: all with the same R=0.81!*
*At linear regression analysis
What about non-linear associations?
Each number correspond to the correlation
coefficient for linear association (R)!!!
Pearson vs Spearman
• Whenever the independent and dependent variables
can be assumed to belong to normal distributions, the
Pearson linear correlation method can be used,
maximizing statistical power and yield.
• Whenever the data are sparse,
rare, and/or not belonging to
normal distributions, the nonparametric Spearman correlation
method should be used, which
yields the rank correlation
coefficient (rho), but not its R2.
C. Spearman
Difference of A – B in each case
Bland Altman plot
Mean of measurement A and B in each case
Regression to the mean:
don’t bet on past rookies of the year!
Ecological fallacy
Ecological fallacy
Logistic regression
• We model ln [p/(1-p)] instead of just p, and the
linear model is written :
ln [p/(1-p)] = ln(p) – ln(1-p) = β0 + β1*X
• Logistic regression is based on the logit which
transforms a dichotomous dependent variable
into a continuous one
Generalized Linear Models
• All generalized linear models have three components :
– Random component identifies the response variable and
assumes a probability distribution for it
– Systematic component specifies the explanatory variables
used as predictors in the model (linear predictor).
– Link describes the functional relationship between the
systematic component and the expected value (mean) of
the random component.
• The GLM relates a function of that mean to the explanatory
variables through a prediction equation having linear form.
• The model formula states that: g(µ) = α + β1x1 + … + βkxk
Generalized Linear Models
• Through differing link functions, GLM
corresponds to other well known models
Distribution
Name
Normal
Exponential
Gamma
Identity
Inverse
Gaussian
Poisson
Inverse
squared
Log
Binomial
Logit
Inverse
Link Function
Mean Function
Survival analysis
• Patients experiencing one or more events are called
responders
• Patients who, at the end of the observational period or
before such time, get out of the study without having
experienced any event, are called censored
Survival analysis
2
A
4
6
8
10
12
6
8
10
12
x
B
x
A and F: events
D
Study end
E
Lost
F
Withdrawn
C
F
Lost
x
B, C, D and E: censored
Study end
Study end
Withdrawn
D
E
4
A
x
B
C
2
Product limit (Kaplan-Meier) analysis
Kaplan-Meier curves and SE
Serruys et al, NEJM 2010
Learning milestones
•
•
•
•
•
Key concepts
Bivariate analysis
Complex bivariate analysis
Multivariable analysis
Specific advanced methods
Multivariable statistical methods
Goal is to explain the variation in the dependent variable by
other variables simultaneously.
Independent and dependent variables
Independent
Predictors
Dependent
Response
Regressors
Explanatory
Prognostic Factors
Manipulated
Ind. Var. : have an effect on ...
Dep. Var.: is influenced by …
Bivariate statistical methods
One Dep. Var. ~ One Ind. Var.
Qualitative
Qualitative
Chi²
Quantitative
Logistic Reg.
Qualitative
Anova 1
I.V.
D. V.
Quantitative
I.V.
Quantitative
Simple Regression
Multivariable statistical methods
One Dep. Var. ~ Several Ind. Var.
Qualitative
Qualitative
Chi²
Quantitative
Logistic Reg.
Qualitative
Anova 1
I.V.
D. V.
Quantitative
I.V.
Quantitative
Simple Regression
Multivariable analysis
• The methods mentioned have specific application
domains depending on the nature of the variables
involved in the analysis.
• But conceptually and qua calculation there are a lot of
similarities between these techniques.
• Each of the multivariable methods evaluates the effect
of an independent variable on the dependent variable,
controlling for the effect of other independent variables.
• Methods such as multiple regression, multi-factor
ANOVA, analysis of covariance have the same
assumptions towards the distribution of the dependent
variable.
• We will learn more about the concepts of multivariable
analysis by reviewing the simple linear regression model.
Multiple linear regression
• Simple linear regression is a statistical model to predict the
value of one continuous variable Y (dependent, response)
from another continuous variable X (independent, predictor,
covariate, prognostic factor).
• Multiple linear regression is a natural extension of the simple
linear regression model
– We use it to investigate the effect on the response variable
of several predictor variables, simultaneously
– It is a hypothetical model of the relationship between
several independent variables and a response variable.
• Let’s start by reviewing the concepts of the simple linear
regression model.
Multiple regression models
• Model terms may be divided into the following categories
–
–
–
–
–
Constant term
Linear terms / main effects (e.g. X1)
Interaction terms (e.g. X1X2)
Quadratic terms (e.g. X12)
Cubic terms (e.g. X13)
• Models are usually described by the highest term present
– Linear models have only linear terms
– Interaction models have linear and interaction terms
– Quadratic models have linear, quadratic and first order interaction
terms
– Cubic models have terms up to third order.
The model-building process
Source:
Applied Linear Statistical Models,
Neter, Kutner, Nachtsheim, Wasserman
AIC and BIC
AIC (Akaike Information Criterion) and BIC (Swarz Information Criterion)
are two popular model selection methods. They not only reward goodness
of fit, but also include a penalty that is an increasing function of the
number of estimated parameters. This penalty discourages overfitting.
The preferred model is the one with the lowest value for AIC or for BIC.
These criteria attempt to find the model that best explains the data with
a minimum of free parameters. The AIC penalizes free parameters less
strongly than does the Schwarz criterion.
AIC = 2k + n [ln (SSError / n)]
BIC = n ln (SSError / n) + k ln(n)
Two-Factor ANOVA
Introduction
• A method for simultaneously analyzing two factors
affecting a response.
– Group effect: treatment group or dose level
– Blocking factor whose variation can be separated from
the error variation to give more precise group
comparisons: study center, gender, disease severity,
diagnostic group, …
• One of the most common ANOVA methods used
in clinical trial analysis.
• Similar assumptions as for single-factor anova.
• Non-parametric alternative : Friedman test
Two-Factor ANOVA
The Model
Response score of
subject k in column
i and row j
Effect of treatment
factor (a levels or i
columns )
Interaction
effect
X ijk i j ( )ij ijk
Overall
Mean
Effect of blocking
factor (b levels
or j rows)
Error or
Effect of not
measured
variables
Analysis of Covariance
ANCOVA
• Method for comparing response means among
two or more groups adjusted for a quantitative
concomitant variable, or “covariate”, thought to
influence the response.
• The response variable is explained by
independent quantitative variable(s) and
qualitative variable(s).
• Combination of ANOVA and regression.
• Increases the precision of comparison of the
group means by decreasing the error variance.
• Widely used in clinical trials
Analysis of Covariance
The model
• The covariance model for a single-factor with fixed
levels adds another term to the ANOVA model,
reflecting the relationship between the response
variable and the concomitant variable.
Yij i ( X ij X ) ij
• The concomitant variable is centered around the
mean so that the constant µ represents the overall
mean in the model.
Repeated-Measures
Basic concepts
• ‘Repeated-measures’ are measurements taken from the same subject
(patient) at repeated time intervals.
• Many clinical studies require:
– multiple visits during the trial
– response measurements made at each visit
• A repeated measures study may involve several treatments or only a
single treatment.
• ‘Repeated-measures’ are used to characterize a response profile over
time.
• Main research question:
– Is the mean response profile for one treatment group the same as for
another treatment group or a placebo group ?
• Comparison of response profiles can be tested with a single F-test.
Repeated-Measures
Comparing profiles
Source:
Common Statistical Methods
for Clinical Research, 1997,
Glenn A. Walker
Repeated Measures ANOVA
Random Effects – Mixed Model
Response:
miles
Tests wrt Random Eff ects
Summary of Fit
RSquare
Source
MS Num
DF Num
F Ratio
Prob>F
subject[species]
17,1667
4,29167
4
2,8879
0,0588
1,219062
season
47,4583
15,8194
3
10,6449
0,0005
4,458333
species
51,0417
51,0417
1
11,8932
0,0261
0,838417
RSquare Adj
0,75224
Root Mean Square Error
Mean of Response
Observ ations (or Sum Wgts)
SS
24
Parameter Estimates
Term
Std Error
t Ratio
Prob>|t|
Intercept
4,4583333
Estimate
0,24884
17,92
<,0001
species[COY OTE]:subject[1-3]
-0,666667
0,49768
-1,34
0,2003
species[COY OTE]:subject[2-3]
-0,666667
0,49768
-1,34
0,2003
species[FOX]:subject[1-3]
-1
0,49768
-2,01
0,0628
species[FOX]:subject[2-3]
0,25
0,49768
0,50
0,6227
-0,625
0,431003
-1,45
0,1676
0,431003
3,96
0,0012
0,431003
2,03
0,0605
0,24884
5,86
<,0001
season[f all-winter]
season[spring-winter]
season[summer-winter]
species[COY OTE-FOX]
Prediction
Formula
1,7083333
0,875
1,4583333
What are your conclusions
about the between subjects
species effect and the within
subjects season effect ?
Repeated Measures ANOVA
Correlated Measurements – Multivariate Model
Response
Profiles
Multi-variate
F-tests
All Between
Test
Exact F
DF Num
DF Den
Prob>F
Wilks' Lambda
0,2516799
Value
11,8932
1
4
0,0261
Pillai's Trace
0,7483201
11,8932
1
4
0,0261
Hotelling-Lawley
2,973301
11,8932
1
4
0,0261
Roy's Max Root
2,973301
11,8932
1
4
0,0261
Logistic regression
Sangiorgi et al, AHJ 2008
Multiple Regression
SPSS Variable Selection Methods
• Enter. A procedure for variable selection in which all variables
in a block are entered in a single step.
• Forward Selection (Likelihood Ratio). Stepwise selection
method with entry testing based on the significance of the
score statistic, and removal testing based on the probability of
a likelihood-ratio statistic based on the maximum partial
likelihood estimates.
• Backward Elimination (Likelihood Ratio). Backward stepwise
selection. Removal testing is based on the probability of the
likelihood-ratio statistic based on the maximum partial
likelihood estimates.
Cox PH analysis
• Problem
– Can’t use ordinary linear regression because how do
we account for the censored data?
– Can’t use logistic regression without ignoring the time
component
• with a continuous outcome variable we use linear
regression
• with a dichotomous (binary) outcome variable we use
logistic regression
• where the time to an event is the outcome of interest,
Cox regression is the most popular regression technique
Cox PH analysis
1,0
,9
MACE Free Survival
,8
,7
,6
,5
,4
,3
,2
,1
0,0
0
100
200
300
Time
Variables in the Equation
Diabetes
B
,710
SE
,204
Wald
12,066
Cosgrave et al, AJC 2005
df
1
Sig.
,001
Exp(B)
2,034
95,0% CI for Exp(B)
Lower
Upper
1,363
3,036
400
Harell C index
Learning milestones
•
•
•
•
•
Key concepts
Bivariate analysis
Complex bivariate analysis
Multivariable analysis
Specific advanced methods
Question: When there are many
confounding covariates needed to adjust for:
– Matching: based on many covariates is not
practical
– Stratification: is difficult, as the number of
covariates increases, the number of strata
grows exponentially:
• 1 covariate: 2 strata 5 covariates: 32 (25) strata
– Regression adjustment: may not be
possible: potential problem: over-fitting
Propensity score
• Replace the collection of confounding
covariates with one scalar function of
these covariates
Age
Gender
Ejection fraction
Risk factors
Lesion characteristics
…
1 composite covariate:
Propensity Score
Balancing score
Comparability
Estimated Propensity Score
1.0
0.8
0.6
0.4
0.2
0.0
Ctl
Trt
No comparison possible…
Compare treatments with
propensity score
• Three common methods of using the
propensity score to adjust results:
–
Matching
– Stratification
–
Regression adjustment
Goal of a clinical trial is appraisal
of…
• Superiority: difference in biologic effect or
clinical effect
• Equivalence: lack of meaningful/clinically
relevant difference in biologic effect or
clinical effect
• Non-inferiority: lack of meaningful/clinically
relevant increase in adverse clinical event
Superiority RCT
• Possibly greatest medical invention ever
• Randomization of adequate number of subjects
ensures prognostically similar groups at study
beginning
• If thorough blinding is enforced, even later on
groups maintain similar prognosis (except for
effect of experiment)
• Sloppiness/cross-over makes arm more similar > traditional treatment is not discarded
• Per-protocol analysis almost always misleading
Equivalence/non-inferiority RCT
• Completely different paradigm
• Goal is to conclude new treatment is not
“meaningfully worse” than comparator
• Requires a subjective margin
• Sloppiness/cross-over makes arm more
similar -> traditional treatment is more likely
to be discarded
• Per-protocol analysis possibly useful to
analyze safety, but bulk of analysis still
based on intention-to-treat principle
Superiority, equivalence or noninferiority?
Vassiliades et al, JACC 2005
Possible outcomes in a non-inferiority trial
(observed difference & 95% CI)
A
B
C
D
E
F
G
Superior
Non-inferior
Non-inferior
Tricky (& rare)
Inconclusive
Inconclusive
Inferior, but
Inferior
H
0
Treatment Difference
Delta
New Treatment Better New Treatment Worse
Typical non-inferiority design
Hiro et al, JACC 2009
Cumulative meta-analysis
Antman et al, JAMA 1992
Meta-analysis of intervention studies
De Luca et al, EHJ 2009
Funnel plot
Review :
Late percutaneous coronary intervention for infarct-related artery occlusion
Comparison: 01 Late percutaneous coronary intervention vs best medical therapy for infarct-related artery occlusion
Outcome:
01 Death
0.0
SE(log OR)
0.4
0.8
1.2
1.6
0.1
0.2
0.5
1
2
5
10
OR (fixed)
Indirect and network meta-analyses
Indirect
Direct plus
indirect
(i.e. network)
Jansen et al, ISPOR 2008
Resampling
• Resampling refers to the use of the observed
data or of a data generating mechanism (such
as a die or computer-based simulation) to
produce new hypothetical samples, the
results of which can then be analyzed.
• The term computer-intensive methods also
is frequently used to refer to techniques such
as these…
Bootstrap
• The bootstrap is a modern, computer-intensive,
general purpose approach to statistical
inference, falling within a broader class of
resampling methods.
• Bootstrapping is the practice of estimating
properties of an estimator (such as its variance)
by measuring those properties when sampling
from an approximating distribution.
• One standard choice for an approximating
distribution is the empirical distribution of the
observed data.
Jacknife
• Jacknifing is a resampling method based
on the creation of several subsamples
by excluding a single case at the time.
• Thus, the are only N jacknife samples for
any given original sample with N cases.
• After the systematic recomputation of the
statistic estimate of choice is completed,
an point estimate and an estimate for the
variance of the statistic can be calculated.
The Bayes theorem
The Bayes theorem
The main feature of Bayesian
statistics is that it takes into account
prior knowledge of the hypothesis
Bayes theorem
Likelihood of
hypothesis (or
conditional
probability of B)
Prior (or marginal)
probability
of hypothesis
P (H | D) P (D | H) * P (H)
_____________
P (H | D) =
P (D)
Posterior (or conditional)
probability of hypothesis H
Probability of the data (prior
or marginal probability of B:
normalizing constant)
Thus it relates the conditional and marginal probabilities of two random
events and it is often used to compute posterior probabilities given
observations.
Frequentists vs Bayesians
“Classical” statistical inference
vs Bayesians inference
Before the next module, a question
for you: who is a Bayesian?
A Bayesian is who, vaguely
expecting a horse, and catching a
glimpse of a donkey, strongly
believes he has seen a mule
Before the next module, a question
for you: who is a Bayesian?
A Bayesian is who, vaguely
expecting a horse, and catching a
glimpse of a donkey, strongly
believes he has seen a mule
Before the next module, a question
for you: who is a Bayesian?
A Bayesian is who, vaguely
expecting a horse, and catching a
glimpse of a donkey, strongly
believes he has seen a mule
JMP
Statistical Discovery Software
•
•
•
•
•
•
•
JMP is a software package that was first developed by John Sall, co-founder of SAS,
to perform simple and complex statistical analyses. It dynamically links statistics
with graphics to interactively explore, understand, and visualize data. This allows
you to click on any point in a graph, and see the corresponding data point
highlighted in the data table, and other graphs.
JMP provides a comprehensive set of statistical tools as well as design of
experiments and statistical quality control in a single package.
JMP allows for custom programming and script development via JSL, originally
know as "John's Scripting Language“.
An add-on JMP Genomics comes with over 100 analytic procedures to facilitate the
treatment of data involving genetics, microarrays or proteomics.
Pros: very intuitive, lean package for design and analysis in research
Cons: less complete and less flexible than the complete SAS system
Price: €€€€.
R
• R is a programming language and software environment for
statistical computing and graphics, and it is an implementation of
the S programming language with lexical scoping semantics.
• R is widely used for statistical software development and data
analysis. Its source code is freely available under the GNU General
Public License, and pre-compiled binary versions are provided for
various operating systems. R uses a command line interface, though
several graphical user interfaces are available.
• Pro: flexibility and programming capabilities (eg for bootstrap),
sophisticated graphical capabilities.
• Cons: complex and user-unfriendly interface.
• Price: free.
S and S-Plus
• S-PLUS is a commercial package sold by TIBCO Software Inc.
with a focus on exploratory data analysis, graphics and
statistical modeling
• It is an implementation of the S programming language. It
features object-oriented programming capabilities and
advanced analytical algorithms (eg for robust regression,
repeated measurements, …)
• Pros: flexibility and programming capabilities (eg for
bootstrap), user-friendly graphical user interface
• Cons: complex matrix programming environment
• Price: €€€€-€€.
SAS
• SAS (originally Statistical Analysis System, 1968) is an integrated
suite of platform independent software modules provided by SAS
Institute (1976, Jim Goodnight and Co).
• The functionality of the system is very complete and built around
four major tasks: data access, data management, data analysis and
data presentation.
• Applications of the SAS system include: statistical analysis, data
mining, forecasting; report writing and graphics; operations
research and quality improvement; applications development; data
warehousing (extract, transform, load).
• Pros: very complete tool for data analysis, flexibility and
programming capabilities (eg for Bayesian, bootstrap, conditional, or
meta-analyses), large volumes of data
• Cons: complex programming environment, labyrinth of modules and
interfaces, very expensive
• Price: €€€€-€€€€
Statistica
• STATISTICA is a powerful statistics and analytics software package
developed by StatSoft, Inc.
• Provides a wide selection of data analysis, data management, data mining,
and data visualization procedures. Features of the software include basic
and multivariate statistical analysis, quality control modules and a
collection of data mining techniques.
• Pros: extensive range of methods, user-friendly graphical interface, has
been called “the king of graphics”
• Cons: limited flexibility and programming capabilities, labyrinth
• Price: €€€€.
SPSS
• SPSS (originally, Statistical Package for the Social Sciences) is a
computer program used for statistical analysis released in its
first version in 1968 and now distributed by IBM.
• SPSS is among the most widely used programs for statistical
analysis in social science. It is used by market researchers,
health researchers, survey companies, government, education
researchers, marketing organizations and others.
• Pros: extensive range of tests and procedures, user-friendly
graphical interface.
• Cons: limited flexibility and programming capabilities.
• Price: €€€€.
Stata
• Stata (name formed by blending "statistics" and "data“) is a
general-purpose statistical software package created in 1985
by StataCorp.
• Stata's full range of capabilities includes: data management,
statistical analysis, graphics generation, simulations, custom
programming. Most meta-analyses tools were first developed
for Stata, and thus this package offers one of the most
extensive library of statistical tools for systematic reviewers
• Pros: flexibility and programming capabilities (eg for
bootstrap, or meta-analyses), sophisticated graphical
capabilities
• Cons: relatively complex interface
• Price: €€-€€€
WinBUGS and OpenBUGS
• WinBUGS (Windows-based Bayesian inference Using
Gibbs Sampling) is a statistical software for the Bayesian
analysis of complex statistical models using Markov chain
Monte Carlo (MCMC) methods, developed by the MRC
Biostatistics Unit, at the University of Cambridge, UK. It is
based on the BUGS (Bayesian inference Using Gibbs
Sampling) project started in 1989.
• OpenBUGS is the open source variant of WinBUGS.
• Pros: flexibility and programming capabilities
• Cons: complex interface
• Price: free
Take home messages
Take home messages
• Advanced statistical methods are best seen as a
set of modular tools which can be applied and
tailored to the specific task of interest.
• The concept of generalized linear model
highlights how most statistical methods can be
considered part of a broader family of methods,
depending on the specific framework or link
function.
Many thanks for your attention!
For any query:
[email protected]
[email protected]
For these slides and similar slides:
http://www.metcardio.org/slides.html