Basic statistics Trial Design Epidemiological Techniques
Download
Report
Transcript Basic statistics Trial Design Epidemiological Techniques
Statistics, Trial design,
Epidemiological
Techniques
Performance status and
co-morbidity
I Dukic, B Zelhof,
North Western Urology Teaching, 8th July 2014
Overview
Basic statistics and epidemiological statistics
Measurement of performance status and co-morbidity
Trial design
Vivas
1. What is evidence based medicine? What are the
problems with randomisation studies?
2. How would you set up a phase III trial?
3. What do you understand by statistical significance and
confidence intervals?
Basic Statistics
2x2 table - sensitivity, specificity
Prevalence/incidence
ROC
Hypothesis testing
P value
Confidence interval
RRR, ARR, CER, EER, NNT
Data types
Qualitative- categorical measurement i.e. not
number
▪ Nominal : e.g. yes/no
▪ Ordinal: rank e.g. most useful to least useful
Quantitative – numerical measurement
▪ Interval- e.g. time interval
▪ Ratio-e.g height
Parametric vs Non-Parametric
Non-parametric test is less powerful, therefore,
parametric should be used if possible, provided the
following rules are fulfilled
The basic distinction for parametric vs non-parametric
is:
1. If measurement scale is nominal or ordinal (Qualitative) – nonparametric tests
2. If measurement is interval or ration scales (Quantitative) –
parametric tests
3. Other considerations include normally distributed data in
parametric tests, and the relationship between the groups or
variables being tested
Statistics
Average of data points
Mean, median, mode, weighted mean
Measure of spread
Centile, standard deviation, range
How sure of answer
P value, power calculation, ci etc
Skew
Mean = The average value, calculated by adding all the observations and dividing by the
number of observations (parametric)
Mode= is the most common (frequent) value- (non-parametric)
Median= Middle value of the list (often used when data are skewed) (non parametric)
2x2 tables
Patient HAS the
disease
Patient does NOT
have the disease
Positive test
Correct
Wrong
Negative test
Wrong
Correct
2x2 tables
Patient HAS the
disease
Patient does NOT
have the disease
Positive test
Correct
True Positive
Wrong
False positive
Negative test
Wrong
False Negative
Correct
True Negative
Sensitivity
Patient HAS the
disease
Patient does NOT
have the disease
Positive test
True Positive
False positive
Negative test
False Negative
True Negative
Proportion of patients correctly identified as having disease =
TP
TP + FN
Specificity
Patient HAS the
disease
Patient does NOT
have the disease
Positive test
True Positive
False positive
Negative test
False Negative
True Negative
Proportion of patients patient without disease correctly identified =
TN
TN + FP
Positive Predictive Value - PPV
Patient HAS the
disease
Patient does NOT
have the disease
Positive test
True Positive
False positive
Negative test
False Negative
True Negative
Proportion of positive test with the disease =
TP
TP + FP
Negative Predictive Value - NPV
Patient HAS the
disease
Patient does NOT
have the disease
Positive test
True Positive
False positive
Negative test
False Negative
True Negative
Proportion with negative test without the disease =
TN
TN + FN
Relevance of Sensitivity + Specificity
Highly specific test is unlikely to give a false positive: +ve
result should be regarded as true +ve - SPIN
Sensitive test rarely misses a condition: -ve test reassuring SNOUT
Type I + Type II Error
Type I
Rejecting the null hypothesis when it is in fact true is called
a Type I error
False positive
Type II
Not rejecting the null hypothesis when in fact the alternate
hypothesis is true is called a Type II error
False negative
Likelihood Ratio
the chance that a specified test result would be expected in a
patient with the condition of interest, versus a patient without
the condition.
Sensitivity
1 - Specificity
Receiver Operating Characteristics (ROC)
Graph of the pairs of true positive rates (sensitivity)
and
False positive rates
(100% - specificity)
Receiver Operating Characteristics
(ROC)
• Graph of the pairs of true
positive rates (sensitivity)
and false positive rates
(100% - specificity)
• Assess if a test is useful
• Can compare two different
tests
• Select optimal cut off value
for test
ROC
It shows the trade off between
sensitivity and specificity (any
increase in sensitivity will be
accompanied by a decrease in
specificity).
The closer the curve follows the lefthand border and then the top border
of the ROC space, the more accurate
the test.
The closer the curve comes to the 45degree diagonal of the ROC space,
the less accurate the test.
The area under the curve is a
measure of test accuracy
Importance of cut –off value on test
performance
Importance of cut –off value on test
performance
ROC Curves for optimal PSA range in
patients aged 50 - 80
50 - 60 years
60 - 70 years
70 - 80 years
El-Gallery et al, Urology, 46:2000. 1995
Incidence + Prevalence
• Incidence - The proportion of new cases of a disease in
the population at risk during a specified time interval. It
is usual to define the disorder, and the population, and
the time, and report the incidence as a rate
• Prevalence - This is a measure of the proportion of
people in a population who have a disease at a point in
time, or over some period of time.
Hypothesis (significance) Testing
Null hypothesis (H0) - the exposure /
intervention being studied is not associated with
the outcome of interest. The difference in means
=0
Alternative hypothesis (H1) – holds if null
hypothesis is not true
Two-tailed tests - assume difference in means in
both directions e.g smoking rates different in
men and women men>women or women>men
One-tailed test – direction of effect specified in
H1 e.g new drug cannot make things worse.
P value
Viva question 3
Hypothesis testing – produces a p-value
P – value is the probability of obtaining our results , if the
null hypothesis is true (chance). It is not the probability
that the null hypothesis is true or correct
Allows assessment of whether findings are statistically
significant or not statistically significant from a reference
value
P<0.05 (smaller the p – value the greater the evidence
against the null hypothesis)
P>0.05 we do not reject the null hypothesis – this does
not mean the null hypothesis is true.
Confidence Interval
Viva Teaching – question 3
The range of plausible values for the “true”
effect
Generally use 95% certainty
It can be used to make a decision with out
providing an exact p - value
If the value lies outside 95% C.I Then reject H0 – p
< 0.05 no exact value.
What determines the width of the
confidence interval?
1. Sample size - a larger sample size will give more
precise results with narrower C.I.
2. Variability of the characteristic being studied; the
less variable it is (between subjects, within
subjects, measurement error etc, the more precise)
3. The degree of confidence required (95%,90%,
65%); the more confidence required, the wider the
interval.
Relative Risk / Risk Ratio
• Risk of an event /
developing a disease
relative to exposure.
It is a ration of the
probability of the
event occurring in the
exposed group
versus the nonexposed group
Outcome
No
Outcome
Exposed
A
B
Not Exposed
C
D
RR = A / (A + B)
C / (C + D)
Risk ratio – ratio of risk in
exposed / risk in unexposed
Relative Risk 2
Experimental Event Rate (EER) =
A
A+B
Control Event Rate (CER) =
C
C+D
Relative Risk = EER
CER
Outcome
No
Outcome
Exposed
A
B
Not Exposed
C
D
Relative Risk 3
Suited to clinical trials
A relative risk of 1 means there is no difference in risk
between the two groups
An RR of < 1 means the event is less likely to occur in the
experimental group than in the control group
An RR of > 1 means the event is more likely to occur in the
experimental group than in the control group
Relative Risk Reduction
Outcome
No
Outcome
Exposed
A
B
Not Exposed
C
D
Absolute risk reduction = EER - CER
Relative risk reduction = Risk difference =
Baseline risk
EER - CER
CER
Worked
Example
Relative Risk
•‘PROSCAR more than halves
the risk of developing acute
urinary retention and the
need for surgery’
•PLESS study
RR example 2
RR example 3
•
EER =
Retention
No
Retention
Total
Placebo
42
1471
1513
Finasteride
99
1404
1503
A
=
A+B
•
CER =
C
RR = EER
CER
=
2.8%
=
6.6%
2.8 =
0.42
1513
=
C+D
•
42
99
1503
=
6.6
Worked Example RRR
Retention
No
Retention
Total
Placebo
42
1471
1513
Finasteride
99
1404
1503
RRR = risk difference = 6.6 – 2.8 = 58%
baseline risk
6.6
Worked Example ARR
Retention
No
Retention
Total
Placebo
42
1471
1513
Finasteride
99
1404
1503
Absolute Risk Reduction = Control event rate – Experimental event rate
= CER – EER
= 6.6 – 2.8 = 3.8
Worked Example NNT
Retention
No
Retention
Total
Placebo
42
1471
1513
Finasteride
99
1404
1503
NNT =
1
ARR
=
1
0.038
= 26
Statistics in
PLESS study
•‘PROSCAR more than halves the
risk of developing acute urinary
’
retention
and the need for surgery’
Retention in control = 6.6%
Retention in treatment = 2.8%
Relative risk reduction = 58%
Absolute risk reduction = 3.8%
NNT = 26
Combination therapy
NNT
MTOPS
CombAT
Progress
clinically
37
18
Prevent AUR
147
22
Prevent
surgery
52
18
McConnell, J.D. et al., 2003. The Long-Term Effect of Doxazosin, Finasteride, Roehrborn, C.G. et al., 2010. The Effects of Combination Therapy with
Dutasteride and Tamsulosin on Clinical Outcomes in Men with Symptomatic
and Combination Therapy on the Clinical Progression of Benign Prostatic
Benign Prostatic Hyperplasia: 4-Year Results from the CombAT Study.
Hyperplasia. New England Journal of Medicine, 349(25), pp.2387–2398.
European Urology, 57(1), pp.123–131.
Why is NNT important?
It takes into account the underlying frequency of the outcome
(which RRR does not)
The ideal NNT is 1, where everyone has improved with treatment
and no-one has with control.
The higher the NNT, the less effective is the treatment
NNTs are only one element of decision making and need to be
integrated with
Patients’ underling risk
patient preferences,
caregiver experience and judgment
local constraints and conditions
Common NNTs in Urology
Appropriate antibiotic in UTI
1
Treatment for stone passage in ureteric colic
4
Intraurethral alprostatil for ED
2.3
Compression stockings for post op DVT
9
Aspirin/streptokinase after M.I
20
Finasteride to prevent retention
26
Aspirin after MI
40
Odds Ratio
Way of comparing if the
probability of an event is
the same for 2 groups
OR = 1 – equally likely in
both groups
OR > 1 – more likely in
first group
OR < 1 – less likely in
first group
Outcome
No
Outcome
Exposed
A
B
Not
Exposed
C
D
OR = A/B = AD
C/D
BC
Summary of effect measures
Measure of
Effect
Abreviation
Description
No effect
Total
success
Absolute Risk
Reduction
ARR
Absolute change in risk: risk of
event in control group – risk of event
in treated group
ARR = 0%
ARR =
initial risk
Relative Risk
Reduction
RRR
Proportion of Risk remoed by
treatment: ARR divided by initial risk
in control group
RRR = 0%
RRR =
100%
Relative Risk
RR
Risk of event in treated group
divided by risk of event in control
group
RR = 1 or
100%
RR = 0
Odds Ratio
OR
Odds of event in treated group
divided by odds of event in control
group
OR = 1
OR = 0
Number Needed
to Treat
NNT
Number of patients needing
treatment to prevent one event:
reciprocal of ARR
NNT = ∞
NNT =
1/initial risk
Other Random Statistical Stuff
Population should be clearly defined
Sample every individual from population it is drawn must have an
equal chance of being included
chi-square test (also chi squared test or χ2 test) is any statistical
hypothesis test in which the sampling distribution of the test statistic is
a chi-squared distribution when the null hypothesis is true, or any in
which this is asymptotically true, meaning that the sampling
distribution (if the null hypothesis is true) can be made to approximate
a chi-square distribution as closely as desired by making the sample
size large enough.
Mann–Whitney U test (also called the Mann–Whitney–Wilcoxon
(MWW) or Wilcoxon rank-sum test) is a non-parametric statistical
hypothesis test for assessing whether two independent samples of
observations have equally large values
Levels of evidence
1a – evidence obtained from meta-analysis of randomized
studies
1b - evidence obtained from at least one randomized trial
2a - evidence obtained form well-designed controlled study
without randomization
2b - Evidence obtained from at least one other type of well
designed cohort or case-control study
3 - Evidence obtained from well designed non-experimental
studies such as comparative studies, correlation studies and
case report
4 - Evidence obtained from expert opinion
Controlled trials
Randomized control trial:
Is a specific type of scientific experiment, participants in
the trial are randomly allocated to either one
intervention. It is the gold standard for a clinical trial.
RCT are often used to test the efficacy of various types of
intervention within a patient population.
Double blinded
Single blinded
Non-Randomized trial
Observational studies
Cohort study (longitudinal study)
One way of getting around the problem of the small
proportion of people with the disease of interest is the
cohort study
Following a group of people (i.e. the cohort) over time and
observe they develop disease
Generally concerned with the aetiology of disease rather
than treatment
Observational studies
Case-control study
Another solution to the problem of the small number of
people with the disease of interest.
Patients with a particular condition are matched with
control
Case-control study is generally concerned with the
aetiology of disease rather than treatment
Cross-sectional study
Cross-sectional studies involve data collected at a defined
time. They are often used to assess the prevalence of
acute or chronic conditions, or to answer questions about
the causes of disease or the results of medical
intervention. They may also be described as censuses
Phases of studies
Consists of 4 phases. If drugs pass phases 1-3, usually
approved by regulatory bodies.
Phase 1: Screening for safety- experimental drug or
treatment in a small group of subjects (20-80) for the
first time to evaluate its safety, determine a safe
dosage range and identify side effects.
Phase 2: Establishing the testing protocol- experimental
treatment is given to a larger group (100-300) to see if
it is effective and to further evaluate its safety.
Phases of studies
Phase 3: Final testing-treatment is given to a large
group of subjects (1000-3000) to confirm its
effectiveness, monitor side effects, compare it to
commonly used treatments and collect information that
will allow it to be used safely
Phase 4: Post-approval studies- post marketing studies.
Including treatment’s risks, benefits and optimal use
Parametric statistical tests
Compare the difference between normally distributed
data sets
Analysis of variance (ANOVA) – used to compare the
means of two or more samples to see whether they
come from the same population, testing the null
hypothesis
t-test – compare two samples
Χ2 (Chi squared) – a measure of difference between
actual and expected frequencies (usually based on a
null hypothesis), alternative includes Fisher’s exact test
(small numbers) and Mantel Haenszel test for comparing
multiple two way tables
Non parametric tests
Data not normally distributed, examples include:
Mann-Whitney U
Wilcoxon rank test
Kruskal Wallis
Fridemann
Resources
Medical stats made easy (2003)
Notes on statistics for medical students
Statistics at Square One – BMJ publishing (free online)
How to read a paper
How to read a paper: Statistics for the non-statistician
Clinicians guide to statistics for medical practice and
research
Measurement of Performance
status and comorbidity
K Moore
Presented by I Dukic 2014
Performance Status
Definition
scales and criteria used by doctors and researchers to
assess how a patient's disease is progressing, assess how
the disease affects the daily living abilities of the patient, and
determine appropriate treatment and prognosis.
Assessment of performance status
Various scoring systems
Zubrod / WHO / ECOG
Karnofsky
Lansky - children
Eastern Cooperative Oncology Group
ECOG was established in 1955 as one of the first
cooperative groups launched to perform multi-center
cancer clinical trials. ECOG has evolved from a five
member consortium of institutions on the East Coast to
one of the largest clinical cancer research organizations
in the U.S.
ECOG PERFORMANCE STATUS
0 - Fully active, able to carry on all pre-disease performance without
restriction
1 - Restricted in physically strenuous activity but ambulatory and
able to carry out work of a light or sedentary nature, e.g., light
house work, office work
2 - Ambulatory and capable of all selfcare but unable to carry out
any work activities. Up and about more than 50% of waking
hours
3 - Capable of only limited selfcare, confined to bed or chair more
than 50% of waking hours
4 - Completely disabled. Cannot carry on any selfcare. Totally
confined to bed or chair
5 - Dead
Am. J. Clin. Oncol: Toxicity And Response Criteria Of The Eastern Cooperative
Oncology Group. Oken et al. Am J Clin Oncol 5:649-655, 1982
WHO PERFORMANCE STATUS
0 - you are fully active and more or less as you were before your
illness
1 - you cannot carry out heavy physical work, but can do anything
else
2 - you are up and about more than half the day; you can look
after yourself, but are not well enough to work
3 - you are in bed or sitting in a chair for more than half the day;
you need some help in looking after yourself
4 - you are in bed or a chair all the time and need a lot of looking
after
KARNOFSKY PERFORMANCE
STATUS
100 – you don’t have any evidence of disease and feel well
90 – you only have minor signs or symptoms but are able to carry on as normal
80 – you have some signs or symptoms and it takes a bit of effort to carry on as normal
70 – you are able to care for yourself but unable to carry on with normal activities/active
work
60 – you need help from time to time but can mostly care for yourself
50 – you need quite a lot of help to care for yourself
40 – you always need help to care for yourself
30 – you are disabled and may need to stay in hospital
20 – you are sick, in hospital and need a lot of treatment
10 – you are very sick and unlikely to recover
Comparing ECOG with KARNOFSKY
ECO
G
Performance
KARNOFSKY
0
Fully active
90-100
1
Not able to do strenuous work but otherwise OK
70-80
2
Capable of self-care only. Up and about > 50% of waking
hours
50-60
3
Limited self care only. In bed or chair > 50% of waking
hours
30-40
4
Completely disabled. No self care possible. Confined to
bed/chair
12-20
Comorbidity
Either the presence of one or more disease in addition to
a primary disease, or, the effect of such additional
disorders or diseases
Many tests attempt to standardise the weight or value of
comorbid conditions
Attempt to consolidate each individual comorbid
condition into a single, predictive variable that measures
mortality or other outcomes
Researchers have validated such tests because of their
predictive value, but no one test is as yet recognized as a
standard
Comorbidity Index
13 different methods identified and critically reviewed – 1
disease count, 12 indexes
Charlson Index – most extensively studied
Cumulative Illness Rating Scale (CIRS) – addresses
body systems without specific diagnoses
Index of Coexisting Disease (ICED) – 2D, measures
disease severity and disability
Kaplan – for use in diabetes
Insufficient data on others
De Groot V et al. How to measure comorbidity: a critical review of available methods. J Clin
Epid 2003; 56: 221-229.
Charlson Comorbidity Index
Charlson index most extensively studied and seems to
be the method of choice in urology.
22 diseases in the index selected and weighed on the
basis of the strength of their association with mortality
Validated as a predictor of short term and long term
mortality.
Charlson Comorbidity Index
Charlson Comorbidity Index
Studied in Prostate, Renal, Bladder.
Charlson score divided in to 3 levels
Low (0)
Medium (0-2)
High (3 or more)
Can look at
Each 1 point increase in
CCI score leads to a 2.3
increase in relative risk
of death at 12 months.
Data from 1987
Survival for the same level of comorbidity over time
(CaP)
Survival for different levels of comorbidity (RCC)
D
Tumour Factors
O
Patient Factors
Performance Status
Comorbidity Index
Life Tables
Treatment
C
T
O
Patient Preference
R
Observation