Reading and reporting evidence from trial-based

Download Report

Transcript Reading and reporting evidence from trial-based

Reading and reporting evidence
from trial-based evaluations
Professor David Torgerson
Director, York Trials Unit
www.rcts.org
Background
• Good quality randomised controlled trials
(RCTs) are the best form of evidence to
inform policy and practice.
• However, poorly conducted RCTs may be
more misleading than other types of
evidence.
RCTs – a reminder
• Randomised controlled trials (RCTs)
provide the strongest basis for causal
inference by:
• Controlling for regression to the mean
effects;
• Controlling for temporal changes;
• Providing a basis for statistical inference;
• Removing selection bias.
Selection Bias
• Selection bias can occur in non-randomised
studies when group selection is related to a
known or unknown prognostic variable.
• If the variable is either unknown or
imperfectly measured then it is not possible
to control for this confound and the
observed effect may be biased.
Randomisation
• Randomisation ONLY ensures removal of
selection bias if all those who are randomised are
retained in the analysis within the groups they
were originally allocated.
• If we lose participants or the analyst moves
participants out of their original randomised
groups, this violates the randomisation and can
introduce selection bias.
Is it randomised?
• “The students were assigned to one of three
groups, depending on how revisions were
made: exclusively with computer word
processing, exclusively with paper and
pencil or a combination of the two
techniques.”
Greda and Hannafin, J Educ Res 1992;85:144.
The ‘Perfect’ Trial
• Does not exist.
• All trials can be criticised methodologically,
but is best to be transparent about trial
reporting so we can interpret the results in
light of the quality of the trial.
Types of randomisation
•
•
•
•
Simple randomisation
Stratified randomisation
Matched design
Minimisation
Simple randomisation
• Use of a coin toss, random number tables.
– Characteristics: will tend to produce some
numerical imbalance (e.g., for a total n = 30 we
might get 14 vs 16). Exact numerical balance
unlikely. For sample sizes of <50 units is less
efficient than restricted randomisation.
However, more resistant to subversion effects
in a sequentially recruiting trial.
Stratified randomisation
• To ensure known covariate balance
restrictions on randomisation are used.
Blocks of allocation are used: ABBA;
AABB etc.
– Characteristics: ensures numerical balance
within the block size; increases subversion risk
in sequentially recruiting trials; small trials with
numerous covariates can result in imbalances.
Matched Designs
• Here participants are matched on some
characteristic (e.g., pre-test score) and then
a member of each pair (or triplet) are
allocated to the intervention.
– Characteristics: numerical equivalence; loss of
numbers if total is not divisible by the number
of groups; can lose power if matched on a weak
covariate, difficult to match on numerous
covariates; can reduce power in small samples.
Minimisation
• Rarely used in social science trials. Balance
is achieved across several covariates using a
simple arithmetical algorithm.
– Characteristics: numerical and known covariate
balance. Good for small trials with several
important covariates. Increases risk of
subversion in sequentially recruiting trials;
increases risk of technical error.
Characteristics of a rigorous trial
• Once randomised all participants are included
within their allocated groups.
• Random allocation is undertaken by an
independent third party.
• Outcome data are collected blindly.
• Sample size is sufficient to exclude an important
difference.
• A single analysis is prespecified before data
analysis.
Problems with RCTs
• Failure to keep to random allocation
• Attrition can introduce selection bias
• Unblinded ascertainment can lead to ascertainment
bias
• Small samples can lead to Type II error
• Multiple statistical tests can give Type I errors
• Poor reporting of uncertainty (e.g., lack of
confidence intervals).
Are these RCTs?
• “We took two groups of schools – one group had high
ICT use and the other low ICT use – we then took a
random sample of pupils from each school and tested
them”.
• “We put the students into two groups, we then
randomly allocated one group to the intervention whilst
the other formed the control”
• “We formed the two groups so that they were
approximately balanced on gender and pretest scores”
• “We identified 200 children with a low reading age and
then randomly selected 50 to whom we gave the
intervention. They were then compared to the
remaining 150”.
Examples
• “Of the eight [schools] two randomly chosen schools
served as a control group”[1]
• “From the 51 children… we formed 17 sets of
triplets…One child from each triplet was randomly
assigned to each of the 3 experimental groups”[2]
• “Stratified random assignment was used in forming 2
treatment groups, with strata (low, medium, high) based
on kindergarten teachers’ estimates of reading”[3]
1 Kim et al. J Drug Ed 1993;23:67.
2 Torgesen et al, J Ed Psychology 1992;84:364
3 Uhry and Shepherd, RRQ, 1993;28:219
What is the problem here?
• “A random-block technique was used to
ensure greater homogeneity among the
groups. We attempted to match age, sex,
and diagnostic category of the subjects. The
composition of the final 3 treatment groups
is summarized in Table 1.”
Roberts and Samuels. J Ed Res 1993;87:118.
Stratifying variables
Male
Female
Young
LD
No LD
Old
LD
No LD
Young
LD
No LD
Old
LD
No LD
Plus 3 groups for each bottom cell = 24 groups in all, sample
size = 36
Blocking
• With so many stratifying variables and a
small sample size then blocked allocation
results in on average 1.5 children per cell.
It is likely that some cells will be empty and
this technique can result in greater
imbalances than less restricted allocation.
Mixed allocation
• “Students were randomly assigned to either
Teen Outreach participation or the control
condition either at the student level (I.e.,
sites had more students sign up than could
be accommodated and participants and
controls were selected by picking names out
of a hat or choosing every other name on an
alphabetized list) or less frequently at the
classroom level”
Allen et al, Child Development 1997;64:729-42.
Is it randomised?
• “The groups were balanced for gender and,
as far as possible, for school. Otherwise,
allocation was randomised.”
Thomson et al. Br J Educ Psychology 1998;68:475-91.
Class or Cluster Allocation
• Randomising intact classes is a useful
approach to undertaking trials. However, to
balance out class level covariates we must
have several units per group (a minimum of
5 classes per group is recommended)
otherwise we cannot possibly balance out
any possible confounders.
What is wrong here?
• “the remaining 4 classes of fifth-grade
students (n = 96) were randomly assigned,
each as an intact class, to the [4] prewriting
treatment groups;”
Brodney et al. J Exp Educ 1999;68,5-20.
Misallocation issues
• “We used a matched pairs design. Children were
matched on gender and then 1 of each pair was
then allocated to the intervention whilst the
remaining child acted as a control. 31 children
were included in the study: 15 in the control group
and 16 in the intervention.”
• “23 offenders from the treatment group could not
attend the CBT course and they were then placed
in the control group”.
Attrition
• Rule of thumb: 0-5%, not likely to be a
problem. 6% to 20%, worrying, > 20%
selection bias.
• How to deal with attrition?
• Sensitivity analysis.
• Dropping remaining participant in a
matched design does NOT deal with the
problem.
What about matched pairs?
• We can only match on observable variables
and we trust to randomisation to ensure that
unobserved covariates or confounders are
equally distributed between groups.
• If we lose a participant dropping the
matched pair does not address the
unobservable confounder, which is one of
the main reasons we randomise.
Matched Pairs on Gender
Control
(unknown covariate)
Boy (high)
Intervention
(unknown covariate)
Boy (low)
Girl (high)
Girl (high)
Girl (low)
Girl (high)
Boy (high)
Boy (low)
Girl (low)
Girl (high)
3 Girls and 3 highs
3 Girls and 3 highs.
Drop-out of 1 girl
Control
Intervention
Boy (high)
Boy (low)
Girl (high)
Girl (high)
Girl (low)
Girl (high)
Boy (high)
Boy (low)
Girl (high)
2 Girls and 3 highs
3 Girls and 3 highs.
Removing matched pair does not
balance the groups!
Control
Intervention
Boy (high)
Boy (low)
Girl (high)
Girl (high)
Girl (low)
Girl (high)
Boy (high)
Boy (low)
2 Girls and 3 highs
2 Girls and 2 highs.
Dropping matched pairs
• In that example by dropping the matched pair we
make the situation worse.
– Balanced on gender but imbalanced on high/low;
– We can correct for gender in statistical analysis as it is
observable variable: we cannot correct for high/low as
this is unobservable;
– Removing the matched pair reduces our statistical
power but does not solve our problem.
Sensitivity analysis
• In the presence of attrition we can see if our
results change because of this. For example, for
the group that has a good outcome, we can give
the worst possible scores to the missing
participants and vice versa.
• If the difference still remains significant we can be
reassured that attrition did not make a difference
to the findings.
Flow Diagrams
635 children in 16 schools
screened using group spelling test
118 children with poor spelling
skills given individual tests of
vocabulary, letter knowledge, word
reading and phoneme manipulation
2 schools excluded
due to insufficient numbers
of poor spellers
84/118 in 14 remaining
schools (6 per school) selected for
randomisation to interventions
excluded 9 children due to behaviour
1 school (6 children) withdrew
from study after randomisation
39/42 children in 13 remaining
schools allocated to 20-week
intervention
39/42 children included
39/42 children in 13 remaining
schools allocated to 10-week intervention
1 child left study (moved school)
38/42 children included
Hatcher et al. 2005 J Child Psych Psychiatry: online
Flow Diagram
• In health care trials reported in the main
medical journals authors are required to
produce a CONSORT flow diagram.
• The trial by Hatcher et al, clearly shows the
fate of the participants after randomisation
until analysis.
Poorly reported attrition
• In a RCT of Foster-Carers extra training was
given.
– “Some carers withdrew from the study once the dates
and/or location were confirmed; others withdrew once
they realized that they had been allocated to the control
group” “117 participants comprised the final sample”
• No split between groups is given except in one
table which shows 67 in the intervention group
and 50 in the control group. 25% more in the
intervention group – unequal attrition hallmark of
potential selection bias. But we cannot be sure.
Macdonald & Turner, Brit J Social Work (2005) 35,1265
Recent Blocked Trial
“This was a block randomised study (four patients to
each block) with separate randomisation at each of
the three centres. Blocks of four cards were
produced, each containing two cards marked with
"nurse" and two marked with "house officer." Each
card was placed into an opaque envelope and the
envelope sealed. The block was shuffled and, after
shuffling, was placed in a box.”
Kinley et al., BMJ 325:1323.
What is wrong here?
Southampton
Sheffield
Doncaster
Doctor Nurse
Doctor Nurse
Doctor Nurse
500
308
118
511
Kinley et al., BMJ 325:1323.
319
118
Type I error issues
• 3 group trial - “Pre-test to posttest scores
improved for most of the 14 variables”. 42
potential comparisons between pairs.
Authors actually did more reporting pretest
posttest one group tests as well as between
groups, which gives 82 tests.
Roberts and Samuels. J Ed Res 1993;87:118.
Type II errors
• Most social science interventions show
small effect sizes (typically 0.5 or lower).
To have 80% chance of observing a 0.5
effect of an intervention we need 128
participants. For smaller effects we need
much larger studies (e.g., 512 for 0.25 of an
Effect Size).
Analytical Errors
• Many studies do the following:
– Do paired tests of pre post tests. Unnecessary
and misleading in a RCT as we should compare
group means.
– Do not take into account cluster allocation.
– Use gain scores without adjusting for baseline
values.
– Do multiple tests.
Pre-treatment differences
• A common approach is to statistically test baseline
covariates:
– “The first issue we examined was whether there were
pretreatment differences between the experimental
groups and the control groups on the following
independent variables” “There were two pretreatment
differences that attained statistical significance”
“However, since they were statistically significant these
2 variables are included as covariates in all statistical
tests”.
Davis & Taylor Criminology 1997;35:307-33.
What is wrong with that?
• If randomisation has been carried out properly
then the null hypothesis is true, any differences
have occurred by chance.
• Statistical significance of differences gives no clue
as to the importance of the covariate to be
included in the analysis. Including a significant
covariate, which is unimportant reduces power
whilst ignoring a balanced covariate also reduces
power.
The CONSORT statement
• Many journals require authors of RCTs to
conform to the CONSORT guidelines.
• This is a useful approach to deciding
whether or not trials are of good quality.
Was the study population adequately described? (i.e. were the important characteristics
of the participants described e.g. age, gender?)
Was the minimum important difference described? (i.e. was the smallest clinically
important effect size described?)
Was the target sample size adequately determined?
Was intention to treat analysis used?
Was the unit of randomisation described (i.e. individuals or groups)?
Were the participants allocated using random number tables, coin flip, computer
generation?
Was the randomization process concealed from the investigators?
Were follow-up measures administered blind?
Was estimated effect on primary and secondary outcome measures stated?
Was precision of effect size estimated (confidence intervals)?
Were summary data presented in sufficient detail to permit alternative analyses or
replication?
Was the discussion of the study findings consistent with the data?
Review of Trials
• In a review of RCTs in health care and
education the quality of the trial reports
were compared over time.
Torgerson CJ, Torgerson DJ, Birks YF, Porthouse J. Br Ed
Res J. 2005;31:761-85.
Study Characteristics
Characteristic
Cluster Randomised
Sample size justified
Concealed randomisation
Blinded Follow-up
Use of CIs
Low Statistical Power
Drug Health Education
1%
36%
18%
59% 28%
0%
40%
8%
0%
53% 30%
14%
68% 41%
1%
45% 41%
85%
Change in concealed allocation
50
45
40
35
30
25
20
15
10
5
0
<1997
>1996
Drug
P = 0.04
No Drug
P = 0.70
NB No education trial used concealed allocation
Blinded Follow-up
60
50
40
<1997
>1996
30
20
10
0
Drug
P = 0.54
Health
P = 0.13
Education
P = 0.03
Underpowered
90
80
70
60
50
<1997
>1996
40
30
20
10
0
Drug
P = 0.01
Health
P = 0.76
Education
P = 0.22
Mean Change in Items
3.5
3
2.5
2
<1997
>1996
1.5
1
0.5
0
Drug
No Drug
P= 0.001
P= 0.07
Education
P= 0.03
Summary
• A lot of evidence from health care trials that
poor quality studies give different results
compared with high quality studies.
• Social science trials tend to be poorly
reported. Often difficult to distinguish
between poor quality and poor reporting.
• Can easily increase reporting quality.