grade - British Association of Dermatologists

Download Report

Transcript grade - British Association of Dermatologists

GRADE
Grading of Recommendations
Assessment, Development and
Evaluation
British Association of Dermatologists
April 2014
Previous grading system
Level of evidence
Level of
Type of evidence
evidence
1++
High-quality meta-analyses, systematic reviews of RCTs, or
RCTs with a very low risk of bias
1+
Well-conducted meta-analyses, systematic reviews of RCTs, or
RCTs with a low risk of bias
1-
Meta-analyses, systematic reviews of RCTs, or RCTs with a
high risk of bias*
2++
High-quality systematic reviews of case-control or cohort
studies
High-quality case-control or cohort studies with a very low
risk of confounding, bias or chance and a high probability that
the relationship is causal
2+
Well-conducted case-control or cohort studies with a low risk
of confounding, bias or chance and a moderate probability
that the relationship is causal
2-
Case-control or cohort studies with a high risk of confounding,
bias or chance and a significant risk that the relationship is not
causal*
3
Non-analytical studies (for example, case reports, case series)
4
Expert opinion, formal consensus
*Studies with a level of evidence ‘-’ should not be used as a basis for
making a recommendation.
Strength of recommendation
Class
Evidence


A


B


D (GPP)
A body of evidence including studies rated as 2++, directly
applicable to the target population and demonstrating
overall consistency of results, or
Extrapolated evidence from studies rated as 1++ or 1+

A body of evidence including studies rated as 2+, directly
applicable to the target population and demonstrating
overall consistency of results, or
Extrapolated evidence from studies rated as 2++



Evidence level 3 or 4, or
Extrapolated evidence from studies rated as 2+, or
Formal consensus

A good practice point (GPP) is a recommendation for best
practice based on the experience of the guideline
development group
C
D
At least one meta-analysis, systematic review, or RCT rated
as 1++, and directly applicable to the target population, or
A systematic review of RCTs or a body of evidence
consisting principally of studies rated as 1+, directly
applicable to the target population and demonstrating
overall consistency of results
Evidence drawn from a NICE technology appraisal
RCT: randomised controlled trial; NICE: National Institute for Health and
Care Excellence.
Advantages of GRADE over other systems
• Developed by a widely representative group of international guideline
developers
• Clear separation between quality of evidence and strength of
recommendations
• Explicit evaluation of the importance of outcomes of alternative
management strategies
• Explicit, comprehensive criteria for downgrading and upgrading quality of
evidence ratings
• Transparent process of moving from evidence to recommendations
• Explicit acknowledgment of values and preferences
• Clear, pragmatic interpretation of strong versus weak recommendations
for clinicians, patients and policy makers
GRADE has been adopted by the WHO, Cochrane Collaboration, NICE, SIGN and nearly 50 other
international organisations
PICO method
• a technique used in evidence-based practice to frame and answer a
clinical question
•
•
•
•
P: population/patient
I: intervention
C: comparator/control (if applicable)
O: outcome
“In an elderly man with acutely painful herpes zoster would treatment with
antivirals or antivirals and corticosteroids lead to a more rapid resolution of
pain?”
Evidence hierarchy
Quality of evidence
Evidence:
• high quality
(further research is very unlikely to change our confidence in the estimate of effect)
• moderate quality
(further research is likely to have an important impact on our confidence in the estimate
of effect and may change the estimate)
• low quality
(further research is very likely to have an important impact on our confidence in the
estimate of effect and is likely to change the estimate)
• very low quality
(any estimate of effect is very uncertain)
“The quality of evidence reflects the extent to which confidence in an estimate
of the effect is adequate to support a particular recommendation”
How does GRADE classify the quality of evidence?
Evidence based on RCTs begins as high quality evidence, but confidence may
be decreased due to five factors:
1.
2.
3.
4.
5.
study limitations
inconsistency of results
indirectness of evidence
imprecision
reporting/publication bias
How does GRADE classify the quality of evidence?
1. Domains for study limitations:
• selection bias
— lack of allocation concealment or sequence generation
• lack of blinding
• attrition bias
— loss of participants (dropouts; non-responders; stopping early for benefit; protocol
deviators and failure to adhere to an intention-to-treat analysis)
• measurement bias
— inaccuracy in the measurement instrument; bias in the expectations of study
participants, carers or researchers
• outcome reporting bias
— selective outcome-reporting or failure to report outcomes
How does GRADE classify the quality of evidence?
2. Inconsistency of results :
• widely differing estimates of the treatment effect across studies suggest
true differences in the underlying treatment effect; variability in the
results may arise from differences in the P, I, C, O
• heterogeneity without plausible explanation
3. Indirectness of evidence:
• indirect comparisons of the magnitude of effect of drug A vs. drug B when
trials = drug A vs. placebo and drug B vs. placebo; differences between the
P, I, C, O
4. Imprecision:
• studies with relatively few patients and few events – wide CI
5. Reporting/publication bias:
• failure to report studies or funding issue
How does GRADE classify the quality of evidence?
• For study limitations and indirectness (individually), the quality of
evidence is assessed:
– for each study for the particular outcome
– then across all studies for the particular outcome (e.g. for metaanalyses), reporting the quality in the majority of the evidence
• For imprecision, inconsistency and publication bias, the quality of
evidence is assessed for ALL the studies as a whole
How does GRADE classify the quality of evidence?
Using RCTs as an example, for each outcome, obtain an overall quality rating:
•
•
•
•
0 if no problems with any of the five factors (high)
-1 if problem in one factor (high  moderate)
-2 if problem in two factors (high  low)
-3 if problem in three factors (high  very low)
How does GRADE classify the quality of evidence?
• Observational studies start with low quality ratings; grading upwards may
be warranted if:
– the magnitude of the treatment effect is large or very large
(large effect: RR > 2 or RR < 0.5; very large effect: RR > 5 or RR < 0.2)
– evidence of a dose-response relationship
– all plausible confounders/biases would have ordinarily decreased the
magnitude of an apparent treatment effect
• Grading upwards is only possible if evidence has not already been
downgraded (due to study limitations, inconsistency, imprecision)
• Upgrade to:
– +1 when RR > 2 or RR < 0.5 = (low  moderate)
– +2 when RR > 5 or RR < 0.2 (low  high)
How does GRADE classify the quality of evidence?
Study design
Quality of
evidence
Lower if…
Higher if…
Randomised
trials
High
Study limitations
Treatment effect
Moderate
Inconsistency
Low
Indirectness
Dose-response
relationship
Confounders or
biases
Very low
Imprecision
Observational
studies
Publication bias
Defining outcomes
GRADE is outcome-centric
Outcome #1
Quality: High
Outcome #2
Quality: Moderate
Outcome #3
Outcome #4
Quality: Low
Quality: Very low
2-
4
1++
3
GRADE
Previous system
• Critical outcomes determine the overall quality of evidence
Defining outcomes
GRADE challenges guideline developers to:
• specify all outcomes of importance to patients
• differentiate outcomes that are critical for
decision-making from those that are important but
not critical, and those that are not important
• limit to a suggested maximum of 7 outcomes
Defining outcomes
For decision-making need to know:
• the outcomes that GDG decides as being critical:
— mortality and QoL
— otherwise, outcomes most likely to affect QoL and mortality
• the magnitude of effect as absolute differences per outcome
• the overall evidence quality per outcome
• an evidence profile helps with decision-making
For decision-making:
• look at the clinical importance for each outcome, and not statistical
significance (need to know absolute changes, and direction)
Judging clinical importance:
• GDG decides if the effect estimate represents a clinically important
difference
• If it is a clinically important difference, need to indicate the direction (i.e.
benefit or harm)
Defining outcomes
Example from BAD guidelines on hidradenitis suppurativa:
• Quality of Life (9)
• Adverse effects – serious (9)
• Pain (8)
•
•
•
•
Disease-specific physician score (6)
Physician’s Global Assessment (5)
Patient’s Global Assessment (5)
Adverse effects – nuisance (4)
How does GRADE classify the strength of
recommendations?
Factors determining the strength of recommendations:
• balance between desirable and undesirable effects
• quality of evidence
• variability (or uncertainty) in patient values and preferences
• costs (resource allocation)
Desirable effects include:
• reduction in morbidity and mortality
• improvement in QoL
• reduction in the burden of treatment
• reduction in resource expenditures
Undesirable effects include:
• adverse effects having a deleterious impact on morbidity, mortality, QoL
or increase in resource expenditures
How does GRADE classify the strength of
recommendations?
strong (unconditional):
• desirable effects of an intervention clearly outweigh the undesirable
effects (or clearly do not)
• high-quality methods with large, precise effect
• low variability or uncertainty in patient values and preferences
• low resource allocation
weak (conditional):
• desirable effects not clearly greater or smaller than undesirable effects
• low quality evidence with imprecise estimate
• patient values and preferences very important
• high resource allocation
N.B. Recommendations to use interventions in a research context may be
appropriate
Strength of recommendation
NICE approach to recommendations:
• strong:
(the benefits clearly outweigh the harms for most people and the intervention is likely
to be cost-effective; the guideline panel believes that a vast majority of clinicians and
patients would choose a particular intervention if they considered the evidence in the
same way that the panel has)
 OFFER
• weak:
(there is a closer balance between benefits and harm; some patients are averse to
some side effects whilst others are not)
 CONSIDER
“Management options associated with strong recommendations are candidates for quality
criteria”
“When recommendations are weak, discussing relative merits of alternative management
options with patients and families may become a quality criterion”
Critical
Outcome
Critical
Outcome
Important
Outcome
Less
High
Moderate
Low
Very low
Summary of findings
& estimate of effect
for each outcome
Systematic review
Grade down
P
I
C
O
Outcome
1.
2.
3.
4.
5.
Grade up
RCT start high,
observational data
start low
Risk of bias
Inconsistency
Indirectness
Imprecision
Publication
bias
1. Large effect
2. Dose
response
3. Confounders
Guideline development
Formulate recommendations:
• For or against (direction)
• Strong or weak (strength)
By considering:
 Quality of evidence
 Balance benefits/harms
 Values and preferences
 Resource use (cost)
GDG reviews the evidence profile
for rating the
overall quality of evidence
NICE recommendations:
• “Offer…”
• “Consider…”
• “Do not offer…”
GRADE evidence profile