Transcript Slide 1
Maintaining Data Integrity in
a cluster randomised trial
(The PRINCE Study)
Bernard McCarthy ,Declan Devane
Collette Kirwan , Dympna Casey,
Kathy Murphy, Lorraine Mee, Adeline Cooney
School of Nursing & Midwifery
National University of Ireland
Galway
Aim of the Presentation
This presentation will highlight the process
undertaken to minimise errors in the study
database prior to analysis for the PRINCE study.
Why this Topic ???
•Multiple stages involved in a research study
before the collected information is ready for
analysis.
•Well reported that errors are introduced into the
information at different points along the process
(Goldberg, Niemierko & Turchin 2008, Day, Fayers & Harvey 1998, King & Lashley 2000)
•Yet the integrity of any study is dependent on the
quality of the data available.
Background
A cluster randomised controlled trial
evaluating the effectiveness of a
structured
pulmonary
rehabilitation
education programme for improving the
health status of people with chronic
obstructive pulmonary disease (COPD).
PRINCE Trial – CONSORT FLOW DIAGRAM
PRINCE Trial Quantitative Outcome
Measurements
Primary: Chronic Respiratory Questionnaire (CRQ)
Four Domains:
Combined Domains:
Dyspnoea
Physical
Fatigue
Emotional Function
Psychological
Mastery
Secondary:
Incremental Shuttle Walk
Test
Muscle Test
Self-Efficacy for Managing
Chronic Disease
Economic Analysis:
EQ5D
Utilisation of Health care
Services
The higher the quality of the data
entry, the greater the “reliability”
resulting in more “convincing
inference” from the study.
(Day, Fayers & Harvey 1998)
Where do the errors arise?
Error arising in databases usually
arise from one of 3 sources:
Originating at the original data collection
Incorrect interpretation of what was
entered on the original documentation
Transcription errors when entering the
data into the database.
(Goldberg, Niemierko & Turchin 2008)
Original Data Errors
• Little can be done after the event to
overcome the first source of these errors as
very few research teams can afford or
undertake double transcription of the original
data collection.
“Getting it right first time” is what is critical here.
• Streamlining of data collection tools
• Adequate training of the data collectors
• Verification of answers with the participants
can improved quality of initial data collection.
Inputting Errors
• Post interview computer entering of
research data for analysis is a tedious duty
and is well known for being “an error prone
task” (Polit and Beck 2010)
• Overcome by direct computer entry at
initial interview, followed by verification of
the inputted answers with the patient.
Data Verification for Errors
• Several methods of data verification exist
that are useful for identification of errors in
transcribing especially typographical and
to a lesser extent interpretation errors.
• No matter which method is utilised it is
impossible to identify all the errors
• Most of the literature on this topic is based
on the level of error rates for different
forms of data entry.
The Process of Validating
• 1st port of call is visually scanning of the
electronic data for obvious errors (Polit & Beck
2004, Day et al. 1998)
often refer to this as:
“exploratory data analysis”
• The first stage of this process was to
identify and check outliers
• “range for outliers” were mainly based
on: clinical judgments, or values falling
outside three standard deviations of the
mean (Day, Fayers & Harvey 1998)
Wild Codes & Consistency Checking
• Stage 2 was identifying “wild codes” Codes which
appear in the data set that are not possible
options for selection. (Polit & Beck 2004).
• Stage 3 “consistency checking” was undertaken on
certain components of the data and focused on
the internal consistency of the data (Polit & Beck
2004, Goldberg, Niemierko & Turchin 2008).
– The data set was checked for errors of compatibility
between answers.
In the current data set patient indicating that they use inhaler but have
no medications listed under medications.
Key Points
• Returning to the original case record form
or the patient notes available for
verification was essential.
• Visual verification of data is deemed
critical to data quality no matter what
alternative form of sophisticated validation
is undertaken. (Day, Fayers & Harvey
1998)
Clustering of Data Errors
• Goldberg, Niemierko & Turchin (2008) in their
analysis of research databases identified that
data entry errors were often grouped around location in
the database fields in the data entry forms.
• The presence of one data error on the
demographic information screen greatly
increased the probability of another data error
in another field on the same screen.
• This justifies the need to increase vigilance in areas
surrounding where errors are found.
Double Data Entry or Not???
• Double data entry is the primary data
verification method utilised in clinical trials to
ensure quality (Kleinman 2001, Day, Fayers & Harvey 1998).
• An increasing debate has arisen in the literature
over the need for complete double data entry.
• The controversy centres on the level and type
of errors identified against the additional cost
and effort involved. Gibson et al. (1994) and
several other studies highlighted that the
typical gain in data quality cannot justify the
cost of complete double data entering.
Alternative to Double Data Entry
• Klienman (2001) presents the adaptive double
data entry system (ADDER) which helps decide
on which forms for an individual data inputted
needs to be double entered.
• The decision is based on the estimated
probability that a forms contain errors. The
probability is calculated on the rate of errors
identified in the previous group of double
entered forms.
• ADDER offers increased data quality at a
minimal cost for those who believe complete
double date entry is not necessary.
• .
Alternative continued
• Target auditing of specific documents was raised by
Rostami, Nahm and Pieper (2009).
• Auditing only “critical data” in terms of the data
imputer’s record of errors.
• They argue that it is reducing the “critical variable
error rate” which is more important than the “overall
error rate” when examining data entry.
Why one would waste resources auditing data that is “not critical to the final
analysis” when this would decrease the emphasis on the essential items?
• The Institute of Medicine cautions that, errors or
spurious data found on examination of any part of
your dataset will call into question the entire dataset
The Approach Taken for PRINCE
• King and Lashley (2000) presented a validation
approach which avoids double data entry.
• Using single data entry in conjunction with
visual record verification of selected records
identified using a statistically developed
continuous sampling plan (CSP).
• Rate of visual verification is determined by 2
factors the proportion of errors identified and the
anticipated average outgoing quality.
• From this information the CSP was developed
CSP Plan
• The CSP developed in which (i) indicates the
number of consecutive records needed to be
clear of errors before the sampling of a fraction
of the records can commence (f).
• Once an error is found in the records 100%
checking re commences until (i) number of clear
consecutive records have been found.
• This CSP approach is reported to; reduces time
associated with double data entry, it enables the
calculation of the gain in data record quality and
demonstrates a large improvement in data
quality over single data entry alone
Continuous Sampling Plan PRINCE
A CSP-1 gives the number of successive records with no
data entry errors that must be inspected i before a
random sample fraction f of records will begin. Whenever
an error is found, the error is corrected and the
successive record checking using i is repeated.
An incoming data field error rate of 0.4% was calculated
from a visual inspection of 2490 completed fields (9 field
errors; none on primary outcome). To maintain an
average outgoing quality (AOQ) of 0.4%, a CSP-1 plan of i
= 15 and f = 0.2 (20%) was implemented
An important principle of data entry quality
was raised by Day, Fayers & Harvey (1998) who
stated that;
“building quality controls into the system is more productive than
just adding on checks onto the end”.
•This is further supported by Rostami ,Nahm
and Pieper (2009) who concluded that:
much higher data quality can be achieved by undertaking
sequenced small audits across the duration of data entry, rather
than a large scale audit of the data when all the data has been
entered.
• The PRINCE team support this since,
repeated errors in inputting on the same
component of the form were highlighted
early. Thus corrective action was put in
place to eliminate recurring errors.
• The continuous sampling plan presented
by King & Lashley (2000) integrates very
well with this concept allowing small audits
and continuous corrective action to ensure
data quality.
Questions & Discussion