Use of i2b2 research data mart for clinical trial accrual prediction
Download
Report
Transcript Use of i2b2 research data mart for clinical trial accrual prediction
Research Data Analytics at
Thomas Jefferson University
Jack London, PhD
Thomas Jefferson University
Sidney Kimmel Cancer Center
Philadelphia PA USA
2015 i2b2 European Academic User Group meeting
October 6, 2015
Disclaimer
In addition to my faculty position at Thomas
Jefferson University in Philadelphia, I am a
consultant for TriNetX Corporation.
2
Thomas Jefferson University and the Sidney Kimmel Cancer
Center (SKCC), Philadelphia
JMC is the second largest private
medical school in the U.S.
Jefferson Medical
College (JMC) was
founded in 1824.
Located between New York
City and Washington DC
The NCI-designated
SKCC has ~ 400 physicians and scientists
dedicated to discovery and development
of novel approaches for cancer
treatment.
3
SKCC’s IT infrastructure
GE Centricity inpatient
EMR
EPIC inpatient and outpatient
Allscripts outpatient
(ambulatory care) EHR
Cerner A/P lab system
EPIC Beaker
OpenSpecimen research biobank management
TIES clinical text extraction
i2b2 research data mart
TriNetX data analytics network
4
Current Jefferson Data Resource Landscape
i2b2 RESEARCH DATA MART
OPEN SPECIMEN
biospecimen annotation (SNOMED)
TJUH CLINICAL DATA WAREHOUSE
DEMOGRAPHICS
(gender, race, age, vital status, ethnicity)
IMPAC METRIQ
cancer registry site, stage, histology,
treatment, survival (ICD-O-3)
DIAGNOSES (ICD9)
PROCEDURES (ICD9)
CLINICAL LABS (LOINC)
CERNER A/P
“omic” data
MEDICATIONS
FORTE ONCORE
clinical trial data
5
Jefferson’s i2b2 Research Data Mart
• Built on “informatics for integrating biology and the bedside”
(i2b2) version 1.7.02
• RDM data are de-identified. Re-identification possible via an
honest broker, who has access to a re-identification
application.
• Currently > 45 million observations on > 450,000 patients.
Data refreshed weekly.
6
Patient data obtained from TJUH EMR
DEMOGRAPHICS
Age
Ethnicity
Gender
Race
Vital Status (alive/dead)
DIAGNOSES
Disease systems --> diseases (organized by ICD9 coding)
CLINICAL LAB RESULTS
Chemistry
Coagulation
Hematology
MEDICATIONS
Anti-neoplastic
INPATIENT PROCEDURES
Diagnostic and Treatment procedures (organized by ICD9 coding)
7
Patient mutation data obtained from Pathology Molecular
Diagnostic Testing (both outsourced and in-house)
ALK
rearrangement
BRAF c.1782T>G p.D594E
BRAF c.1801A>G p.K601E
BRAF c.1799T>A p.V600E
EGFR Deletion in exon 19
EGFR Insertion in exon 20
EGFR c.2236G>A p.E746K
EGFR c.2236_2250del15
p.E746_A750delELREA
EGFR c.2156G>C p.G719A
EGFR c.2155G>T p.G719C
EGFR c.2155G>A p.G719S
EGFR c.2573T>G p.L858R
EGFR c.2582T>A p.L861Q
EGFR c.2303G>T p.S768I
KRAS c.35G>C
KRAS c.34G>T
KRAS c.35G>A
KRAS c.34G>C
KRAS c.34G>A
KRAS c.35G>T
KRAS c.38G>A
p.G12A
p.G12C
p.G12D
p.G12R
p.G12S
p.G12V
p.G13D
NRAS c.183A>T
NRAS c.181C>A
NRAS c.182A>T
NRAS c.182A>G
p.Q61H
p.Q61K
p.Q61L
p.Q61R
PIK3CA
PIK3CA
PIK3CA
JAK2 c.1849G>T
p.V617F
JAK3 c.2164G>A p.V722I
c.843C>A
c.811G>T
c.857A>C
c.400T>C
c.734G>A
c.388C>G
c.524G>A
c.817C>T
c.818G>A
c.318C>G
c.659A>G
c.707A>G
p.D281E
p.E271*
p.E286A
p.F134L
p.G245D
p.L130V
p.R175H
p.R273C
p.R273H
p.S106R
p.Y220C
p.Y236C
c.1633G>A p.E545K
c.3140A>T p.H1047L
c.3140A>G p.H1047R
PTEN c.754G>T
PTEN c.59G>A
RET
TP53
TP53
TP53
TP53
TP53
TP53
TP53
TP53
TP53
TP53
TP53
TP53
p.D252Y
p.G20E
rearrangement
ROS1 rearrangement
SMAD4
c.1157G>A
p.G386D
8
Molecular Diagnostics ontology
9
Specimen annotation from campus biobanks
Eight biobanks, including the TJUH paraffin block archive of ~400,000 cases since
1990.
Anatomic origin (SNOMED)
Class (tissue, fluid)
Type (frozen, FFPE)
Pathology (normal, malignant, diseased)
Slide images
10
Specimen annotation management
TJUH clinical paraffin block
archive
Pathology Department
research tissue bank
(J. Evans, PI)
Pancreatic tumor bank
JJJjjjj
Brain tumor bank
Jefferson integrated
Research Specimen
management
(OpenSpecimen)
> 100,000 patients
> 230,000 patients
> 650,000 specimens
(C. Yeo, PI)
Breast tumor bank
(J. Palazzo, PI)
Thyroid tumor bank
(E. Pribitkin, PI)
Brain tumor bank
via i2b2 RDM
Cancer patients having
comprehensive annotation from
the Tumor Registry and banked
specimens
(D. Andrews, PI)
Liver tumor bank
(V. Navarro, PI)
11
Biospecimen ontology
12
Pathology images are available via i2b2 query tool
13
Patient data from Jefferson Tumor Registry
Over 100,000 cases since 1990.
Primary Cancer Diagnosis
Age at diagnosis/date of diagnosis
Survival (months) from diagnosis
Tumor histology and behavior
Stage (AJCC/TNM, clinical and pathological)
Grade
Recurrence
local, distant
Treatment
chemotherapy, radiation, surgery, transplant, palliative
Disease-specific factors
ex: (prostate --> Gleason score)
14
Tumor Registry ontology
15
Typical SKCC Investigator Queries
Example #1:
Form cohort of “triple negative” (estrogen receptor, progesterone receptor, and
her2 negative), African American patients, having matched normal and malignant
frozen tissue specimens.
Example #2:
Form cohort of patients with a primary diagnosis of papillary thyroid cancer, and
expressing a V600E BRAF mutation.
16
Additional data on selected cohort can be retieved
17
Example data summaries from the i2b2 RDM
CLINICAL DIAGNOSES OF TJUH PATIENTS WITH THYROID SPECIMENS
18
Jefferson – TriNetX project
In the fall of 2014, the SKCC informatics group entered into a collaboration with a
Cambridge, Massachusetts based start-up company, TriNetX, Inc.
TriNetX facilitates collaboration between pharmaceutical companies and academic
healthcare providers through the creation of a global, federated data network that
connects academic and industry clinical researchers in real-time to the patient
populations they are attempting to study.
The TriNetX applications accesses a site’s i2b2 database, and displays aggregate
query results in an advanced, flexible manner.
19
TriNetX application offers an alternative query tool
with enhanced data visualization
Google-like query interface
Graphic result display
20
TriNetX application offers an alternative query
toolwith enhanced data visualization
Interactive display
capability
21
Cohort definition via i2b2 can be used to predict
accrual for proposed clinical trials
22
Problem confronting clinical trials research: studies
that fail to accrue
An Institute of Medicine report1 on cancer cooperative group trials found that 40% were
never completed because of failure to achieve minimum accrual goals:
“The ultimate inefficiency is a clinical trial that is never completed because of
insufficient patient accrual, and this happens far too often.”
These non-accruing trials are often kept open for many months before closure,
consuming personnel resources in their setup and operation at a significant cost to
institutions, without providing any return in definitive research findings.
Furthermore, while many of these trials register zero patients, others accrue some
patients, resulting in thousands of patients nationwide who are recruited to unproductive
research studies.2
1. Nass SJ, Moses HL, Mendelsohn J, editors. Committee on Cancer Clinical Trials and the NCI Cooperative Group Program Board on Health Care Services; A National Cancer Clinical Trials System for
the 21st Century: Reinvigorating the NCI Cooperative Group Program. Washington DC: National Academies Press, 2010.
2. Cheng, S., M. Dietrich, S. Finnigan, A. Sandler, J. Crites, L. Ferranti, A. Wu, and D. Dilts. A sense of urgency: Evaluating the link between clinical trial development time and the
accrual performance of CTEP-sponsored studies. 2009 ASCO Annual Meeting Proceedings. J of Clinical Oncology, 2009.
23
Study design
The overall objective of this study was to evaluate whether accrual for proposed
cancer clinical trials could be predicted by performing cohort queries that are
based on the trial’s eligibility criteria on recent patient data in Jefferson’s i2b2
research data mart (RDM), created from de-identified integrated hospital clinical,
tumor registry, and specimen data.
To determine the ability of the i2b2 RDM to predict accrual for prospective trials,
we retrospectively used the RDM to obtain patient populations for two years prior
to recent trials and compared these cohort sizes to the actual accrual observed
after the trial was opened. We considered 90 interventional cancer trials opened at
KCC in the years 2008, 2009, and 2010, since these have been open for at least
two years and their accrual performance could be evaluated.
24
Study methodology
o We constructed RDM cohort queries corresponding to the trial eligibility criteria for
the two years prior to each trial’s opening (e.g., we considered TJUH patient
populations from 2007 and 2008 for trials opened in 2009).
o We computed an annual cohort size by averaging the 2-year totals.
o We then compared our RDM annual cohort size for the 2 years preceding a trial’s
opening to the annual target goal for that trial and the trial’s actual accrual
performance.
• Since we initially assumed that 50% of eligible participants would enroll in a study, the
RDM cohort would have to be at least twice the accrual goal for a prediction of
“successful” trial accrual.
• We defined a trial’s actual accrual performance as “successful” if it accrued at least 80%
of its target enrollment.
25
Results
To assess the predictive precision of our proposed project, a contingency table was produced for the 90 trials
analyzed.
•A trial was denoted as potentially successful in meeting its annual target accrual (“PREDICTED SUCCESS” row)
if the retrospective i2b2 cohort analysis indicated sufficient patients for the trial.
•A trial was denoted as actually successful in meeting its annual target accrual if the trial satisfactorily approached
the protocol’s stated target annual accrual (“ACTUAL SUCCESS” column).
Contingency table comparing i2b2 accrual predictions with actual accrual success, assuming only 50% of potential
participants identified by i2b2 are enrolled.
Our methodology has 0.969 (= 31/32 trials) accuracy (95% C.I. (0.908, 1)) for predicting successful accrual (i.e.
specificity) and 0.397(= 23/58 trials) accuracy (95% C.I. (0.271, 0.522)) for predicting failed accrual (i.e.
sensitivity). The positive predictive value, or precision rate, is 0.958 (= 23/24 trials) (95% C.I. (0.878, 1)).
26
Results
Our results show that the methodology, while having an excellent positive predictive
value (95.8%, predicted failure for 23 of the 24 trials that actually failed ), is not good at
predicting failed accrual (39.7%, 23/58 trials).
In other words:
if the methodology predicts "failed accrual," then we should trust
this prediction and should not proceed to open the trial with its
current eligibility criteria;
however, a prediction of accrual success using this method is no
guarantee that target goals will be met.
27
How can this methodology be useful?
A benefit of analyzing potential trial accrual during the protocol design phase is that it
offers an opportunity to “tweak” eligibility rules when insufficient patient cohorts are
found.
A change in participation criteria that does not impact significantly on the scientific
objectives of the trial may provide a sufficiently large potential patient pool.
Not opening the 23 trials that were correctly predicted to fail to
accrue over the 3 years studied would have prevented the waste
of about $200,000 in trial startup costs alone, and the
participation of 57 patients in studies which did not contribute to
advancing science or clinical care.
28
Selected areas of research using
RDM:
Hallgeir Rui, MD, PhD: Molecular Cancer Epidemiology, cancer
pharmacogenetics, individualised cancer risk assessment and
prognostication.
Raphael E. Bonita, MD: Jefferson Heart Institute, correlation of
troponin levels and heart failure in transplant patients.
Hushan Yang, PhD: Molecular Cancer Epidemiology.
Jordan Winter, MD: Surgery, whipple procedure survival study.
Scott Waldman, MD, PhD: Pharmacology and experimental
therapeutics.
Ron Myers, PhD:
Gene environmental risk assessmant.
Stephen Peiper, MD: Biomarker discovery using Next
Generation Sequencing.