Research Data Marts In Support Of Cancer
Download
Report
Transcript Research Data Marts In Support Of Cancer
Research Data Marts In Support
Of Cancer Personalized
Medicine
Jack London, PhD and Devjani Chatterjee, PhD
Jefferson Kimmel Cancer Center, Philadelphia PA
Mid-Atlantic Healthcare Informatics Symposium, April 25,
2014
1
“Cancer”
From the late 14th to the late 19th century, the word “apoplexy”
referred to any sudden death that began with a sudden loss of
consciousness. Ruptured aortic aneurysms, and even heart attacks
and strokes were referred to as apoplexy in the past.
Like the term “apoplexy,” the word “cancer” was used broadly in the
20th century to describe people having unrestrained tumor growth.
We now know that “cancer” refers to different diseases, with different
cellular mechanisms. Although “cancer” was (and still is) often
differentiated by its anatomic primary site of origin, such as “breast
cancer,” the genomic alterations and pathways affected for a patient
are more directly related to the cause and treatment of their “cancer.”
All breast cancer patients do not have the same disease mechanism,
and therefore all will not respond to the same treatment.
Cancer Personalized Medicine
• Cancers are often highly heterogeneous with many different subtypes.
These subtypes confer different outcomes including prognosis,
response to treatments, recurrence, and metastasis.
• These subtypes are often associated with different genetic mutations,
epigenetic events, gene expression profiles, molecular signatures,
tissue and organ morphologies, and clinical phenotypes.
• Effective treatment, and the research needed to develop these
treatments, requires a personalized characterization of cancer patients,
including their genetic, molecular and clinical data.
• Cancer research, diagnosis, and treatment also often require
biospecimens to obtain these data which characterizes the patient.
Research Data Mart
o Cancer translational research and cancer treatment now requires
clinical data describing the diagnoses, treatments, and outcomes for
patient populations. Additionally, genomic and other data – such as
available research biospecimens – need to be integrated with the
patient’s clinical data.
o A research data mart (RDM) is a data repository (i.e., database) that
integrates clinical and research data for use by investigators.
o The data are often de-identified (or sometimes anonymized). The deidentified data may be re-identified by an honest broker.
o Possible uses of RDMs are
• hypothesis generation
• cohort identification
Information Flow from Source to Data Mart
CLINICAL DATA
SOURCES
RESEARCH DATA
SOURCES
Inpatient EMR
Tumor Registry
Outpatient EHR*
Biospecimens
Billing
CLINICAL
DATA
WAREHOUSE
i2b2
RESEARCH
DATA MART
(de-identified)
Clinical Trials*
Clinical/AP Lab
“Omic” NGS
Pharmacy
*Data feed under development
Work Flow for Research Data Access
Current Jefferson i2b2 RDM
•
RDM data are de-identified. Re-identification
possible via honest broker.
• Currently > 34 million observations on > 400,000
patients. Data refreshed weekly.
• Built on “informatics for integrating biology and the
bedside” (i2b2) framework from NIH-funded National
Center for Biomedical Computing based at Partners
HealthCare System.
Available cancer patient and specimen annotation includes
• Demographics
gender
race
ethnicity
vital status (alive, deceased)
• Primary cancer diagnosis (ICD-03)
age at diagnosis
date of diagnosis
primary tumor sequence
survival (months from diagnosis)
primary disease site (ICD-03)
histology (ICD-03)
AJCC stage (clinical and path)
grade
TNM (clinical and pathological)
Recurrence (distant, local, regional)
•
• Multiple Primary Diagnoses
• Treatment
chemotherapy
diagnostic (biopsy)
endocrine
palliative
radiation
surgery
transplant
• Site-specific factors, including
ER, PR, HER2
CEA, KRAS, CA 19-9, PSA
Gleason score
• Specimen anatomic origin
• Specimen class (tissue, fluid), path
(normal, malignant), type (frozen, fixed,
paraffin block)
• reports (surgical pathology, cytology,
molecular/genomic diagnostics)
Available Genomic Annotation Includes
GENES
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
ABL1
APC
ATM
BRAF
CSF1R
ERBB4
FBXW7
FGFR2
FGFR3
FLT3
G11
GQ
HNF1A
HRAS
IDH1
JAK3
KDR
KRAS
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
MET
MPL
NOTCH1
NPM1
NRAS
PDGFRA
PIK3CA
PTEN
RB1
RET
SMAD4
SMO
SRC
STK11
TP53
VHL
Patient genomic annotation includes:
• gene mutation result (POSITIVE or NEGATIVE)
• alternate allele frequency
• mutation type
• nucleotide change
• protein change
• COSMIC ID
• DBSNP ID
Drag-and-drop i2b2 query tool
Biospecimen Ontology
solid, fluid, etc.
malignant, normal
frozen, paraffin, etc.
Chemo
Tumor Registry – Primary Breast Cancer Annotation
How many patients are ER-PR-Her2 negative, with infiltrating breast cancer, and
have frozen tissue available for researchers?
Demographics of patients that are ER-PR-Her2 negative, with infiltrating breast
cancer, and have frozen tissue available for researchers
Honest brokers have tools to re-identify data.
Current Usage of Jefferson i2b2 RDM
• i2b2 query interface is the primary access portal for
identifying biospecimens available for research.
• RDM provides cohort size estimates for prospective studies
– grant applications
– design phase of clinical trials (estimate of recent patient
population satisfying proposed eligibility rules)
• RDM provides comprehensive patient annotation for
ongoing research projects.
– Next Generation Sequencing (NGS) studies to discover
cancer biomarkers