The Linked Clinical Data Project - Medical informatics at Mayo Clinic
Download
Report
Transcript The Linked Clinical Data Project - Medical informatics at Mayo Clinic
Strategic Health IT Advanced Research
Projects (SHARP)
Area 4: Secondary Use of EHR Data
Project 3: High-Throughput Phenotyping
Jyotishman Pathak, PhD
Assistant Professor of Biomedical Informatics
June 11, 2012
Project 3: Collaborators & Acknowledgments
• CDISC (Clinical Data Interchange Standards Consortium)
• Rebecca Kush, Landen Bain
• Centerphase Solutions
• Gary Lubin, Jeff Tarlowe
• Group Health Seattle
• David Carrell
• Harvard University/MIT
• Guergana Savova, Peter Szolovits
• Intermountain Healthcare/University of Utah
• Susan Welch, Herman Post, Darin Wilcox, Peter Haug
• Mayo Clinic
• Cory Endle, Rick Kiefer, Sahana Murthy, Gopu
Shrestha, Dingcheng Li, Gyorgy Simon, Matt Durski,
Craig Stancl, Kevin Peterson, Cui Tao, Lacey Hart, Erin
Martin, Kent Bailey, Scott Tabor
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-2
SHARPn High-Throughput Phenotyping
Phenotyping is still a bottleneck…
[Image from Wikipedia]
©2012 MFMER | slide-4
EHR systems: United States 2002—2011
[Millwood et al. 2012]
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-5
Electronic health records (EHRs) driven
phenotyping
• EHRs are becoming more and more prevalent
within the U.S. healthcare system
• Meaningful Use is one of the major drivers
• Overarching goal
• To develop high-throughput automated
techniques and algorithms that operate on
normalized EHR data to identify cohorts of
potentially eligible subjects on the basis of
disease, symptoms, or related findings
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-6
http://gwas.org
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-7
EHR-driven Phenotyping Algorithms - I
• Typical components
•
•
•
•
•
•
•
Billing and diagnoses codes
Procedure codes
Labs
Medications
Phenotype-specific co-variates (e.g., Demographics,
Vitals, Smoking Status, CASI scores)
Pathology
Imaging?
• Organized into inclusion and exclusion criteria
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-8
EHR-driven Phenotyping Algorithms - II
Rules
Evaluation
Phenotype
Algorithm
Transform
Mappings
Visualization
Transform
Data
NLP, SQL
SHARPn High-Throughput Phenotyping
[eMERGE Network]
©2012 MFMER | slide-9
Example: Hypothyroidism Algorithm
No thyroid-altering medications (e.g., Phenytoin, Lithium)
2+ non-acute visits in 3 yrs
ICD-9s for
Hypothyroidism
Abnormal
TSH/FT4
Thyroid replace. meds
Antibodies for
TTG or TPO
(anti-thyroglobulin,
anti-thyroperidase)
No ICD-9s for
Hypothyroidism
No
Abnormal
TSH/FT4
No thyroid replace. meds
No Antiboides for TTG/TPO
No secondary causes (e.g., pregnancy, ablation)
No hx of myasthenia gravis
Case 1
Case 2
Control
[Denny et al., 2012]
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-10
Hypothyroidism Algorithm: Validation
Positive Predictive Values (PPV) Based on Chart
Review – All Sites
EHR-based
Cases/Controls
Sampled for
Chart Review
Cases/Controls
Old Case
PPV (%)
New Case
PPV (%)
Group Health
430/1,188
50/50
92
98
Marshfield
509/1193
50/50
88
91
Mayo Clinic
250/2,145
100/100
76
97
103/516
50/50
88
98
184/1,344
50/50
90
98
1,421/6,362
—
87
96
Site
Northwestern
Vanderbilt
All sites
[Denny et al., 2012]
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-11
Data Categories used to define the EHR-driven
Phenotyping Algorithms
Clinical gold
EHR-derived Phenotype Validation
standard
phenotype
Definitions (PPV/NPV)
Alzheimer’s
Dementia
Demographics, clinical
examination of mental
status, histopathologic
examination
Diagnoses,
medications
Demographics,
laboratory tests,
radiology reports
Cataracts
Clinical exam finding
(Ophthalmologic
examination)
Diagnoses,
procedure codes
Demographics,
medications
98%/98%
Peripheral Arterial
Disease
Radiology test results
(ankle-brachial index or
arteriography)
Diagnoses,
Demographics
procedure codes,
medications,
radiology test results
94%/99%
Type 2 Diabetes
Laboratory Tests
Diagnoses, laboratory Demographics,
tests, medications
height, weight,
family history
98%/100%
Cardiac
Conduction
ECG measurements
ECG report results
73%
Demographics,
97%
diagnoses,
procedure codes,
medications,
[eMERGE Network]
laboratory tests
©2012 MFMER | slide-12
Genotype-Phenotype Association Results
disease
Atrial fibrillation
Crohn's disease
Multiple sclerosis
Rheumatoid arthritis
Type 2 diabetes
marker
gene /
region
rs2200733
Chr. 4q25
rs10033464
Chr. 4q25
rs11805303
IL23R
rs17234657
Chr. 5
rs1000113
Chr. 5
rs17221417
NOD2
rs2542151
PTPN22
rs3135388
DRB1*1501
rs2104286
IL2RA
rs6897932
IL7RA
rs6457617
Chr. 6
rs6679677
RSBN1
rs2476601
PTPN22
rs4506565
TCF7L2
rs12255372
TCF7L2
rs12243326
TCF7L2
rs10811661
CDKN2B
rs8050136
FTO
rs5219
KCNJ11
rs5215
KCNJ11
rs4402960
IGF2BP2
0.5
0.5
published
1.0
2.0
Odds Ratio
SHARPn High-Throughput Phenotyping
observed
5.0
5
[Ritchie
et al. 2010]
©2012 MFMER | slide-13
Key lessons learned from eMERGE
• Algorithm design and transportability
•
•
•
•
Non-trivial; requires significant expert involvement
Highly iterative process
Time-consuming manual chart reviews
Representation of “phenotype logic” for transportability
is critical
• Standardized data access and representation
• Importance of unified vocabularies, data elements, and
•
•
value sets
Questionable reliability of ICD & CPT codes (e.g., billing
the wrong code since it is easier to find)
Natural Language Processing (NLP) is critical
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-14
Algorithm Development Process - Modified
Rules
Semi-Automatic Execution
Evaluation
Phenotype
Algorithm
Transform
Mappings
Visualization
Transform
Data
NLP, SQL
SHARPn High-Throughput Phenotyping
[eMERGE Network]
©2012 MFMER | slide-15
Algorithm Development Process - Modified
•
Standardized and structured
representation of phenotype
definition criteria
Use the NQF Quality Data
Model (QDM)
•
Rules
•
Conversion of structured
phenotype criteria into
executable queries
Evaluation
• Use JBoss® Drools (DRLs)
Semi-Automatic Execution
Phenotype
Algorithm
• Standardized representation of
Visualization
•
Transform
Mappings
Transform
clinical data
Create new and re-use existing
clinical element models (CEMs)
Data
NLP, SQL
SHARPn High-Throughput Phenotyping
[Welch et al. 2012]
[Thompson et al., submitted 2012]
[Li et al., submitted 2012]
©2012 MFMER | slide-16
The SHARPn “phenotyping funnel”
CEMs
Mayo Clinic
EHR
QDMs
Intermountain
EHR
DRLs
Phenotype specific
patient cohorts
[Welch et al. 2012]
[Thompson et al., submitted 2012]
[Li et al., submitted 2012]
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-17
Clinical Element Models
Higher-Order Structured Representations
[Stan Huff, IHC]
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-18
Pre- and Post-Coordination
[Stan Huff, IHC]
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-19
CEMs available for patient demographics,
medications, lab measurements, procedures etc.
SHARPn High-Throughput Phenotyping
[Stan Huff, IHC]
SHARPn data
normalization flow - I
CEM MySQL database
with normalized patient
information
[Welch et| al.
2012]
©2012 MFMER
slide-21
SHARPn data normalization flow - II
CEM MySQL database
with normalized patient
information
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-22
Algorithm Development Process - Modified
•
Standardized and structured
representation of phenotype
definition criteria
Use the NQF Quality Data
Model (QDM)
•
Rules
Semi-Automatic Execution
Evaluation
Phenotype
Algorithm
• Standardized representation of
Visualization
•
Transform
Mappings
Transform
clinical data
Create new and re-use existing
clinical element models (CEMs)
Data
NLP, SQL
SHARPn High-Throughput Phenotyping
[Welch et al. 2012]
[Thompson et al., submitted 2012]
[Li et al., submitted 2012]
©2012 MFMER | slide-23
Our task: human readable machine
computable
[Thompson et al., submitted 2012]
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-24
NQF Quality Data Model (QDM)
• Standard of the National Quality Forum (NQF)
• A structure and grammar to represent quality measures in
a standardized format
• Groups of codes in a code set (ICD-9, etc.)
• "Diagnosis, Active: steroid induced diabetes" using
"steroid induced diabetes Value Set GROUPING
(2.16.840.1.113883.3.464.0001.113)”
• Supports temporality & sequences
• AND: "Procedure, Performed: eye exam" > 1 year(s)
starts before or during "Measurement end date"
• Implemented as set of XML schemas
• Links to standardized terminologies (ICD-9, ICD-10,
SNOMED-CT, CPT-4, LOINC, RxNorm etc.)
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-25
116 Meaningful Use Phase I Quality Measures
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-26
Example: Diabetes & Lipid Mgmt. - I
Human readable HTML
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-27
Example: Diabetes & Lipid Mgmt. - II
Computable XML
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-28
NQF Measure Authoring Tool (MAT)
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-29
Algorithm Development Process - Modified
•
Standardized and structured
representation of phenotype
definition criteria
Use the NQF Quality Data
Model (QDM)
•
Rules
•
Conversion of structured
phenotype criteria into
executable queries
Evaluation
• Use JBoss® Drools (DRLs)
Semi-Automatic Execution
Phenotype
Algorithm
• Standardized representation of
Visualization
•
Transform
Mappings
Transform
clinical data
Create new and re-use existing
clinical element models (CEMs)
Data
NLP, SQL
SHARPn High-Throughput Phenotyping
[Welch et al. 2012]
[Thompson et al., submitted 2012]
[Li et al., submitted 2012]
©2012 MFMER | slide-30
JBoss® open-source Drools rules based
management system (RBMS)
• Represents knowledge with
declarative production rules
• Origins in artificial intelligence
expert systems
• Simple when <pattern> then
<action> rules specified in
text files
• Separation of data and logic
into separate components
• Forward chaining inference
model (Rete algorithm)
• Domain specific languages
(DSL)
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-31
Example Drools rule
{Rule Name}
rule
when
"Glucose <= 40, Insulin On“
{binding}
{Java Class}
{Class Getter Method}
$msg : GlucoseMsg(glucoseFinding <= 40,
currentInsulinDrip > 0 )
then
{Class Setter Method}
glucoseProtocolResult.setInstruction(GlucoseInstructions
GLUCOSE
_LESS_THAN_40_INSULIN_ON_MSG);
end Parameter {Java Class}
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-32
Automatic translation from NQF QDM
criteria to Drools
Measure
Authoring
Toolkit
From non-executable to
executable
Drools
Engine
Measures
XML-based
Structured
representation
Data Types
XML-based
structured
representation
Value Sets
Converting measures to
Drools scripts
Mapping data types
and value sets
Drools
scripts
Fact
Models
saved in XLS
files
[Li et al., submitted 2012]
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-33
Automatic translation from NQF QDM
criteria to Drools
[Li et al., submitted 2012]
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-34
The “executable” Drools flow
©2012 MFMER | slide-35
Phenotype library and workbench - I
http://phenotypeportal.org
1. Converts QDM to Drools
2. Rule execution by querying
the CEM database
3. Generate summary reports
©2012 MFMER | slide-36
Phenotype library and workbench - II
http://phenotypeportal.org
©2012 MFMER | slide-37
Phenotype library and workbench - III
http://phenotypeportal.org
©2012 MFMER | slide-38
Phenotype library and workbench - IV
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-39
SHARPn High-Throughput Phenotyping
Additional on-going research efforts - I
• Machine learning and
association rule mining
• Manual creation of
algorithms take time
• Let computers do the
“hard work”
• Validate against
expert developed
ones
[Caroll et al. 2011]
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-41
Additional on-going research efforts - I
•
•
•
•
•
Patien TB
t
Origins from sales data
Items (columns): co-morbid conditions
Transactions (rows): patients
Itemsets: sets of co-morbid conditions
Goal: find all itemsets (sets of
conditions) that frequently co-occur in
patients.
• One of those conditions should be DM.
DL
M
ND … IEC
001
Y
Y
Y
Y
002
Y
Y
Y
Y
003
Y
Y
004
Y
005
A
Y
Y
B
Y
C
D
• Support: # of transactions the itemset I
appeared in
• Support({TB, DLM, ND})=3
• Frequent: an itemset I is frequent, if
support(I)>minsup
AB
AC
ABD
AD
BC
BD
CD
ACD
X: infrequent
[Simon et al. 2012]
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-42
Additional on-going research efforts - II
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-43
Additional on-going research efforts - II
TRALI/TACO sniffer
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-44
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-45
Active Surveillance for TRALI and TACO
Of the 88 TRALI cases correctly
identified by the CART algorithm, only
11 (12.5%) of these were reported to
the blood bank by the clinical service.
Of the 45 TACO cases correctly
identified by the CART algorithm, only 5
(11.1%) were reported to the blood bank
by the clinical service.
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-46
Additional on-going research efforts - III
• Phenome-wide association scan (PheWAS)
• Do a “reverse GWAS” using EHR data
• Facilitate hypothesis generation
[Pathak et al. submitted 2012]
©2012 MFMER | slide-47
Publications till date (conservative)
14
12
12
10
8
6
8
6
Papers
Abstracts
Under review
6
4
2
2
0
Year 1 (2011)
Year 2 (2012)
Year 3 (2013)
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-48
Mayo projects and collaborations
• Ongoing
• Transfusion related acute lung injury (Kor)
• Drug induced liver injury (Talwalkar)
• Drug induced thrombocytopenia and neutropenia (Al-Kali)
• Active surveillance for celiac disease (Murray)
• Warfarin dose response & heartvalve replacements (Pereira)
• Phenotype definition standardization (HCPR/Quality)
• Getting started/planning
• Pharmacogenomics of systolic heart failure
•
•
•
(Bielinski/Pereira)
Pharmacogenomics of SSRI (Mrazek/Weinshilboum)
Lumbar image reporting with epidemiology (Kallmes)
Active clinical trial alerting (CTMS/Cancer Center)
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-49
HTP related presentations
• June 11th, 2012
• Using EHRs for clinical research (Vitaly Herasevich)
• Association rule mining and T2D risk prediction (Gyorgy Simon)
• Scenario-based requirements engineering for developing EHR add-ons
to support CER in patient care settings (Junfeng Gao)
• June 12th, 2012
• Exploring patient data in context clinical research studies: Research
•
•
•
•
Data Explorer (Adam Wilcox et al.)
Utilizing previous result sets as criteria for new queries with FURTHeR
(Dustin Schultz et al.)
Semantic search engine for clinical trials (Yugyung Lee)
Knowledge-driven workbench for predictive modeling (Peter Haug et al.)
Clinical analytics driven care coordination for 30-day readmission –
Demonstration from 360 Fresh.com (Ramesh Sairamesh)
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-50
Thank You!
[email protected]
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-51