EHRs - Mayo Clinic

Download Report

Transcript EHRs - Mayo Clinic

High-Throughput Phenotyping and
Cohort Identification from Electronic
Health Records for Clinical and
Translational Research
Jyotishman Pathak, PhD
Assistant Professor of Biomedical Informatics
Health Sciences Research Grand Rounds
April 23, 2012
Background – The Problem
• Patient recruitment is a huge bottleneck step in
conducting successful clinical research studies
• 50% of time is spent in recruitment
• Low participant rates (~ 5%); studies are
underpowered
• Clinicians: lack resources to help patients find
appropriate studies and trials
• Patients: face difficultly to find appropriate studies
that are locally available
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-2
Background – Use Cases
• Large-scale genomics research
• Linking biospecimens and genetic data to personal
•
health data via biorepositories
Need large sample sizes for study design
• Population-based epidemiological studies in
understanding disease etiology
• Often limited in scope or population diversity
• Quality metrics and HITECH Act
• Pay-for-Performance and quality-based incentives
• Population management and cohort identification is nontrivial
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-3
Electronic health records (EHRs) driven
phenotyping – The Proposed Solution
• EHRs are becoming more and more prevalent
within the U.S. healthcare system
• Meaningful Use is one of the major drivers
• Overarching goal
• To develop techniques and algorithms that
operate on normalized EHR data to identify
cohorts of potentially eligible subjects on the
basis of disease, symptoms, or related findings
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-4
Advantages: EHR-derived phenotyping
• There is a LOT of information about subjects
• Demographics, labs, meds, procedures, clinical notes…
• Identification of otherwise latent population differences
• Minimal costs for case ascertainment, no studyspecific recruitment
• Records are “retrospectively longitudinal”
• Records are real world and contain many different
phenotypes
• Transportability and reuse of phenotype definitions
across EHR enabled sites = power for clinical and
research studies
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-5
Challenges: EHR-derived phenotyping
• There is a LOT of information about subjects…
• Non-standardized, heterogeneous, unstructured
•
•
•
data (compared to protocol-based structured
data collection)
Measured (e.g., demographics) vs. un-measured
(e.g., socio-economic status) population
differences
Hospital specialization and coding practices
Population/regional market landscape
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-6
The challenges can be addressed…if we
• Develop techniques for standardization and
normalization of clinical data and phenotypes
• Develop techniques for transforming and
managing unstructured clinical text into
structured representations
• Develop techniques for transportability of EHRdriven phenotyping algorithms
• Develop a scalable, robust and flexible
framework for demonstrating all of the above in
a “real-world setting”
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-7
http://gwas.org
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-8
• Funded by the NHGRI/NIGMS
• Goal: to assess utility of EHRs as resources for genome
science
• Each site includes a biorepository linked to EHRs
• Each project includes informatics, biostatistics, community
engagement, ELSI, genetics experts
• Initial proposals included identifying a primary phenotype of
interest in 3,000 subjects and conduct of a genome-wide
association study at each center: Σ=18,000
• eMERGE Phase II has a target of developing ~40 phenotype
algorithms by the end of 2014
• Algorithm transportability an integral component
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-9
EHR-based Phenotyping Algorithms
• Typical components
•
•
•
•
•
•
•
Billing and diagnoses codes
Procedure codes
Labs
Medications
Phenotype-specific co-variates (e.g., Demographics,
Vitals, Smoking Status, CASI scores)
Pathology
Imaging?
• Organized into inclusion and exclusion criteria
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-10
EHR-based Phenotyping Algorithms
• Iteratively refine case definitions through partial
manual review to achieve ~PPV ≥ 95%
• For controls, exclude all potentially overlapping
syndromes and possible matches; iteratively refine
such that ~NPV ≥ 98%
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-11
Algorithm Development Process
Rules
Evaluation
Phenotype
Algorithm
Transform
Mappings
Visualization
Transform
Data
NLP, SQL
High-Throughput Phenotyping from EHRs
[eMERGE Network]
©2012 MFMER | slide-12
Hypothyroidism: Initial Algorithm
No thyroid-altering medications (e.g., Phenytoin, Lithium)
2+ non-acute visits in 3 yrs
ICD-9s for
Hypothyroidism
Abnormal
TSH/FT4
Thyroid replace. meds
Antibodies for
TTG or TPO
(anti-thyroglobulin,
anti-thyroperidase)
No ICD-9s for
Hypothyroidism
No
Abnormal
TSH/FT4
No thyroid replace. meds
No Antiboides for TTG/TPO
No secondary causes (e.g., pregnancy, ablation)
No hx of myasthenia gravis
Case 1
Case 2
Control
[Denny et al., 2012]
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-13
Hypothyroidism: Initial Algorithm
[Conway et al. 2011]
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-14
Hypothyroidism: Algorithm Refinement
No thyroid-altering medications (e.g., Phenytoin, Lithium)
2+ non-acute visits in 3 yrs
ICD-9s for
Hypothyroidism
Abnormal
TSH/FT4
Thyroid replace. meds
Antiboides for
TTG or TPO
(anti-thyroglobulin,
anti-thyroperidase)
No ICD-9s for
Hypothyroidism
No
Abnormal
TSH/FT4
No thyroid replace. meds
No Antiboides for TTG/TPO
No secondary causes (e.g., pregnancy, ablation)
No hx of myasthenia gravis
Case 1
Case 2
Control
[Denny et al., 2012]
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-15
New Hypothyroidism Algorithm:
Validation
Positive Predictive Values (PPV) Based on Chart
Review – All Sites
EHR-based
Cases/Controls
Sampled for
Chart Review
Cases/Controls
Old Case
PPV (%)
New Case
PPV (%)
Group Health
430/1,188
50/50
92
98
Marshfield
509/1193
50/50
88
91
Mayo Clinic
250/2,145
100/100
76
97
103/516
50/50
88
98
184/1,344
50/50
90
98
1,421/6,362
—
87
96
Site
Northwestern
Vanderbilt
All sites
[Denny et al., 2012]
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-16
Data Categories used to define the EHR-driven
Phenotyping Algorithms
Clinical gold
EHR-derived Phenotype Validation
standard
phenotype
Definitions (PPV/NPV)
Alzheimer’s
Dementia
Demographics, clinical
examination of mental
status, histopathologic
examination
Diagnoses,
medications
Demographics,
laboratory tests,
radiology reports
Cataracts
Clinical exam finding
(Ophthalmologic
examination)
Diagnoses,
procedure codes
Demographics,
medications
98%/98%
Peripheral Arterial
Disease
Radiology test results
(ankle-brachial index or
arteriography)
Diagnoses,
Demographics
procedure codes,
medications,
radiology test results
94%/99%
Type 2 Diabetes
Laboratory Tests
Diagnoses, laboratory Demographics,
tests, medications
height, weight,
family history
98%/100%
Cardiac
Conduction
ECG measurements
ECG report results
73%
Demographics,
97%
diagnoses,
procedure codes,
medications,
[eMERGE Network]
laboratory tests
©2012 MFMER | slide-17
Genotype-Phenotype Association Results
disease
Atrial fibrillation
Crohn's disease
Multiple sclerosis
Rheumatoid arthritis
Type 2 diabetes
marker
gene /
region
rs2200733
Chr. 4q25
rs10033464
Chr. 4q25
rs11805303
IL23R
rs17234657
Chr. 5
rs1000113
Chr. 5
rs17221417
NOD2
rs2542151
PTPN22
rs3135388
DRB1*1501
rs2104286
IL2RA
rs6897932
IL7RA
rs6457617
Chr. 6
rs6679677
RSBN1
rs2476601
PTPN22
rs4506565
TCF7L2
rs12255372
TCF7L2
rs12243326
TCF7L2
rs10811661
CDKN2B
rs8050136
FTO
rs5219
KCNJ11
rs5215
KCNJ11
rs4402960
IGF2BP2
0.5
0.5
published
1.0
2.0
Odds Ratio
The Linked Clinical Data (LCD) Project
observed
5.0
5 et al. 2010]
[Ritchie
©2012 MFMER | slide-18
Key lessons learned from eMERGE
• Algorithm design and transportability
•
•
•
•
Non-trivial; requires significant expert involvement
Highly iterative process
Time-consuming manual chart reviews
Representation of “phenotype logic” for transportability
is critical
• Standardized data access and representation
• Importance of unified vocabularies, data elements, and
•
•
value sets
Questionable reliability of ICD & CPT codes (e.g., billing
the wrong code since it is easier to find)
Natural Language Processing (NLP) needs
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-19
Algorithm Development Process - Modified
Rules
Semi-Automatic Execution
Evaluation
Phenotype
Algorithm
Transform
Mappings
Visualization
Transform
Data
NLP, SQL
High-Throughput Phenotyping from EHRs
[eMERGE Network]
©2012 MFMER | slide-20
Strategic Health IT Advance Research
Projects (SHARPn): Secondary Uses of
EHR Data
• Mission: To enable the use of EHR data for
secondary purposes, such as clinical research
and public health. Leveraging clinical and health
informatics to:
• generate new knowledge
• improve care
• address population needs
http://sharpn.org
[Chute et al. 2011]
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-21
SHARPn: Secondary Use of EHR Data
A $15M National Consortium
• Agilex Technologies
• Harvard University
• CDISC (Clinical Data Interchange • Intermountain Healthcare
Standards Consortium)
• Mayo Clinic
• Centerphase Solutions
• Mirth Corporation, Inc.
• Deloitte
• MIT
• Group Health, Seattle
• MITRE Corp.
• IBM Watson Research Labs • Regenstrief Institute, Inc.
• University of Utah
• SUNY, Buffalo
• University of Pittsburgh
• University of Colorado
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-22
Cross-integrated suite of projects
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-23
Algorithm Development Process - Modified
•
Standardized and structured
representation of phenotype
definition criteria
Use the NQF Quality Data
Model (QDM)
•
Rules
•
Conversion of structured
phenotype criteria into
executable queries
Evaluation
• Use JBoss® Drools (DRLs)
Semi-Automatic Execution
Phenotype
Algorithm
• Standardized representation of
Visualization
•
Transform
Mappings
Transform
clinical data
Create new and re-use existing
clinical element models (CEMs)
Data
NLP, SQL
High-Throughput Phenotyping from EHRs
[Welch et al. 2012]
[Thompson et al., submitted 2012]
[Li et al., submitted 2012]
©2012 MFMER | slide-24
The SHARPn “phenotyping funnel”
CEMs
Mayo Clinic
EHR
QDMs
Intermountain
EHR
DRLs
Phenotype specific
patient cohorts
[Welch et al. 2012]
[Thompson et al., submitted 2012]
[Li et al., submitted 2012]
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-25
Clinical data normalization
• Data Normalization
• Clinical data comes in all different forms even for
•
the same kind of information
Comparable and consistent data is foundational to
secondary use
• Clinical Element Models (CEMs)
• Basis for retaining computable meaning when data
is exchanged between heterogeneous computer
systems
• Basis for shared computable meaning when clinical
data is referenced in decision support logic
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-26
Clinical Element Models
Higher-Order Structured Representations
[Stan Huff, IHC]
The Linked Clinical Data (LCD) Project
©2012 MFMER | slide-27
Pre- and Post-Coordination
[Stan Huff, IHC]
The Linked Clinical Data (LCD) Project
©2012 MFMER | slide-28
High-Throughput Phenotyping from EHRs
[Stan Huff, IHC]
Data element harmonization
• Stan Huff (Intermountain Healthcare)
• Clinical Information Model Initiative (CIMI)
• NHS Clinical Statement
• CEN TC251/OpenEHR Archetypes
• HL7 Templates
• ISO TC215 Detailed Clinical Models
• CDISC Common Clinical Elements
• Intermountain/GE CEMs
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-30
SHARPn data normalization flow - I
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-31
SHARPn data
normalization flow - II
CEM MySQL database
with normalized patient
information
[Welch et| al.
2012]
©2012 MFMER
slide-32
Algorithm Development Process - Modified
•
Standardized and structured
representation of phenotype
definition criteria
Use the NQF Quality Data
Model (QDM)
•
Rules
Semi-Automatic Execution
Evaluation
Phenotype
Algorithm
• Standardized representation of
Visualization
•
Transform
Mappings
Transform
clinical data
Create new and re-use existing
clinical element models (CEMs)
Data
NLP, SQL
High-Throughput Phenotyping from EHRs
[Welch et al. 2012]
[Thompson et al., submitted 2012]
[Li et al., submitted 2012]
©2012 MFMER | slide-34
NQF Quality Data Model (QDM) - I
• Standard of the National Quality Forum (NQF)
• A standard structure and grammar to represent quality
measures precisely and accurately in a standardized
format that can be used across electronic patient care
systems
• First (and only) standard for “eMeasures”
• “All patients 65 years of age or older with at least two
provider visits during the measurement period receiving
influenza vaccine subcutaneously”
• Implemented as set of XML schemas
• Links to standard terminologies (ICD-9, ICD-10,
SNOMED-CT, CPT-4, LOINC, RxNorm etc.)
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-35
NQF Quality Data Model (QDM) - II
• Supports temporality & sequences
• AND: "Procedure, Performed: eye exam" > 1 year(s)
starts before or during "Measurement end date"
• Groups of codes in a code set (ICD9, etc.)
• Can group groups
• Represented by OIDs, requires lookup
• "Diagnosis, Active: steroid induced diabetes" using
"steroid induced diabetes Value Set GROUPING
(2.16.840.1.113883.3.464.0001.113)”
• Focus on structured data
• Would require extensions for NLP
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-36
116 Meaningful Use Phase I Quality Measures
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-37
Example: Diabetes & Lipid Mgmt. - I
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-38
Example: Diabetes & Lipid Mgmt. - II
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-39
NQF Measure Authoring Tool (MAT)
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-40
Our task: human readable  machine
computable
[Thompson et al., submitted 2012]
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-41
Algorithm Development Process - Modified
•
Standardized and structured
representation of phenotype
definition criteria
Use the NQF Quality Data
Model (QDM)
•
Rules
•
Conversion of structured
phenotype criteria into
executable queries
Evaluation
• Use JBoss® Drools (DRLs)
Semi-Automatic Execution
Phenotype
Algorithm
• Standardized representation of
Visualization
•
Transform
Mappings
Transform
clinical data
Create new and re-use existing
clinical element models (CEMs)
Data
NLP, SQL
High-Throughput Phenotyping from EHRs
[Welch et al. 2012]
[Thompson et al., submitted 2012]
[Li et al., submitted 2012]
©2012 MFMER | slide-42
JBoss® open-source Drools environment
• Represents knowledge with declarative production
rules
• Origins in artificial intelligence expert systems
• Simple when <pattern> then <action> rules
specified in text files
• Separation of data and logic into separate
components
• Forward chaining inference model (Rete algorithm)
• Domain specific languages (DSL)
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-43
Drools inference architecture
Inference Execution Model
 Define a Knowledge Base
• Compiled Rules
• Produces Production Memory
 Extract Knowledge Session from
Knowledge Base
 Insert Facts (data) into Knowledge
Session  “Agenda”
 Fire Rules (Race Conditions/Infinite
Loops)
 Retrieve End Results
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-44
Example Drools rule
{Rule Name}
rule
when
"Glucose <= 40, Insulin On“
{binding}
{Java Class}
{Class Getter Method}
$msg : GlucoseMsg(glucoseFinding <= 40,
currentInsulinDrip > 0 )
then
{Class Setter Method}
glucoseProtocolResult.setInstruction(GlucoseInstructions
GLUCOSE
_LESS_THAN_40_INSULIN_ON_MSG);
end Parameter {Java Class}
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-46
The “obvious” slide - T2DM Drools flow
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-47
Automatic translation from NQF QDM
criteria to Drools
Measure
Authoring
Toolkit
From non-executable to
executable
Drools
Engine
Measures
XML-based
Structured
representation
Data Types
XML-based
structured
representation
Value Sets
Converting measures to
Drools scripts
Mapping data types
and value sets
Drools
scripts
Fact
Models
saved in XLS
files
[Li et al., submitted 2012]
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-48
SHARPn phenotyping architecture
using CEMs, QDMs, and DRLs
[Welch et al. 2012]
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-49
The SHARPn “phenotyping funnel”
CEMs
Mayo Clinic
EHR
QDMs
Intermountain
EHR
DRLs
Phenotype specific
patient cohorts
[Welch et al. 2012]
[Thompson et al., submitted 2012]
[Li et al., submitted 2012]
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-50
Phenotype library and workbench - I
http://phenotypeportal.org
©2012 MFMER | slide-51
Phenotype library and workbench - I
http://phenotypeportal.org
1. Converts QDM to Drools
2. Rule execution by querying
the CEM database
3. Generate summary reports
©2012 MFMER | slide-52
Phenotype library and workbench - II
http://phenotypeportal.org
©2012 MFMER | slide-53
Phenotype library and workbench - III
http://phenotypeportal.org
©2012 MFMER | slide-54
Additional on-going research efforts
• Machine learning and
association rule mining
• Manual creation of
algorithms take time
• Let computers do the
“hard work”
• Validate against
expert developed
ones
[Caroll et al. 2011]
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-55
Additional on-going research efforts
• Machine learning and association rule mining
• Manual creation of algorithms take time
• Let computers do the “hard work”
• Validate against expert developed ones
• Just-in-time phenotyping
• Current approach: retrospective, longitudinal
•
•
and offline data processing for phenotypes
Future: online, real-time phenotyping by
implementing “phenotype sniffers”
Applications in active syndrome surveillance
for transfusion medicine [Kor et al. 2012]
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-56
What does this R&D mean to HSR?
• Common, agreed-upon and well-validated phenotype
definitions and criteria
• Standardized clinical data retrieval and management
• “One-stop place” for visualization, execution, and
report generation of phenotyping algorithms
• Implications for (to name a few):
• Center for Science of Healthcare Delivery (SHCD)
• Data Management Services (DMS/BSI)
• Epidemiology & Health Care and Policy Research
•
(Epi./HCPR/Rochester Epi. Project)
Mayo Clinic Biobank/Genome Consortia (MayoGC)
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-57
Summary
• EHRs contain a wealth of phenotypes for clinical
and translational research
• EHRs represent real-world data, and hence has
challenges with interpretation, wrong diagnoses,
and compliance with medications
• Handling referral patients even more so
• Standardization and normalization of clinical
data and phenotype definitions is critical
• Phenotyping algorithms are often transportable
between multiple EHR settings
• Validation is an important component
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-58
Acknowledgements: eMERGE collaborators
• Northwestern
• NHGRI
•
•
Rongling Li
Teri Manolio
• Group Health
•
• Eric Larson
• Gail Jarvik
• Chris Carlson
• Wylie Burke
• Gene Jart
• David Carrell
• Malia Fullerton
• Walter Kukull
• Paul Crane
• Noah Weston
Marshfield
• Cathy McCarty
• Peggy Peissig
• Marilyn Ritchie
• Russ Wilke
•
•
• Rex Chisholm
• Bill Lowe
• Phil Greenland
• Luke Rassmussen
• Justin Starren
• Maureen Smith
• Jen Allen-Pacheco
• Will Thompson
Mayo Clinic
• Christopher G. Chute
• Iftikhar J. Kullo
• Suzette Bielinski
• Mariza de Andrade
• John Heit
• Jyoti Pathak
• Matt Durski
• Sean Murphy
• Kevin Bruce
Vanderbilt
• Dan Roden
• Josh Denny
• Brad Malin
• Ellen Wright Clayton
• Dana Crawford
• Melissa Basford
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-59
Acknowledgement: SHARPn collaborators
• Agilex Technologies
• Harvard University
• CDISC (Clinical Data Interchange • Intermountain Healthcare
Standards Consortium)
• Mayo Clinic
• Centerphase Solutions
• Mirth Corporation, Inc.
• Deloitte
• MIT
• Group Health, Seattle
• MITRE Corp.
• IBM Watson Research Labs • Regenstrief Institute, Inc.
• University of Utah
• SUNY, Buffalo
• University of Pittsburgh
• University of Colorado
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-60
Thank You!
http://jyotishman.info
High-Throughput Phenotyping from EHRs
©2012 MFMER | slide-61