Discovering Novel Adverse Drug Events Using Natural
Download
Report
Transcript Discovering Novel Adverse Drug Events Using Natural
Discovering Novel Adverse Drug
Events Using Natural Language
Processing and Mining of
Electronic Health Records
Carol Friedman, PhD
Department of Biomedical Informatics
Columbia University
July 21 - AIME 2009
Motivation: Severity of Problem
• Clinical trials do not test a broad population
• Adverse Drug Events (ADEs) world-wide
problem
• *Expense from ADEs is $5.6 billion annually
• *Estimated that over 2 million patients
hospitalized due to ADEs
• *ADEs are fourth leading cause of death
*In US alone
July 21 - AIME 2009
Motivation: Limitations of Approaches
• Manual review of case reports (Venulet J 1988)
• Spontaneous reporting to designated agency
(Evans JM 2001; Eland IA 1999; Wysowski DK 2005)
–
–
–
–
–
Serious ADEs reported less than 1-10% of time
Reporting is voluntary for physicians/patients
Recognition of ADEs is highly subjective
Difficult to determine cause of ADE
Biased by length of time on market and other
factors
– Cannot determine number of patients on drug or
percent at risk
• Drug prescribing/claims data (Hershman D 2007; Ray
WA 2009)
July 21 - AIME 2009
Severity of Under Reporting
Study showed 87% of time physicians
ignored patient reports of known ADEs
(Golumb et al. Physicians response to patient reports of
adverse drug effects. Drug Safety 2007)
July 21 - AIME 2009
Related Work
• Automated methods mainly based on
spontaneous reporting databases
– Most methods use (Evans SJ 2001; Szarfman A 2002)
• Surrogate observed-to-expected ratios
• Incidence of drug-event reporting compared to background
reporting across all drugs and events
• Some research aimed at improving
effectiveness of SPR databases
– Create ontology of higher order adverse events
• MedDRA
– Avoid fragmentation of signal
July 21 - AIME 2009
Related Work
• Pharmacoepidemiology databases used to
confirm suspicions
– General practice research database (GPRD) (Wood
& Martinez 2004)
– New Zealand Intensive Medicines Monitoring
(IMMP) (Coulter 1998)
– Medicine Monitoring Unit (MEMO) (Evans et al. 2001)
• EHR databases used to find signals (Brown JS et
al. 2007; Berlowitz DR et al. 2006; Wang X et al. 2009)
– Mainly coded data used
– Has potential for active real time surveillance
– Should reduce biased reporting
July 21 - AIME 2009
Related Work
• Consortiums involving multiple EHRs
– EU-ADR project (http://www.alert-project.org/)
– eHealth initiative
(http://www.ehealthinitiative.org/drugSafety/)
• Related work using EHR to detect known
ADEs – not aimed at discovering novel ADEs
(Bates DW 2003; Hongman B 2001)
July 21 - AIME 2009
Exploiting the Electronic Health Record
Text notes
primary
care
D
A
T
A
inpatient
progress
specialties
Applications
admit
history
Labs
bun
83
inr
1.3
hct
22
…
…
Centralized
Data
NLP +
Integration
Orders
lasix
…
pepcid
…
…
…
July 21 - AIME 2009
Executable
Data
•Decision support
•Patient Safety
•Acquire knowledge
•Discovery
•Guidelines
•Surveillance
•Patient management
•Clinical Trial
recruitment
•Improved
documentation
•Quality assurance
The Electronic Health Record (EHR)
• Rich source of patient information
• Mostly untapped
• Primary use for EHR
– Documenting care in multi-provider environment
– Manual review by providers
• More complete than coded ICD-9 codes
– Symptoms
– Clinical conditions not beneficial for billing
• Fragmented
• Heterogeneous
• Noisy
July 21 - AIME 2009
Research Opportunities: NLP Issues
• Occurrence of clinical events in natural language
– Drugs, diseases, symptoms
– Temporal information is critical
• Irregularity of reports
– Section headings important but abbreviated/missing
– Use of indentation, lists, run on sentences
– Tables & semi-structured data in reports
• Abbreviations
– 2/2 meaning secondary to
– co meaning cardiac output or complaining of
• Mapping terms in text to an ontology/controlled
vocabulary
– infiltrate in chest x-ray means chest infiltrate
Julylimited
21 - AIMEthan
2009 language
– ontology terms more
Research Opportunities: Statistical Issues
• Find associations between drug, symptoms,
and diseases
– Not explicit in EHR
• Large volumes of data
– Statistical significance vs. clinical significance
• Statistical associations – not relationships
– Drug treats condition / Drug causes condition
• Integrating time sequences is important
– For treats: condition must precede drug event
– For causes: drug event must precede condition
July 21 - AIME 2009
Research Opportunities: Statistical Issues
• Confounding (indirect associations)
– Metolazone treats heart failure (HF)
– HF is manifested by shortness of breath (SOB)
– Metolazone and SOB indirectly related
• Higher order associations
– Drug interactions: Drug1, drug2, condition
– Drug-contraindications: Drug, disease, condition
• Rare ADEs
July 21 - AIME 2009
Other Research Opportunities:
Knowledge Acquisition
• Structured Knowledge bases
– UMLS relations (may_be_treated_by)
– Proprietary ones – usually unavailable
• Text/Semi-Structured Knowledge (need NLP)
– Spontaneous reporting databases: indications,
drugs, adverse events
– Literature (Medline)
– Web sites (WebMD, Micromedix)
– Online medical textbooks
– Claims Data (Health IT payors)
July 21 - AIME 2009
Text Mining for Knowledge Acquisition
• Statistical methods: co-occurrences
– Discovered associations between diseases and
diets from literature (Weeber M 2002)
– Identified disease candidate genes ( Hristovski D 2005)
• NLP systems
– Trends in medications based on the literature and
narrative clinical reports (Chen ES 2007, 2008)
– Semantic relations in the literature (Hristovski D 2006)
July 21 - AIME 2009
Overview of Our NLP-EHR based
Pharmacovigilance System
Narrative
records
Coded
data
MedLEE NLP
Standardize &
integrate
EHR
Selecting &
filtering
Detect
associations
Medical
knowledge
Eliminate
confounding
July 21 - AIME 2009
ADE Signals
Natural Language Processing of EHR
Narrative
records
Coded
data
MedLEE NLP
Standardize &
integrate
EHR
Selecting &
filtering
Detect
associations
Medical
knowledge
Eliminate
confounding
July 21 - AIME 2009
ADE Signals
Meds:
Tegretol xr
Zocor
All:
Several sz meds
PMHx:
sz d/o - well controlled on tegretol
high chol - on zocor
CAD - 60% lesion in LADM by cath
MR - secondary to mitral prolapse
PSHx:
rib fx in 2001, shoulder fx secondary to trauma
Vitals: 130/80 12 80
A/P: 54 y/o m with mult med problems, all relatively well controlled. Pt sz
free, not anemic as of 2/2003. Concerned of MR and its possible long
term effects.
July 21 - AIME 2009
Coded Output from NLP
med:tegretol xr
sectname>> report medication item
code>> UMLS:C0592163_Tegretol XR
med:zocor
sectname>> report medication item
code>> UMLS:C0678181_Zocor
.........
problem:mitral valve regurgitation
sectname>> report past history item
code>> UMLS:C0026266_Mitral Valve Insufficiency
……..
problem:rib fracture
date>> 2001
sectname>> report past history item
July 21 - AIME 2009
Coding Issues
• Not all conditions have codes
– Non-communicative
• Some conditions are combinations of codes
– Difficulty sleeping
– Vascular injury
• Granularity of coding system
– Many different codes for a concept
Asthma: asthma exacerbation, asthma disturbing
sleep, moderate asthma, suspected asthma, …
July 21 - AIME 2009
Standardizing Coded Data
Narrative
records
HCT:20
Coded
data
MedLEE NLP
Standardize &
integrate
C0744727:
low hematocrit
EHR
Selecting &
filtering
Detect
associations
Medical
knowledge
Eliminate
confounding
July 21 - AIME 2009
ADE Signals
Standardizing Coded EHR Data:
Laboratory Tests and Medications
• Lab values denoting normal/abnormal vary
– Abnormal range may depend on age, sex, ethnicity, weight
– Change in lab values and duration must be considered
• Standardizing medications is complex & requires
additional knowledge
– Tradename to generic (Avandia rosaglitazone)
– Handling of combination medications
• 1.5% Lidocaine with 1:200,000 Epinephrine
– Handling of dose & Route
• Diazepam 2 MG Oral Tablet
July 21 - AIME 2009
Selecting and Filtering
Narrative
records
Coded
data
MedLEE NLP
Standardize &
integrate
EHR
Selecting &
filtering
• Select using UMLS classes
(diseases, medications)
Filter out:
•negations, past info, …
• wrong time order
Detect
associations
Medical
knowledge
Eliminate
confounding
ADE Signals
July 21 - AIME 2009
Selecting and Filtering
• Dependence on accuracy of semantic
classification
– UMLS classification errors
- Finding: birth history, cardiac output, divorce
+ Finding: cardiomegaly, fever
• Temporal information difficult to obtain
– An adverse drug event should only follow drug event
– Processing of explicit time information is complex and vague
• Yesterday, last admission, 2/5
– Information typically occur in reports without dates
July 21 - AIME 2009
Detect Associations
Narrative
records
Coded
data
MedLEE NLP
Standardize &
integrate
EHR
Selecting &
filtering
Detect
associations
Medical
knowledge
Eliminate
confounding
July 21 - AIME 2009
• Obtain event frequencies
•Co-occurrence frequencies
•Form 2x2 tables
•Calculate associations
ADE Signals
Detect Associations
• Correct temporal sequence is critical
– Drug event should precede adverse event
– Dates are not usually stated along with events
– Section of reports helpful surrogate
• Statistical associations correspond to different
clinical relations
– For pharmacovigilance:
• Want drug causes adverse event
• Confounding caused by dependencies in data
July 21 - AIME 2009
Confounding Interdependencies
Disease
Manifested by
Treats
Drug
Cause_ADE
July 21 - AIME 2009
Adverse
Event
Confounding Interdependencies
HD
SOB
ML
ML: Metolazone; HD: Hypertensive Disease; SOB: Shortness of Breath
July 21 - AIME 2009
Drug Associations Network
Rx1-n
ADE
treatment
Rx
ADE
Sx
association
Sx1-n
process
treatment
process
Dx1-n
Dx
association
July 21 - AIME 2009
Reduce Confounding
Narrative
records
Coded
data
MedLEE NLP
Standardize &
integrate
EHR
Selecting &
filtering
Detect
associations
Medical
knowledge
Eliminate
confounding
July 21 - AIME 2009
ADE Signals
Reduce Confounding
• Collect knowledge from external sources and
associations
– Drug-treat-disease
– Disease-manifested by-symptom
– Drug-interacts with-drug
• Use Information theory
– Mutual Information (MI)
– Data processing inequality
MI3 < (MI1, MI3)
Disease
MI2
MI1
Drug
MI3
July 21 - AIME 2009
Adverse
Event
Initial Study: Methods
•
6 drugs chosen
–
–
–
•
•
•
•
•
Ibuprofen, Morphine, Warfarin: longtime on market with
known ADEs
Bupropion, Paroxetine, Rosiglitazone: ADEs discovered after
2004
1 drug class: ACE inhibitors
25,074 textual discharge summaries in 2004 from
NYPH processed using MedLEE NLP
Reference standard created using expert knowledge
sources
Drug-potential ADE pairs determined
Recall/precision calculated
Qualitative analysis performed to classify drugpotential ADE pairs detected
July 21 - AIME 2009
Initial Study: Results
• Quantitative
– recall (.75), precision (.30)
• Qualitative analysis: potential drug-ADE pairs
a. Known drug-ADEs: 30%
b. Drug-indication pairs: 30%
c. Remote drug-indication pair: 33%
d. Unknown clinical associations: 6%
July 21 - AIME 2009
Confounding Interdependencies
Disease
Disease2
Manifested by
Treats
Drug
Cause_ADE
July 21 - AIME 2009
Adverse
Event
Study 2: Reduction of Confounding
• Evaluation set
• 14 associations related to 2 drugs from Study 1
• Reference standard
• Drug-ADE associations determined and MI, DPI used
to automatically classify them
Drug-ADE Relation
Direct
Side effects of the drug (Rosiglitazone-headache)
Indirect
Conditions related to the disease/symptoms the drug
treats (Metolazone-shortness of breath)
Either
Conditions in both ‘direct’ and ‘indirect’ categories
(Rosiglitazone-chest Pain)
July 21 - AIME 2009
Results
• Precision
• 0.86 when handling confounding
• 0.31 when without handling confounding
July 21 - AIME 2009
Discussion: Limitations
& Future Directions
• Mutual information only strategy to handle
confounding
– More complex MI strategy will be explored
– Other statistical/knowledge based methods will be explored
• Inpatient data only/sicker patient population
– The same methods could be used for outpatient data as well possibly more noisy
• Drug dosage, drug-drug and more complex
interactions should be explored
July 21 - AIME 2009
Discussion: Limitations
& Future Directions
• Small evaluation data set
– More comprehensive evaluation
• Limitations inherent from NLP, coding,
association detection
• Limitations due to fragmented/incomplete
patient data
July 21 - AIME 2009
Summary
• Need for more pharmacovigilance research
– Based on the EHR
– Using available databases and text
• Studies demonstrated promising results
• Many interesting research opportunities
–
–
–
–
–
Natural language processing
Statistical methods
Integrating different sources of data
Gathering knowledge from different sources
Automated knowledge acquisition for evidence
based medicine
July 21 - AIME 2009
Acknowledgement
• NLP Data Mining group at DBMI at Columbia
–
–
–
–
–
–
–
–
George Hripcsak
Marianthi Markatou
Herb Chase
Xiaoyan Wang
David Albers
Jung-wei Fan
Lyudmila Shagina
Noemie Elhadad
• Grants
–
–
–
–
R01 LM007659 from NLM
R01 LM008635 from NLM
R01 LM06910 from NLM
5T15LM007079 from NLM training grant
July 21 - AIME 2009
QUESTIONS
THANK YOU!
July 21 - AIME 2009