Data - PRISME Forum
Download
Report
Transcript Data - PRISME Forum
EMR Data Mining for Drug Safety:
Challenges and Opportunities
Zhaohui (John) Cai, MD, PhD
Director, Biomedical Informatics
AstraZeneca
PRISM SIG 2010
La Jolla, CA
1
Oct 19, 2010
Proprietary and Confidential © AstraZeneca 2009
FOR INTERNAL USE ONLY
Outline
• Introduction of EMR data for drug safety research
• Data sources and limitations
• Challenges
• EMR safety data mining methods
• Proposing an interdisciplinary approach
• One AZ example
• EMR data interacting with drug development data– enabling
two-way translation between clinical research and practice
2
Oct 19, 2010
Proprietary and Confidential © AstraZeneca 2009
FOR INTERNAL USE ONLY
EMR, EHR, PHR
• NAHIT definition of EMR and EHR
• EMR: The electronic record of health-related information on an individual that is
created, gathered, managed, and consulted by licensed clinicians and staff from a
single organization who are involved in the individual’s health and care.
• EHR: The aggregate electronic record of health-related information on an individual that
is created and gathered cumulatively across more than one health care organization
and is managed and consulted by licensed clinicians and staff involved in the
individual’s health and care.
• By these definitions, an EHR is an EMR with interoperability
• In reality, it’s common to see the 2 terms used interchangeably
• PHR (Personal Health Record):
• A personal health record is a digital health record that is owned, updated, and
controlled by the consumer. It contains a summary of health information from
throughout an individual's entire lifetime.
• Examples of information contained in a PHR include a record of immunizations, family
health history, personal health history (i.e., significant illnesses and surgical
procedures), significant diagnostic procedures and dates (such as mammograms), a
list of health problems, a current medication list, allergies, contact information for
physicians seen on a routine basis, and a physician visit history.
3
Oct 19, 2010
Proprietary and Confidential © AstraZeneca 2009
FOR INTERNAL USE ONLY
What are we talking about?
• EHR/EMR can be
• Hospital Information System
• Departmental Systems (laboratory, radiology, pharmacy, materials
management)
• Computerized Physician Order Entry Systems
• ePrescribing Systems
• Administrative/Financial/Billing Systems
• Ambulatory Systems
• Specialty Systems (Cardiology, OBGYN, Pediatrics, Nephrology, …)
• Data Warehouse
• Research Database
• Ideally, data must be integrated across the continuity of care and all
EHRs should be interoperable in the sense they can be overlaid, and all
data is sharable when and as needed
4
Oct 19, 2010
Proprietary and Confidential © AstraZeneca 2009
FOR INTERNAL USE ONLY
EMR vs. SRS for drug safety research
• SRS: spontaneous reporting systems, which are database resources containing
millions of voluntarily submitted reports of suspected ADEs occurring during
regular clinical practice
• Current mainstay within pharamacovigilance: typically mined for statistical drug-event
associations to screen for unknown potential ADEs that are then clinically validated and
flagged for continued monitoring
• Major SRSs: FDA AERS and WHO Programm for International Drug Monitoring
• Well recognized limitations: under reporting, over reporting due to media influences, subjective
diagnoses by the reporter of the event, uneven levels of granularity used to describe or encode the
drugs and events, duplicity of reporting for the same patient and event, missing data, typographical
errors, confounding issue, lack of denominator or information on exposure
• EMR/EHR
• Advantages: earlier detection of ADEs, potential for active and real time surveillance, the
absence of most of the reporting biases attributed to SRS, knowledge on #patients exposed
(in the dataset) and thus have a denominator to assess co-morbidity, co-medication, etc.
• Challenges:
• Unstructured narratives are unsuitable for a direct application to pharamacovigilance.
• Confounding issue requires the examination of a much wider range of possible drug-event
associations, the majority of which are completely unrelated or associated only because of
confounding
5
Oct 19, 2010
• Same biases still exist as in SRS to some extent
Proprietary and Confidential © AstraZeneca 2009
FOR INTERNAL USE ONLY
EMR vs. Claims data
• Medical and pharmacy claims
• Benefits: captures real-world utilization patterns; encompass a wealth of
variables and analyses of these data can be used for benchmarking purposes
• Challenges: lag time in the availability of information about new therapies;
does not capture clinical experience; data limited to patients with adjudicated
claims; data limited to insured populations
• EHR/EMR data
• Benefits: data available to reflect more complete care experience; data can
be analyzed in an ongoing basis for populations under care; may improve
depth and breadth of outcomes studies; used with e-prescribing can reduce
adverse drug events, medical errors and redundant tests
• Challenges: converting paper-based systems to electronic collecting and
storing data in a standardized format; Certification to ensure security and
privacy of EMR systems; interoperability; slow adoption; limited populations,
e.g. general practitioners or hospital data
6
Oct 19, 2010
Proprietary and Confidential © AstraZeneca 2009
FOR INTERNAL USE ONLY
EMR vs. PMS systems
• PMS – Practice Management System – a software program and
database system that processes billing and scheduling information for
physicians and hospitals
• Administrative data sources created primarily to support reimbursement
• There are many well-recognized limitations with the use of administrative data
sets compared to comprehensive EMR data
• The widespread availability of administrative data make them the most widely
used source of internal and comparative quality indicators
• EHR / EMR systems– a software program and database system that
stores medical record information about a patient’s health
• Contain detailed clinical data that are not contained in administrative data
sets. The availability of more clinically relevant data in electronic queryable
format represents a new source of data that can be leveraged without the
expense of manual chart abstraction.
• The American Recovery and Reinvestment Act contains explicit language
linking the ‘meaningful EHR user’ to the ability to capture and report clinical
quality measures.
7
Oct 19, 2010
Proprietary and Confidential © AstraZeneca 2009
FOR INTERNAL USE ONLY
PMS data content
• Demographics – Similar to EHR demographics, gender, year of birth and city (or
3-digit postal code) are common. Race is very rarely captured.
• ICD9 (International Classification of Diseases, 9th Revision) – PMS Systems
capture ICD9 codes as a regular method of billing to insurance companies,
Medicare and Medicaid.
• Coding could be frequently exaggerated in order to obtain higher billing
reimbursement.
• Some diagnoses are omitted deliberately so as to protect the patient from
insurance company blacklisting.
• when dealing with ONLY PMS ICD9 codes, the research team should be
vigilant about data abnormalities and irregularities and treats the data
accordingly
• CPT4 (Current Procedural Terminology, 4th Edition) – All ambulatory PMS
systems employ the CPT4 coding system for medical billing.
• Coding what services and levels of service were performed by the provider
• highly accurate as they are a direct reflection of the work done by the provider
8
Oct 19, 2010
Proprietary and Confidential © AstraZeneca 2009
FOR INTERNAL USE ONLY
EHR/EMR data content
• Demographics – Critical data includes gender, the year of birth (without month or
day for privacy reasons) and the city or first three digits of postal code. Race is
very valuable if captured
• ICD9 – Very useful data, when entered as part of the EHR data capture
• CPT4 – Secondary data and not often captured. Valuable to determine what was
ordered (labs, pathology, and radiology) and what procedures were administered
• Vitals – Height + Weight (rendering BMI) are highly desirable. Blood pressure
readings are also of primary importance
• Problems – “free-form” problems such as “constipation” as written by the nurse
or in the patient’s own words. Hard to standardize unless ICD9 code is used
• Lab Results – Extremely important in pharmacovigilance and readily available in
huge quantities.
• Lab Orders – Can be derived from HL-7 lab data (the “ORC” segment) and
alternatively can be obtained from the EHR order entry system
• Pathology Results – Values that return “positive” and “negative” are most
valuable to pharmacovigilance. Values that return verbose narrative are hard to
incorporate into data analysis unless NLP is employed
9
Oct 19, 2010
Proprietary and Confidential © AstraZeneca 2009
FOR INTERNAL USE ONLY
EMR/EHR data content -- continued
• Radiology Results
• Text – Very valuable data only when processed through NLP to render discrete nomenclature
values
• Images – Not useful by themselves. Only useful when drilling down to study significant adverse
events detected by signal detection
• Immunizations – the administration of vaccines and immunizations as a single binary
event (given/not given) are the key factors of value in this data
• History
• Text – most common form and valuable only when processed through NLP to render discrete
nomenclature values
• Structured Data from Templates – tremendously valuable in pharmacovigilance, only if those
discrete values can be captured and interpreted correctly.
• Chart Notes
• Text – Same rule as “History, Text”
• Structured Data from Templates – Same consideration as “History, Structured Data from
Templates.”
• Attachments – Attachments are typically stored as (1) images or (2) documents with
searchable text. Document can be processed through NLP and rendered into discrete
nomenclature points which would be very valuable for pharmacovigilance
10
Oct 19, 2010
Proprietary and Confidential © AstraZeneca 2009
FOR INTERNAL USE ONLY
Challenges for safety data mining
• Data source limitation
• Small populations in some cases (depending on vendors or individual systems)
• Legacy data migration is a major bottleneck
• Missing data type (e.g. no lab results in claims databases)
• Text data needs to be converted to discrete data
• NLP (Natural Language Processing): converting text from a History and Physical or Discharge
Summary dictation note into discrete data values is of significant interest in signal detection
• Example: MedLEE™ (Medical Language Extraction and Encoding).
• Data itself
• Varied quality (e.g. ICD9 codes from a PMS system)
• Lack of data standards
• Different coding systems/nomenclatures (e.g. National Drug Code, SNOMED, UMLS MEDCIN)
• Hypothesis generation vs. testing
• What is a signal or how to define signal thresholds?
• What is a drug-related AE?
• Statistical significance vs. clinical significance
• Confounding factors: co-medication, co-morbidities, indication, or any combination of the three
Proprietary and Confidential © AstraZeneca 2009
• What analysis methods to use
11
Oct 19, 2010
FOR INTERNAL USE ONLY
Safety data mining methods at individual level
• Identifying individual ADEs
• Case report and case review
• Rule-based: ICD-9 classification rules, allergy rules, drug–laboratory rules
(Honigman et al., 01)
• NLP/Text mining: Events were identified to include all possible drug names
and adverse effect combinations, using a controlled vocabulary of medical
concepts and drug terminology that allows multiple relationships between
multiple medical terms and events(Honigman et al., 01)
• Deviation detection: looks for outliers or values that deviate from the norm
(e.g. FDA guideline on DILI, 2009) and can be seen either graphically or
statistically (FDA eDISH tool)
• Individual patient timeline, i.e. temporal relationship, needed to establish
drug-event pairs for all the above
12
Oct 19, 2010
Proprietary and Confidential © AstraZeneca 2009
FOR INTERNAL USE ONLY
Safety data mining methods at population level
• Measures of disproportionality
• Frequentist methods: the disproportional reporting rate signal threshold
against epidemiology- based rates should be considered
• Proportional reporting ratio (PRR) and Reporting odds ratio (ROR)
• The most common published threshold is when PRR>2, and the number of reports
or cases N>3. Additional criteria can include a statistical strength of Chi square and
if used this is often set at Χ2>4 (Deshpande, Gogolak et al.)
• Most useful for initial assessments particularly with newer drugs and to monitor
changes in proportional reporting rate over time
13
Oct 19, 2010
Wilson et al., 2004
Br J Clin Pharmacol 57:2 Proprietary and Confidential © AstraZeneca 2009
FOR INTERNAL USE ONLY
Safety Data Mining Methods at population level
• Bayesian methods
• Bayesian Confidence Propagation Neural Network (BCPNN) – WHO UMC
• Empirical Bayes Geometric Mean (EBGM), based on Multi-item GammaPoisson Shrinker (MGPS) – FDA
• The specificity of these methods is very good and can be configured to
evaluate drug-drug interactions and complex induced medical syndromes,
but they do have lower sensitivity.
• In retrospective comparisons of different methods, the frequentist
methods almost always consistently triggered an alert sooner than the
Bayesian methods (Hauben and Reich 2004; Hauben, Reich et al. 2006;
Chen, Guo et al. 2008).
• Statistical testing
• Statistical hypothesis tests such as the chi-squared test and Fisher’s exact
test,
• Used to test the hypothesis of independence between a pair of drug and
event
14
Oct 19, 2010
Proprietary and Confidential © AstraZeneca 2009
FOR INTERNAL USE ONLY
Other methods
• Predictive modeling
• Classification: develop a model to relate a dependent variable (AE) with a set
of independent variables (drug, dose, age, gender, etc.) and predict group
membership (w/ or w/o the AE) of new records based on their characteristics
(the independent variables).
• Regression: value prediction for continuous dependent variables based on a
set of independent variables
• Clustering:
• Reduce a large sample of records to a smaller set of specific homogeneous
subgroups (clusters) without losing much information about the whole sample.
• Hypothesis generation (clustering by symptoms or diagnoses to see if there is
a drug association)
• Both designed to deal with very large data sets, that contain
many more variables (predictors) than observations, just like safety databases or
EMR systems in which the thousands of drugs, conditions, or events that exist
and can be analyzed simultaneously for drug-event associations.
15
Oct 19, 2010
Proprietary and Confidential © AstraZeneca 2009
FOR INTERNAL USE ONLY
Approaches to address confounding
• Stratification approaches
• Stratification and Mantel-Haenszel test statistics
• Effective for addressing confounding in large sample sizes and small
number of confounding variables.
• In other cases (e.g. safety databases or EHR systems) they are not
as effective
• Regression (McNamee R 2005) or classification models
• Allow for the evaluation of several risk factors simultaneously
• Incorporating potential confounding variables into the model: the value
of a dependent variable (e.g., the presence of an AE) is explained by a set of predictor
variables (e.g., different drugs or conditions), each with its own degree of contribution
• Controlling for possible confounders: the effect or influence of the confounding
variables on the predictor variables could be assessed to determine whether or not the
relationship between the dependent and predictor variables is influenced by the
confounders
16
Oct 19, 2010
Proprietary and Confidential © AstraZeneca 2009
FOR INTERNAL USE ONLY
Proposing an inter-disciplinary approach
Define appropriate
questions (by PSE)
Identify appropriate
data sources (by PSE, Ix)
Choose
EMR data*
NLP/Text
mining (by Ix)
• Mapping
vocabularies
• Standardize
data format
• Integrate
databases
Choose claims databases*,
PMS*, or SRS
Disproportionality
measures (by Epi, Stats)
• Frequentist methods
• Bayesian methods
Modeling
Statistical testing (by Epi, Stats)
(by Stats, Ix):
• Chi-square
• Fisher’s exact
regression/
classification
Hypothesis generation (by PSE)
Hypothesis testing (by PSE, Epi, Stats…):
RCT or observational studies
17
Oct 19, 2010
PSE: patient/drug safety experts
(physians or scientists), including
pharmacovigilance scientists
Stats: Statisticians
Ix: Informatics scientists
Epi: Epidemiology scientists
and Confidential © AstraZeneca 2009
* Need one or more external dataProprietary
partners,
and it’s possible to use them in combination
FOR INTERNAL USE ONLY
AZ-NWeH collaboration
• NWeH
• Formed as a collaboration between
• University of Manchester
• Salford Royal Foundation Trust
• Salford Primary Care Trust
• Combines the University's strength in bio-health informatics and
technology innovation with Salford NHS's strength in front-line clinical
informatics and the integration of primary and secondary care
• DILI study team working with AZ: physicians, informatics scientist,
statisticians, etc.
• AZ multi-disciplinary team:
• Informatics: Clinical Informatics and Discovery Information
• Hepatotoxicity Knowledge Group (safety physicians and scientists)
• Statistics
• Epidemiology
18
Oct 19, 2010
Proprietary and Confidential © AstraZeneca 2009
FOR INTERNAL USE ONLY
AZ-NWeH DILI Study: aim and objectives
Aim
• To explore the feasibility of using naturalistic cohort data from the NHS,
through linkage of liver function test (LFT) records, to study drug induced
liver injury (DILI) at the population level
Objectives
• Identification of eligible data-sources
• Linkage of health records
• Identification of case-cohorts
• Detailed analysis of a case-cohort
• Process Improvement for data collection, collation and analysis within a
healthcare firewall
19
Oct 19, 2010
Proprietary and Confidential © AstraZeneca 2009
FOR INTERNAL USE ONLY
AZ-NWeH DILI Study: challenges & opportunities
Challenges
• Infrastructure/systems building: EHRs, research databases
• Data standard and controlled vocabularies: need to review the rubrics from Salford GP
data, for diagnosis, lab tests, etc.
• Observation period selection: depending on data availability (#patient years) and lab
test frequencies
• Characterising the Salford data in order to understand what questions can be
meaningfully addressed
Opportunities
• Process Improvement for data collection, collation and analysis within a healthcare
firewall
• Building a large longitudinal datasets from primary care for research purpose,
combining diagnoses, lab tests, and prescriptions relevant to liver signals
• Developing enhanced metrics and tools to “appraise” a data set. i.e. an information
score to summarize the frequency and regularity of testing as a more general measure of
the “longitudinal strength” of an EHR dataset
• Establishing “baseline” incidence rates of liver signals for a few real-world disease
populations and examining their changes with prescriptions
20
Oct 19, 2010
Proprietary and Confidential © AstraZeneca 2009
FOR INTERNAL USE ONLY
Two-way translation b/w clinical research and practice
Mining of real world data
• CER (including
comparative safety)
• HTA
• PHC
• Pharmacovigilance
Discovery
21
Oct 19, 2010
Preclinical
Development
Early Clinical
Development
Late Clinical
Development
Proprietary and Confidential © AstraZeneca 2009
FOR INTERNAL USE ONLY
Product
LCM
Acknowledgements
• Anders Ottosson, Patient Safety, AstraZeneca
• Kaushal Desai, Biomedical Informatics, AstraZeneca
• James Weatherall, Biomedical Informatics, AstraZeneca
Confidentiality Notice
This file is private and may contain confidential and proprietary information. If you have received this file in error, please notify us and remove it
from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorized use or disclosure of the
contents of this file is not permitted and may be unlawful. AstraZeneca PLC, 15 Stanhope Gate, W1K 1LN, London, UK, Tel: +44(0)20 7304 5000,
Fax: +44 (0)20 7304 5151, www.astrazeneca.com
22
Oct 19, 2010
Proprietary and Confidential © AstraZeneca 2009
FOR INTERNAL USE ONLY