presentation_6-7-2010-9-48-41

Download Report

Transcript presentation_6-7-2010-9-48-41

Towards Real-time Safety Monitoring of Medical
Products
Xiaochun Li
MBSW May 24, 2010
4/13/2015
Page 1
BACKGROUND
• In the fall of 2007, Congress passed the FDA Amendments Act
(FDAAA), mandating FDA to establish an active surveillance
system for monitoring drugs, using electronic data from
healthcare information holders. The Sentinel Initiative is FDA’s
response to that mandate. Its goal is to build and implement a
new active surveillance system that will eventually be used to
monitor all FDA-regulated products.
• Goal - to create a linked, sustainable system -- the Sentinel
System--that will draw on existing automated healthcare data
from multiple sources to actively monitor the safety of medical
products continuously and in real-time
4/13/2015
Page 2
Real time Sentinel System with
healthcare data from multiple sources
entails
• Standardized data structure – a
common data model (CDM)
• Analytical methods that run on CDMs
4/13/2015
Page 3
Observational Medical Outcomes
Partnership (OMOP)
A public-private partnership to serve the public health by testing whether multi-source
observational data can improve our ability to assess drug safety and benefits. The design
was developed through a Public-Private Partnership among industry, FDA and FNIH.
• OMOP Objectives
– To determine the feasibility of assembling the required data into an
infrastructure that enables active, systematic monitoring of observational
data
– To determine value of using observational data to identify and evaluate
the safety and benefits of prescription drugs, as a supplement to currently
available tools
– To test required governance structures
Page 4
Testing data models:
OMOP data community
OMOP Extended Consortium
OMOP Research Core
Humana
Regenstrief
Research Lab
Partners
HC
Centralized data
Thomson Reuters GE
i3 Drug
Safety
SDI
Federal
partners
Distributed Network
Page 5
Common Data Model
• The common data model includes:
– A single data schema that can be applied to disparate data types
– Standardized terminologies
– Consistent transformation for key data elements
• A common data model can:
– Enable consistent and systematic application of analysis methods
to produce comparable results across sources
– Create a community to facilitate the sharing of tools and practices
– Impose data quality standards
– Create implementation efficiencies
Page 7
Common data model
Using standardized terminologies for
representing
• drugs
• conditions
• procedures
4/13/2015
Page 8
Observational Medical Dataset Simulator:
OSIM
•
•
•
•
Capable of generating 1 to 100,000,000+ persons
Two types of output files:
– Simulated Drug & Condition Files: including attributes used to model
confounding (provides “answer key” for analytic research)
– Hypothetical Person Files: longitudinal record of drug exposures and
condition occurrences
Data characteristics and confounding controlled by input probability
distributions
– Confounding variables age, gender, race, indication introduced as risk
factors for select drugs & conditions
– Default distributions produced from analysis of real observational data; can
be modified by user
Format of Hypothetical Person Files conforms to OMOP Common Data
Model
Implementation by ProSanos Corporation
Page 10
Present Status
•
OMOP Research Core completed transformation of 5 central
databases into common data model
–
–
–
–
–
•
OMOP Research Team made publicly available:
–
–
–
–
–
•
Thomson MedStat- Commercial
Thomson MedStat- Medicare
Thomson MedStat- Medicaid
Thomson MedStat- Lab
GE Centricity
Final Common data model specification document
Program code for instantiating common data model tables
Transformation documentation and source code for central datasets
Procedure code for constructing eras from drug and condition tables
Standardized terminology and source mapping tables (ICD9->MedDRA)
OMOP community (Distributed Partners, Federal Collaborators,
Extended Consortium) have implemented or are implementing common
data model to their data sources
– Feedback lessons learned
– Contribute to open-source library of tools for data transformation
•
All analysis methods have been developed for the common data model
Page 11
OMOP Methods Development
OMOP analysis domains
Hypothesis Generating
Identification of
non-specified
conditions
Hypothesis Strengthening
Evaluation of a
drug-condition
association
Monitoring of
Health
Outcomes of
Interest
Identification of non-specified associations: This exploratory analysis
aims to generate hypotheses from observational data by identifying
associations between drugs and conditions for which the relationships
were previously unknown. This type of analysis is likely to be considered
an initial step of a triaged review process, where many drug-outcome pairs
are simultaneously explored to prioritize the drugs and outcomes that
warrant further attention.
Monitoring of Health Outcomes of Interest: The goal of this surveillance
analysis is to monitor the relationship between a series of drugs and
specific outcomes of interest. These analyses require an effective
definition of the events of interest in the context of the available data.
Page 13
Methods development
Analysis Method
Epidemiology designs
Cohort
Case-control
Case-crossover
Self-controlled case series
Sequential methods
Maximized sequential probability ratio test
Conditional Sequential Sampling Procedure
Disproportionality Analysis
Proportional reporting ratio
Multi-item Gamma Poisson Shrinker
Bayesian screening
Bayesian confidence propagation neural network
Adjusted residual score
Other methods
Local Control
Tree-based scan statistic
Statistical relational learning
Bayesian Logistic Regression
Information-theoretic similarity measure
Temporal pattern discovery
Other analytical considerations
Propensity score adjustment
False discovery rate
Matching and stratification
Page 14
Methods testing strategy:
Monitoring of Health Outcomes of Interest
• Each method is implemented in the OMOP Research Lab
against the central databases
• Method feasibility will be tested across the OMOP data network
• Methods performance tested two ways
– Identifying drug-condition associations within an entire
observational dataset
– Identifying drug-condition associations as data accumulates
over time
• Evaluation focuses on degree to which method maximizes ‘true
positives’ while minimizing ‘false positives’
• Monitoring of Health Outcomes of Interest studies for each
method will explore 10 HOIs for 10 drugs (100 experiments per
data cut)
Page 15
Drug-HOI Pairs
Drug/class
Health Outcome of Interest
ACE inhibitors
Angioedema
ACE inhibitors
Hospitalization (including readmission and
mortality)
Amphotericin B
Renal failure
Antibiotics: erythromycins, sulfonamides,
and tetracyclines
Acute liver injury (symptomatic hepatitis)
Antiepileptics: carbamazepine, valproic
acid, and phenytoin
Aplastic anemia
Benzodiazepines
Hip fracture
Beta blockers
Mortality after MI
Bisphosphonates: alendronate
GI ulcer hospitalizations
Tricyclic antidepressants
Myocardial infarction
Typical antipsychotics
Myocardial infarction
Warfarin
Bleeding
Page 16
HSIU
Highthroughput Safety-screening by IU
IU OMOP Method Team
Siu Hui
Xiaochun Li
Changyu Shen
Yan (Cindy) Ding
Deming Mi
Challenges



The hypothesis generation of testing all by all
(e.g., 4000x5000) drug-condition associations in
large databases (eg 10 million patients) presents a
unique challenge
A practically useful approach will need to balance
accuracy and efficiency
False positive control is important
Proposed approach





A cohort analysis perspective
Selection of controls
Two versions of “event”
Confounding adjustment
False positive control
Count and intensity based analyses
Count based
Condition
Present
Condition
Absent
Total
Intensity based
Exposed Unexposed Total
a
b
a+b
c
d
c+d
a+c
b+d
N
Association can be assessed by
Chi-square, Odds ratio, relative risk and
risk difference
Condition
present
Length of
exposure
Exposed
a
Unexposed
b
Tota0
a+b
L1
L0
L1+L0
Association can be assessed by
Chi-square, intensity density ratio and
Intensity density difference. Note for
unexposed, the length of exposure is
the sum of exposure of all drugs
Selection of controls

The control group - subjects who did not take the
medication being studied and had at least one other
medication
–
The exposed and control groups are more comparable
Likely to reduce false positive
–
Substantially increase computation cost
–
 Alternative is to include everyone as control, i.e. the
population norm
Definition of event (exposed)
The “in” version
• The event Y occurs
during any exposure
period of drug A
The “after” version
• The event Y occurs after
the first prescription of
drug A
A
A
A
Y
A
1
A
A
A
Y
A
1
A Y
A
A
Y
A
1
A Y
A
A
Y
A
1
A
A Y A
A
0
A
A Y A
A
1
A A
A
0
Y A
A A
A
0
Y A
Time
Time
Definition of event (control)
The “in” version
• The event Y occurs
during any exposure
period of ANY drug
The “after” version
• The event Y occurs after
the earliest prescription of
ANY drug
B
B
B
Y
B
1
B
B
B
Y
B
1
B Y
B
C
Y
C
1
B Y
B
C
Y
C
1
C
C Y D
D
0
C
C Y D
D
1
B B
B
0
Y B
B C
C
0
Y B
Time
Time
Adjustment of confounding





Stratification with continuous variables transformed to
categorical variables first
We will consider age, gender and number of
medications
The advantage of stratification - automatically
generates sub-group analysis
Stratification is compatible with the parallel computing
where data are divided into subsets to run parallel
(data parallelization)
For drug-condition pairs with strong signal, further
sensitivity analysis can be used to assess possible bias
induced by uncontrolled confounding
False positives/negatives



Multiple-comparison issue for assessment of many
drug-condition pairs
False discovery rate (FDR) as a quantitative measure
for false positive control
We plan to implement the local FDR procedure
(Efron, 2000)
–
–
–
–
True association status is a latent binary variable
Model the distributions of true and false positives (mixture
model)
Both parametric and non-parametric methods are straight
forward
Probabilistic measure of likelihood of true association for
each pair
Computation
 We implemented our method in SAS
 Programs need to balance actual computation and
data access to optimize performance (i.e. storage of
large amount of intermediate data avoids redundant
computation, but access of large data also costs time)
 Modularize programs to allow flexible functionality
 Easily incorporate new data to update results
Computational Issues



Large number of patients
Large number of combinations of drugs and
conditions
Need efficient algorithms
–
–
for counting events
for calculation of length of exposure to a specific drug, or to any
drug
 Identification of bottleneck(s) for efficiency improvement
Computing Lessons Learned

Pre-indexing is important for fast query/access of data
•


Batching (by patients) saves memory
Program optimization can reduce computation time by 90%
•
•

Avoid redundant computations
Appropriate data structure to avoid storage of large amount of
trivial data (i.e. large number of zero count)
Parallel computing
•
•

Identification of unique drug list of the synthetic data by SAS
took 6 min before indexing and less than 1 sec after indexing
Data parallelization – single set of instructions on different parts
of data
Parallel computing using SAS/CONNECT reduces the
computing time of 10,000 patients by ~70% on OMOP stat
server
Effort is still on-going
Where we are now
Methods implemented in SAS
• unstratified analysis
• stratified (by age, sex and number of drugs) analysis
Methods in queue to be tested by OMOP
Lessons Learned
 Implementation of relatively straight forward
method might not be so straight forward in giant
databases
 Hardware and software co-ordination is a key for
successful execution of the method and
enhancement of speed. It will also take a series of
trial-and-error experiments to identify the optimal
setting.
 Need to work closely with OMOP to achieve clear
mutual understanding of needs from both sides at
strategic and tactic levels