The DIMACS Working Group on Disease and Adverse Event

Download Report

Transcript The DIMACS Working Group on Disease and Adverse Event

The DIMACS Working Group on
Disease and Adverse Event
Surveillance
Henry Rolka and David Madigan
Background
• WG Objective: Bring together researchers in adverse
event monitoring and disease surveillance
• Part of a 5-year special focus on computational and
mathematical epidemiology
• 50+ WG members: epidemiologists, public health
professionals, biostatisticians, etc.
• Focus on analytic/statistical methods
• Two WG meetings plus week-long tutorial (02-03)
• Coordinated closely with National Syndromic
Surveillance Conferences
Areas of Common Interest
Disease Surveillance
Vaccine Safety
Surveillance
Drug Safety
Surveillance
Syndromic Surveillance
Representation
• Carnegie-Mellon
University
• FDA
• Quintiles Inc.
• CDC
• Rutgers University
• Emergint, Inc.
• AT&T Labs
• NJ State
•
•
•
•
•
•
•
•
NYC Dept. of Health
University of Pennsylvania
Aventis
ATSDR
University of Connecticut
Los Alamos National Lab
Lincoln Technologies
SAS Institute
Background, cont.
• WG conceived before September 11, 2001
• Surveillance landscape has changed drastically
• Major public health effort directed at
bioterrorism detection
• Proliferation of novel surveillance projects in
response to national threat
• “Good for detecting outbreaks of various
kinds”
New Data Types for Public Health
Surveillance
• Managed care patient encounter data
• Pre-diagnostic/chief complaint (text data)
• Over-the-counter sales transactions
– Drug store
– Grocery store
•
•
•
•
•
•
911-emergency calls
Ambulance dispatch data
Absenteeism data
ED discharge summaries
Prescription/pharmaceuticals
Adverse event reports
New Analytic Methods and Approaches
•
•
•
•
•
•
•
Spatial-temporal scan statistics
Statistical process control (SPC)
Bayesian applications
Market-basket association analysis
Text mining
Rule-based surveillance
Change-point techniques
ANALYTIC METHODS IN USE
• Scan statistics (e.g., Kulldorff ’s SaTScan)
• Statistical process control (e.g., Hutwagner’s EARS)
• Association rule mining (e.g., Moore’s WSARE)
• Bayesian shrinkage (e.g., DuMouchel’s MGPS)
• Generalized linear mixed models (e.g., Kleinman)
• Sequential probability ratio tests (e.g., Spiegelhalter, Evans)
SCAN STATISTICS
• Martin Kulldorff ’s SaTScan Spatial and Space-Time Scan Statistics
- software.
• e.g., spatial scan – using Poisson
model computes likelihood of all
possible circles compared with
likelihood under the null distribution
• Picks the circle with the biggest
likelihood ratio
• P-value computed via Monte Carlo
• Big literature on disease clustering: Besag & Newell, Diggle,
Moran test, Turnbull’s method, Cuzick & Edwards, etc.
• Need methodology for multiple sources
Farzad Mostashari
BAYESIAN SHRINKAGE ESTIMATION
• DuMouchel’s GPS/MGPS
• Compares observed counts of “market baskets” to
expected counts under some (simple) model. For example,
saw 30 cases in the ER today with G.I. syndrome AND
fever AND work in Newark compared with an expectation
of 3 cases
• 30-to-3 is more convincing than 3-to-0.3 but less
convincing that 300-to-30. Idea: shrink the smaller ones
towards one.
6
GPS SHRINKAGE – AERS DATA
number of reports
3
2
1
0
log EBGM
4
5
1
2
3
5
10
50-100
0
1
2
3
log RR
4
5
6
BAYESIAN SHRINKAGE ESTIMATION
• Issues:
Appropriate amount of shrinkage?
Where do the expected values come from?
Temporal dimension?
Covariate information
Simpson’s paradox (“innocent bystander”)
SEQUENTIAL PROBABILITY RATIO TESTS
• Classical much-studied statistical method dating back to
Wald (1948)
NATURAL LANGUAGE
• Important sources of health data begin life as free text
“chief complaints” (ED visits, primary care encounters,
adverse event reports, e-mail, etc.)
“Approximately 5 minutes after receiving flu and pneumonia vaccine pt began
hollering, "Oh, Oh my neck is hurting. Feels like a knot in my throat, a
medicine taste." Complained of chest pain moving to back and leg numbness.”
• Some (successful) work on automated coding of free
text.
• Little work on direct surveillance of text data
CONCLUSION
• Analytic methods for surveillance have a long history in
Statistics but currently attract substantial new interest from
researchers in both CS and Statistics
• Urgently need new methods for multivariate, multi-data type
streams
• Data availability a bottleneck; simulation non-trivial.
• DARPA currently staging a competition
THE IDEA OF A COMPETITION
Thesis: Rapid growth in the number of deployed health
surveillance systems and increasing complexity require new
analytic methodologies
Goal: Stimulate mainstream Computer Science and
Statistics researchers to focus on this area
How: A signal detection competition
Examples: the Message Understanding Conferences (MUC),
Text Retrieval Conferences (TREC), KDD Cup, M3 Time
Series competition
COMPETITION STATUS
•DIMACS Working Group on Adverse Event and Disease
Reporting, Surveillance, Analysis
•Subgroup focused on competition; applied for funding;
identified data sources
•Key challenge: appropriate methods for inserting signals
into real data (“spiking”)
•Other groups face the same challenge (e.g. BioStorm)
ANALYTIC METHODS IN USE
• Scan statistics (e.g., Kulldorff ’s SaTScan)
• Statistical process control (e.g., Hutwagner’s EARS)
• Association rule mining (e.g., Moore’s WSARE)
• Bayesian shrinkage (e.g., DuMouchel’s MGPS)
• Generalized linear mixed models (e.g., Kleinman)
• Sequential probability ratio tests (e.g., Spiegelhalter, Evans)
SCAN STATISTICS
• Martin Kulldorff ’s SaTScan Spatial and Space-Time Scan Statistics
- software.
• e.g., spatial scan – using Poisson
model computes a likelihood ratio for
all possible circles comparing event
counts inside and outside
• Picks the circle with the biggest
likelihood ratio
• P-value computed via Monte Carlo
• Big literature on disease clustering: Besag & Newell, Cuzick &
Edwards, Diggle, Moran test, Pagano, Turnbull’s method,, etc.
• Need methodology for multiple sources
Farzad Mostashari
BAYESIAN SHRINKAGE ESTIMATION
• DuMouchel’s GPS/MGPS
• Compares observed counts of “market baskets” to
expected counts under some (simple) model. For example,
saw 30 cases in the ER today with G.I. syndrome AND
fever AND work in Newark compared with an expectation
of 3 cases
• 30-to-3 is more convincing than 3-to-0.3 but less
convincing that 300-to-30. Idea: shrink the smaller ones
towards one.
6
GPS SHRINKAGE – AERS DATA
number of reports
3
2
1
0
log EBGM
4
5
1
2
3
5
10
50-100
0
1
2
3
log RR
4
5
6
BAYESIAN SHRINKAGE ESTIMATION
• Issues:
Appropriate amount of shrinkage?
Where do the expected values come from?
Temporal dimension?
Covariate information
SEQUENTIAL PROBABILITY RATIO TESTS
• Classical much-studied statistical method dating back to
Wald (1948). Mostly univariate.
NATURAL LANGUAGE
• Important sources of health data begin life as free text
“chief complaints” (ED visits, primary care encounters,
adverse event reports, e-mail, etc.)
“Approximately 5 minutes after receiving flu and pneumonia vaccine pt began
hollering, "Oh, Oh my neck is hurting. Feels like a knot in my throat, a
medicine taste." Complained of chest pain moving to back and leg numbness.”
• Some (successful) work on automated coding of free
text.
• Little work on direct surveillance of text data
THE IDEA OF A COMPETITION
Thesis: Rapid growth in the number of deployed health
surveillance systems and increasing complexity require new
analytic methodologies
Goal: Stimulate mainstream Computer Science and
Statistics researchers to focus on this area
How: A signal detection competition
Examples: the Message Understanding Conferences (MUC),
Text Retrieval Conferences (TREC), KDD Cup, M3 Time
Series competition
HOW CAN THIS BE ACCOMPLISHED
• Definitions of signals.
• Test data sets for refining signal detection procedures.
• Modular, interoperable signal generation algorithms.
• Computing efficiencies for Monte Carlo simulations of signal
detection events in large complex data.
• Multidimensional graphical displays to interpret results and evaluate
algorithms.
• Multivariate statistical techniques for evaluating signal detection
profiles across multiple data sources.
COMPETITION STATUS
•DIMACS Working Group on Adverse Event and Disease
Reporting, Surveillance, Analysis
•Subgroup focused on competition; applied for funding;
identified data sources
•Key challenge: appropriate methods for inserting signals
into real data (“spiking”)
•Other groups face the same challenge (e.g. BioStorm)
CONCLUSION
• Short-term goals/benefits:
•Promote coordination and collaboration
• Long-term goals/benefits
• Stimulate methodological research
• Provide objective evaluation of competing algorithms
• Produce high quality spiking algorithms
ANALYTICAL METHODS FOR
HEALTH SURVEILLANCE
DAVID MADIGAN
DEPARTMENT OF STATISTICS
RUTGERS UNIVERSITY
Novel Surveillance Applications
Methodologies
• Early Aberration Reporting System (EARS), CDC
• What’s Strange About Recent Events? (WSARE), U
of Pittsburgh and Carnegie-Mellon U
• Spatial and Space-Time Scan Statistics (SaTScanTM –
Kulldorff)
• Web Visual Data Mining Environment (WebVDME),
Lincoln Technologies, Inc.
Novel Surveillance Applications
Projects
• Electronic Surveillance System for the Early Notification of
Community-based Epidemics (ESSENCE I&II), DOD
• Real-time Outbreak and Disease Surveillance (RODS), U of
Pittsburgh
• Biological Spatio-Temporal Outbreak Reasoning Module
(BioSTORM), Stanford U
• Rapid Syndrome Validation Project (RSVP), Sandia NL, NM
• Alternative Surveillance Alert Program (ASAP), Health Canada
• Syndromic Surveillance Project, NYC
• Bioterrorism Syndromic Surveillance Demonstration Program,
CDC/Harvard
Conceptual Taxonomy
Public Health Surveillance
Adverse event
(to intervention exposure)
Drug
Vaccine
Disease
Syndromic
Traditional
Infectious disease
Birth defect
Other
Injuries
Etc.