Transcript slides

Bayesian Biosurveillance
Gregory F. Cooper
Center for Biomedical Informatics
University of Pittsburgh
[email protected]
The research described in this talk is based on collaborative work with members
of the Bayesian Biosurveillance project and the RODS Laboratory at the
University of Pittsburgh, and the Auton Laboratory at Carnegie Mellon University.
Special thanks Bill Hogan for the BARD slides that are included in this
presentation.
Outline
• Provide a brief overview of Bayesian inference
as applied to outbreak detection
• Show an example of a Bayesian biosurveillance
algorithm
Biosurveillance
• Definition: Biosurveillance is the
process of monitoring for new outbreaks
of infectious disease
• Goal: Detect an infectious disease
outbreak in a population rapidly and
accurately
Bayes Rule
P(data, hypothesis)
P(hypothesis | data) 
P(data)
Bayes Rule
P(hypothesis | data) 

P(data, hypothesis)
P(data)
P(data | hypothesis) P(hypothesis)
P(data)
Bayes Rule
P (hypothesis | data) 
P (data, hypothesis)
P (data)

P (data | hypothesis) P (hypothesis)
P (data)

P (data | hypothesis) P (hypothesis)
 P (data | hypothesisi ) P (hypothesisi )
i
Bayes Rule
P(hypothesis | data) 
P(data, hypothesis)
P (data)

P(data | hypothesis) P(hypothesis)
P(data)

P (data | hypothesis) P(hypothesis)
 P (data | hypothesisi ) P(hypothesisi )
i
 P(data |  ) P( | hypothesis) P(hypothesis) d


  P(data |  ) P( | hypothesisi ) P(hypothesisi ) d
 i
Bayes Rule
P(hypothesis | data) 

P(data, hypothesis)
P (data)
P(data | hypothesis) P(hypothesis)
P(data)
P (data | hypothesis) P(hypothesis)

 P (data | hypothesisi ) P(hypothesisi )
i
parameter
prior
hypothesis
prior
 P (data |  ) P( | hypothesis) P(hypothesis) d


  P(data |  ) P( | hypothesisi ) P(hypothesisi ) d
 i
Bayes Rule for
Outbreak Detection
One hypothesis is that there is no
disease outbreak at the present time.
Other hypothesis postulates various
types of outbreaks, such as anthrax,
small pox, plague, and many others.
Some Advantages
of a Bayesian Approach
to Biosurveillance
• Permits specification of prior knowledge and belief
– Knowledge about outbreak diseases
– Belief about whether, when and how an outbreak will occur,
based on experience, intel, and intelligent guesses.
• Facilitates modeling
– of complex outbreaks
– with multiple data streams
• Yields inferences
– of P(outbreak | data), which can be used directly in a decision
analysis about what to do
– of other statistics of interest, such as the expected number of
people infected in a probable outbreak situation
An Example of a Bayesian
Biosurveillance Algorithm
• BARD (Bayesian Aerosol Release Detector)
is an outbreak detection system that is
designed to compute the posterior probability
of an outdoor, windborne release of anthrax
spores
• Outbreak data
– Emergency Dept (ED) chief complaints
– OTC
– BioWatch sensors
• Additional data
– Weather data
– Dispersion data
BARD: Overview
• Seeks earlier, more sensitive detection of
windborne outbreaks through recognition
of a characteristic dispersion pattern
• An alert not only detects outbreak, but
characterizes it as windborne
• Derives estimates of release location,
quantity and timing
• Has been running in Pittsburgh (since
1/2005) and Philadelphia (since 6/2005)
Typical Computation for Aerosol Releases:
Predict Consequences of Release Parameters
Weather
Quantity
Released
Location of
Release
Time of
Release
Dispersion
Model
Downwind airborne
concentrations
Model of Effects of
Aerosol on People
Predicted effect on
biosurveillance
data over time
BARD Uses Bayesian Inference to Derive
Release Parameters from Data
Weather
Quantity
released
Location of
release
Time of
Release
Inversion of
Dispersion Model
Downwind airborne
concentrations
Inversion of Model
of Aerosol Effects on
Biosurveillance Data
Observed effect on
biosurveillance
data over time
BARD Searches for the
Optimal Release Parameters
Wind direction 2 days ago
P(Data | Release Params)
is very low
Wind direction 3 days ago
P(Data | Release Params)
is relatively high
The Structure of the BARD Model
The Structure of the BARD Model
Dispersion
Model
The Gaussian Plume Model
 ( hi  h )
 ( hi  h ) 

2
Qw V E
2 y ( x  xi )
2 z ( x  xi ) 2
2 z ( x  xi ) 2
e

d
e
e
2 y ( s, x  xi ) z ( s, x  xi )u




.
 ( y  yi ) 2
2
where
d is the number of spores inhaled by an individual
Q is the number of kilograms of spores released
w is the number of spores per kilogram
VE is minute ventilation
(x, y, h) is the coordinate of the hypothesized release location where x and y specify the
location on the surface of the earth and h specifies height above ground
(xi yi, hi) is similarly the coordinate of the patient’s location
x and Z are the distributions of spores in the crosswind direction
u is the wind speed
s is the atmospheric stability
2
The Structure of the BARD Model
Model of effects on
an person
BARD Evaluation: Methods
• Used BARD to generate data for 20
simulated windborne anthrax releases
(Thus, this is a preliminary evaluation.)
• Injected that ED respiratory chief
complaint data into a real historical dataset
• Used historical weather data for simulation
and detection
AMOC Analysis of BARD
Time to detection (days)
3.5
3
From
release
2.5
From
first ED
visit
2
1.5
1
0.5
0
0
10
20
30
False alarm rate (per year)
40
50
BARD Evaluation: Results
• Sensitivity = 100% at false alarm rate of zero (for detection
within seven days of the simulated release)
• Mean timeliness at false alarm rate of zero:
– From time of release, 3.1 days
– From time of first ED visit, 1.2 days (28 hours)
• Mean accuracy of release parameters output by BARD:
–
–
–
–
–
X coordinate of release location:
Y coordinate of release location:
Height of release:
Quantity of release:
Time of release:
3,400
84
124
0.5
0.008
meters
meters
meters
kilograms
days
Accuracy of Identified Release Parameters
mean
stdev
min
max
median
x
y
3,399
84.1
5,623
94.7
17
0.8
24,710 344.0
1,784
51.9
x,y
3,412
5,616
45
24,711
1,785
x,y,h
3,422
5,612
60
24,712
1,786
Q
0.5
1.1
0.0
4.0
0.0
h
124.0
104.2
0.0
390.0
100.0
t
0.0083
0.0373
0.0000
0.1667
0.0000
BARD: Search Time
~ 3 minutes to consider 200,000 release
scenarios in searching for an outbreak in the
Pittsburgh metropolitan area
Summary
Bayesian biosurveillance
• has a number of attractive qualities
• has been implemented in several algorithms
• is practical
• has many unexplored, promising directions for
future work
Acknowledgments
This research was supported by the
National Science Foundation,
the Pennsylvania Department of Health,
the Department of Homeland Security,
DARPA, and the Centers for Disease
Control and Prevention.
Additional Information
• Bayesian Biosurveillance Project:
www.cbmi.pitt.edu/panda
• Real-Time Outbreak and Disease
Surveillance (RODS) Laboratory:
rods.health.pitt.edu
• Greg Cooper: [email protected]