Building the Model

Download Report

Transcript Building the Model

Challenges for Discrete Mathematics
and Theoretical Computer Science
in Defense Against Bioterrorism
Great concern about the deliberate introduction of
diseases by bioterrorists has led to new challenges
for mathematical scientists.
smallpox
Dealing with bioterrorism requires detailed
planning of preventive measures and responses.
Both require precise reasoning and extensive
analysis.
Understanding infectious systems requires being
able to reason about highly complex biological
systems, with hundreds of demographic and
epidemiological variables.
Intuition alone is insufficient to fully understand
the dynamics of such systems.
Experimentation or field trials are often
prohibitively expensive or unethical and do not
always lead to fundamental understanding.
Therefore, mathematical modeling becomes an
important experimental and analytical tool.
Mathematical models have become important tools
in analyzing the spread and control of infectious
diseases and plans for defense against bioterrorist
attacks, especially when combined with powerful,
modern computer methods for analyzing and/or
simulating the models.
What Can Math Models Do For Us?
What Can Math Models Do For Us?
•Sharpen our understanding of fundamental
processes
•Compare alternative policies and interventions
•Help make decisions.
•Prepare responses to bioterrorist attacks.
•Provide a guide for training exercises and
scenario development.
•Guide risk assessment.
•Predict future trends.
What are the challenges for mathematical scientists
in the defense against disease?
This question led DIMACS, the Center for Discrete
Mathematics and Theoretical Computer Science, to
launch a “special focus” on this topic.
Post-September 11 events soon led to an emphasis
on bioterrorism.
DIMACS Special Focus on
Computational and Mathematical
Epidemiology 2002-2005
Anthrax
Methods of Math. and Comp. Epi.
Math. models of infectious diseases go back to
Daniel Bernoulli’s mathematical analysis of
smallpox in 1760.
Hundreds of math. models since have:
•highlighted concepts like core population in
STD’s;
•Made explicit concepts such as herd immunity
for vaccination policies;
•Led to insights about drug resistance, rate of
spread of infection, epidemic trends, effects of
different kinds of treatments.
The size and overwhelming complexity of modern
epidemiological problems -- and in particular the
defense against bioterrorism -- calls for new
approaches and tools.
The Methods of Mathematical
and Computational Epidemiology
•Statistical Methods
–long history in epidemiology
–changing due to large data sets involved
•Dynamical Systems
–model host-pathogen systems, disease spread
–difference and differential equations
–little systematic use of today’s powerful computational
methods
The Methods of Mathematical
and Computational Epidemiology
•Probabilistic Methods
–stochastic processes, random walks, percolation,
Markov chain Monte Carlo methods
–simulation
–need to bring in more powerful computational tools
Discrete Math. and Theoretical
Computer Science
• Many fields of science, in particular molecular
biology, have made extensive use of DM broadly
defined.
Discrete Math. and Theoretical
Computer Science Cont’d
•Especially useful have been those tools that make
use of the algorithms, models, and concepts of
TCS.
•These tools remain largely unused and unknown
in epidemiology and even mathematical
epidemiology.
DM and TCS Continued
•These tools are made especially relevant to
epidemiology because of:
–Geographic Information Systems
DM and TCS Continued
–Availability of large and disparate
computerized databases on subjects relating to
disease and the relevance of modern methods
of data mining.
DM and TCS Continued
–The increasing importance of an evolutionary
point of view in epidemiology and the relevance
of DM/TCS methods of phylogenetic tree
reconstruction.
Challenges for Discrete Math
and Theoretical Computer
Science in Bioterrorism Defense
What are DM and TCS?
DM deals with:
•arrangements
•designs
•codes
•patterns
•schedules
•assignments
TCS deals with the theory of computer algorithms.
During the first 30-40 years of the computer age,
TCS, aided by powerful mathematical methods,
especially DM, probability, and logic, had a direct
impact on technology, by developing models, data
structures, algorithms, and lower bounds that are
now at the core of computing.
DM and TCS have found extensive use in many
areas of science and public policy, for example in
Molecular Biology.
These tools, which seem especially relevant to
problems of epidemiology, are not well known to
those working on public health problems.
So How are DM/TCS Relevant to the Fight
Against Bioterrorism?
1. Detection/Surveillance
1a. Streaming Data Analysis:
•When you only have one shot at the data
•Widely used to detect trends and sound alarms in
applications in telecommunications and finance
•AT&T uses this to detect fraudulent use of credit
cards or impending billing defaults
•Columbia has developed methods for detecting
fraudulent behavior in financial systems
•Uses algorithms based in TCS
•Needs modification to apply to disease detection
Research Issues:
•Modify methods of data collection,
transmission, processing, and visualization
•Explore use of decision trees, vector-space
methods, Bayesian and neural nets
•How are the results of monitoring systems best
reported and visualized?
•To what extent can they incur fast and safe
automated responses?
•How are relevant queries best expressed, giving
the user sufficient power while implicitly
restraining him/her from incurring unwanted
computational overhead?
1b. Cluster Analysis
•Used to extract patterns from complex data
•Application of traditional clustering algorithms
hindered by extreme heterogeneity of the data
•Newer clustering methods based on TCS for
clustering heterogeneous data need to be modified
for infectious disease and bioterrorist applications.
1c. Visualization
•Large data sets are sometimes best understood by
visualizing them.
1c. Visualization (continued)
•Sheer data sizes require new visualization
regimes, which require suitable external memory
data structures to reorganize tabular data to
facilitate access, usage, and analysis.
•Visualization algorithms become harder when data
arises from various sources and each source
contains only partial information.
1d. Data Cleaning
•Disease detection problem: Very “dirty” data:
1d. Data Cleaning (continued)
•Very “dirty” data due to
–manual entry
–lack of uniform standards for content and formats
–data duplication
–measurement errors
•TCS-based methods of data cleaning
–duplicate removal
–“merge purge”
–automated detection
1e. Dealing with “Natural Language”
Reports
•Devise effective methods for translating natural
language input into formats suitable for analysis.
•Develop computationally efficient methods to
provide automated responses consisting of followup questions.
•Develop semi-automatic systems to generate
queries based on dynamically changing data.
1f. Cryptography and Security
•Devise effective methods for protecting privacy of
individuals about whom data is provided to
biosurveillance teams -- data from emergency dept.
visits, doctor visits, prescriptions
•Develop ways to share information between
databases of intelligence agencies while protecting
privacy?
1f. Cryptography and Security (continued)
•Specifically: How can we make a simultaneous
query to two datasets without compromising
information in those data sets? (E.g., is individual
xx included in both sets?)
•Issues include:
–insuring accuracy and reliability of responses
–authentication of queries
–policies for access control and authorization
2. Social Networks
•Diseases are often spread through social contact.
•Contact information is often key in controlling an
epidemic, man-made or otherwise.
•There is a long history of the use of DM tools in
the study of social networks: Social networks as
graphs.
2a. Spread of Disease through a Network
•Dynamically changing networks: discrete times.
•Nodes (individuals) are infected or non-infected
(simplest model).
•An individual becomes infected at time t+1 if
sufficiently many of its neighbors are infected at
time t. (Threshold model)
•Analogy: saturation models in economics.
•Analogy: spread of opinions through social
networks.
Complications and Variants
•Infection only with a certain probability.
•Individuals have degrees of immunity and
infection takes place only if sufficiently many
neighbors are infected and degree of immunity is
sufficiently low.
•Add recovered category.
•Add levels of infection.
•Markov models.
•Dynamic models on graphs related to neural nets.
Research Issues:
•What sets of vertices have the property that their
infection guarantees the spread of the disease to x%
of the vertices?
•What vertices need to be “vaccinated” to make
sure a disease does not spread to more than x% of
the vertices?
•How do the answers depend upon network
structure?
•How do they depend upon choice of threshold?
These Types of Questions Have Been
Studied in Other Contexts Using DM/TCS
2b. Distributed Computing:
2b. Distributed Computing (continued):
•Eliminating damage by failed processors -- when a
fault occurs, let a processor change state if a
majority of neighbors are in a different state or if
number is above threshold.
•Distributed database management.
•Quorum systems.
•Fault-local mending.
2c. Spread of Opinion
2c. Spread of Opinion
•Of relevance to bioterrorism.
•Dynamic models of how opinions spread through
social networks.
•Your opinion changes at time t+1 if the number of
neighboring vertices with the opposite opinion at
time t exceeds threshold.
•Widely studied.
•Relevant variants: confidence in your opinion (=
immunity); probabilistic change of opinion.
3. Evolution
3. Evolution (continued)
•Models of evolution might shed light on new
strains of infectious agents used by bioterrorists.
•New methods of phylogenetic tree reconstruction
owe a significant amount to modern methods of
DM/TCS.
• Phylogenetic analysis might help in identification
of the source of an infectious agent.
3a. Some Relevant Tools of DM/TCS
•Information-theoretic bounds on tree
reconstruction methods.
•Optimal tree refinement methods.
•Disk-covering methods.
•Maximum parsimony heuristics.
•Nearest-neighbor-joining methods.
•Hybrid methods.
•Methods for finding consensus phylogenies.
3b. New Challenges for DM/TCS
•Tailoring phylogenetic methods to describe the
idiosyncracies of viral evolution -- going beyond a
binary tree with a small number of
contemporaneous species appearing as leaves.
•Dealing with trees of thousands of vertices, many
of high degree.
•Making use of data about species at internal
vertices (e.g., when data comes from serial
sampling of patients).
•Network representations of evolutionary history if recombination has taken place.
3b. New Challenges for DM/TCS:
Continued
•Modeling viral evolution by a collection of trees -to recognize the “quasispecies” nature of viruses.
•Devising fast methods to average the quantities of
interest over all likely trees.
4. Decision Making/Policy
Analysis
4. Decision Making/Policy
Analysis (continued)
•DM/TCS have a close historical connection with
mathematical modeling for decision making and
policy making.
•Mathematical models can help us:
–understand fundamental processes
–compare alternative policies and interventions
–provide a guide for scenario development
–guide risk assessment
–aid forensic analysis
–predict future trends
4a. Consensus
•DM/TCS fundamental to theory of group decision
making/consensus
•Based on fundamental ideas in theory of “voting”
and “social choice”
•Key problem: combine expert judgments (e.g.,
rankings of alternatives) to make policy
4a. Consensus Continued
•Prior application to biology (Bioconsensus):
–Find common pattern in library of molecular
sequences
–Find consensus phylogeny given alternative
phylogenies
•Developing algorithmic view in consensus theory:
fast algorithms for finding the consensus policy
•Special challenge re bioterrorism/epidemiology:
instead of many “decision makers” and few
“candidates,” could be few decision makers and
many candidates (lots of different parameters to
modify)
4b. Decision Science
•Formalizing utilities and costs/benefits.
•Formalizing uncertainty and risk.
•DM/TCS aid in formalizing optimization
problems and solving them: maximizing utility,
minimizing pain, …
•Bringing in DM-based theory of meaningful
statements and meaningful statistics.
•Some of these ideas virtually unknown in public
health applications.
•Challenges are primarily to apply existing tools to
new applications.
4c. Game Theory
4c. Game Theory (continued)
•History of use in military decision making
•Relevant to conflicts: bioterrorism
•DM/TCS especially relevant to multi-person
games
•Of use in allocating scarce resources to different
players or different components of a
comprehensive policy.
•New algorithmic point of view in game theory:
finding efficient procedures for computing the
winner or the appropriate resource allocation.
5. Operations Research
•O.R. a traditional tool in defense.
•Many applications in planning defense against
attacks by bioterrorists.
•Methods of Discrete Optimization/Queueing
relevant to:
–size of stockpiles of vaccines
–allocation of medications
–analysis of bottlenecks in treatment facilities
5. Operations Research (continued)
Challenges are not primarily development of new
methods, but modification of existing O.R.
methods to apply to new contexts.
6. Some Additional Relevant
DM/TCS Topics
6a. Order-Theoretic Concepts:
•Relevance of partial orders and lattices.
•The exposure set (set of all subjects whose
exposure levels exceed some threshold) is a
common construction in dimension theory of
partial orders.
•Point lattices may be useful for visualizing the
relationships of contigency tables to effect
measures and cut-off choices.
6b. Combinatorial Group Testing
•Natural or human-induced epidemics might
require us to test samples from large populations at
once.
•Combinatorial group testing arose from need for
mathematical methods to test millions of WWII
draftees for syphilis.
•Identify all positive cases in large population by:
–dividing items into subsets
–testing if subset has at least one positive item
–iterating by dividing into smaller groups.
Would DM/TCS help with a
deliberate outbreak of Anthrax?
What about a deliberate release of
smallpox?
Similar approaches, using mathematical models
based in DM/TCS, have proven useful in many
other fields, to:
•make policy
•plan operations
•analyze risk
•compare interventions
•identify the cause of observed events
Why shouldn’t these approaches work in the
defense against bioterrorism?