Putting non-parametric methods in the service of public

Download Report

Transcript Putting non-parametric methods in the service of public

Putting non-parametric methods
in the service of public health
Seydou Doumbia, MD, PhD,
Professor of Epidemiology, Department of Public Health
& Deputy Director of NIAID/NIH Research Program at
Malaria Research & Training Center,
Faculty of Medicine, University of Bamako, Mali
INTRODUCTION
• Importance of forecasting
• ‘’Life has to be lived forward but can only be
understand backward’’
• Basic and ultimate purposes of forecasting is to
predict in the near term what will happen in order
to avoid substantial cost or loss
• The cost of poor prediction may be the loss of
soldiers in war, jobs in economy, profit in
business
• With informed opinions on future probabilities the
planner can mobilize and deploy necessary
resources and reduce the substantial cost of
miscalculation
Introduction (CONTINUED)
Predicting infectious diseases can maximize
intervention impact and minimize cost
Outcome (predicted or measured)
= Cost-benefit
Financial + non-financial
costs
 The cost-benefit for an epidemiological intervention
may be measured a posteriori or estimated a priori.
Optimum predictions may improve outcomes.
•
There are myriad predictive approaches in the
statistical and mathematical epidemiology,
ranging in complexity and generalizability.
•
Most approaches are parametric and hence,
often difficult to optimize, disease specific,
sensitive to outliers, and setting dependent.
A toolbox encapsulating general-purpose
approaches, applicable to different diseases
and settings, is needed.
Thus, let’s discuss a few unorthodox predictive
approaches that may become part of such
toolbox.
•
•
Predicting infectious diseases
Endemic, meso-endemic, or epidemic
Multi- or uni-variate requirements
Temporally or spatially-temporally extended
General-purpose methods
•Disease independent
•Easily operated
•Versatile
•Adaptable
Unorthodox approaches
Non-parametric methods
Fuzzy logic methods
Artificial intelligence
Example 1: Non-parametric approach
Exponential smoothing methods:
•Econometric tradition (eg inventory control)
•Capture non-linearity for endemic and meso-endemic timeseries (climates, geography, demography)
•Learn from experience (adapt to time-series perturbations)
•Usually univariate yet covariates may be introduced
District of Niono, Mali:
•Meso-endemic time-series: Diarrhea, Acute
Respiratory Infection, Malaria,
•Endemic time-series: Schistosomiasis time-series
•Sub-optimum for epidemic time-series
Irrigation system and stagnant water
reservoirs in the district of Niono, Mali.
Observed diarrhea consultation rate time-series are depicted as black lines while red and blue traces
correspond to contemporaneous 2- and 3-month horizon forecasts, respectively; their 95% prediction
interval bounds are symbolized by dots of the same colors. Forecasts and prediction interval bounds are
calculated with a bootstrap-coupled seasonal multiplicative Holt-Winters method. Panel A: 0–11 months;
Panel B: 1–4 years; Panel C: 5–15 years; and, Panel D: >15 years. Medina DC et al. (2007) Forecasting
Non-Stationary Diarrhea, Acute Respiratory Infection, and Malaria Time-Series in Niono, Mali. PLoS ONE
2(11): e1181.
Observed ARI consultation rate time-series are depicted as black lines while red and blue traces
correspond to contemporaneous 2- and 3-month horizon forecasts, respectively; their 95%
prediction interval bounds are symbolized by dots of the same colors. Forecasts and prediction
interval bounds are calculated with a bootstrap-coupled seasonal multiplicative Holt-Winters
method. Panel A: 0–11 months; Panel B: 1–4 years; Panel C: 5–15 years; and, Panel D: >15 years.
Medina DC et al. (2007) Forecasting Non-Stationary Diarrhea, Acute Respiratory Infection, and
Malaria Time-Series in Niono, Mali. PLoS ONE 2(11): e1181.
Observed malaria consultation rate time-series are depicted as black lines while red and blue traces
correspond to contemporaneous 2- and 3-month horizon forecasts, respectively; their 95% prediction
interval bounds are symbolized by dots of the same colors. Forecasts and prediction interval bounds are
calculated with a bootstrap-coupled seasonal multiplicative Holt-Winters method. Panel A: 0–11 months;
Panel B: 1–4 years; Panel C: 5–15 years; and, Panel D: >15 years. Medina DC et al. (2007) Forecasting
Non-Stationary Diarrhea, Acute Respiratory Infection, and Malaria Time-Series in Niono, Mali. PLoS ONE
2(11): e1181.
Thus, SA3 degenerates faster than the
MHW method as the forecast horizon
increases
Medina DC et al. (2007) Forecasting Non-Stationary Diarrhea, Acute Respiratory Infection, and Malaria
Time-Series in Niono, Mali. PLoS ONE 2(11): e1181.
Observed Schistosoma haematobium consultation rate time-series in the district of Niono, Mali, are
depicted as black lines in this composite panel while red traces correspond to contemporaneous hmonth horizon forecasts; 95% prediction interval bounds are symbolized by red dots of the same color.
Forecasts were generated with exponential smoothing (ES) methods, which are encapsulated within
the state-space forecasting framework. Panels A, B, C, and D correspond to 2-, 3-, 4-, and 5-month
horizon forecasts, respectively. Medina DC et al. (2008) State–Space Forecasting of Schistosoma
haematobium Time-Series in Niono, Mali. PLoS Negl Trop Dis 2(8): e276.
Mean absolute percentage error (MAPE) values between Schistosoma haematobium time-series
observations for the district of Niono, Mali, and their corresponding h-month horizon forecasts measure
external accuracy. MAPE values for 1–5 month horizon forecasts were circa 25. Therefore, this panel
demonstrates that forecast accuracy is reasonable for short horizons. Of note, MAPE assesses the skill
of h-month horizon forecasts. Medina DC et al. (2008) State–Space Forecasting of Schistosoma
haematobium Time-Series in Niono, Mali. PLoS Negl Trop Dis 2(8): e276.
Example 2: Knowledge-driven approach
Fuzzy logic functions (e.g. trigonometric, weighted,
etc):
•Engineering tradition
•Attempts to assign membership to an item with different
degrees of certainty
•Knowledge- and or data-driven
•Capture non-linearity (climates, geography, demography)
•Learn from experience
•Usually multivariate
•Optimum for spatially extended system with scarce data
African continent:
•Rift Valley Fever
Endemic suitability map for Rift Valley fever
in Africa based on ordered weighted
averages analysis. Suitability scores range
from 0 (completely unsuitable) to 255
(completely suitable). Clements et al.
International Journal of Health Geographics
2006 5:57
Epidemic suitability map for Rift Valley
fever in Africa based on ordered weighted
averages analysis. Suitability scores range
from 0 (completely unsuitable) to 255
(completely suitable).Clements et al.
International Journal of Health Geographics
2006 5:57
Overlay of observed serological
prevalence and estimated endemic
suitability for Rift Valley fever in
Senegal (ruminant). Suitability
estimates were derived using weighted
linear combination. Clements et al.
International Journal of Health
Geographics 2006 5:57
Overlay of observed serological
prevalence and estimated epidemic
suitability for Rift Valley fever in
Senegal (ruminant). Suitability estimates
were derived using weighted linear
combination. Clements et al. International
Journal of Health Geographics 2006 5:57
Example 3: Artificial Intelligence approach
Support vector machines:
•Artificial intelligence tradition: Kernel methods, Supportvector Machines (regression, classification, anomaly
detection), Neural networks
•Solve problems for which analytical treatment is lacking or
intractable
•Capture non-linearity (climates, geography, demography)
•Learn from experience
•Usually univariate or multivariate
•Temporally or spatially-temporally extended
Support Vector Regression (SVR):
•Kernel-Based  transform data set into a linear space
•Large data sets  automatic regularization
•Highly generalizeable
Support vector machines is
similar to kerneltransforming a non-linear
input data into a linear highdimensional feature space
where simple linear
regression can be executed.
The output is always in the
original dimension.
Somalia:
•Ruminant IgG seroprevalence
•Two-stage clusterrandomized serological
survey
•Spatial estimates with SVR
•Built-in bootstrap for
dispersion estimation
Figure 8. Spatial ruminant serological spatial prevalence. Centrality and dispersion were calculated
via B = 100 ordinary bootstraps of multivariate observations, SVR-based spatially-resolved
prevalence estimation for each re-sample, and finally computation of adequate order statistics. A)
median, B) maximum, C) IQR, and D) minimum. Courtesy of Daniel Medina..
Conclusion
1. Non-parametric approaches may be applied to multiple
diseases and settings without parametric disadvantages
such as multi-colinearity and sensitivity to outliers.
2. Although non-parametric approaches are like a “blackbox” approach, they are robust, simply interpreted, and
easily optimized.
3. Fuzzy logic is ideal for spatially extended areas for which
transmission is epidemic and or data are scarce. [Thus,
minimizing data collection needs.]
4. The general-purpose nature of non-parametric/fuzzy
logic/artificial intelligence approaches implies that
studies for multiple diseases and sites could be better
compared
5. Adequate predictions maximize intervention and
minimize costs
Acknowledgements
Thanks to the organizers, participants, Malaria
Research & Training Center, Mali; Columbia
University, US; and the District Hospital of
Niono, Mali.