meteorological factors

Download Report

Transcript meteorological factors

6th International Conference on Software Engineering and Knowledge Engineering
SEKE 2014, Hyatt Regency, Vancouver, Canada
Artificial neural networks for infectious diarrhea
prediction using meteorological factors in Shanghai
Yongming Wang, Junzhong Gu and Zili Zhou
Department of Computer Science & Technology, East China Normal University
Institute of Computer Applications, Shanghai, China
E-mail: [email protected]
http://www.ica.stc.sh.cn
OUTLINES
•
•
•
•
•
•
•
•
Introduction
Study area and dataset
Prediction method and performance metrics
Development of FFBPNN model
– input and output parameters
– Data pre-processing and post-processing
– Determination of optimum network and parameters
Development of MLR model
Experiments results and discussion
Sensitivity analyses
Conclusions
Introduction
As a kind of common and important infectious disease, infectious
diarrhea has a serious threat to human health and leads to one billion
disease episodes and 1.8 million deaths each year (WHO, 2008).
In Shanghai of China which is the biggest developing country, the
incidence of infectious diarrhea has significant seasonality throughout
the year and is particularly high in the summer and autumn of recent
years.
Hence, a robust short-term forecasting model for infectious diarrhea
incidence is necessary for decision-making in policy and public health.
Introduction
Infectious diseases have a closely relation with meteorological factors,
such as temperature and rainfall, and can affect infectious diseases in a
linear or nonlinear fashion. In recent years, there has been a large
scientific and public debate on climate change and its direct as well as
indirect effects on human health.
As far as we are concerned with the prediction of diarrhea diseases in
literature, many forecasting models based on statistical methods for
diarrhea diseases forecasting have been reported.
With regard to the fact that number of meteorological factor that effect
infectious diarrhea are too much and the inter-relation among them is
also very complicated, prediction models based on statistics methods may
not be fully suitable for such type of problems.
Introduction
Nowadays, Artificial Neural Networks (ANNs) are considered to be one
of the intelligent tools to understand the complex problems and have
been widely used in the medical and health field. To the best knowledge
of the authors, there is no works has been carried out to utilize the ANNs
method in predicting diarrhea disease.
Contribution : E stablish a new ANNs model (FFBPNN) to predict
infectious diarrhea in Shanghai with a set of meteorological factors as
predictors.
Study area and Dataset-Study area
Shanghai is located in the eastern part of China which is the largest
developing country in the world, and the city has a mild subtropical
climate with four distinct seasons and abundant rainfalls. It is the
most populous city in China comprising urban/suburban districts
and counties, with a total area of 6,340.5 square kilometers and had
a population of more then 25.0 million by the end of 2013.
Study area and dataset-dataset
350
Weekly number of infectious diarrhea
300
250
200
150
100
50
0
0
50
150
100
Time(week)
200
T he infectious diarrhea cases for the period 2005.1.32009.1.4
250
35
25
30
30
25
20
15
10
20
15
10
5
0
5
90
85
Weekly average minimum relative humidity
30
35
Weekly average temperature
40
Weekly average minimum temperature
Weekly average maximum temperature
Study area and dataset-dataset
25
20
15
10
5
80
75
70
65
60
55
50
2007
2008
Time(week)
2009
-5
(b) 2005
2010
90
2009
2010
0
(c) 2005
1030
Weekly average atmospheric pressure
Weekly average relative humidity
2007
2008
Time(week)
1035
85
80
75
70
65
60
55
50
45
(e) 2005
2006
1025
1020
1015
1010
1005
2006
2007
2008
Time(week)
2009
2010
(d)
45
2005
12
7
10
6
Weekly average wind speed
2006
Weekly average sunshine duration
0
(a) 2005
8
6
4
2
2006
2007
2008
Time(week)
4
3
2
1000
2006
2007
2008
Time(week)
2009
2010
(f)
995
2005
2006
2007
2008
Time(week)
2009
2010
0
(g) 2005
2006
2007
2008
Time(week)
2009
1
2010(h) 2005
2006
2007
2008
Time(week)
30
Weekly average rainfall
2010
5
35
25
The meteorological factors data for the period
2005.1.3-2009.1.4
20
15
10
5
0
(i) 2005
2009
2006
2007
2008
Time(week)
2009
2010
2009
2010
Method and performance metrics
Dataset
Data gathering
Data Collecting
Step 1: Data collection
Data normalizing
Models testing and
comparing
Prediction
Model
Step 2: Data pre-processing
Step 3: Data mining
Data mining
Models development
Pre-processing
Data calculating
The schematic
method.
flowchart
of
proposed
Method and performance metrics
 n

y  f ( xi )  w0   w j   (vij xi  b j )   b
j 1
 i 1

m
Three layered feed-forward back-propagation artificial neural network
model.
Method and performance metrics
1
n
 yˆ

y
t
t
n t
MAE 
RMSE 
MAPE 
n

n t
1
R
2
y
t

y
yˆ
(

)

y
y
ˆ
t
t
n t
n
t
 100%
t
 n

  (y  yˆ )2 
t
t


t 1
 1
n


2
  (y t )

 t 1

2
1
1
1
n
1
R 
1
2
(y t  yˆt )

t
1
n
2
(y t  y )

t
1
The models with the smallest RMSE, MAE and MAPE and the largest R
and R2 are considered to be the best models.
Development FFBPNN model
The FFBPNN modeling consists of two steps:
--- Train the network using training dataset
--- Model input and output parameters
--- Data pre-processing and post-processing
--- Determination of optimum network and parameters
--- Test the network with testing dataset
Hidden neurons and network
errors
Development FFBPNN model
Parameters
FFBPNN
Number of input layer units
9
Number of hidden layer
1
Number of hidden layer units
4
Number of output layer units
1
Momentum rate
0.9
Learning rate
0.74
Error after learning
1e-6
Learning cycle
1500 epoch
Transfer function in hidden
layer
Tansig
Transfer function in output
layer
Purelin
Training function
TRAINGDM
The optimum model architecture and parameters for the diarrhea
prediction.
Development MLR model
WNID  1972.7903  10.9619Tmax
 20.8158Tmin  2.6208Tavg  1.6506 RH min
 0.2993RH avg  2.0902 APavg  5.7734 SD
 15.7205WS avg  1.6048 R
Dependent variable : diarrhea number
Independent variables : meteorological
factors
Results and discussion
Models
FFBPNN
PECs
Training
MLR
Testing
Training
Testing
MAE
20.7628
27.7547
29.8077
35.3774
RMSE
28.3007
36.0526
39.3739
48.9395
MAPE(%
27.27%
)
38.41%
43.37%
41.82%
R
0.8490
0.8089
0.6968
0.8783
R2
0.9213
0.9125
0.8811
0.8388
The reason of better performances of the FFBPNN model over MLR
model may be attributed to the complex nonlinear relationship between
infectious diseases and meteorological factors.
Results and discussion
350
Actual
FFBPNN
300
The weekly number of infectious diarrhea
The weekly number of infectious diarrhea
350
250
200
150
100
50
0
0
(a)
Actual
MLR
300
250
200
150
100
50
0
-50
20
40
60
80
100
Time(week)
FFBPNN
120
140
160
0
20
40
60
80
100
Time(week)
120
140
MLR
Comparison curves plot of actual vs. predicted trends for training
dataset
160
Results and discussion
300
y=0.83+17
FFBPNN predicted values
250
R2=0.9385
200
150
100
50
0
(b)0
50
100
150
200
Actual values
250
FFBPNN
300
350
MLR
Comparison scatter plot of actual vs. predicted values for training
dataset
Results and discussion
300
300
Actual
MLR
The weekly number of infectious diarrhea
The weekly number of infectious diarrhea
Actual
FFBPNN
250
200
150
100
50
0
(c)
0
10
20
30
Time(week)
40
FFBPNN
50
60
250
200
150
100
50
0
(c) 0
10
20
30
Time(week)
40
50
MLR
Comparison curves plot of actual vs. predicted trends for testing
dataset
60
Results and discussion
250
200
y=0.54x+39
y=0.68x+28
180
R2=0.9125
R2=0.8388
160
MLR predicted values
FFBPNN predicted values
200
150
100
140
120
100
80
60
50
40
20
(d)
0
0
50
100
150
Actual values
200
FFBPNN
250
300
0
(d)0
50
100
150
Actual values
200
250
MLR
Comparison scatter plot of actual vs. predicted values for testing
dataset
300
Sensitivity analyses
Infectious diarrhea
Meteorological factor
ANNs
black-box
Sensitivity analysis (Cosine Amplitude Method)
rij 
m
 x ik x jk /
k 1
m
m
k 1
k 1
2
2
x
x
 ik  jk
Sensitivity analyses
Most effective meteorological factor :
temperature
least effective meteorological factor :
average rainfall
Conclusions
1. The proposed method is more suitable for prediction infectious
diarrhea then statistical methods MLR.
2. The feed-forward back-propagation neural network (FFBPNN)
model with architecture 9-4-1 has the best accurate prediction results
in prediction of the weekly number of infectious diarrhea.
3. most effective meteorological factor on the infectious diarrhea is
weekly average temperature, whereas weekly average rainfall is the
least effective parameter on the infectious diarrhea.
Therefore, this technique can be used to predict infectious diarrhea.
The results can be used as a baseline against which to compare other
prediction techniques in the future.