A Data Mining Approach to Study the Air Pollution Induced by Urban
Download
Report
Transcript A Data Mining Approach to Study the Air Pollution Induced by Urban
A data mining approach to elucidate the
relationships between air pollution and respiratory
diseases in big cities
Professor Fabio Teodoro de Souza
Postgraduate Program in Urban Management (PPGTU)
Pontifical Catholic University of Paraná (PUCPR)
Valencia, October 24th 2016
Landslide occurred in February
(1967), Laranjeiras neighborhood Rio de Janeiro.
50 km
*crucial 72-hour (rescue operations)
Epicenter Earthquake
or aftershock (x, y)
Mass Movement
(x, y, z)
Classification Rules
(CCR = 84.79%)
Wildfire Wildfir
not
e
active active
Wildfire 76.09% 2.17%
not
active
Wildfire 13.04% 8.70%
active
Introduction
* The difficulty in evaluating the air pollution effect in megacities from
developing countries because of the poor environmental monitoring
systems, the lack of cohesive air quality policies and the shortage of
disease surveillance data (Shah et al., 2013)
* Rapid and unplanned growth of cities imply in an increase of air pollution;
* Sources (industries, motor vehicles, mining activities, among others);
* The integration of urban management and public health is essential to find
innovative solutions for many problems that our society face in dense
urban environments.
Curitiba, Araucária and Colombo
*Data from three different cities in the
Metropolitan Region of Curitiba (MRC):
Curitiba, Araucaria and Colombo.
*The selection of these cities is due to
the resilient inter-institutional relations
of the acting government agencies in
these cities.
METHODOLOGY
This research emphasizes on extracting knowledge of urban phenomena
management in the context of air pollution and respiratory diseases.
Data Acquisition
The Environmental Institute of Paraná (IAP) monitors seven air quality
parameters and make it available on their website the daily measurements
of:
* total suspended particulates (TSP);
* smoke;
* inhalable particles (IP or PM10);
* sulfur dioxide (SO2);
* carbon monoxide (CO);
* ozone (O3);
* Nitrogen dioxide (NO2).
Air Quality Data Measurements in May
2014 of the Ouvidor Pardinho Station.
AIR QUALITY INDEX (AQI) & IMPACTS
ON HEALTH.
Air Quality Index
Levels of Health
Concern
Good
Moderate
Unhealthy forSensitive
Groups
Numerica
l Value
Meaning
0 to50
Air qualityis considered satisfactory, and air pollution poses
little or norisk
51 to100
Air quality is acceptable; however, for some pollutants there
may be a moderate health concern for a very small number of
people who are unusuallysensitive to air pollution.
101 to150
Members of sensitive groups may experience health effects.
The general public is not likely to be affected.
Unhealthy
151 to200
Everyone may begin to experience health effects; members of
sensitive groups may experience more serious health effects.
VeryUnhealthy
201 to300
Health warnings of emergency conditions. The entire
populationis more likely to be affected.
Hazardous
301 to500
Health alert:everyone may experience more serious health
effects
Data Preparation and Modeling
Data Preparation
* Inconsistencies treatment (missing values replacement; outliers, false 0 etc).
* Insertion of the data into a GIS
Data Modeling
* multivariate analysis:
consists mainly of identifying qualitative patterns among the involved
variables. The clear identification of the relationship helps in the construction
of quantitative models.
* quantitative models:
An association rule is a simple model that easily explains the cause (IF) and
effect (THEN) relationship. These rules may have a predictive character or are
classification rules.
Data Preparation and Modeling
An example of classification rule to predict landslides during rainfall is
described below:
IF Cumulative rain in the last 6 hours > 43.7 mm
(A)
THEN
LANDSLIDE OCCURRENCE
(B)
(90.6%
117
106 )
Within 6 h of cumulative rain (h_6) measuring above 43.7 mm (presented
by the total amount of 117 registers - A), 106 out of 117 registers
(confidence of 90.6%) would predict a landslide (B).
It can be observed that the rule is easy to understand and usable, i.e., it
could be used by the government in alert emissions during rainfall events.
Another advantage of rules association and rules classification is
insertion of the expert knowledge into the models.
the
Curitiba, Araucária and Colombo
Rule 1:
M_N_TNSPR_TOT_T0_>_30 = Y
TempMin_Pinhais_<_13oC = Y
-> Class =
M_S_CMPLG_DR_T1_>_100
(10.924% 100.000% 13 13 10.924%)
AIR QUALITY STATIONS IN MRC
Real Time Air Quality
Permanence in Curitiba
Permanence in Araucária
10
0
90
80
70
60
c(%
an
rm
e
P
)
50
A_ASSIS_QUALITY_GOOD
A_ASSIS_QUALITY_REGULAR
40
30
20
10
0
Months
Permanence in Colombo
100
Colombo_QUALITY_GOOD
90
Colombo_QUALITY_REGULAR
80
Colombo_QUALITY_INADEQUATE
70
60
50
c(%
an
rm
e
P
)
40
30
20
10
0
Months
Principal Component Analysis for Curitiba
Curitiba_CIC
1,0
PM 10
CO
0,8
NO2
0,6
IQA
0,4
SO2
Fator 2
0,2
0,0
-0,2
O3
UM ID
T EM P
-0,4
-1,0
-0,8
-0,6
-0,4
-0,2
0,0
Fator 1
0,2
0,4
0,6
0,8
1,0
Principal Component Analysis for
Araucária
Araucária - A S S I S
1,0
0 ,8
S O 2 NO2
0 ,6
IQ A
0 ,4
O3
0 ,2
0 ,0
Fator
2
-0 ,2
UM ID
T EM P
-0,4
-0,6
-0,8
-1,0
-1 ,0
-0 ,8
-0 ,6
-0 ,4
-0 ,2
0,0
Fator 1
0 ,2
0 ,4
0 ,6
0 ,8
1 ,0
Dendrogram for Curitiba
Curitiba_CIC
800
700
600
500
400
Linkage
Distance
300
200
100
UM ID
IQA
O3
CO
SO2
PM 10
NO2
T EM P
Dendrogram for Araucária
Araucária - ASSIS
1200
1000
800
600
Linkage Distance
400
200
0
UMID
IQA
O3
SO2
NO2
TEMP
PARTIAL CONCLUSIONS
*The phenomena may be explained for different urban features on those cities
or region.
* The results at the end of this scientific research (started in November, 2014)
may be used by the local government for actions and interventions to:
* minimize the risks of air pollution and
* improving air quality and people's well-being in urban centers.
“Relationship Between Respiratory Diseases
and Mining of Limestone in Curitiba and its
Surroundings” – Valquiria Jablinski
Mining in general is the cause of many lung diseases, but Silicosis is the
most common and serious in Brazil. It is caused by constant exposure
of silica dust, which often comes from the extraction of limestone.
Other diseases such as emphysema and bronchitis are commonly seen
between mining workers and residents near zones of limestone
extraction.
All limestone extracted from Parana comes from Metropolitan Region
of Curitiba (MRC). Rio Branco do Sul is responsible for more than 65%
of all mined limestone in the state and Campo Largo and Almirante
Tamandare in second position, extracting approximately 13% each.
The map shows the
four cities with the
highest percentages of
hospital admissions
and deaths caused by
respiratory
malfunctions.
According to wind
statistics in Curitiba,
the direction of the
wind throughout the
year is west, which
may explain the
reason why Itaperuçu
had the highest
percentages in the
studied area. Other
factors, such as
temperature and
humidity, may also
influence the dust
concentration in the
cities.
Results
T+1
T+2
Other perspectives
1)Motorization, climatology and respiratory diseases;
2)Vegetal
carbon production in BOCAIÚVA DO SUL and
respiratory diseases;
3)LUNG DISEASES AND EXPOSURE TO NITROGEN
DIOXIDE PARTICLE: AN ANALYSIS ON A UBS IN ARAUCÁRIA
CITY;
4) LUNG DISEASES AND EXPOSURE TO OZONE
ANALYSIS ON A UBS IN ARAUCÁRIA CITY
PARTICLE
Proposal Research
Urban Variables and Aedes Aegypti
* Dengue Fever,
* Zika vírus;
* Chikungunya.
[email protected]
Thank you!
ACKNOWLEDGMENT
Special thanks to the CNPq by the financial support of the actual research
and the institutes which provided the data: Environmental Institute of
Paraná (IAP); State Health Secretary (SESA); Institute of Urban Planning
of Curitiba (IPPUC); Mary´s Protection Center of Children and Teenagers
(CEDIN).