Transcript air masses

WEATHER TYPE CLASSIFICATIONS
IN HUMAN HEALTH STUDIES
Applications in South Korea
Jan Kyselý
Institute of Atmospheric Physics, Prague
Czech Republic
with support and inputs from
Radan Huth
Institute of Atmospheric Physics, Prague
Jiyoung Kim
Korea Meteorological Administration, Seoul
WEATHER TYPE CLASSIFICATIONS
IN HUMAN HEALTH (?) STUDIES
MORTALITY
Population: 47 million (2005), Seoul 10.4 million
Area: 99 000 km2
Air temperature, heat index and excess mortality in
summer 2004, South Korea
Air temperature, heat index and excess mortality in
summer 1994, South Korea
+3400 excess deaths; excess mortality in all age groups
absence of efficient heat-watch-warning system (HWWS)
Documented large natural disasters affecting Korean
Peninsula since 1901
(Kysely and Kim 2009, Climate Research 38:105-116)
Year
Death
toll
Event
Affected region over which
death toll is given
1994
3384
Heat waves
1936
1104
Typhoon
South and North Korea
2006
844
Flooding
North Korea
1959
768
Typhoon Sarah
1972
672
Seoul, Kyonggi flood
South Korea
2007
610
Flooding
North Korea
1969
408
Gyeongsangbukdo, Gyeongsangnamdo,
Gangwon flood and landslides
South Korea
1987
345
Chungchongnamdo, Chollanamdo,
Kangwon flood and landslides
South Korea
1998
324
Massive rain, floods and landslides
South Korea
2002
246
Typhoon Rusa
South Korea
South Korea
South Korea and Japan
~3400 excess deaths represent net excess mortality as no mortality
displacement effect appeared after the heat waves
OUTLINE
1. Introduction to the methodology; the 1994 heat
wave in Korea
2. Data & classification procedure
3. Results
3.1 Identification of oppressive air masses
3.2 Dependence on settings of the classification procedure
3.3 Selected classifications C6 and C15
3.4 Regression models for excess mortality within the oppressive AMs
4. Concluding remarks
5. The last slide
The 1994 heat wave
summer 1994
‘usual summer’ (2002)
The 1994 heat wave
Excess mortality in individual age groups and genders during the 1994
heat waves
The 1994 heat wave
(in a ‘climate change’ perspective)
if a gradual warming of 0.04C/year is assumed over the period 2001-2060, the
recurrence interval of a very long spell of days with temperature exceeding a high
threshold (as in the 1994 heat wave) is estimated to decrease to around 40 (10)
years in the 2021-2030 (2041-2050) decade
“Air mass” classifications & mortality
Heat-watch-warning systems (HWWS):
apply methods to determine whether a day will be associated with
elevated mortality risks according to weather forecast & take action when
oppressive day is predicted
often make use of objective classifications of weather types (‘air masses’,
AMs) – take into account the entire weather situation rather than single
elements
 identify ‘oppressive’ AMs associated with elevated mortality in a given
location/area
& apply regression models within the oppressive AMs in order to account
for (and predict) excess mortality
“Air mass” classifications & mortality
The idea behind:
human physiology responds to the whole ‘umbrella of air’ and not single
weather elements
(although there is little doubt that air temperature and humidity are the two
most important parameters determining the thermal comfort)
“Air mass” classifications & mortality
Two basic types of classifications:
• ‘Temporal Synoptic Index’
(TSI; Kalkstein, 1991; Kalkstein et al., 1996; McGregor, 1999)
• ‘Spatial Synoptic Classification’
(SSC; Kalkstein and Greene, 1997; Sheridan, 2002)
in both methods, one AM is representative for a given day & location/region
under study
the classifications are based on a relatively standard set (although differing
among studies) of input variables: air temperature (T), humidity variable
(e.g. T-Td), total cloud amount (TCA), wind components, and atmospheric
pressure
• TSI: location-specific AMs produced
• SSC: more universal, allows for a comparison between places
“Air mass” classifications & mortality
Methodology: TSI & SSC differ in the statistical approach to the
identification of AMs:
TSI – PCA and cluster analysis used to define the AMs
SSC – days are assigned to one of several predetermined types (seed
days)
(straightforward interpretation of the SSC over larger areas compensate for its
drawback that the representative days must be identified manually, involving a
large degree of subjectivity)
Focus on TSI in this presentation (SSC “for comparison”):
(i) no need to make regional- or continental-scale comparisons of the AMs and
their links to human health
(ii) avoid the subjectivity involved in the initial step of the SSC, i.e. the
predetermination of the AMs
“Air mass” classifications & mortality
both TSI and SSC utilized to a comparable extent in previous studies
applications:
• impacts of weather conditions on mortality
• impacts on hospital admissions
• mostly summer, but winter also examined
• total mortality of all causes usually found the most useful and reliable
characteristic of human health effects
• the geographic range: North America (US, Canada), Europe (UK, France,
Italy, Greece, Czech Republic), Asia (China, Korea, Japan), Australia
• the development of HWWS (based on TSI – Kalkstein et al. 1996, or SSC –
Sheridan and Kalkstein 2004)
 the AM classifications have become a sort of ‘standard tools’ in
biometeorological studies
“Air mass” classifications & mortality
TSI: individual studies differ in many specific ‘settings’ of the classification
procedure, including
•
•
•
the set of input variables,
the clustering algorithm,
the number of clusters (AMs) formed
the role of these settings and choices on results usually not discussed
! often no reasoning or justification of the settings !
important details of the classification which are needed for its ‘reproduction’
are missing (only 3 out of 11 studies on TSI specify the number of PCs
retained; some studies do not present the clustering algorithm and/or even the
number of AMs formed)
Kyselý et al., International Journal of Climatology 2009
Kyselý et al., International Journal of Climatology 2009
Main questions
To what extent do results (i.e. the AMs formed and their links to human
mortality) depend on the settings of the classification procedure?
•
•
•
•
the selection of input meteorological variables
the way the input variables are treated (averaged/pooled station data)
the number of PCs retained for the cluster analysis
the number of clusters (AMs) formed
Which classification is most useful for a possible application in HWWS?
1. Introduction
2. Data & classification procedure
3. Results
3.1 Identification of oppressive air masses
3.2 Dependence on settings of the classification procedure
3.3 Selected classifications C6 and C15
3.4 Regression models for excess mortality within the oppressive AMs
4. Concluding remarks
5. The last slide
Data & classification procedure
Daily mortality data:
1991-2005
total (all-cause) mortality
Excess daily mortality: deviations of the observed number of deaths from
expected (baseline) number of deaths
Expected (baseline) number of deaths takes into account long-term changes
in mortality, the seasonal and the weekly cycles
excess mortality examined in the whole population (all ages) and the elderly
(persons aged 70+ years)
several confounding factors controlled for
e.g. days with very large accidents (aviation and maritime disasters, a store collapse,
…) and death tolls due to severe natural disasters (typhoons and floods), resulting in
more than 100 accidental or disaster-related deaths each
Long-term changes in mortality ( standardization)
BUT also seasonal and weekly cycles…
Data & classification procedure
Input meteorological data:
•
•
•
air temperature (T), dew-point deficit (T-Td), zonal wind, meridional
wind and total cloud amount (TCA)
10 stations representative for the area of South Korea, 4 times a day
(3, 9, 15 and 21 LT)
mid-May to mid-September
AM classifications differ in the way the
station data are taken into account: the input
variables originate from
 the average series for the area,
 the pooled series at the 10 stations
considered together
Data & classification procedure
AM classification methodology:
STEP 1
 unrotated PCA to form a set of new orthogonal variables (PCs)
 the number of PCs retained conforms to the criteria recommended in
literature (a large gap in explained variance; e.g. Richman 1986)
 more than one solution for the number of PCs possible in most cases,
so time series of different numbers of PCs enter the cluster analysis
STEP 2
 cluster analysis: non-hierarchical k-means method (more useful for the
identification of oppressive AMs than hierarchical average linkage
clustering)
 the number of clusters: 6 (‘small’), 10 (‘moderate’) and 15 (‘large’)
(a range around values appearing in literature is spanned;
no objective method to determine the number of clusters was used: the data are not clearly
structured and form rather a continuum than a set of well-defined separate states)
Data & classification procedure
STEP 3
 oppressive AMs: those associated with mean excess mortality
significantly different from 0 (t-test) & the mean increase at least 3%
relative to the baseline mortality (~ about 20 excess deaths in Korea)
Data & classification procedure
Regression models for predicting excess mortality within the oppressive AM:
STEP 4
 to evaluate the impact of within-AM variations in meteorological
elements, a stepwise multiple regression analysis performed on all days
classified with the oppressive AM
 dependent variables: relative daily excess mortality in the whole
population and the elderly (70+ years)
 independent variables: weather elements (T, Td, and heat index measured
4 times a day; daily averages of T, Td, heat index, TCA, and wind speed) & nonmeteorological factors (day in sequence; time of season – within-season
acclimation to heat; year – long-term changes in vulnerability to heat stress; the
numbers of days with the oppressive AM since the beginning of summer and in
previous summer – shorter-term and longer-term acclimation to oppressive
weather conditions)

meteorological variables lagged by 1 and 2 days (t-1, t-2) and changes
over 24h periods (d/dt; cf. McGregor, 1999) also considered as possible
predictors
1. Introduction
2. Data & classification procedure
3. Results
3.1 Identification of oppressive air masses
3.2 Dependence on settings of the classification procedure
3.3 Selected classifications C6 and C15
3.4 Regression models for excess mortality within the oppressive AMs
4. Concluding remarks
5. The last slide
IDENTIFICATION OF OPPRESSIVE AMs
•
the whole set of input variables (T, T-Td, W and TCA): the solution as
to the number of PCs unique and the same for both averaged and
pooled input data (4 PCs retained)
•
the reduced sets of input variables (either TCA or both TCA and W
omitted): the number of PCs was ambiguous  two solutions
considered for both averaged and pooled data
T, T-Td, TCA, W: 4 PCs
T, T-Td, W:
3 / 4 PCs
T, T-Td:
2 / 3 PCs (avg), 2 / 5 PCs (pooled)
•
•
the additional PCs describe mainly diurnal variations and local effects
more PCs means more variance of the input variables explained, at
the expense of possibly including too many details that may be
irrelevant for the AM definition
•
the relationship between AMs and mortality is better expressed when
fewer PCs are retained
IDENTIFICATION OF OPPRESSIVE AMs
Boxplots of relative excess mortality in individual AMs
•
most classifications identify an oppressive AM with enhanced mortality
•
the mean relative excess mortality 3-7% for the whole population, up to
9% for the elderly (70+ years)
•
the mean mortality increases in the age group 70+ years always larger
than in the whole population
IDENTIFICATION OF OPPRESSIVE AMs
Boxplots of relative excess mortality in individual AMs
•
mean excess mortality is negative or close to zero in the other nonoppressive AMs  skill of the classification procedure in identifying
weather conditions associated with mortality impacts
•
around a quarter of days classified with the oppressive AM is associated
with marked excess mortality of 10% and more above the baseline (more
than 60 excess deaths a day!)
IDENTIFICATION OF OPPRESSIVE AMs
Boxplots of relative excess mortality in individual AMs
•
BUT not all days with the oppressive AM have excess mortality
 further analysis into which meteorological and non-meteorological
factors may account for the excess deaths is needed
•
no ‘deficit mortality’ counterpart to the oppressive AM in any classification,
i.e. an AM associated with pronounced average deficit mortality
IDENTIFICATION OF OPPRESSIVE AMs
Input variables: T, T-Td; pooled data; 2 PCs retained; classification with 6 AMs
•
weather conditions: the oppressive AM the warmest one, associated with
large humidity, weak southern flow and below-average TCA (but not the
smallest one among AMs)
•
if there are >1 oppressive AMs they differ rather in ‘additional’ weather
characteristics (mainly zonal and meridional wind) than T and Td
DEPENDENCE ON SETTINGS OF THE CLASSIFICATION
Basic characteristics of the oppressive AM (the relative frequency; the mean
relative mortality increase on days with the oppressive AM; and the
coverage of days with pronounced excess mortality) differ in individual
classifications
Two important criteria that the oppressive AM should meet:
1.
separated from the rest of the sample in terms of mean excess mortality
2.
covers large percentage of days with pronounced excess mortality
(the most important criterion for the application into predicting elevated mortality
risks – pronounced excess mortality in summer is usually heat-related)
DEPENDENCE ON SETTINGS OF THE CLASSIFICATION
Coverage of days with large excess
mortality
 the less frequent the oppressive
AM, the larger the mean excess
mortality in the oppressive AM (the
AM is better separated from the rest
of the sample); however, this is at the
expense of the coverage of days with
elevated mortality
the oppressive AM of the classifications with
6 clusters (if any) is therefore associated
with smaller mean excess mortality, but a
much higher percentage of days with large
excess mortality compared to the
classifications with 10 and 15 clusters
Mean excess mortality (70+ yrs)
DEPENDENCE ON SETTINGS OF THE CLASSIFICATION
Summary of the findings:

additional variables (W, TCA) do not generally improve the results 
the effects of wind and cloudiness are of secondary importance

differences between classifications based on averaged and pooled data
relatively little consistent, depend on the particular set of input variables

the dependence of results on the number of PCs retained relatively
large, particularly for pooled input data; fewer PCs give better results
for all 3 classifications with T, T-Td and W based on pooled data: the number of
PCs governs not only the mean relative excess mortality in the oppressive AMs
but even the number of the oppressive AMs (0, 1, or 2)!!!
DEPENDENCE ON SETTINGS OF THE CLASSIFICATION
Classifications based on (T, T-Td) with
pooled input data and 2 retained
PCs form the most interesting set:
C15 – oppressive AM with very
large mean relative excess
mortality (6.7% in the whole
population, 8.9% in the 70+
years)
C6 – the oppressive AM has the
largest coverage of days with
large excess mortality
CLASSIFICATIONS C6 & C15
C6
C15
Boxplots of relative excess mortality in individual AMs
CLASSIFICATIONS C6 & C15
large interannual
variations in the
occurrence of the
oppressive AM:
0 days in 1993 (for both
C6 and C15)
>30 (10) days in some
summers for C6 (C15)
maximum: 52 (25) days
in summer 1994
CLASSIFICATIONS C6 & C15
within-season variability:
maximum in late July or
early August
no occurrences in May
and June
CLASSIFICATIONS C6 & C15
another specific characteristic = persistence:
• the average duration of a spell = 4.7 (2.7) days in C6 (C15), much longer
than for any other AM
• record-breaking durations of the oppressive AMs: 29 (15) days in C6 (C15)
CLASSIFICATIONS C6 & C15
average mortality impacts in the oppressive AM depend on the day in sequence:
mean excess mortality increases with the duration of a spell on the first few days
 then a slight decline appears  followed by a sharp increase in the mortality
response, mainly in the elderly, during late stages of prolonged occurrences of
the oppressive weather (days 15-23 in C6, 9-15 in C15)
CLASSIFICATIONS C6 & C15
heat stress effects tend to cumulate over the first few days with the oppressive
weather,
a certain degree of short-term acclimation to heat develops after a week or so
BUT this acclimation (which may be physiological as well as behavioural) does
not play a role anymore if the oppressive weather persists for a very long time
CLASSIFICATIONS C6 & C15
the oppressive AM covers most days with pronounced excess mortality (daily
excess mortality exceeds 200 deaths in the peak of the 1994 heat waves!)
BUT some days with relatively large excess mortality in 1994, after the peak
of the heat wave, are not classified with the oppressive AM in C15
 a consequence of the trade-off between the coverage of days with
pronounced excess mortality (better in C6) and mean mortality increase on
days classified with the AM (better/larger in C15)
CLASSIFICATIONS C6 & C15
CLASSIFICATIONS C6 & C15
the oppressive AM covers a large portion of days with pronounced heatrelated mortality in C15, and nearly all in C6
BUT not all days classified with the oppressive AMs associated with excess
mortality: the links are complex and mortality is affected not only by
meteorological elements but also other factors (timing within a season, timing
within a spell of oppressive days, longer-term changes in the public
perception of heat, etc.)
REGRESSION MODELS FOR EXCESS MORTALITY
STEP 1: linear regression models developed using the step-wise screening
and the whole available period of data (1991-2005); BIC used to control for
overfitting
REGRESSION MODELS FOR EXCESS MORTALITY
C6:
excess mortality positively associated with day-time temperature (T15)
and day-to-day change in night-time dew-point temperature (dTd3)
non-meteorological factors also important: mortality impacts decrease
with the number of days with the oppressive AM both in previous summer
and since the beginning of summer in a given year
for the whole population, mortality effects are found to decrease over
time, too
REGRESSION MODELS FOR EXCESS MORTALITY
C15:
regression models somewhat more complex, with different predictors
selected for the whole population and the elderly
two non-meteorological factors are important: excess mortality in the
oppressive AM increases with the day in sequence, and decreases with the
time of season
REGRESSION MODELS FOR EXCESS MORTALITY
larger percentage of explained variance in C15 than C6 (much smaller
sample size, 98 vs. 343; possible overfitting in C15, i.e. the models may be
too complex for given amount of data)
REGRESSION MODELS FOR EXCESS MORTALITY
as expected, the model for the oppressive AM of the C15 classification
produces a better fit BUT many days with excess mortality are not classified
with the oppressive AM
the model for the C6 classification performs reasonably well except for the
8-day mortality peak in 1994, magnitude of which is substantially
underestimated
REGRESSION MODELS FOR EXCESS MORTALITY
REGRESSION MODELS FOR EXCESS MORTALITY
20 (50) days with the largest observed excess mortality
(June-August 1991-2005):
19 (39) are associated with the oppressive AM of the C6 classification
modelled relative excess mortality exceeds 5% on 16 (31) of them
(note that large excess mortality may also be related to other factors than oppressive
weather conditions, so a ‘perfect fit’ is not expected)
REGRESSION MODELS FOR EXCESS MORTALITY
for possible applications in HWWS, the models may be evaluated in terms
of a skill score based on the number of ‘successes’ (excess mortality
observed and modelled, Y/Y), ‘false alerts’ (excess mortality modelled but
not observed, N/Y) and ‘missing hits’ (excess mortality observed but not
modelled, Y/N)
we use the threat score (‘critical success index’)
TS = n(Y/Y) / [n(Y/Y) + n(Y/N) + n(N/Y)],
i.e. the number of correct forecasts of large excess mortality divided by the
total number of cases when large excess mortality is observed and/or
modelled
REGRESSION MODELS FOR EXCESS MORTALITY
C6
OBS/MOD
Y/Y [%]
Y/N [%]
N/Y [%]
N/N [%]
TS
Bias
>=3%
10.3
20.5
6.2
62.9
0.28
0.54
>=5%
6.1
14.0
3.8
76.1
0.26
0.49
>=10%
1.6
5.7
0.2
92.5
0.21
0.25
>=5/3%
8.2
11.9
8.4
71.5
0.29
0.82
>=10/5%
4.0
3.3
5.9
86.8
0.30
1.35
models’ performance over July-August 1991-2005 evaluated using different
thresholds of excess mortality at which alerts would be issued
the performance is better when the bias is partly compensated for (the
last two rows)
in C6, the days with modelled excess mortality exceeding by at least 5%
the expected number of deaths cover 55% of days with observed excess
mortality more than 10% above the baseline
BUT TS is not large (0.30), particularly as the number of ‘false alerts’ exceeds that of
‘successes’
REGRESSION MODELS FOR EXCESS MORTALITY
C6
C15
OBS/MOD
Y/Y [%]
Y/N [%]
N/Y [%]
N/N [%]
TS
Bias
>=3%
10.3
20.5
6.2
62.9
0.28
0.54
>=5%
6.1
14.0
3.8
76.1
0.26
0.49
>=10%
1.6
5.7
0.2
92.5
0.21
0.25
>=5/3%
8.2
11.9
8.4
71.5
0.29
0.82
>=10/5%
4.0
3.3
5.9
86.8
0.30
1.35
OBS/MOD
Y/Y [%]
Y/N [%]
N/Y [%]
N/N [%]
TS
Bias
>=3%
5.2
25.7
1.8
67.3
0.16
0.23
>=5%
3.5
16.6
1.8
78.1
0.16
0.27
>=10%
1.9
5.4
0.4
92.3
0.25
0.32
>=5/3%
4.3
15.8
2.7
77.2
0.19
0.35
>=10/5%
2.7
4.6
2.7
90.0
0.27
0.74
important finding: the skill of the model for C15 is smaller than for C6: ‘false
alerts’ are reduced at the expense of enhanced number of ‘missing hits’
REGRESSION MODELS FOR EXCESS MORTALITY
STEP 2: models developed using only 12 years of data (1991-2002) & their
performance tested on independent 3-year sample (2003-2005)
altogether 30 days in July-August 2003-2005 with relative excess mortality
exceeding by at least 5% the expected number of deaths:
the oppressive AM of the C6 (C15) classification covers 17 (10) of them,
and on all of them in both C6 and C15, positive excess mortality was
predicted
C6: 12 out of 15 days with the largest excess mortality are classified with
the oppressive AM  the classification is a useful tool for finding most
stressful weather conditions
REGRESSION MODELS FOR EXCESS MORTALITY
the reproduction of day-to-day
variations in mortality not very
successful in either
classification
C6 outperforms C15 – the
oppressive AM of the C6
classification covers more days
with large excess mortality
for the application in a HWWS,
the prediction of when excess
mortality may be expected is
more important than the
prediction of the magnitude of
the excess mortality itself
TS of the prediction of days with large excess mortality is also higher in C6 than C15
(for any threshold of what is considered ‘large’ excess mortality)
for the threshold of relative excess mortality at least 3% above the baseline, the
predicted days cover almost half of the observed days with large excess mortality, and
the rate of ‘successes’ against ‘false alerts’ is around 3  TS=0.40
Seoul
TSI (6, 10, 15 AMs)
vs. SSC (bottom)
TSI superior with respect
to the coverage of days
with large excess
mortality
OAMs in TSI15 ~ 55%
DT & MT+ in SSC ~ 30%
CONCLUDING REMARKS 1/4
• results strongly depend on the settings of the classification procedure
• general rules concerning the most appropriate methodology for the
identification of oppressive AMs are difficult to be formulated
• the method has to be adjusted for specific goals and location
CONCLUDING REMARKS 2/4
• for South Korea, the classifications based on only two input variables, T
and T-Td, are superior in identifying conditions associated with large excess
mortality
• results support the idea that air temperature and humidity are most
important for characterizing the effects of daily weather on human health,
while other weather elements may be relatively unimportant
CONCLUDING REMARKS 3/4
• the classification with 6 AMs more useful for a possible application in a
HWWS, particularly as it better covers and ‘predicts’ days with large excess
mortality
• both meteorological and non-meteorological parameters are found to be
predictors in regression models for excess mortality within the oppressive
AMs of the selected classifications
• the models show better skills in predictions of when large excess mortality
occurs
CONCLUDING REMARKS 4/4
further analysis may benefit from
• inclusion of air pollution factors (e.g. total suspended particulates, ozone)
into models (BUT data in Korea available since 2001 only)
• more general models than the linear regression
• the use of PCs instead of raw meteorological parameters in order to
overcome the issue of colinearity and get more stable regression equations
THE LAST SLIDE
general usefulness of the air-mass-based synoptic approach in the analysis
of relationships between weather and human health, including the
application in HWWS
BUT
•
there is a large number of more or less subjective decisions that have
to be made during the process of forming the AMs,
and, most importantly,
•
the outcome of the classification procedure, characteristics of the
oppressive AM, and statistical models that link meteorological and nonmeteorological parameters to excess mortality substantially depend on
these decisions

MUCH ATTENTION NEEDED
TSI in detail:
Kyselý J., Huth R., Kim J., 2009: Evaluating heat-related mortality in Korea by
objective classifications of ‘air masses’. International Journal of Climatology, doi
10.1002/joc.1994.
comparison of TSI and SSC (for Seoul):
Kyselý J., Huth R., 2010: Relationships between summer air masses and
mortality in Seoul: Comparison of weather-type classifications. Physics and
Chemistry of the Earth, doi 10.1016/j.pce.2009.11.001.