Transcript Slide 1

PM Model Performance Workshop, February 10-11, 2004, Research Triangle Park, NC
Model Evaluation: Looking for
Spatial and Temporal Patterns
By
John S. Irwin1
Edith L. Gego2, P. Steven Porter3, Christian Hogrefe4 and
S. Trivikrama Rao1
1 NOAA Atmospheric Sciences Modeling Division, On Assignment to the U.S. Environmental
Protection Agency, Research Triangle Park, NC 27711, U.S.A.
2 Corporation for Atmospheric Research, Idaho Falls, ID 84401
3 Department of Civil Engineering, University of Idaho, Idaho Falls, ID 83401, U.S.A.
4 Atmospheric Sciences Research Center, University at Albany, Albany, NY 12222. U.S.A.
a)
b)
Temperature
c)
Ozone
Figure 1. Panel a, location of the analysis domain and of the temperature and ozone
measurement sites. Panel b, histogram of the distance between temperature measurement
sites and their nearest neighbors. Panel c, histogram of the distance between ozone
monitoring sites and their nearest neighbors. Modal separation distance for temperature
measurements is 20 km. Model separation distance for ozone monitors is 10 km.
a)
Temper.
(o C.)
Figure 2. July 15, 1995 5PM.
Comparison of 4 km and 12
km temperature predictions.
Panel a, 4 km model
predictions. Panel b, 12 km
model predictions Panel c,
histogram of the differences
between the 4 km and 12 km
predictions.
Notice the increased “texture”
in the 4-km results (top panel)
as compared to the 12-km
results. Can we show this to
be an increase in skill?
b)
RAMS4a and UAM-V were
executed to simulate three
episodes when the 1-hour
ozone standard (0.12 ppb)
was exceeded in the
northeastern U.S. during the
summer of 1995: June 1920, July 14-15, and August 12, 1995.
Fraction
c)
Difference between
the 12 km and the 4 km predictions ( 0C)
The mean bias, mean
gross error and correlation
coefficient are all slightly
improved for daytime
hours (1PM to 5PM) for
sites located near water
and in mountainous areas.
Little to no improvement
could be seen in the
temperature results for
nighttime hours (1AM to 5
PM).
Figure 3. Location with the highest local variability (top 5%) in the 4 km predictions,
where variability is expressed by the standard deviation of the nine 4 predictions within
each 12 km grid cell.
a)
Figure 4. July 15, 1995 5PM. Comparison
of 4 km and 12 km ozone predictions.
Panel a, 4 km model predictions. Panel b,
12 km model predictions. Panel c,
histogram of the differences between the
4 km and 12 km predictions.
Ozone
(ppb)
b)
Fraction
c)
Difference between
the 12 km and the 4 km predictions (ppb)
Notice the increase in “texture” (streaks)
in the 4-km ozone results (top panel) as
compared to the 12-km results. Can we
show this to be an increase in skill?
The mean bias, mean gross
error and correlation
coefficient for the high local
variability sites suggest that,
in the case of ozone, the 4
km grid did not lead to better
results than the 12 km one,
neither for nighttime nor
daytime simulations.
It even appears that refining
the grid size from 12 to 4 km
led to deterioration of the
quality of ozone estimates.
Figure 5. Locations of highest local variability (top 5%) in the 4 km ozone predictions,
where variability is expressed by the standard deviation of the nine 4 km predictions within
each 12 km grid cell.
Temperature – locating improvement and deterioration
Figure 6. Comparison of
mean gross errors (panel
a) and correlation
coefficients (panel b)
calculated for both 12 km
and 4 km simulations at all
temperature monitors.
c)
d)
The 10% of stations
having the largest
improvement of model
performance for the 4 km
predictions as measured
by a particular metric are
marked with upwardpointing arrows, while the
10% of stations having the
largest deterioration of
model performance for the
4 km simulations are
marked with downwardfacing arrows.
Panels c and d depict the
location of the stations
that were marked by
arrows in panels a and b,
respectively.
Improvements look to be
for grids near water.
Ozone – locating improvement and deterioration
Figure 7. Comparison of
mean gross errors (panel
a) and correlation
coefficients (panel b)
calculated for both 12 km
and 4 km simulations at all
ozone monitors.
The 10% of stations
having the largest
improvement of model
performance for the 4 km
predictions as measured
by a particular metric are
marked with upwardpointing arrows, while the
10% of stations having the
largest deterioration of
model performance for the
4 km simulations are
marked with downwardfacing arrows.
Panels c and d depict the
location of the stations
that were marked by
arrows in panels A and B,
respectively.
It is difficult to see any
obvious clustering or
patterns.
Sulfate
ug/m3
Sulfate
ug/m3
CASTNet
A
IMPROVE
A
B
B
C
C
1999
2000
2001
1999
2000
2001
Figure 8. Decomposition of sulfate concentrations recorded from January1999 to
December 2001 at GSMP into low- and high-frequency signals (panel A: raw data,
panel B: low-frequency signal, panel C: high-frequency signal) IMPROVE = 24-hr
(http://vista.cira.colostate.edu/improve/Publications/OtherDoc/IMPROVEDataGuide/)
Nitrate
ug/m3
Nitrate
ug/m3
CASTNet
IMPROVE
A
B
C
1999
2000
2001
1999
2000
2001
Figure 9. Decomposition of nitrate concentrations recorded from January 1999 to
December 2001 at GSMP into low- and high-frequency signals (panel A: raw data,
panel B: low-frequency signal, panel C: high-frequency signal) CASTNet = Weekly
(http://www.epa.gov/CASTNet)
A
Sulfate
B
Concentrati
on
(ug/m3)
Nitrate
Ammonium
Castnet
Improve
STN
Figure 10. Panel A - Scatter plots of the block average concentrations calculated for STN and
IMPROVE vs the block average concentrations calculated for CASTNet (window: 5 weeks), Panel
B - time series of the long-term signals (5-week moving average) for CASTNet: BEL116, Beltsville,
Maryland; IMPROVE: WASN1, District of Columbia, and STN: 110010043, District of Columbia.
Sulfate
CASTNet
Mid-Atlantic States
Southeastern States
New England States
CASTNet
STN
IMPROVE
Kentucky area
Western Great
Lakes States
IMPROVE
STN
Figure 11. PCA
results for sulfate
at sites located
east of -100º
longitude (eastern
U.S.), from July
1st 2001 to July
31st 2002, for
those sites with
less than 20 %
missing values.
Similar groups are
formed by the
three networks,
although some
differences are
evident in the
time series within
the groups.
Ammonium
CASTNet
Mid-Atlantic States
IMPROVE
Kentucky area
Southeastern States
STN
Figure 12. PCA results for ammonium at
sites located east of -100º longitude
(eastern U.S.), from July 1st 2001 to July
31st 2002, for those sites with less than
20 % missing values.
Groupings by the three networks is not
quite as clean as for sulfate.
New England States
CASTNet
Mid-West States
IMPROVE
Western Great
Lakes States
STN
Some differences in the time series are
seen in almost all the groups.
Sulfate
CASTNet
Nitrate
CASTNet
IMPROVE
IMPROVE
STN
STN
Figure 13. Comparison of structure of groups for sulfate and nitrate. The local emission contributions to nitrate are more
forceful for some sites, and this causes some of the groups to “intermingle”.
Figure 14. PCA results for sulfate for IMPROVE sites
for 1996.
Figure 15. 5-week running averages for each group comparing
observations with modeling results. CMAQ 2002 release;
REMSAD version 7.06.
Pacific Coast
New England
Idaho Wyoming
*
Observations
*
Kentucky Virginia
CMAQ
*
South West
*
Central Florida *
REMSAD
Washington D.C. WASH1
-77.03330 East longitude
Figure 16. Comparison of the long-term
seasonal time series can be accomplished
using 5-week running averages, at each site or
for a group of sites (where you have averaged
the results together for a group).
38.88330 North latitude
Sulfate Concentration
CMAQ NMSE = 0.12 STD = 0.0206
REMSAD NMSE = 0.07 STD = 0.0133 t-Value = 1.608
1x10
1
Observed
CMAQ
REMSAD
1x10
0
60
120
180
240
Day Number
300
360
This can be done 35 times, as we can compute
the 35-day averages (5-weeks) sequentially
with a start date of day 1, then day 2, etc. For
each start date we generate a new pair of
NMSE values.
Bryce Canyon BRCA1
-112.16670 East longitude
37.60000 North latitude
Sulfate Concentration
CMAQ NMSE = 0.49 STD = 0.0301
REMSAD NMSE = 0.50 STD = 0.0126 t-Value = -0.530
1x10
0
Observed
CMAQ
REMSAD
1x10
-1
60
120
180
240
Day Number
300
Here the NMSE has been computed for each
model, comparing the modeled result with the
corresponding observation. When you compare
the NMSE value for two models, the question
being posed is, which model is closer to the
observations on average (i.e., which model has
the smallest NMSE)?
360
To test whether the NMSE values are really
different, take the difference between the NMSE
values for each model for each start date. Then
compute the mean and standard deviation of
the 35 differences. Use a student-t test to see if
the average difference is statistically speaking
significantly different from zero. For 35 values,
the computed t-test (Mean/SD) must be greater
than 1.96 to have 95% confidence that the
mean of the 35 differences is different from
zero. ASTM Standard Guide D 6589.
Figure 17. Synoptic pattern typing.
A climatology of synoptic circulation
patterns, computed using 1996 and
2001 1200Z sea-level pressures.
We label each day as being in one
of these patterns, and then
characterize model performance
(MM5, CMAQ), looking for
variations in skill within and
between patterns.
McKendry, I.G., Steyn, D.G., and
McBean, G., 1995: Validation of
synoptic circulation patterns
simulated by the Canadian Climate
Center General Circulation Model
for Western North America.
Atmosphere-Ocean 33(4), 809-825.
Yarnal, B., 1993: Synoptic
climatology in environmental
analysis: a primer, Bellhaven
Press, London, UK, 195 pp.
On the left, freely available Census Bureau data. Represents where people live,
i.e., residences, i.e., night-time population distribution. On the right, Los Alamos
National Laboratory - derived day-time population distribution.
Washington DC
The presentation is broken into three parts.
1) Available monitors are too far apart for spatial interpolation for assessment of whether fine-scale
photochemical models are adding valid information (Is the increased "texture" seen in such simulations
an increase in model skill?), so we attempted to assess performance by other means (see Figures 2-7,
Tables 1 and 2) - we concluded that for temperature, going from 12km to 4km may have improved the
temperature estimates ever so slightly along the coastlines, but we could not confirm an increase in skill
for the ozone predictions in going from 12km to 4km.
2) We compared measurements from the three aerosol networks for sulfate, nitrate and ammonium.
PCA analysis and comparison of long-term temporal time series suggest that the three networks have
similar spatial patterns and similar (long-term) temporal patterns (within the subregions) for sulfate and
ammonium; nitrate measurements appear somewhat different and will require further work to fully
understand.
3) We used our understanding of PCA on sulfate to see if we could devise a way to assess model
performance. It appears that when we look and the long-term temporal patterns predicted and
observed in the PCA subregions, we can detect where the model is performing well and where it is
failing (e.g., failure of sulfate predictions over most of western states, and especially along west coast).
We plan to explore how we can adapt this evaluation method for ammonium and nitrate. We plan to
extend this work to assess model performance within the subregions based on the synoptic situation
We provide an example of how differences in model performance in simulating the 5-week average
sulfate concentrations can be objectively determined.
The last slide in the presentation is there (if I have time) to alert people to start thinking of population as
a function of time of day, which will possibly affect emission inventories and most definitely will affect
exposure assessments. This slide is from a presentation I heard up in DC as part of the homeland
defense meetings I am going to. One of the other "comments" mentioned at this meeting was that it
was estimated that "urban" as a land-use description was likely underestimated in the US by about
50%.....food for thought....