Fuzzy verification

Download Report

Transcript Fuzzy verification

Federal Department of Home Affairs FDHA
Federal Office of Meteorology and Climatology MeteoSwiss
Fuzzy Verification toolbox:
definitions and results
Felix Ament
MeteoSwiss, Switzerland
Motivation for new scores
Which rain forecast would you rather use?
Mesoscale model (5 km) 21 Mar 2004
Sydney
RMS=13.0
Fuzzy Verification toolbox
[email protected]
Global model (100 km) 21 Mar 2004
Observed 24h rain
Sydney
RMS=4.6
2
Fine scale verification: Fuzzy Methods
“… do not evaluate a point by point match!”
General Recipe
• (Choose a threshold to define event
and non-event)
• define scales of interest
• consider statistics at these scales
for verification
Scale
forecast
observation
x
x
x
x x
x
x
X
x
X
X X
X
X
 score depends on spatial scale
and intensity
x X
X
X
x
Evaluate box
statistics
x
x
x
Intensity
Fuzzy Verification toolbox
[email protected]
3
A Fuzzy Verification Toolbox
Fuzzy method
Decision model for useful forecast
Upscaling (Zepeda-Arce et al. 2000; Weygandt et al. 2004)
Resembles obs when averaged to coarser scales
Anywhere in window (Damrath 2004), 50% coverage
Predicts event over minimum fraction of region
Fuzzy logic (Damrath 2004), Joint probability (Ebert 2002)
More correct than incorrect
Multi-event contingency table (Atger 2001)
Predicts at least one event close to observed event
Intensity-scale (Casati et al. 2004)
Lower error than random arrangement of obs
Fractions skill score (Roberts and Lean 2005)
Similar frequency of forecast and observed events
Practically perfect hindcast (Brooks et al. 1998)
Resembles forecast based on perfect knowledge of observations
Pragmatic (Theis et al. 2005)
Can distinguish events and non-events
CSRR (Germann and Zawadzki 2004)
High probability of matching observed value
Area-related RMSE (Rezacova et al. 2005)
Similar intensity distribution as observed
Ebert, E.E., 2007: Fuzzy verification of high resolution gridded forecasts: A review and proposed framework. Meteorol. Appls., submitted.
Toolbox available at http://www.bom.gov.au/bmrc/wefor/staff/eee/fuzzy_verification.zip
Fuzzy Verification toolbox
[email protected]
4
Applying fuzzy scores
Fuzzy scores provide a wealth
of information, but
• the results seems to be contrasting
• their interpretation is sometimes difficult
• contain too many numbers
poor
Fuzzy Verification toolbox
[email protected]
good
5
Application versus testbed
Know the scores
!
?
Scores are unknown
Fuzzy Verification toolbox
[email protected]
Forecast error is
unknown
Application
Testbed
?
!
Know the forecast
error
6
A Fuzzy Verification testbed
Virtual truth
(Radar data, model
data, synthetic field)
Perturbation Realizations of
Generator
virtual erroneous
model forecasts
Fuzzy
Verification
Toolbox
Analyzer
Realizations of
verification results
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
0.90
0.90
0.90
0.90
0.90
0.90
1.00
1.00
1.00
0.70
0.70
0.70
0.70
0.70
0.70
1.00
1.00
1.00
0.50
0.50
0.50
0.50
0.50
0.50
0.90
0.90
0.90
0.50
0.50
0.50
0.50
0.50
0.50
0.90
0.90
0.90
0.40
0.40
0.50
0.50
0.50
0.50
0.90
0.90
0.90
0.30
0.40
0.40
0.50
0.50
0.50
0.90
0.90
0.90
Assessment of
• sensitivity (mean)
• [reliability (STD)]
Two ingredients:
1. Reference fields: Hourly radar derived rain fields, August 2005 flood event, 19 time stamps (Frei et al., 2005)
2. Perturbations:  next slide
Fuzzy Verification toolbox
[email protected]
7
Perturbations
Perturbation
Type of forecast error
Algorithm
PERFECT
No error – perfect forecast!
-
XSHIFT
Horizontal translation
Horizontal translation
(10 grid points)
BROWNIAN
No small scale skill
Random exchange of
neighboring points
(Brownian motion)
LS_NOISE
Wrong large scale forcing
Multiplication with a disturbance
factor generated by large scale
2d Gaussian kernels.
SMOOTH
High horizontal diffusion (or
coarse scale model)
Moving window arithmetic
average
DRIZZLE
Overestimation of low
intensity precipitation
Moving Window filter setting
each point below average point
to the mean value
Fuzzy Verification toolbox
[email protected]
8
Perfect
forecast
All scores should equal
!
• But, in fact, 5 out of 12 do not!
Fuzzy Verification toolbox
[email protected]
9
Effect of „Leaking“ Scores
Problem: Some methods assume no skill at scales below window size!
An example: Joint probability method
observation
forecast
pobs=0.5
pforecast=0.5
Forecast
OBS
Assuming random ordering within
window
Fuzzy Verification toolbox
[email protected]
yes
no
yes
0.25
0.25
no
0.25
0.25
Not perfect!
10
spatial scale
Expected response to perturbations
XSHIFT
coarse
BROWNIAN
LS_NOISE
SMOOTH
DRIZZLE
fine
low
high
Sensitivity:
expected (=0.0);
not expected (=1.0)
intensity
Summary in terms of contrast:
Contrast := mean( ) – mean( )
Fuzzy Verification toolbox
[email protected]
11
Summary real
good
BROWNIAN
SMOOTH
LS_NOISE
DRIZZLE
XSHIFT
Contrast
Leaking Scores
0.7
0.6
0.5
0.4
0.3
0.2
0.1
-0.1
Upscaling
Anywhere in
Window
50%
coverage
Fuzzy
Logig
Joint
Prob.
Multi
Fraction
Intensity
event
Skill
Scale
cont. tab.
Score
Pragmatic
Appr.
Practic.
Perf.
Hindcast
CSSR
0.2
• Leaking scores show an overall poor performance
0.1
• “Intensity scale” and “Practically Perfect Hindcast” perform in general well, but …
0
Area
related
RMSE
• Many score have problem to detect large scale noise (LS_NOISE); “Upscaling” and
STD
“50% coverage” are beneficial in this respect
good
Fuzzy Verification toolbox
[email protected]
12
Spatial detection versus filtering
Dx=25km 
Horizontal translation
(XSHIFT) with variable
displacement Dx
• “Intensity scale” method
can detect spatial scale
of perturbation
• All other methods like
the “Fraction Skill score”
just filter small scale
errors
Dx=10km 
Dx=5km 
Fuzzy Verification toolbox
[email protected]
13
Redundancy of scores
Correlation (%) of resulting scores between all score for all thresholds, window
sizes – averaged over all types of perturbation:
 Groups of scores:
• UP, YN, MC, FB, PP
• FZ, JP
• FB, PP, (IS)
Fuzzy Verification toolbox
[email protected]
14
August 2005 flood event
Precipitation sum 18.8.-23.8.2005:
Mean: 73.1mm
Mean: 62.8mm
Fuzzy Verification toolbox
[email protected]
(Hourly radar data calibrated using
rain gauges (Frei et al., 2005))
Mean: 106.2mm
Mean: 43.2mm
15
Fuzzy Verification of August 2005 flood
Based on 3 hourly accumulations during August 2005 flood period (18.8.-23.8.2005)
COSMO-2
Scale
(7km gridpoints)
COSMO-7
Intensity
threshold (mm/3h)
Fuzzy Verification toolbox
[email protected]
good
16
bad
Fuzzy Verification of August 2005 flood
Difference of Fuzzy Scores
Scale
(7km gridpoints)
COSMO-2 better
neutral
COSMO-7 better
Intensity threshold (mm/3h)
Fuzzy Verification toolbox
[email protected]
17
D-PHASE
Demonstration of Probabilistic
Hydrological and Atmospheric Simulation
of flood Events in the Alpine region
RADAR
• Operational phase (June until November
2007) is running
• 33 atmospheric models take part …
Standard verification
(see Poster)
• … and store there output in a common
format in one data archive
Let’s apply the fuzzy toolbox
• Models: COSMO -2, -7, -DE, -EU
• Period: August 2007
• Lead times: most recent forecast starting at
forecast hour +03.
• Observations: Swiss Radar data aggregated
on each model grid
• To be verified: 3h accumulation of precip.
Fuzzy Verification toolbox
[email protected]
18
D-PHASE: August 2007
COSMO-7
COSMO-2
COSMO-EU
COSMO-DE
Intensity Scale score (preliminary), 3h accumulation
Fuzzy Verification toolbox
[email protected]
19
Conclusions
• Fuzzy Verification score are a promising
framework for verification of high resolution
precipitation forecasts.
• The testbed is a useful tool to evaluate the
wealth of scores (not necessarily fuzzy ones):
• Not all scores indicate a perfect forecast by
perfect scores (Leaking scores).
• The “intensity scale” method is able to detect
the specific scale of an spatial error.
• MeteoSwiss goes for: Upscaling, Intensity
scale, Fraction skill score ( and Pracitically
perfect hindcast) methods.
• First long term application for D-PHASE has just
started.
Fuzzy Verification toolbox
[email protected]
20
Summary ideal
good
BROWNIAN
SMOOTH
LS_NOISE
DRIZZLE
XSHIFT
Contrast
Leaking Scores
0.7
0.6
0.5
0.4
0.3
0.2
0.1
-0.1
Upscaling
Anywhere in
Window
50%
coverage
Fuzzy
Logig
Joint
Prob.
Multi
Fraction
Intensity
event
Skill
Scale
cont. tab.
Score
Pragmatic
Appr.
Practic.
Perf.
Hindcast
CSSR
Area
related
RMSE
0.2
0.1
0
STD
good
Fuzzy Verification toolbox
[email protected]
21
D-PHASE: August 2007
Fuzzy Verification toolbox
[email protected]
22
D-PHASE: August 2007
Fuzzy Verification toolbox
[email protected]
23
D-PHASE: August 2007 – cosmoch7
Fuzzy Verification toolbox
[email protected]
24
D-PHASE: August 2007 – Cosmoch2
Fuzzy Verification toolbox
[email protected]
25
D-PHASE: August 2007 - LME
Fuzzy Verification toolbox
[email protected]
26
D-PHASE: August 2007 - LMK
Fuzzy Verification toolbox
[email protected]
27
August 2005 flood event
Precipitation sum 18.8.-23.8.2005:
Mean: 73.1mm
Mean: 43.2mm
7
Mean: 62.8mm
Fuzzy Verification toolbox
[email protected]
Mean: 106.2mm
28
August 2005 flood event
Fuzzy Verification (hourly accumulations):
COSMO-7
Fuzzy Verification toolbox
[email protected]
COSMO-2
29
August 2005 flood event
Fuzzy Verification COSMO-2 – COSMO-7:
• Suprisingly, small differences
• However, COSMO2 seems to be slightly better slightly better at
Fuzzy Verification toolbox
[email protected]
30