UWME - Atmospheric Sciences

Download Report

Transcript UWME - Atmospheric Sciences

JEFS Calibration:
Bayesian Model Averaging
Adrian E. Raftery
J. McLean Sloughter
Tilmann Gneiting
University of Washington
Statistics
Eric P. Grimit
Clifford F. Mass
Jeff Baars
University of Washington
Atmospheric Sciences
Research supported by:
Office of Naval Research
Multi-Disciplinary University Research Initiative (MURI)
JEFS Technical Meeting; Monterey, CA
23 August 2005 11:30 AM
The General Goal
“The general goal in EF [ensemble forecasting] is to
produce a probability density function (PDF) for the future
state of the atmosphere that is reliable…and sharp…”
-- Plan for the Joint Ensemble Forecast System (2nd Draft),
Maj. F. Anthony Eckel
JEFS Technical Meeting; Monterey, CA
23 August 2005 11:30 AM
Calibration and Sharpness
Calibration ~ reliability (also: statistical consistency)
A probability forecast p, ought to verify with relative frequency p.
The verification ought to be indistinguishable from the forecast
ensemble (the verification rank histogram* is uniform).
However, a forecast from climatology is reliable (by definition), so
calibration alone is not enough.
Sharpness ~ resolution (also: discrimination, skill)
The variance, or confidence interval, should be as small as
possible, subject to calibration.
*Verification Rank Histogram
Record of where verification fell (i.e., its rank) among the ordered ensemble members:
Flat
Well-calibrated (truth is indistinguishable from ensemble members)
U-shaped
Under-dispersive (truth falls outside the ensemble range too often)
Humped
Over-dispersive
JEFS Technical Meeting; Monterey, CA
23 August 2005 11:30 AM
0.3
0.3
0.3
Probability
Probability
Probability
Typical Verification Rank Histograms
0.2
0.2
0.2
0.4
0.4
0.4
(a) Z500
EOP*
*UWME
36h
ACMEcore
36h*UWME+
ACMEcore
Probability
Probability
0.1
0.1
0.1
0.3
0.3
0.3
0.0
0.0
0.0
0.2
0.2
0.2
111
222
333
444
55
5
66
6
(b) MSLP
Verification
Rank
Verification
Rank
Verification
Rank
77
7
88
8
99
9
0.1
0.1
0.1
0.4
0.0
0.0
0.0
Synoptic
Variable
2
22
3
33
0.2
( Errors Depend on
Model Uncertainty )
0.0
0.4
4
44
5
55
6
66
7
77
8
88
9
99
36h
ACMEcore+
36h
ACMEcore
36h
ACMEcore+
36h
ACMEcore
36h
*ACMEcore
*UWME
9.0 %
6.7 %
*UWME
36h*UWME+
*ACMEcore
25.6 %
13.3 %
36h ACMEcore
36h*ACMEcore
ACMEcore
36h
*UWME+
36h
ACMEcore+
36h
ACMEcore
36h
ACMEcore+
36h
*ACMEcore+
36h
ACMEcore
36h
*ACMEcore
Verification Rank
Verification Rank
Rank
Verification
0.3
Probability
( Errors Depend on
Analysis Uncertainty )
1
11
5.0 %
4.2 %
(c) WS10
Surface/Mesoscale 0.1
Variable
36h
*ACMEcore+
36h
*ACMEcore
1
2
3
4
5
6
7
8
9
Verification Rank
(d) T2
Probability
0.3
0.2
*UWME
*UWME+
0.1
43.7 %
21.0 %
36h *ACMEcore
36h
*ACMEcore+
36h
*ACMEcore
0.0
1
[c.f. Eckel and Mass 2005, Wea. Forecasting]
2
3
4
5
6
Verification Rank
7
8
9
*Excessive Outlier Percentage
JEFS Technical Meeting; Monterey, CA
23 August 2005 11:30 AM
Objective and Constraints
Objective: Calibrate JEFS (JGE and JME) output.
Utilize available analyses/observations as surrogates for
truth.
Employ a method that
accounts for ensemble member construction and relative skill.
Bred-mode / ETKF initial conditions (JGE; equally skillful members)
Multiple models (JGE and JME; differing skill for sets of members)
Multi-scheme diversity within a single model (JME)
is adaptive.
Can be rapidly relocated to any theatre of interest.
Does not require a long history of forecasts and observations.
accommodates regional/local variations within the domain.
Spatial (grid point) dependence of forecast error statistics.
works for any observed variable at any vertical level.
JEFS Technical Meeting; Monterey, CA
23 August 2005 11:30 AM
First Step: Mean Bias Correction
Calibrate the first moment  the ensemble mean.
In a multi-model and/or multi-scheme physics ensemble,
individual members have unique, often compensatory,
systematic errors (biases).
Systematic errors do not represent forecast uncertainty.
Implemented a member-specific bias correction for
UWME using a 14-day training period (running mean).
Advantages and disadvantages:
Ensemble spread is reduced (in an under-dispersive system).
The ensemble spread-skill relationship is degraded.
(Grimit 2004, Ph.D. dissertation)
Forecast probability skill scores improve.
Excessive outliers are reduced.
Verification rank histograms become quasi-symmetric.
JEFS Technical Meeting; Monterey, CA
23 August 2005 11:30 AM
Second Step: Calibration
Calibrate the higher moments  the ensemble variance.
Forecast error climatology
Add the error variance from a long history of forecasts and
observations to the current (deterministic) forecast.
For the ensemble mean, we shall call this forecast mean error
climatology (MEC).
MEC is time-invariant (a static forecast of uncertainty; a
climatology).
MEC is calibrated for large samples, but not very sharp.
Advantages and disadvantages:
Simple. Difficult to beat!
Gaussian.
Not practical for JGE/JME implementation, since a long history is
required.
A good baseline for comparison of calibration methods.
JEFS Technical Meeting; Monterey, CA
23 August 2005 11:30 AM
Mean Error Climatology (MEC) Performance
Comparison of *UWME 48-h 2-m temperature forecasts:
Member-specific mean bias correction applied to both [14-day running mean]
FIT = Gaussian fit to the raw forecast ensemble
MEC = Gaussian fit to the ensemble-mean + the mean error climatology
[00 UTC Cycle; October 2002 – March 2004; 361 cases]
FIT
MEC
CRPS = continuous ranked probability score
[Probabilistic analog of the mean absolute error (MAE) for scoring deterministic forecasts]
JEFS Technical Meeting; Monterey, CA
23 August 2005 11:30 AM
Bayesian Model Averaging (BMA)
Bayesian Model Averaging (BMA) Summary
BMA has several advantages over MEC:
Member-specific
mean-bias
correction
Member-specific
BMA
parameters
weights
BMA variance
(not-member specific
here, but can be)
A time-varying uncertainty forecast.
A way to keep multi-modality, if it is warranted.
Maximizes information from short (2-4 week) training periods.
Allows for different relative skill between members through the BMA
weights (multi-model, multi-scheme physics).
[c.f. Raftery et al. 2005, Mon. Wea. Rev.]
JEFS Technical Meeting; Monterey, CA
23 August 2005 11:30 AM
BMA Performance Using Analyses
BMA initially implemented using
training data from the entire
UWME 12-km domain (Raftery
et al. 2005, MWR).
MEC
No regional variation of BMA
weights, variance parameters.
Used observations as truth.
After several attempts to
implement BMA with local or
regional training data using
NCEP RUC 20-km analyses as
truth, we found that:
when the training data is
selected from a neighborhood
of grid points with similar landuse type and elevation produced
EXCELLENT results!
Example application to 48-h 2-m
temperature forecasts uses only
14 training days.
BMA
JEFS Technical Meeting; Monterey, CA
23 August 2005 11:30 AM
BMA-Neighbor* Calibration and Sharpness
calibration
FIT
MEC
MEC
sharpness
BMA
BMA
*neighbors have same land use type and
elevation difference < 200 m within a
search radius of 3 grid points (60 km)
Probability integral
transform (PIT)
histograms
 an analog of
verification rank
histograms for
continuous forecasts
JEFS Technical Meeting; Monterey, CA
23 August 2005 11:30 AM
BMA-Neighbor* CRPS Improvement
BMA improvement over MEC
*neighbors have same land use type and
elevation difference < 200 m within a
search radius of 3 grid points (60 km)
JEFS Technical Meeting; Monterey, CA
23 August 2005 11:30 AM
BMA-Neighbor Using Observations
Use observations, remote if necessary, to
train BMA.
Follow the Mass-Wedam procedure for
bias correction, to select the BMA training
data.
1. Choose the N closest observing locations to
the center of the grid box, which have similar
elevation and land-use characteristics.
2. Find the K occasions during a recent period
(up to Kmax days previous), on which the
interpolated forecast state was similar to the
current interpolated forecast state at each
station n = 1, …, N.
a) Similar ensemble mean forecast states.
b) Similar min/median/max ensemble forecast states.
3. If N*K matches are not found, relax the
similarity constraints and repeat (1) and (2).
JEFS Technical Meeting; Monterey, CA
23 August 2005 11:30 AM
Summary and the Way Forward
Mean error climatology
Good benchmark to evaluate competing calibration methods.
Generally beats a raw ensemble, even though it is not statedependent.
The ensemble mean contains most of the information we can use.
The ensemble variance (state-dependent) is generally a poor prediction of
uncertainty, at least on the mesoscale.
Bayesian model averaging (BMA)
A calibration method that is becoming popular. (CMC-MSC)
A calibration method that meets many of the constraints that
FNMOC and AFWA will face with JEFS.
It accounts for differing relative skill of ensemble members (multi-model,
multi-scheme physics).
It is adaptive (short training period).
It can be rapidly relocated to any theatre.
It can be extended to any observed variable at any vertical level
(although, research is ongoing on this point).
JEFS Technical Meeting; Monterey, CA
23 August 2005 11:30 AM
Extending BMA to Non-Gaussian Variables
For quantities such as wind speed and precipitation, distributions are
not only non-Gaussian, but not purely continuous – there are point
masses at zero.
For probabilistic quantitative precipitation forecasts (PQPF):
Model P(Y=0) with a logistic regression.
Model P(Y>0) with a finite Gamma mixture distribution.
Fit Gamma means as a linear regression of the cubed-root of observation on
forecast and an indicator function for no precipitation.
Fit Gamma variance parameters and BMA weights by the EM algorithm, with
some modifications.
[c.f. Sloughter et al. 200x, manuscript in preparation]
JEFS Technical Meeting; Monterey, CA
23 August 2005 11:30 AM
PoP Reliability Diagrams
Results for January 1, 2003 through December 31,
2004 24-hour accumulation PoP forecasts, with 25day training, no regional parameter variations.
Ensemble consensus
voting as crosses.
BMA PQPF model as
red dots.
[c.f. Sloughter et al. 200x, manuscript in preparation]
JEFS Technical Meeting; Monterey, CA
23 August 2005 11:30 AM
PQPF Rank Histograms
Verification Rank
Histogram
PIT Histogram
[c.f. Sloughter et al. 200x, manuscript in preparation]
QUESTIONS and DISCUSSION
JEFS Technical Meeting; Monterey, CA
23 August 2005 11:30 AM
0.5
Forecast Probability Skill Example
0.4
vs. LeadSkill
Time
ForecastSkill
Probability
vs. Lead Time
for
FP
of the10-m
event:
WSspeed
The
event:
wind
> kt
18 kt
10 > 18
0.20
BSS
BSSScore (BSS)
Brier Skill
0.25
0.3
0.2
*UWME
*ACMEcore
UWME
ACMEcore
*UWME+
*ACMEcore
UWME+
ACMEcore
0.1
0.15
0.0
0.10
Uncertainty
* Bias-corrected
-0.1
00
0.05
03
06
09
12
better
0.00
-0.05
00
03
06
09
12
15
18
21
24
27
30
33
36
39
42
45
48
Lead Time (h)
(0000 UTC Cycle; October 2002 – March 2003)
Eckel and Mass 2005
BSS = 1, perfect
BSS < 0, worthless
1
JEFS Technical Meeting; Monterey, CA
23 August 2005 11:30 AM
UWME: Multi-Analysis/Forecast Collection
Resolution (~ @ 45 N )
Objective
Analysis
Abbreviation/Model/Source
Type
GFS, Global Forecast System (GFS),
Spectral T382 / L64
~35 km
1.0 / L14
~80 km
SSI
3D Var
Finite
Diff
0.9 / L28
~70 km
1.25 / L11
~100 km
4D Var
Finite
Diff.
12 km / L45
90 km / L37
SSI
3D Var
Spectral T239 / L29
~60 km
1.0 / L11
~80 km
3D Var
Spectral T213 / L40
~65 km
1.25 / L13
~100 km
4D Var
Spectral T239 / L30
Fleet Numerical Meteorological & Oceanographic Cntr.
~60 km
1.0 / L14
~80 km
3D Var
TCWB, Global Forecast System,
1.0 / L11
~80 km
OI
National Centers for Environmental Prediction
CMCG, Global Environmental Multi-scale (GEM),
Canadian Meteorological Centre
ETA, North American Mesoscale limited–area model,
National Centers for Environmental Prediction
GASP, Global AnalysiS and Prediction model,
Australian Bureau of Meteorology
JMA, Global Spectral Model (GSM),
Japan Meteorological Agency
Computational
NGPS, Navy Operational Global Atmos. Pred. Sys.
Taiwan Central Weather Bureau
UKMO, Unified Model,
United Kingdom Meteorological Office
Spectral T79 / L18
~180 km
Finite
Diff.
Distributed
5/65/9/L30 same / L12
~60 km
4D Var
UWME: MM5 Physics Configuration
(January 2005 - current)
PBL / LSM
Soil
UWME
Cloud
vertical
Microphysics
diffusion
Cumulus
36-km
12-km
Domain
Domain
SST
shlw.
Radiation
Perturbation
cumls.
Land Use
Table
MRF
5-Layer
Y
Reisner II
Kain-Fritsch Kain-Fritsch
N
CCM2
none
default
GFS+
MRF
LSM
Y
Simple Ice
Kain-Fritsch Kain-Fritsch
Y
RRTM
SST_pert01
LANDUSE.plus1
CMCG+
MRF
5-Layer
Y
Reisner II
Grell
Grell
N
cloud
SST_pert02
LANDUSE.plus2
ETA+
Eta
5-Layer
N
Goddard
Betts-Miller Grell
Y
RRTM
SST_pert03
LANDUSE.plus3
GASP+
MRF
LSM
Y
Shultz
Betts-Miller Kain-Fritsch
N
RRTM
SST_pert04
LANDUSE.plus4
JMA+
Eta
LSM
N
Reisner II
Kain-Fritsch Kain-Fritsch
Y
cloud
SST_pert05
LANDUSE.plus5
NGPS+
Blackadar 5-Layer
Y
Shultz
Grell
Grell
N
RRTM
SST_pert06
LANDUSE.plus6
TCWB+
Blackadar 5-Layer
Y
Goddard
Betts-Miller Grell
Y
cloud
SST_pert07
LANDUSE.plus7
UKMO+
Eta
N
Reisner I
Kain-Fritsch Kain-Fritsch
N
cloud
SST_pert08
LANDUSE.plus8
UWME+
LSM
1) Albedo
 Assumed differences between model physics options
approximate model error coming from sub-grid scales
 Perturbed surface boundary parameters according
to their suspected uncertainty
2) Roughness
Length
3) Moisture
Availability
JEFS Technical Meeting; Monterey, CA
23 August 2005 11:30 AM
Member-Wise Forecast Bias Correction
UWME+ 2-m Temperature
4.0
3.5
2.5
2.0
1.5
1.0
0.5
12
12hh
2424
hh
36
36hh
4848
hh
Average RMSE (C)
and
(shaded) Average Bias
3.0
0.0
-0.5
-1.0
-1.5
-2.0
-2.5
GFS+
CMCG+
ETA+
GASP+
JMA+
NGPS+
TCWB+
(0000 UTC Cycle; October 2002 – March 2003)
Eckel and Mass 2005
UKMO+
MEAN+
JEFS Technical Meeting; Monterey, CA
23 August 2005 11:30 AM
Member-Wise Forecast Bias Correction
UWME+ 2-m Temperature
14-day running-mean bias correction
4.0
3.5
2.5
2.0
1.5
1.0
0.5
12 h
24 h
36 h
48 h
Average RMSE (C)
and
Bias
Average
(shaded)
AverageRMSE
RMSE
and Bias
Bias
(C)
Average
and
(mb)
3.0
0.0
-0.5
-1.0
-1.5
-2.0
-2.5
plus01
plus02
*GFS+ *CMCG+
plus01
plus02
plus03
*ETA+
plus03
plus04
*GASP+
plus04
plus05
*JMA+
plus05
plus06
*NGPS+
plus06
plus07
plus08
mean
*TCWB+
*UKMO+
plus07
plus08 *MEAN+
mean
(0000 UTC Cycle; October 2002 – March 2003)
Eckel and Mass 2005
Post-Processing: Probability Densities
Q: How should we infer forecast probability density functions from a finite
ensemble of forecasts?
A: Some options are…
Democratic Voting (DV)
P=x/M
x = # members > or < threshold
M = # total members
Uniform Ranks (UR)***
Assume flat rank histograms
Linear interpolation of the DV
probabilities between adjacent
member forecasts
Extrapolation using a fitted Gumbel
(extreme-value) distribution
Parametric Fitting (FIT)
Fit a statistical distribution (e.g.,
normal) to the member forecasts
***currently operational scheme
Sample ensemble forecasts
JEFS Technical Meeting; Monterey, CA
A Concrete Example
23 August 2005 11:30 AM
JEFS Technical Meeting; Monterey, CA
23 August 2005 11:30 AM
A Concrete Example
Minimize Misses
Minimize False Alarms
JEFS Technical Meeting; Monterey, CA
How to Model Zeroes
logit of proportion of rain versus
cubed root of bin center
23 August 2005 11:30 AM
JEFS Technical Meeting; Monterey, CA
23 August 2005 11:30 AM
How to Model Non-Zeroes
mean (left) and variance (right) of fitted gammas on each bin
JEFS Technical Meeting; Monterey, CA
23 August 2005 11:30 AM
Power-Transformed Obs
Untransformed:
Square root:
Cube root:
Fourth root:
JEFS Technical Meeting; Monterey, CA
23 August 2005 11:30 AM
A Possible Fix
Try a more complicated
model, fitting a point mass
at zero, an exponential for
“drizzle,” and a gamma for
true rain around each
member forecast
Red: no rain, Green: drizzle, Blue: rain