(probabilistic) forecasts

Download Report

Transcript (probabilistic) forecasts

Verifying and interpreting
ensemble products
Goals of an EPS
• Predict the observed distribution of events and
atmospheric states
• Predict uncertainty in the day’s prediction
• Predict the extreme events that are possible on
a particular day
• Provide a range of possible scenarios for a
particular forecast
Ensemble design decisions depend on the norm
How ensembles can be used
• Scenarios
– Forecasters are used to looking at multiple solutions
– Has not been a goal of the ensemble community per
se
– Bandwidth can be an issue
• Sensitivities (lagged covariance)
– Hardly explored at this point
• Probabilistic: more typical
– Interpret the model as a sample from forecast pdf of
the atmosphere
– Calibration is desirable
Challenges in probabilistic
mesoscale prediction
• Model formulation
• Bias (marginal and conditional)
• Lack of variability caused by truncation and approximation
• Non-universality of closure and forcing
• Initial conditions
• Small-scales are damped in analysis systems, and the model
must develop them
• Perturbation methods designed for medium-range systems may
not be appropriate
• Lateral boundary conditions
• After short time periods the lateral boundary conditions can
dominate
• Representing uncertainty in lateral boundary conditions is critical
• Lower boundary conditions
• Dominate boundary-layer response
• Difficult to estimate uncertainty in lower boundary conditions
Motivation for Generating Ensemble Forecasts
1) Greater accuracy of ensemble mean forecast (half the
error variance of single forecast)
2) Likelihood of extremes
3) Non-Gaussian forecast PDF’s
4) Ensemble spread as a representation of forecast
uncertainty
 All rely on forecasts being calibrated
This section:
1) calibration issues and ongoing ATEC work
2) Ensemble verification in general, focusing on ATEC
ensemble (probabilistic) scores
NCAR/RAL - National Security Applications Program
5
Gridded Forecasts
(2D or 3D)
Point Observations
(2D or 3D)
Calibration typically involves making adjustments of forecasts where
observations are already available
Slide from Tressa Fowler
copyright 2009, UCAR, all rights
reserved.
Matching Points to Grids
• Observation points are unlikely to fall exactly on forecast
grid points.
• Match in horizontal space via choice of methods:
• Closest
• Interpolate
• Function of surrounding points,
e.g.
• Min of closest 4
• Median of closest 25
• Match in vertical by interpolating
between level above and below.
Slide from Tressa Fowler
copyright 2009, UCAR, all rights
reserved.
X
Time
• If your forecasts and observations are not
at the same time, you may need to define a
time window for your observations.
Forecast Time

Obs

Slide from Tressa Fowler
Obs

Observation Window
copyright 2009, UCAR, all rights
copyright 2009, UCAR, all rights reserved.
reserved.
obs
“bias”
Forecast
PDF
“spread” or “dispersion”
calibration
Probability
Probability
What do we mean by “calibration” or
“post-processing”?
Temperature [K]
obs
Forecast
PDF
Temperature [K]
Post-processing has corrected:
• the “on average” bias
• as well as under-representation of the 2nd moment of the empirical
forecast PDF (i.e. corrected its “dispersion” or “spread”)
NCAR/RAL - National Security Applications Program
9
Specific Benefits of Post-Processing
Improvements in:
statistical accuracy, bias, and, reliability
Correcting basic forecast statistics (increasing
user “trust”)
discrimination and sharpness
Increasing “information content”; in many cases,
gains equivalent to years of NWP model
development!
Relatively inexpensive!
NCAR/RAL - National Security Applications Program
10
(cont) Benefits of Post-Processing
Essential for tailoring to local application:
NWP provides spatially- and temporally-averaged
gridded forecast output
=> Applying gridded forecasts to point locations requires
location specific calibration to account for spatial- and
temporal- variability ( => increasing ensemble dispersion)
NCAR/RAL - National Security Applications Program
11
What are we doing …
Working with 30 member HPC operational ensemble forecasts at DPG:

Developed post-processing procedure for temperature and dew
point (applicable to other weather variables)

Introduce Quantile Regression
- powerful under-utilized approach in atmospheric applications

Other more-standard approaches (i.e. Logistic Regression)
employed under Quantile Regression framework
NCAR/RAL - National Security Applications Program
12
Example of Quantile Regression (QR)
Our application
Fitting T quantiles
using QR
conditioned on:
1) Reforecast ens
2) ensemble mean
3) ensemble median
4) ensemble stdev
5) Persistence
6) Log Reg quantile
NCAR/RAL - National Security Applications Program
13
Probability
Calibration Procedure
obs
Forecast
PDF
1)

Fit Logistic Regression (LR) ensembles
– Calibrate CDF over prescribed set of
climatological quantiles
– For each forecast: resample 15 member
ensemble set
For each quantile:
Temperature [K]
2)
Perform a “climatological” fit to the data
3)
Starting with full regressor set, iteratively select best
subset using “step-wise cross-validation”
T [K]
–
–
Fitting done using QR
Selection done by:
a) Minimizing QR cost function
b) Satisfying the binomial distribution
( 2nd pass: segregate forecasts into differing ranges of
ensemble dispersion, and refit models )
observed
Forecasts
Time
Regressors for each quantile: 1) reforecast ensemble
2) ens mean 3) ens median 4) ens stdev 5)
persistence 6) logistic regression quantile
NCAR/RAL - National Security Applications Program
14
6hr Temperature Time-series
Before Calibration
ATEC-4DWX Forecaster Training 19 May 2009
After Calibration
National Security Applications Program
Research Applications Laboratory
© 2009 UCAR. All rights reserved.
36hr Temperature Time-series
Before Calibration
After Calibration
National Security Applications Program
Research Applications Laboratory
ATEC-4DWX Forecaster Training 19 May 2009
© 2009 UCAR. All rights reserved.
Raw versus Calibrated PDF’s
Blue is “raw” ensemble
Black is calibrated ensemble
Red is the observed value
Notice: significant change in
both “bias” and dispersion of
final PDF
obs
(also notice PDF asymmetries)
NCAR/RAL - National Security Applications Program
17
Verifying ensemble (probabilistic)
forecasts
Overview:
1)Rank histogram
2)Mean square error (MSE)
3)Brier score
4)Rank Probability Score (RPS)
5)Reliability diagram
6)Relative Operating Characteristic (ROC) curve
7)Skill score
NCAR/RAL - National Security Applications Program
18
36hr Temperature Time-series
Before Calibration
After Calibration
National Security Applications Program
Research Applications Laboratory
ATEC-4DWX Forecaster Training 19 May 2009
© 2009 UCAR. All rights reserved.
0
0
Counts
10
20
Counts
10
20
30
30
Troubled Rank Histograms
1 2 3 4 5 6 7 8 9 10
Ensemble #
Slide from Matt Pocernic
1 2 3 4 5 6 7 8 9 10
Ensemble #
6hr Temperature Rank Histograms
National Security Applications Program
Research Applications Laboratory
ATEC-4DWX Forecaster Training 19 May 2009
© 2009 UCAR. All rights reserved.
36hr Temperature Rank Histograms
National Security Applications Program
Research Applications Laboratory
ATEC-4DWX Forecaster Training 19 May 2009
© 2009 UCAR. All rights reserved.
Types of Forecasts
Types of Observations
• Continuous
• Continuous
– Wind speed
– Wind speed
– Temperature
– Temperature
• Categorical (includes
• Categorical (includes
Binary)
binary)
– Rain / No Rain
– Rain / No Rain
– Hurricane Category 1 - 5
– Hurricane Category 1 - 5
• Ensembles => probabilistic
copyright 2009, UCAR, all rights
reserved.
Data type
Analyses
Continuous forecasts
Continuous observations
Root Mean Squared Error (RMSE), Mean
Absolute Error (MAE), Bias
Continuous and/or probabilistic
forecasts,
Categorical observations
Receiver Operating Characteristic (ROC)
curve
Categorical forecasts,
Categorical observations
Contingency table statistics and skill
scores
Probabilistic forecasts,
Categorical observations
Brier score, reliability diagram
Probabilistic forecasts,
Multi-categorical or continuous
observations
Rank probability score (RPS)
Slide from Tressa Fowler
copyright 2009, UCAR, all rights
reserved.
Continuous scores: MSE
1 n
2
MSE    yi  xi 
n i 1
Attribute:
measures
accuracy
Average of the squares of the errors: it measures the magnitude of
the error, weighted on the squares of the errors
it does not indicate the direction of the error
Quadratic rule, therefore large weight on large errors:
 good if you wish to penalize large error
 sensitive to large values (e.g. precipitation) and outliers; sensitive to
large variance (high resolution models); encourage conservative forecasts
(e.g. climatology)
=> For ensemble forecast, use ensemble mean
Slide from Barbara Casati
Rank Probability Score
for multi-categorical or continuous variables

1 n
RPS 
CDFfc,i  CDFobs,i

n  1 i 1

2
NCAR/RAL - National Security Applications Program
26
Scatter-plot and
Contingency Table
Brier Score
Does the forecast detect correctly
temperatures above 18 degrees ?
n
1
2
BS   yi  oi 
n i 1
y = forecasted event occurence
o = observed occurrence (0 or 1)
i = sample # of total n samples
=> Note similarity to MSE
Slide from Barbara Casati
Conditional Distributions
Conditional histogram and
conditional box-plot
Slide from Barbara Casati
Reliability (or Attribute) Diagram
Slide from Matt Pocernic
Scatter-plot and Contingency Table
Does the forecast detect correctly
temperatures above 18 degrees ?
Slide from Barbara Casati
Does the forecast detect correctly
temperatures below 10 degrees ?
Discrimination Plot
Outcome
= No
Decision
Threshold
Outcome
= Yes
Hits
False
Alarms
Slide from Matt Pocernic
Receiver Operating Characteristic (ROC) Curve
Slide from Matt Pocernic
Skill Scores
SS 
A forc  Aref
Aperf  Aref
• Single value to summarize
performance.
• Reference forecast - best naive guess;
persistence, climatology
• A perfect forecast implies that the
object can be perfectly observed
• Positively oriented – Positive is good
36hr Temperature Time-series
CRPS Skill Score
RMSE Skill Score
Reference Forecasts:
Black -- raw ensemble
Blue -- persistence
National Security Applications Program
Research Applications Laboratory
ATEC-4DWX Forecaster Training 19 May 2009
© 2009 UCAR. All rights reserved.
RMSE of Models
36hr Lead-time
6hr Lead-time
National Security Applications Program
Research Applications Laboratory
ATEC-4DWX Forecaster Training 19 May 2009
© 2009 UCAR. All rights reserved.
Significant Calibration Regressors
36hr Lead-time
6hr Lead-time
National Security Applications Program
Research Applications Laboratory
ATEC-4DWX Forecaster Training 19 May 2009
© 2009 UCAR. All rights reserved.
References:
Jolliffe and Stephenson (2003): Forecast Verification: a practitioner’s
guide, Wiley & Sons, 240 pp.
Wilks (2005): Statistical Methods in Atmospheric Science, Academic
press, 467 pp.
Stanski, Burrows, Wilson (1989) Survey of Common Verification
Methods in Meteorology
http://www.eumetcal.org.uk/eumetcal/verification/www/english/courses/
msgcrs/index.htm
http://www.bom.gov.au/bmrc/wefor/staff/eee/verif/verif_web_page.html
NCAR/RAL - National Security Applications Program
37