VarQC formulation (1)

Download Report

Transcript VarQC formulation (1)

Variational Quality Control
Erik Andersson
Room: 302 Extension: 2627
[email protected]
DA/SAT Training Course, March 2006
Plan of Lecture
•
•
•
•
•
•
Introduction
VarQC formalism
Rejection limits and tuning
Minimisation aspects
Examples
Summary
DA/SAT Training Course, March 2006
Introduction (1)
Assuming Gaussian statistics, the
maximum likelihood solution to
the linear estimation problem
results in observation analysis
weights (w) that are independent
of the observed value.
xa  xb  w( y  Hxb )
 b2
w 2
2
o b
Outliers will be given the
same weight as good data,
potentially corrupting the
analysis
DA/SAT Training Course, March 2006
Introduction (2)
Real histogram distributions of departures (y-Hxb) can show
significant deviations from the pure Gaussian form.
Airep temperature observations
They often reveal a more
frequent occurrence of
large departures than
expected
from
the
corresponding Gaussian
(normal)
distribution
with the same mean and
standard deviation showing as wide “Tails”.
Actual distribution
Gaussian
“Tail”
QC-rejection?
Observed departure from the background
DA/SAT Training Course, March 2006
K
The Normal Jo (1)
The normal observation cost function Jo(x) has a quadratic
form, which is consistent with the assumption that the
errors are Gaussian in nature.
1
J o ( x)  ( y  Hx )T R 1 ( y  Hx )
2
If observations errors are
uncorrelated then this
simplifies to:
y: array of observations of length=N
x: represents the model/analysis variables
H: observation operators
R: observation error covariance matrix
σo: observation error standard deviation
2
1  y  Hx 

J o ( x)   
 o i
i 1 2 
N
Normalized departure
DA/SAT Training Course, March 2006
The Normal Jo (2)
The general expression for the observation cost function is
based on the probability density function (the pdf) of the
observation error distribution (see Lorenc 1986):
J o   ln p  const
p is the probability density
function of observation
error
arbitrary constant,
chosen such that
Jo=0 when y=Hx
DA/SAT Training Course, March 2006
The Normal Jo (3)
When for p we insert the normal (Gaussian) distribution (N):
2

1
1  y  Hx  
 
N
exp  
 o 2
 2   o  
we obtain the usual expression
J
N
0
1  y  Hx 

  ln N  const  
2  o 
2
In VarQC a non-Gaussian pdf will be used,
resulting in a non-quadratic expression for Jo.
DA/SAT Training Course, March 2006
VarQC formulation (1)
In an attempt to better describe the tails of the observed
distributions, Ingleby and Lorenc (1993) suggested a
modified pdf, written as a sum of two distinct distributions:
p QC  (1  A) N  ApG
Normal distribution (pdf),
as appropriate for
‘good’ data
pdf for data affected by
gross errors
A is the prior probability of gross error
DA/SAT Training Course, March 2006
VarQC formulation (2)
Thus, a pdf for the data affected by gross errors (pG) needs to
be specified. Several different forms could be considered.
In the ECMWF implementation (Andersson and Järvinen
1999, QJRMS) a flat distribution was chosen.
1
1
p  
D 2d o
G
D is the width of the
distribution
D is here written as a multiple d
of the observation error, either
side of zero.
The reasons for this choice and its consequences
will become clear in the following
DA/SAT Training Course, March 2006
VarQC formulation (3)
Inserting pQC for p in the expression Jo=-ln p + const, we
obtain:
N




exp(

J
QC
o )
J o   ln 



1




QC
N
J o  J o 1 
N 
   exp(  J o ) 
A 2
with γ defined as :  
(1  A)2d
DA/SAT Training Course, March 2006
We can see
how the
presence of γ
modifies the
normal cost
function and
its gradient
Probability of gross error
The term modifying the gradient (on the previous slide) can be
shown to be equal to:

the a-posteriori probability of gross error P,
given x and assuming that Hx is correct (see
Ingleby and Lorenc 1993)
P
Furthermore, we can define a VarQC weight W:
It is the factor by which the gradient’s
magnitude is reduced.
W 1  P
  exp(  J oN )
J oQC  WJ oN
• Data which are found likely to be incorrect (P≈1) are given reduced
weight in the analysis.
• Data which are found likely to be correct (P ≈ 0) are given the weight
they would have had using purely Gaussian observation error pdf.
DA/SAT Training Course, March 2006
Illustrations
flat pQC (left), wide Gaussian pQC (right)
Gradient
Gradient
QC Weight
QC Weight
DA/SAT Training Course, March 2006
Application
In the case of many observations, all with uncorrelated errors,
JoQC is computed as a sum (over the observations i) of
independent cost function contributions:
J oQC   ln  piQC  const  - ln piQC  const   J oiQC
i
i
i
The global set of observational data includes a variety of observed quantities, as
used by the variational scheme through their respective observation
operators. All are quality controlled together, as part of the main 4D-Var
estimation.
The application of VarQC is always in terms of
the observed quantity.
DA/SAT Training Course, March 2006
Minimisation
VarQC
requires a
good
‘preliminary
analysis’.
Otherwise
incorrect
QC-decisions
will occur.
In operations
VarQC is
therefore
switched on
after 40
iterations.
(Now 40)
DA/SAT Training Course, March 2006
Multiple minima
The probability of
gross error
depends on the
size of the
departure. When
the probability for
rejection is close
to 50/50 this will
be reflected in the
cost function as
two distinct
minima of
roughly equal
magnitude.
DA/SAT Training Course, March 2006
Rejection limits
VarQC does not require the specification of threshold values at
which rejections occur - so called rejection limits. Rejections
occur gradually.
If, for example, we classify as
rejected those data that
have P>0.75, then we can
obtain
an
analytical
relationship for the effective
rejection limit as a function
of the two VarQC input
parameters A and d (or γ)
only:
i  2 ln( 3 /  )
The first implementation of VarQC was
thereby tuned to roughly reproduce the
rejections of the old scheme (OIQC).
DA/SAT Training Course, March 2006
Tuning the rejection limit
The histogram on the left has been
transformed (right) such that the
Gaussian part appears as a pair of
straight lines forming a ‘V’ at zero.
The slope of the lines gives the Std
deviation of the Gaussian.
The rejection limit can be chosen to
be where the actual distribution is
some distance away from the ‘V’ around 6 to 7 K in this case, would
be appropriate.
DA/SAT Training Course, March 2006
Tuning example
BgQC too tough
BgQC and VarQC correctly tuned
The shading reflects the value of P,
the probability of gross error
DA/SAT Training Course, March 2006
Example (1)
For given
values of the
VarQC input
parameters
(A and d), the
QC result
(i.e. P), is a
function of the
normalized
departure
(y-Hx)/σo, only.
DA/SAT Training Course, March 2006
Example (2)
VarQC checks all data
and all data types
simultaneously. In this
Australian example the
presence of aircraft
data has led to the
rejection of a PILOT
wind.
DA/SAT Training Course, March 2006
Example (3) - a difficult one
Observations of intense and small-scale features may be
rejected although the measurements are correct.
The problem
occurs when
the resolution
of the analysis
system (as
determined by
the B-matrix)
is insufficient.
DA/SAT Training Course, March 2006
Summary
• VarQC provides a satisfactory and very efficient quality
control mechanism - consistent with 3D/4D-Var.
• The implementation can be very straight forward.
• VarQC does not replace the pre-analysis checks - the checks
against the background for example.
• All observational data from all data types are quality
control simultaneously, as part of the general 3D/4D-Var
minimisation.
• The setting of VarQC parameters needs regular revision.
A good description of background errors is essential
for effective, flow-dependent QC
DA/SAT Training Course, March 2006