Aucun titre de diapositive
Download
Report
Transcript Aucun titre de diapositive
Workshop Uncertainty Analysis in Geophysical Inverse Problems
Lorentz Center, Leiden, The Netherlands
A Few Concluding Comments …
Olivier Talagrand
Laboratoire de Météorologie Dynamique, École Normale Supérieure, Paris, France
With Acknowledgments to M. Jardak, A. Fichtner, J. Trampert, P. J. van Leeuwen
and all participants in the Workshop
11 November 2011
Vocabulary (e. g. ‘model’) !
- Summary and comments
- Ensemble estimation (resampling
Monte-Carlo or otherwise)
- A few problems to be solved
methods,
Bayesian vs. frequentist point of view (L. Tenorio, M.
Holschneider, J. Trampert) ?
Bayesian estimation
The problem, if there is any, is with the prior.
But is there always a prior ?
First time geophysicists attempted to determine a ‘model’ of the Earth’s interior from
observations of the propagation of seismic waves, did they have a prior ?
Early 2005. Huygens probe descended into the atmosphere of Titan (satellite of Saturn).
No prior was used.
And, if there a prior, where does it come from, if not from earlier data ? L. Tenorio,
who works on cosmological problems, mentioned he had a prior for the Universe,
based on quantum fluctuations.
Bayesian estimation (continued)
Prior can be implicit
State vector x, belonging to state space S (dimS = n), to be estimated.
Data vector z, belonging to data space D (dimD = m), available.
z = F(x, )
(1)
where is a random element representing the uncertainty on the data (or, more precisely, on the
link between the data and the unknown state vector).
For example
z = x +
Bayesian estimation (continued)
Probability that x = for given ?
x = z = F(, )
P(x = | z) = P[z = F(, )] / ’ P[z = F(’, )]
Unambiguously defined iff, for any , there is at most one x such that (1) is
verified.
data contain information, either directly or indirectly, on any component of x.
Determinacy condition.
No need for explicit knowledge for a prior probability distribution for x, but need
for for explicit knowledge for probability distribution for .
Presentation by H. Igel (starting point m0 not associated with probability
distribution)
Example
z = x +
N [, S]
Then conditional probability distribution is
P(x z) = N [xa, Pa]
where
xa = ( T S-1)-1 T S-1 [z ]
Pa = ( T S-1)-1
Determinacy condition : rank= n. Requires m ≥ n.
Expressions
xa = ( T S-1)-1 T S-1 [z ]
Pa = ( T S-1)-1
(2a)
(2b)
can also be obtained, from the knowledge of only and S, as defining the
variance minimizing linear estimator of x from z, or Best Linear Unbiased
Estimate (BLUE) of x from z (but significance of Pa is now totally different)
Expressions (2) are commonly used, and are useful, in many (especially, but
non only, geophysical) applications where error is distinctly nongaussian and operator is mildly nonlinear. In fact, going beyond (2)
turns out to be very challenging in many practical problems.
This Workshop
In addition to mathematical and theoretical papers, large variety of different
physical applications and methods.
Beyond differences in vocabulary and notations, methods used are fundamentally
the same. Linear and gaussian inversion (ray-tracing inversion, …) ,
heuristically extended to (mildly) nonlinear and non-gaussian situations, and
sampling, or ‘ensemble’ methods, intended at explicitly sampling the
uncertainty.
Seismic Tomography
Major development. Full Waveform Inversion. Is physics simpler for seismologists
than for fluid dynamicists ?
Major problem of underdeterminacy. Null space matters, hence Backus-Gilbert
theory (D. Al-Attar)
Sampling algorithms: Gibbs, Metropolis-Hastings, Neighbourhood Algorithm (M.
Sambridge).
Neural network (NN, J. Trampert). Very efficient for numerical inversion of
nonlinear functions. But link with bayesian approach and uncertainty
quantification ?
Assimilation
The word assimilation refers to the situation when the observed system evolves in
time and the dynamical laws which govern the evolution of the system are
among the information to be used.
This does not change anything to the conceptual nature of the problem, but it means
that in the description
z = F(x, )
the dynamical laws are present in a way or another.
Because they exist for producing daily forecasts, meteorologists started assimilation
very early (A. Lorenc)
Ratio
global computer
costs:
ratio of of
supercomputer
costs:
/ 1 day
11 day's
day’s assimilation
DA (total incl.
FC) forecast
/ 1 day’s forecast.
100
Computer power increased by 1M in 30 years.
Only 0.04% of the Moore’s Law increase over
this time went into improved DA algorithms,
rather than improved resolution!
10
31
20
4D-Var
with
outer_loop
simple
4D-Var
on SX8
8
3D-Var on
T3E
5
AC scheme
1
1985
1990
© Crown copyright Met Office Andrew Lorenc 14
1995
2000
2005
2010
Three classes of algorithms have been presented, with both specific applications and
theoretical discussions
- Variational assimilation, which is a standard tool in meteorology, and has
extended in recent years to applications in solid earth geophysics.
- Kalman filter, which is equivalent to variational assimilation in the linear case,
and has been presented in its nonlinear extension, the Ensemble Kalman filter
(EnKF)
- Particle filters, which are totally bayesian in their principle (P. J. van
Leeuwen).
Assimilation, as it has done over the years, is continuously extending to more and
more novel physical applications : geomagnetism (A. Fournier, A. Jackson),
observations of the Earh’s rotation (J. Saynisch), atmospheric chemistry (H.
Eskes), identification of CO2 surface fluxes (V. Gaur).
Theory : R. Potthast, P. J. Van Leeuwen
Very active field, which we can expect to further develop in the coming years, both
through the development of new physical applications, and of theoretical
methods.
Purely bayesian methods, particle filters ?
Quantification of uncertainty : identification and quantification of model (I mean
‘theory’) errors, and ensemble methods.
- Ensemble estimation (resampling methods, Monte-Carlo
or otherwise)
ECMWF, Technical Report 499, 2006
Data
z = x +
Then conditional posterior probability distribution
P(x z) = N [xa, Pa]
with
xa = ( T S-1)-1 T S-1 [z ]
Pa = ( T S-1)-1
Ready recipe for producing sample of independent realizations of posterior probability
distribution (resampling method)
Perturb data vector additively according to error probability distribution N [0, S], and
compute analysis xa for each perturbed data vector
Available data consist of
- Background estimate at time 0
x0b = x0 + 0b
0b N [, P0b]
- Observations at times k = 0, …, K
yk = Hkxk + k
k N [, Rk]
- Model (supposed to be exact)
xk+1 = Mkxk
k = 0, …, K-1
Errors assumed to be unbiased and uncorrelated in time, Hk and Mk linear
Then optimal state (mean of bayesian gaussian pdf) at initial time 0 minimizes objective function
0 S
J(0) = (1/2) (x0b - 0)T [P0b]-1 (x0b - 0) + (1/2) k[yk - Hkk]T Rk-1 [yk - Hkk]
subject to k+1 = Mkk ,
k = 0, …, K-1
Work done here (with M. Jardak). Apply ‘ensemble’ recipe described above to
nonlinear and non-gaussian cases, and look at what happens.
Everything synthetic, Two one-dimensional toy models : Lorenz ’96 model and
Kuramoto-Sivashinsky equation. Perfect model assumption
(Nonlinear) Lorenz’96 model
There is no (and there cannot be) a general objective test of bayesianity. We use here
as a substitute the much weaker property of reliability.
Reliability. Statistical consistency between predicted probability of occurrence
and observed frequency of occurrence (it rains 40% of the time in circumstances
when I predict 40%-probability for rain).
Observed frequency of occurrence p‘(p) of event, given that it has been
predicted to occur with probability p, must be equal to p.
For any p, p‘(p) = p
More generally, frequency distribution F‘(F) of reality, given that probability
distribution F has been predicted for the state of the system, must be equal to F.
Reliability can be objectively assessed, provided a large enough sample of
realizations of the estimation process is available.
Reliability diagramme, NCEP, event T850 > Tc - 4C, 2-day range,
Northern Atlantic Ocean, December 1998 - February 1999
Rank histograms, T850, Northern Atlantic, winter 1998-99
Top panels: ECMWF, bottom panels: NMC (from Candille, Doctoral Dissertation, 2003)
In both linear and nonlinear cases, size of ensembles Ne =30
Number of realizations of the process M =3000 and 7000 for Lorenz and
Kuramoto-Sivashinsky respectively.
Non-gaussianity of the error, if model is kept linear, has no
significant impact on scores.
Brier Score (Brier, 1950), relative to binary event E
B E[(p - po)2]
where p is predicted probability of occurrence, po = 1 or 0 depending on whether E has been
observed to occur or not, and E denotes average over all realizations of the prediction system.
Decomposes into
B = E[(p-p’)2] - E[(p’-pc)2]
+
pc(1-pc)
where pc E(po) = E(p’) is observed frequency of occurrence of E.
First term E[(p-p’)2] measures reliability.
Second term E[(p’-pc)2] measures dispersion of a posteriori calibrated probabilities p’. The
larger that dispersion, the more discriminating, or resolving, and the more useful, the prediction
system. That term measures the resolution of the system.
Third term, called uncertainty term, depends only on event E, not on performance of prediction
system.
Reliability diagramme, NCEP, event T850 > Tc - 4C, 2-day range,
Northern Atlantic Ocean, December 1998 - February 1999
Brier Skill Score
BSS 1 - B/pc(1-pc)
(positively oriented)
and components
Brel E[(p-p’)2]/pc(1-pc)
Bres 1 - E[(p’-pc)2] /pc(1-pc)
(negatively oriented)
Very similar results obtained with the Kuramoto-Sivashinsky equation.
Preliminary conclusions
• In the linear case, ensemble variational assimilation
produces, in agreement with theory, reliable estimates of
the state of the system.
• Nonlinearity significantly degrades reliability (and
therefore bayesianity) of variational assimilation
ensembles. Resolution (i. e., capability of ensembles to
reliably estimate a broad range of different probabilities of
occurrence, is also degraded. Similar results (not shown)
have been obtained with Particle Filter, which produces
ensembles with low reliability.
Perspectives
• Perform further studies on physically more realistic
dynamical models (shallow-water equations)
• Further comparison with performance Particle Filters, and
with performance of Ensemble Kalman Filter.
Bayesian estimation (continued)
Consider again general situation
z = F(x, )
(1)
What if probability distribution for is not known, or known insufficiently for determining
posterior bayesian probability distribution for x ?
Consider class C of all probability distributions for x ithat are compatible with data (i. e., that are
the bayesian posterior probability distribution for some choice of missing information on
distribution for ). Then choose one distribution in C.
Frequent choice. Choose pdf p(x) which maximizes entropy
E -p(x) lnp(x) dx = - E [lnp(x)]
Maximum entropy (S. Losa). Least commiting choice
Bayesian estimation (continued)
In the case
z = x +
assuming that only expectation and covariance matrix S of are known, entropy-maximizing
probability distribution is gaussian distribution N [xa, Pa] defined above.
Question. How to quantify uncertainty ?
Through bounds, and then interval arithmetic, which carries the bounds through the computations
? That has probably been done. But if operator F-1, while existing, is unbounded, bounds are
infinite. Regularization can help here (but is it legitimate to use regularization only for
imposing bounds for which there is no physical justification ?).
‘Most-squares’ method presented by M. Meju.
Through probability distributions ? Probability distributions are very convenient for describing
uncertainty, especially when it comes to updating uncertainty through Bayes’ rule.
Jaynes, E. T. 2007, Probability Theory: The Logic of Science, Cambridge University Press.
Following an earlier work by Cox (1946), author posits a number of axioms that
any numerical function that measures ‘uncertainty’ must verify, and then finds
(maths is not trivial) that such an uncertainty function follows the rules of calculus
of probability.
Alternative approaches
- Dempster Shafer theory of evidence (Dempster, A. P., 1967, Annals of Math. Statistics, Shafer,
G., 1976, A mathematical theory of evidence. Princeton Univ. Press). Introduces distinct
notions of belief and plausibility.Has been used in remote observation of Earth surface
parameters.
- Theory of possibility (link with fuzzy logic and fuzzy sets (Zadeh, L. A., 1978, Fuzzy Sets and
Systems)
SIAM Activity Group on Uncertainty Quantification (http://www.siam.org/activity/uq/)
Statistical and Applied Mathematical Sciences Institute (SAMSI) 2011-12 Program on
Uncertainty Quantification (http://www.samsi.info/programs/2011-12-program-uncertaintyquantification)
- A few problems to be solved
The data weighting problem
modified w/o changing misfit
original
de Wit & Trampert
•
Practically no change in the global misfit (modification lies essentially in the nullspace).
•
Big change in the local misfit at observatory
.
While being globally plausible, the local observation rules out the modified model.
The data weighting problem
modified w/o changing misfit
original
de Wit & Trampert
•
Practically no change in the global misfit (modification lies essentially in the nullspace).
•
Big change in the local misfit at observatory
.
While being globally plausible, the local observation rules out the modified model.
Can uncertainties be based on one single scalar misfit measure?
Can we attach more significance to observatory without becoming un-Bayesian?
What do we do with all the other observatories that may claim to be as important as observatory
?
- How to represent, or even define, uncertainty ?
- How to objectively evaluate estimates of uncertainty
(requires very large validation sets, that may not be
available for solid Earth) ?
- Identification and validation of errors in ‘models’ (in the
sense of statements of relevant physical laws)
Announcement
International Conference on Ensemble Methods in Geophysical Sciences
Supported by World Meteorological Organization, Météo-France, Agence Nationale de
la Recherche (France)
Dates and location : 12-16 November 2012, Toulouse, France
Forum for study of all aspects of ensemble methods for estimation of state of
geophysical systems (atmosphere, ocean, solid earth, …), in particular in assimilation of
observations and prediction : theory, algorithmic implementation, practical applications,
evaluation, …
Scientific Organizing Committee is being set up. Announcement will be widely
circulated in coming weeks.
If interested, send email to [email protected]