Présentation LCFR Méthodes probabilistes US

Download Report

Transcript Présentation LCFR Méthodes probabilistes US

Advances in methods for uncertainty
and sensitivity analysis
Nicolas Devictor
CEA Nuclear Energy Division
[email protected]
in co-operation with:
Nadia PEROT, Michel MARQUES and Bertrand IOOSS (CEA)
Julien JACQUES (INRIA Rhône-Alpes, PhD student),
Christian LAVERGNE (Montpellier 2 University & INRIA).
International Workshop on level 2 PSA and Severe Accident Management
Koln, Germany, March 2004
1
Introduction (1/2)
• In the framework of the study of the influence of uncertainties on the results of
severe accidents computer codes, and then on results of Level 2 PSA (responses,
hierarchy of important inputs…)
• Why taken account uncertainty ?
– A lot of sources of uncertainty ;
– To show explicitly and tracebly their impact  decision process that could be
robust against uncertainties.
• Probabilistic framework is one of the tools for a coherent and rational treatment of
uncertainties in a decision-making process.
• Some applications of treatment of uncertainty by probabilistic methods
– For a best understanding of a phenomenon
• To evaluate the most influential input variables. To steer R&D.
– For an improvement of a modelling or a code
• Calibration, Qualification…
– In a risk decision-making process
• Hierarchy of contributors  interest for actions to reduce uncertainty or
to define a mitigation mean (for example a SAM measure)
• Confidence intervals or probabilistic density functions or margins…
• In any analysis, we must keep in mind the choice in modelling and the assumptions.
– Case : a variable has a big influence on the response variability, but we have
a low confidence on his value…
OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM
CEA/Cadarache - Plant Operation and Reliability Laboratory
March 2004
2
Sources of uncertainties
Real phenomenon
Human understanding
Simplified model…
Theory
Input variables
Output
Meaning ?
Variability ?
Code
Model parameters
OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM
CEA/Cadarache - Plant Operation and Reliability Laboratory
« mathematics »
Equations
Numerical schemes
Convergence criteria
…
March 2004
3
Introduction (2/2)
• A lot of methods exist, but these methods are often not suitable, from a
theoretical point of view, when
– the phenomena that are modelled by the computer code are discontinuous
in the variation range of influent parameters;
– input variables are statistically dependent.
• For an overview of the method  see paper
• The talk will mainly speak about:
– Sensitivity analysis in the case of dependent input variables.
– The validation of response surfaces.
– The estimation of the additional error that is introduced by the use of a
response surface on the results of the uncertainty and sensitivity analysis.
– Clustering methods, that could be useful when we want apply statistical
methods based on Monte-Carlo simulation.
OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM
CEA/Cadarache - Plant Operation and Reliability Laboratory
March 2004
4
(in this talk) “influence of uncertainties” means:
Uncertainties on:
- physical variables,
- model parameters,
- models…
Process (codes,
experiences…)
 Uncertainty on outputs
 Probability Y > Ytarget
 Most influential variables
OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM
CEA/Cadarache - Plant Operation and Reliability Laboratory
Inputs for the study
Probabilistic models of the
uncertainties on physical variables
and parameters ;
Mathematical model of the ageing
or failure phenomenon ;
Acceptance criterion
Propagation of uncertainties
Probability to exceed a
threshold
Sensitivity analysis
March 2004
5
Sensitivity analyses
y = f(x1, … , xp)
(where y could be a probability)
• 1st Question : what is the impact of a variation of the value of an input
variable on the value of the response Y ?
– Gradient, differential analysis
– Often deterministic approach
• 2nd Question : what is the part of the variance of Y that comes from the
variance of Xi (or a set {Xi}) ?
V E Y X 
V Y 
– Usual sensitivity indices
• Pearson’s correlation coefficient, Spearman’s correlation coefficient,
Coefficients from a linear regression, PRCC…
– In the case of non linear or non monotonous : Sobol’s method or FAST
• with very time consuming code ( use of response surface),
• problems with correlated uncertainties.
– All these indices are defined under the assumptions that the
variables inputs are satistically independent.
OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM
CEA/Cadarache - Plant Operation and Reliability Laboratory
March 2004
6
Sensitivity analyses – dependent inputs
• The problem of sensitivity analysis for model with dependant inputs is a real one,
and concerns the interpretation of sensitivity indices values.
Sj 

V EY/Xj
V Y 

• Inputs are statistically independent  the sum of these sensitivity indices = 1.
• Inputs are statistically dependent
– the terms of model function decomposition (Sobol’s method) are not
orthogonal, so it appears a new term in the variance decomposition.
 the sum of all order sensitivity indices is not equal to 1.
– Effectively, variabilities of two correlated variables are linked, and so when
we quantify sensitivity to one of this two variables we quantify too a part of
sensitivity to the other variable. And so, in sensitivity indices of two variables
the same information is taken into account several times, and sum of all
indices is thus greatest than 1.
• We have studied the natural idea: to define multidimensional sensitivity indices for
groups of correlated variables.
– We can also define higher order indices and total sensitivity indices.
– If all input variables are independent, those sensitivity indices are the same
than in case of independant variables.
– The assessment is often time consuming (extension of Sobol’s method)
 some computational improvements are in progress and very promising.
OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM
CEA/Cadarache - Plant Operation and Reliability Laboratory
March 2004
7
Response surface method
• Interest for a response surface (or meta-model or surrogated model):
– Good capability in approximation (study on the training sample) ;
– Good capability in prediction ;
– Low CPU time for a calculation.
• Data needed in a Response Surface Method (RSM) :
– a training sample D of points (x(i), z(i)), where P(X,Z) the probability law of
the random vector (X,Z) (unknown in practice) ;
– a family F of function f(x,c), where c is either a parameter vector or a index
vector that identifies the different elements of F.
• The best function in the family F is then the function f0 that minimized a
risk function :
R f    L z , f  x, c dP  x , y 
• In practice, often use of an empirical risk function :
RE  f  
1 N
2
 z i   f  x i , c 
N i 1
OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM
CEA/Cadarache - Plant Operation and Reliability Laboratory
March 2004
8
Examples of response surface
• Polynomial models
• Generalized Linear Models (GLM)
– Regression models (assumption : continuous function).
– Other possibility : discriminant function (logit, probit models).
– Qualitative and quantitative inputs.
• Thin plate spline
– Regression models (assumption : continuous function).
• PLS (Partial Least Squares)
– Regression models (assumption : continuous function).
– Qualitative and quantitative inputs.
• Neural networks
– Regression models (assumption : continuous function).
– Other possibility : discriminant function (logit, probit models).
• A simplified « physical » model (3D 1D, …)
OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM
CEA/Cadarache - Plant Operation and Reliability Laboratory
March 2004
9
With regard to the validation step
• The characteristic « good approximation » is subjective and
depends on the use of the response surface.
– What is the future use of the built response surface ?
– What are the constraints that are forced by the use ?
– How to define the validity domain of a response surface ?
• Calibration, modelling, prediction, probability computation…
• Specific criteria in the decision making process
– Conservatism / A bound on the remainder / Better accuracy in
a interest area (distribution tail…).
• How defines the expected accuracy ?
–
–
–
–
Ratio “residual deviance / null deviance” ?
Calibration : representativeness of the most influential parameters,
Prediction : robustness : bias/variance compromise,
The quality of the response surface should be compatible with the
accuracy of the studied code.
OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM
CEA/Cadarache - Plant Operation and Reliability Laboratory
March 2004
10
Validation of a response surface
Statistics
(often under assumptions like Gauss-Markov
assumptions…)
Values computed by a
– Variance analysis
polynomial response
– Estimator of the variance s²
surface
– R² statistics
– Confidence area 1-d for coefficients c
...
Prediction : test base (bias), cross validation
8,3
8
7,7
7,4
7,1
6,8
6,6 6,9 7,2 7,5 7,8 8,1 8,4
Values computed
by the function g
1,4 000
Bootstrap method
– to
improve
the
estimation of the bias
between learning and
generalization error,
– to
estimate
the
sensitivity of the trained
model f in relation to
available data.
Comparison of results
–
Pdf
of
the
output,
Confidence interval…
1,2000
Mean
1,0000
0,8000
Standard
deviation
0,6000
Minimum
0,4000
Maximum
0,2000
0,0000
30
40
OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM
CEA/Cadarache - Plant Operation and Reliability Laboratory
50
60
70
80
90
Database size
March 2004
11
Example : The direct containment heating (DCH)
• In the framework of a contract with the PSA Level 2 project at IRSN (in 2000).
• Code : RUPUICUV module of Escadre ( Model has changed since 2000)
• The calculations have been performed with the in 2000. A database of 300
calculations is available. The inputs vectors for these calculations have been
generated randomly in the variation domain.
• Responses
– maximum pressure in the containment;
– the presence of corium in the containment outside the reactor pit; it is a
discrete response with value 0 (no corium) or 1 (presence).
• Inputs variables
– MCOR : mass of corium, uniformly distributed between 20 and 80 tons,
– FZRO : fraction of oxyded Zr, uniformly distributed between 0,5 and 1,
– PVES : primary pressure, uniformly distributed between 1 and 166 bars,
– DIAM : break size, uniformly distributed between 1 cm and 1 m,
– ACAV : section de passage dans le puits de cuve (varie entre 8 and 22 m 2 )
– FRAC : fraction of corium directly ejected in the containment, uniformly
distributed between 0 and 1,
– CDIS : discharge coefficient at the break, uniformly distributed between 0,1
and 0,9,
– KFIT : adjustment parameter, uniformly distributed between 0,1 and 0,3,
– HWAT : water height in the reactor pit, discrete random variable (0 or 3
meter)
OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM
CEA/Cadarache - Plant Operation and Reliability Laboratory
March 2004
12
Example : maximum pressure (1/2)
• Use of the empirical risk function
• Approximation capabilities : all the RS seems good
• Prediction capabilities :
– Non negligible residues
Response
surface
Polynôme
degré 2
Training set
Pdf
comparison
Similar
Thin plate
spline =3
Interpolation
RN-1
Similar
Mean of 3
RN
Similar
GLM3
Similar
Test base
RS statistics Pdf comparison
R²= 90,3%
CC=0,95
Interpolation
Similar
R²=97,0 %
CC=0,98
R²= 97,3%
CC=0,98
R²= 91,1%
CC=0,95
Similar
Similar
Similar
Similar
RS statistics
R²= 86,8%
CC=0,93
R²= 82,2%
CC=0,90
R²= 74,7%
CC=0,86
R²= 82,3%
CC=0,91
R²= 87,9%
CC=0,94
OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM
CEA/Cadarache - Plant Operation and Reliability Laboratory
Sensitivity indices
comparison
Partial
Spearman
correlations correlations
Very good
Very good
Not same
hierarchy
Not same
hierarchy
Not same
hierarchy
Not same
hierarchy
Very good
Very good
Very good
Very good
March 2004
Max. residue
(bars)
-2,94 ;
+ 1,83
-1,80 ;
+ 2,81
-2,81 ;
+ 3,68
-1,81 ;
+ 2,70
-2,99 ;
+ 1,52
13
Example : maximum pressure (2/2)
Training sample
Test sample
Box-and-Whisker Plot
Box-and-Whisker Plot
po2
po2
P ression
P ression
0
2
4
6
8
10
0
2
Density T races
6
8
10
Density T races
0,2
Variables
po2
Pression
0,24
0,12
0,08
Variables
po2
Pression
0,2
density
0,16
density
4
0,04
0,16
0,12
0,08
0,04
0
0
0
2
4
6
8
0
10
2
4
6
8
10
Plot of Fitted Model
Plot of Fitted Model
10
10
8
po2
po2
8
6
6
4
4
2
2
0
0
0
0
2
4
6
8
10
2
4
6
8
10
Pression
Pression
OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM
CEA/Cadarache - Plant Operation and Reliability Laboratory
March 2004
14
About the impact of response surface « error »
• Use of a RS in an UASA  a bias or an error on the results of the uncertainty and
sensitivity analysis.
• Usual questions are:
– What is the impact of this “error” on the results of an uncertainty and sensitivity
analysis made on a response surface?
– Can we deduce results on the “true” function from results obtained from a
response surface?
•  “residual function”
(x1, …, xp) = RS(x1, …, xp) - f(x1, …, xp)
• Assume that all Xi are independent, and sensitivity analysis have been done on the
two function RS and , and we note SRS,i and S,i the computed sensitivity analysis.
V(E(f(X1, …, Xp)/Xi)) from
SRS,i and S,i is:
 







2 covE RS X1,, X p  X i , E  X1,, X p  X i 

V  f X1,, X p 
S f ,i  S SR,i 
V RS X1,, X p
V  X 1,  , X p
 S  ,i 
V f X 1,  , X p
V f X 1,  , X p
• Problem of the computation of the covariance term  generally impossible to deduce
results on the “true” function from results obtained from a RS.
• Only cases where results can be deduce are :
– SR is a truncated model obtained from a decomposition in a orthogonal basis;
–  is not very sensitive of the variables X1, …, Xp
SSR,i / (V((x1, …, xp))+V(SR(x1, …, xp)))
OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM
CEA/Cadarache - Plant Operation and Reliability Laboratory
March 2004
15
Discontinuous model
• No usual response surface family is suitable.
• In practice, discontinuous behaviour means generally that more than one
physical phenomenon is implemented in the code.
• To avoid misleading in interpretation of results of uncertainty and
sensitivity analysis, discriminant analysis should be used to define areas
where the function is continuous. Analysis are led on each continuous area.
• Possible methods:
–
–
–
–
neural networks with sigmoid activation function,
GLM models with a logit link or logistic regression,
Vector support machine
Decision tree,
and variants like random forest…
• Practical problems are often encountered
if the sample is « linearly separable ».
• Support vector machines and methods
based on Decision Trees are very
promising for that case.
1st
« continuous »
set
OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM
CEA/Cadarache - Plant Operation and Reliability Laboratory
2nd
« continuous »
set
March 2004
16
Example : presence of corium in the containment
• First tool  generalized linear model with a logit link.
– It exists always a model that explains 100% of the dispersion of the results
for the training set.
– But there is some drawbacks :
• the list of the terms that are statistically significant varies strongly with
the training set;
• the prediction error is around 20%.
• Use of neural networks  similar problems.
• Other methods  SVM, decision trees and random forest
• Conclusion (for that example)
– The most efficient method is the Random Forest method.
– The methods J48 and Random Forest are faster than the algorithms based
on optimisation step (like Naïve Bayes, SVM, Neural Network…).
– The principle of decision trees and random forest is simple and based on the
building of a set of logical combination of decision rules. They are often very
readable, and have very prediction capabilities (like shown by the example).
OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM
CEA/Cadarache - Plant Operation and Reliability Laboratory
March 2004
17
Example : presence of corium in the containment
Description of the
method
Method parameters
Training set
Badly classified
Error matrix
A more global indicator
of the quality
(approximation +
prediction capabilities)
of the model is obtained
by cross validation
method.
Test set
Badly classified
Error matrix
Cross validation
(300 calculations)
Badly classified
Error matrix
Naive Bayes
SVM SMO
Optimisation of Support vector
the classification machine method
probability in a
with SMO
Bayesian
algorithm for
framework
the optimisation
under Gaussian
step
assumptions
-D
-C2-E2
-G0.01
-A1000003
-T0.001
-P10-10
-N0 -R -M
-V-1 -W 1
J48
Decision Tree
method also
named C4.5
algorithm.
Random Forest
Algorithm
based on a set of
decision trees
predictor.
-S
-C0.25
M2
-A
-I40
-K0
-S1
7.5%
10 11
5 168
8.5%
12 15
2 171
5%
17 10
0 173
0%
27 0
0 173
9%
11 3
6 80
9%
7 7
2 84
7%
11 3
4 82
5%
10 4
1 85
8.67%
26 15
11 248
9.33%
18 23
5 254
9.67%
21 20
9 250
7.33%
25 16
6 253
OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM
CEA/Cadarache - Plant Operation and Reliability Laboratory
March 2004
18
Conclusions
• A lot of methods exist for UASA in the framework of level 2 PSA and
severe accident codes.
• As these methods are often not suitable, from a theoretical point of view,
when
– the phenomena that are modelled by the computer code are
discontinuous in the variation range of influent parameters;
– input variables are statistically dependent,
new results and ideas to overcome these problems have been described
in the paper.
• Practical interest of these “new” methods should be confirmed, by
application on « real » problems.
OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM
CEA/Cadarache - Plant Operation and Reliability Laboratory
March 2004
19
OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM
CEA/Cadarache - Plant Operation and Reliability Laboratory
March 2004
20
Response uncertainty
• Probability distribution
– Simulation + fit + statistical tests (asymptotical)
• First statistical moments
– Statistics on a sample (convergence, Bootstrap)
– Approximation of the standard deviation
1/2
  f 2

n n  f  f 
2
s f x1,,xn
 s xi2      cov xi;x j 

x
i1 ji1 xi  x j 
 i  i 



• Confidence interval
– From the density function
– Wilks formula
OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM
CEA/Cadarache - Plant Operation and Reliability Laboratory
 
PPmY M 
1 N N1  N 1
March 2004
21
Monte-Carlo Simulations
–Variance reduction methods:
conditional MC, stratified MC,
Hypercube Latin
–More suitable for the computation
of a probability : importance
sampling, directional simulation
–Practical problem with very time
consuming codeResponse surface
OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM
CEA/Cadarache - Plant Operation and Reliability Laboratory
March 2004
22
FORM/SORM Methods
•
Probabilistic transformation Z
U
(Ui is N(0,1)-distributed and are independents)
•
In U-space, a new failure surface G(U)=H(T(Z))=0
•
Design point and Hasofer-Lind index U*
 HL  min
ut u
G u   0
FORM approximation
•
U2
P F     HL
•
SORM approximation (Breitung)
PF  HL1 HL i
n
1 2
i1
•
Sensitivity factors
U*
Safe
domain
HL
Failure
domain
0
U1
G(U) = 0
*
u
i  i
 HL
OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM
CEA/Cadarache - Plant Operation and Reliability Laboratory
March 2004
23
FORM – simple case
• Ramdom variables : N(0,1)-distributed
and are independents
• Limit state function : hyper plane
U2
P f  Pr obY1   HL 
    HL 
P*
Domaine de
défaillance
HL
0
U1
G(U) = 0
OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM
CEA/Cadarache - Plant Operation and Reliability Laboratory
March 2004
24
Validation of the FORM/SORM results
•
•
•
Sets of results : FORM, SORM, Conditional importance sampling, etc.
Comparison of FORM, SORM and Conditional Importance Sampling (CIS)
results
– Coherence of all these results ?
• If yes, a good confidence is obtained in FORM result and geometrical
assumption of FORM method.
– Coherence of FORM and CIS results ?
• If yes, a good confidence is obtained in FORM result and the
geometrical assumption of FORM method.
– Coherence of SORM and CIS results ?
• If yes, a good confidence is obtained in SORM result, and the
geometrical assumption of FORM method is false.
– If no coherence
• Geometrical assumptions for FORM and SORM are false.
• Existence of other minima ?
• Monte-Carlo simulation or a variance reduction method (with or
without a response surface).
New tests have been developed to check that the computed minimum is a global
minimum (non negligible costs).
OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM
CEA/Cadarache - Plant Operation and Reliability Laboratory
March 2004
25
Conditional importance sampling
U2
Domaine de
défaillance
P*


0

OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM
CEA/Cadarache - Plant Operation and Reliability Laboratory
G(u)=0
U1
March 2004
26
Comparison of methods
Simulations
FORM/SORM
RESULTS
RESULTS
FAILURE PROBABILITY
FAILURE PROBABILITY
ERROR ON THE ESTIMATION
MOST INFLUENTIAL VARIABLES
(PROBABILITY)
PROBABILITY
RESPONSE
DISTRIBUTION
OF
THE
EFFICIENCY
(depends on the number of random variables)
ASSUMPTIONS
ASSUMPTIONS
NO ASSUMPTION ON THE RANDOM
VARIABLES
(DISCRETE,
CONTINOUS,
DEPENDANCY…)
CONTINOUS RANDOM VARIABLES
CONTINOUS LIMIT STATE FUNCTION
NO ASSUMPTION ON THE LIMIT STATE
FUNCTION
DRAWBACKS
DRAWBACKS
NO ERROR ON THE ESTIMATION
COMPUTATION COSTS
(depends on the probability level)
OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM
CEA/Cadarache - Plant Operation and Reliability Laboratory
GLOBAL MINIMUM
March 2004
27
Examples of response surface
• Polynomial models
• Generalized Linear Models (GLM)
– Regression models (assumption : continuous function).
– Other possibility : discriminant function (logit, probit models).
– Qualitative and quantitative variables.
• Thin plate spline
– Regression models (assumption : continuous function).
– Qualitative (if 2 factors) and quantitative variables.
• PLS (Partial Least Squares)
– Regression models (assumption : continuous function).
– Qualitative and quantitative variables.
• Neural networks
– Regression models (assumption : continuous function).
– Other possibility : discriminant function (logit, probit models).
– Qualitative (if 2 factors) and quantitative variables.
OCDE/NEA/CSNI WGRISK WS on Level 2 PSA and SAM
CEA/Cadarache - Plant Operation and Reliability Laboratory
March 2004
28