Transcript Title

Summary of P-box
• Probability bound analysis (PBA)
• PBA can be implemented by nested Monte Carlo
simulation.
– Generate CDF for different instances of the epistemic uncertainty
– P-ox is given by bounds of probability distribution.
– As a result, probability is interval valued quantity instead of single value.
-1-
Validation metric in case of aleatory uncertainty
• Continuous model response and discrete experimental
observations
– Cumulative distribution function (CDF) from continuous PDF
FX  x   
x

f X  x  dx
– Empirical distribution function (EDF) from finite number of data
1, xi  x
1
Sn  x    I  xi , x  , where I  xi , x   
n
0, xi  x
• Example
– Single data x=1.5
– 3 data x=1.2 1.5 3.3
– 100 data x= normrnd(3.5,1.2,1,100)
1
0.9
1
0.8
0.9
0.7
1
0.8
0.6
0.9
0.7
0.5
0.8
0.6
0.4
0.7
0.5
0.3
dx=0.2; xx=0:dx:8; nx=length(xx);
n=100; xp=normrnd(3.5,1.2,1,n); xp=sort(xp);
%xp=[1.2 1.5 3.3]
for i=1:nx; yy(i)=sum(heaviside(xx(i)-xp))/n; end
stairs(xx,yy,'LineWidth',2); ylim([0 1])
-2-
0.6
0.4
0.2
0.5
0.3
0.1
0.4
0.2
0
0
1
2
3
4
5
0.1
0
6
7
0.3
8
0.2
0
1
2
3
4
5
6
7
8
0.1
0
0
1
2
3
4
5
6
7
8
How to compare the two, and what is the metric ?
– Compare visually the shape of the two distributions.
– Compare values of moments such as mean and variance.
– Compare maximum difference of CDF, Kolmogorov-Smirnov distance:
d S  F , Sn   max F  x   Sn  x 
x
– In this Oberkampf and Roy, area metric is suggested.
d  F , Sn   


F  x   Sn  x  dx
• area between model distribution F and data distribution Sn as the measure of mismatch
between the model and the experiment.
-3-
Validation metric is better than comparing moments
• Figures explain:
– Even if we have agreement with mean, we can still have large area.
– Even if we have agreement with both mean & stdev, we can still have
the large area metric.
– Only when the two agree closely at every point, we have small area.
-4-
area metric is better than KS metric
– In the 1st figure below, according to the K-S metrics,
the right one is better than the left.
But from engineering viewpoint, the left one is better.
– In the 2nd figure below, both give same K-S metric of unity.
Area metric does not.
-5-
area metric applies even when both are discrete
• Sometimes, we also have only a few simulation output
from the model due to the computational cost.
-6-
Vicente Romero’s metric and critique
• Area metric does not distinguish between left and middle
cases.
• Prefers metric in real space (see notes page for source)
P-box in case of epistemic uncertainty
– Case (a): p-box for interval uncertainty, i.e., no aleatory, only pure epistemic
uncertainty. Then SRQ will also be in an interval. In terms of probability, CDF = 0
below left. CDF = 1 above right.
– Case (c): degenerate p-box, i.e., precise CDF is given when there is only
aleatory uncertainty. This is of course typical of traditional probability distribution.
– Case (b): p-box for a mixture of aleatory & epistemic uncertainty. As a result, we
get bounded CDF. e.g., at SRQ = 34, CDF is in the range [0.2, 0.6]. Compare
with case (c) where the CDF is a single value.
-8-
Area metric d under epistemic uncertainty
– Consider only the case of single observation.
– For easier understanding, think about the extreme case of aleatory
uncertainty, in which the lower bound and upper bound is identical.
-9-
Homework
Consider the model used in previous lecture that simulates a response y(x) as a function of input x.
y  x  3sin  x / 2
At x=3.0, we have experimentally measured the performance of y three times to get 5.7, 5.84, 6.45.
1. Determine the confidence interval of the unknown mean of the difference between the model
and experimental data. Plot the sample mean and its confidence interval. Based on the
result, is the model is valid ?
2. Now the input x is found to be random variable, following normal distribution with the mean
at 3, standard deviation 0.5. Due to this, the performance y is no longer deterministic but is
given by the distribution. Use the crude Monte Carlo simulation to obtain samples of y and
plot the histogram. From the samples, calculate the mean and standard deviation of output y.
How much is the difference between the mean of y and deterministic output y ?
3. Plot empricial cumulative distribution function from experimental data as in slide 2 of this
lecture.
4. Plot the two cdf's, one being the cdf of model, the other the empirical cdf in one figure.
5. Calculate the validation metric as defined by Kolmogorov-Smirnov test, which is the
maximum difference of the two CDF's.
6. Calculate the validation metric as defined by area between two CDF as given by page 9.
- 10 -
Homework -continued
7.
8.
Draw the Romero real validation metric based on the confidence intervals.
Now the model is changed to the following equation, which includes a parameter u, instead
of a fixed value 2.
y  x  3sin x / u


and the value of u is known to be between (1,3). Assuming this as uniformly distributed, by
employing double loop Monte-Carlo process, draw p-box of the cdf.
9. Plot empricial cumulative distribution function on top of the p-box.
10. Calculate the validation metric as defined by area between the two CDFs
The terminology used in this lecture is very important!
- 11 -