P14_UNCERTAINTY-AND

Download Report

Transcript P14_UNCERTAINTY-AND

Uncertainty and Sampling
Dr. Richard Young
Optronic Laboratories, Inc.
CORM 2002: Uncertainty
Introduction
Uncertainty budgets are a growing
requirement of measurements.
Multiple measurements are generally
required for estimates of uncertainty.
Multiple measurements can also decrease
uncertainties in results.
How many measurement repeats are
enough?
CORM 2002: Uncertainty
Random Data Simulation
PDF of Normal Distribution [m=100, s=10]
0.04
Probability
0.03
0.02
Here is an example
probability
distribution
function of some
hypothetical
measurements.
We can use a
random number
generator with
this distribution
to investigate
the effects of
sampling.
0.01
0
60
70
80
90
100
110
120
130
140
Value
CORM 2002: Uncertainty
Random Data Simulation
Effect of sampling on mean and standard deviation
160
data
140
Value of data
120
100
80
60
Here is a set of 10,000 data points…
40
20
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
Sample #
CORM 2002: Uncertainty
Random Data Simulation
Effect of sampling on mean and standard deviation
160
35
mean
Standard Deviation
140
30
120
25
100
20
80
15
60
10
Plotting Sample # on a log scale is better
to show behaviour at small samples.
40
20
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
Value of sample standard deviation
Value of data or mean
data
5
0
10000
Sample #
CORM 2002: Uncertainty
Random Data Simulation
Effect of sampling on mean and standard deviation
160
35
mean
Standard Deviation
140
30
120
25
100
20
80
15
60
10
There is a lot of variation, but how is this
affected by the data set?
40
20
1
10
100
1000
Value of sample standard deviation
Value of data or mean
data
5
0
10000
Sample #
CORM 2002: Uncertainty
Sample Mean
Sample means of normal distribution random numbers [m=100, s=10] vs number of samples
130
Here we have results
for 200 data sets.
120
Sample mean
110
100
90
m
80

3s
n
70
1
10
100
Number of Samples
CORM 2002: Uncertainty
Sample Mean
PDF for sample mean [m=100, s=10] with samples taken
0.4
0.35
0.3
2
3
Probability
0.25
5
10
100
0.2
0.15
0.1
0.05
0
80
85
90
95
100
105
110
115
120
Value of calculated mean
CORM 2002: Uncertainty
Sample Standard Deviation
Sample standard deviations of normal distribution [m=100, s=10] vs number of samples
35

30
0
25
Sample standard deviation
s

2s
n 1
20
15
10
5
0
1
10
100
Number of Samples
CORM 2002: Uncertainty
Sample Standard Deviation
PDF of Sample standard deviation [s=10] with samples taken
0.6
The most probable
value for the
sample standard
deviation of 2
samples is zero!
Many samples are
needed to make 10
most probable.
0.5
Probability
0.4
0.3
0.2
0.1
Samples
2
3
4
5
10
100
0
0
2
4
6
8
10
12
14
16
18
20
Value of calculated sample standard deviation
CORM 2002: Uncertainty
Cumulative Distribution
CDF of Sample standard deviation [s=10] with samples taken
Sometimes it
is best to look
at the CDF.
Cumulative Probability
1
Samples 50%
2
6.75
3
8.29
4
8.86
5
9.15
10
9.60
100
9.98
0.5
The 50%
level is
where
lower or
higher
values are
equally
likely.
0
0
5
10
15
20
25
30
Value of calculated sample standard deviation
CORM 2002: Uncertainty
Uniform Distribution
PDF of Sample standard deviation [s=10] with samples taken - Uniform Distribution
0.9
What if the
distribution
was uniform
instead of
normal?
0.8
0.7
Probability
0.6
0.5
The most
probable value
for >2 samples is
 10.
2
3
4
5
0.4
10
100
0.3
0.2
0.1
0
0
2
4
6
8
10
12
14
16
18
20
Value of calculated sample standard deviation
CORM 2002: Uncertainty
Uniform Distribution
CDF of Sample standard deviation [s=10] with samples taken - Uniform Distribution
Cumulative Probability
1
Underestimated
values are still
more probable
Samples 50%
7.28 because the
9.21
PDF is
9.65
9.83 asymmetric.
9.94
2
0.5
3
4
5
10
9.99
100
0
0
5
10
15
20
25
30
Value of calculated sample standard deviation
CORM 2002: Uncertainty
Uniform Distribution
 Throwing a die is an example of a uniform
random distribution.
 A uniform distribution is not necessarily random
however.
 It may be cyclic e.g. temperature variations due to
air conditioning.
 With computer controlled acquisition, data
collection is often at regular intervals.
 This can give interactions between the cycle
period and acquisition interval.
CORM 2002: Uncertainty
Cyclic Variations
Sinusoidal Variation [m=100, s=10]
120
For symmetric cycles, any
multiple of two data points
per cycle will average to
the average of the cycle.
115
110
Value
105
100
95
90
85
80
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Phase
CORM 2002: Uncertainty
Cyclic Variations
Mean values of sinusiodal data with phase at 10 samples per cycle
Phase
120
0
Unless synchronized, data
collection may begin at any
point (phase) within the cycle.
115
Sample Mean Value
110
0.05
0.1
0.15
0.2
0.25
0.3
105
0.35
0.4
0.45
100
0.5
0.55
Correct averages are
obtained when full
cycles are sampled,
regardless of the phase.
95
90
85
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
80
1
10
100
1000
Sample #
CORM 2002: Uncertainty
Cyclic Variations
Sample standard deviations of sinusiodal data with phase at 10 samples per cycle
Phase
14
Standard
Deviation
12
Again, whole cycles are
needed to give good values.
0
0.05
0.1
Sample standard deviation
10
0.15
8
0.2
0.25
6
The value is not 10
because sample
standard deviation
has a (n-1)0.5 term.
4
2
0.3
0.35
0.4
0.45
0
1
10
100
1000
Sample #
CORM 2002: Uncertainty
Cyclic Variations
Population standard deviations of sinusiodal data with phase at 10 samples per cycle
12
Phase
0
0.05
10
Population standard deviation
0.1
The population standard
deviation is 10 at each
complete cycle.
Each cycle contains all the
data of the population.
8
6
4
0.15
0.2
0.25
0.3
0.35
The standard deviation for
full cycle averages = 0.
2
0
1
10
100
0.4
0.45
1000
Sample #
CORM 2002: Uncertainty
Smoothing
Smoothing involves combining adjacent
data points to create a smoother curve
than the original.
A basic assumption is that data contains
noise, but the calculation does NOT allow
for uncertainty.
Smoothing should be used with caution.
CORM 2002: Uncertainty
Smoothing
What is the difference?
CORM 2002: Uncertainty
Savitzky-Golay Smoothing
Effect of Savitzky-Golay smoothing
7000
6000
5000
Signal [cps]
4000
Here is a
spectrum
of a white
LED.
0.02s data
3000
2000
It is recorded at
very short
integration time
to make it
deliberately
noisy.
1000
0
-1000
350
400
450
500
550
600
650
700
750
800
Wavelength [nm]
CORM 2002: Uncertainty
Savitzky-Golay Smoothing
Effect of Savitzky-Golay smoothing
7000
6000
0.02s data
0.02s 25 pt S-G
5000
Signal [cps]
4000
3000
2000
A 25 point
Savitzky-Golay
smooth gives a
line through the
center of the
noise.
1000
0
-1000
350
400
450
500
550
600
650
700
750
800
Wavelength [nm]
CORM 2002: Uncertainty
Savitzky-Golay Smoothing
Effect of Savitzky-Golay smoothing
7000
6000
0.02s data
0.02s 25 pt S-G
5000
7s data
Signal [cps]
4000
3000
2000
1000
The result of the
smooth is very
close to the
same device
measured with
optimum
integration time
0
-1000
350
400
450
500
550
600
650
700
750
800
Wavelength [nm]
CORM 2002: Uncertainty
Spectral Sampling
Effect of Savitzky-Golay smoothing
7000
But how does the
number of data
points affect
results?
6000
0.02s data
5000
Signal [cps]
4000
3000
2000
1000
0
-1000
350
Here we have 1024 data points.
400
450
500
550
600
650
700
750
800
Wavelength [nm]
CORM 2002: Uncertainty
Spectral Sampling
Effect of Savitzky-Golay smoothing
7000
6000
0.02s data
5000
Signal [cps]
4000
3000
2000
1000
0
-1000
350
Now we have 512 data points.
400
450
500
550
600
650
700
750
800
Wavelength [nm]
CORM 2002: Uncertainty
Spectral Sampling
Effect of Savitzky-Golay smoothing
7000
6000
0.02s data
5000
Signal [cps]
4000
3000
2000
1000
0
-1000
350
Now we have 256 data points.
400
450
500
550
600
650
700
750
800
Wavelength [nm]
CORM 2002: Uncertainty
Spectral Sampling
Effect of Savitzky-Golay smoothing
7000
6000
0.02s data
5000
Signal [cps]
4000
3000
2000
1000
0
-1000
350
Now we have 128 data points.
400
450
500
550
600
650
700
750
800
Wavelength [nm]
CORM 2002: Uncertainty
Spectral Sampling
Effect of Savitzky-Golay smoothing
7000
6000
0.02s data
0.02s 25 pt S-G
5000
Signal [cps]
4000
3000
2000
A 25 point
smooth follows
the broad peak
but not the
narrower
primary peak.
1000
0
-1000
350
400
450
500
550
600
650
700
750
800
Wavelength [nm]
CORM 2002: Uncertainty
Spectral Sampling
Effect of Savitzky-Golay smoothing
7000
6000
5000
Signal [cps]
4000
3000
2000
1000
But it
doesn’t
work so
well on
the
broad
peak.
0.02s data
0.02s 7pt S-G
To follow the
primary peak
we need to use
a 7 point
smooth…
0
-1000
350
400
450
500
550
600
650
700
750
800
Wavelength [nm]
CORM 2002: Uncertainty
Spectral Sampling
Effect of Savitzky-Golay smoothing
7000
6000
5000
Signal [cps]
4000
3000
2000
1000
0
This is
because
some of
the
higher
signal
data have
been
removed.
-1000
350
400
450
0.02s data
0.02s 7 pt S-G
7s data
500
550
600
Comparing to
the optimum
scan, the
intensity of the
primary peak is
underestimated.
650
700
750
800
Wavelength [nm]
CORM 2002: Uncertainty
Spectral Sampling
Effect of Savitzky-Golay smoothing
Effect of Savitzky-Golay smoothing
7000
7000
6000
6000
0.02s data
0.02s data
0.02s 7 pt S-G
5000
7s data
7s data
Beware of under-sampling peaks –
you may underestimate or
overestimate intensities.
4000
Signal [cps]
4000
Signal [cps]
0.02s 25 pt S-G
5000
3000
3000
2000
2000
1000
1000
0
0
-1000
350
-1000
350
400
450
500
550
600
Wavelength [nm]
650
700
750
800
400
450
500
550
600
650
700
Wavelength [nm]
CORM 2002: Uncertainty
750
800
Exponential Smoothing
Effect of Exponential smoothing
7000
Here is the
original
data again.
6000
0.02s data
5000
Signal [cps]
4000
What about other types of
smoothing?
3000
2000
1000
0
-1000
350
400
450
500
550
600
650
700
750
800
Wavelength [nm]
CORM 2002: Uncertainty
Exponential Smoothing
Effect of Exponential smoothing
7000
An exponential
smooth shifts the
peak.
6000
0.02s data
5000
0.02s 0.8 Exp
Signal [cps]
4000
Beware of asymmetric
algorithms!
3000
2000
1000
0
-1000
350
400
450
500
550
600
650
700
750
800
Wavelength [nm]
CORM 2002: Uncertainty
Sampling Without Noise
Effect of Sampling on data without noise
7000
This is the optimum integration
scan but with 128 points like
the noisy example.
6000
Signal [cps]
5000
4000
With lower noise, can we
describe curves with
fewer points?
3000
2000
1000
0
350
400
450
500
550
600
650
700
750
800
Wavelength [nm]
CORM 2002: Uncertainty
Sampling Without Noise
Effect of Sampling on data without noise
7000
… 64 points.
6000
Signal [cps]
5000
4000
3000
2000
1000
0
350
400
450
500
550
600
650
700
750
800
Wavelength [nm]
CORM 2002: Uncertainty
Sampling Without Noise
Effect of Sampling on data without noise
7000
… 32 points.
6000
Is this enough to
describe the peak?
Signal [cps]
5000
4000
3000
2000
1000
0
350
400
450
500
550
600
650
700
750
800
Wavelength [nm]
CORM 2002: Uncertainty
Interpolation
Interpolation is the process of estimating
data between given points.
National Laboratories often provide data
that requires interpolation to be useful.
Interpolation algorithms generally estimate
a smooth curve.
CORM 2002: Uncertainty
Interpolation
There are many forms of interpolation:
 LeGrange, B-spline, Bezier, Hermite,
Cardinal spline, cubic, etc.
They all have one thing in common:
 They go through each given point and
hence ignore uncertainty completely.
Generally, interpolation algorithms are
local in nature and commonly use just 4
points.
CORM 2002: Uncertainty
Interpolation
Effect of Interpolation
180
Random data
LeGrange Interpolated
Random Number [m=100, s=10]
160
Excel Smooth curve
140
120
The interesting thing about
interpolating data containing
random noise is you never know
what you will get.
100
80
60
Let’s zoom this portion…
40
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
Wavelength [nm]
CORM 2002: Uncertainty
Interpolation
Effect of Interpolation
Uneven sampling can
cause overshoots.
180
Random data
LeGrange Interpolated
Random Number [m=100, s=10]
160
Excel Smooth curve
140
120
100
80
The Excel curve can
even double back.
60
40
500
520
540
560
580
600
620
640
660
680
700
Wavelength [nm]
CORM 2002: Uncertainty
Combining a Smooth
and Interpolation
If a spectrum can be represented by a
function, e.g. polynomial, the closest “fit”
to the data can provide smoothing and
give the values between points.
The “fit” is achieved by changing the
coefficients of the function until it is
closest to the data.
 A least-squares fit.
CORM 2002: Uncertainty
Combining a Smooth
and Interpolation
The square of the differences between
values predicted by the function, and
those given by the data are added to give a
“goodness of fit” measure.
Coefficients are changed until the
“goodness of fit” is minimized.
Excel has a regression facility that
performs this calculation.
CORM 2002: Uncertainty
Combining a Smooth
and Interpolation
Theoretically, any simple smoothly varying
curve can be fitted by a polynomial.
Sometimes it is better to “extract” the data
you want to fit by some reversible
calculation.
This means you can use, say, 9th order
polynomials instead of 123rd order to make
the calculations easier.
CORM 2002: Uncertainty
Polynomial Fitting
Lamp 525
30
NIST provide
data at uneven
intervals.
25
E [mW cm-2 nm-1]
20
To use the data, we
have to interpolate to
intervals required by
our measurements.
15
Data
10
5
0
0
500
1000
1500
2000
2500
Wavelength [nm]
CORM 2002: Uncertainty
Method 1
9th power polynomial fit
30
data
fit
25
E*mm5/exp(a+b/mm)
20
15
NIST recommend to fit a highorder polynomial to data values
multiplied by l5/exp(a+b/l) for
interpolation.
The result looks
good, but…
10
5
0
0
500
1000
1500
2000
2500
-5
Wavelength [nm]
CORM 2002: Uncertainty
Method 1
9th power polynomial fit
1
data
fit
...on a log scale, the match
is very poor at lower values.
E*mm5/exp(a+b/mm)
0.1
0.01
0.001
250
300
350
400
450
Wavelength [nm]
CORM 2002: Uncertainty
Method 1
Lamp 525
10
Data
fit
E [mW cm-2 nm]
1
0.1
When converted back to the
original scale, lower values bear
no relation to the data.
0.01
250
300
350
400
450
Wavelength [nm]
CORM 2002: Uncertainty
What went wrong?
 The “goodness of fit” parameter is a measure of
absolute differences, not relative differences.
 NIST use a weighting of 1/E2 to give relative
differences, and hence closer matching, but that is
not easy in Excel.
 Large values tend to dominate smaller ones in
the calculation.
 A large dynamic range of values should be
avoided.
 We are trying to match data over 4 decades!
CORM 2002: Uncertainty
How do NIST deal with it?
Although NIST’s 1/E2 weighting gives
closer matches than this data, to get best
results they split the data into 2 regions
and calculate separate polynomials for
each.
This a reasonable thing to do but can lead
to local data effects and arbitrary splits
that do not fit all examples.
Is there an alternative?
CORM 2002: Uncertainty
Alternative Method 1
Alternative 9th power polynomial fit
3
A plot of the log of
E*l5 values vs. l-1 is
a gentle curve
2
data
fit
1
– almost a straight
line.
We can calculate a
polynomial without
splitting the data.
5
Log(E*mm )
0
-1
-2
The fact that we are fitting a log scale
means we are effectively using relative
differences in the least squares calculation.
-3
-4
-5
0
0.5
1
1.5
2
2.5
3
3.5
4
-1
1/l [mm )
CORM 2002: Uncertainty
Method 2
Lamp 525
30
Scaled Blackbody @ 3207.9K
Incandescent lamp
emission is close to
that of a blackbody.
Data
25
E [mW cm -2 nm -1]
20
15
10
5
0
0
500
1000
1500
2000
2500
Wavlength [nm]
CORM 2002: Uncertainty
Method 2
Lamp 525
30
Scaled Blackbody @ 3207.9K
If we calculate a
scaled blackbody
curve as we would to
get the distribution
temperature…
Data
25
E [mW cm -2 nm -1]
20
15
10
…and then divide the
data by the blackbody...
5
0
0
500
1000
1500
2000
2500
Wavlength [nm]
CORM 2002: Uncertainty
Method 2
9th power polynomial fit
1.1
Data
fit
1
0.9
E/EBB
...we get a smooth curve with very
little dynamic range.
0.8
The “fit” is not good because of
the high initial slope and almost
linear falling slope.
0.7
0.6
0
500
1000
1500
2000
2500
Wavelength [nm]
CORM 2002: Uncertainty
Method 2
9th power polynomial fit
1.1
Data
fit
1
E/EBB
0.9
Plotting vs. l-1, as in alternative
method 1, allows close fitting of
the polynomial.
0.8
0.7
0.6
0
0.5
1
1.5
2
2.5
3
3.5
4
-1
1/l [mm ]
CORM 2002: Uncertainty
Comparing results
Residuals for Lamp 525
0.3%
0.1%
Residuals [%]
0.0%
-0.1%
-0.3%
NIST Program, Region 1
Method 2 shows lower
residuals, but there is
not much difference.
NIST Program, Region 2
-0.4%
Alternative Method 1
Method 2
-0.6%
0
500
1000
1500
2000
2500
Wavelength [nm]
CORM 2002: Uncertainty
Comparing results
Lamp 525
30
25
E [mW cm-2 nm-1]
20
15
Data
All methods discussed
give essentially the
same result when
converted back to the
original scale.
fit
10
5
0
0
500
1000
1500
2000
2500
Wavelength [nm]
CORM 2002: Uncertainty
Algorithms and Uncertainty
None of the algorithms mentioned allow
for uncertainty or assume it is constant.
If we replaced the least-squares
“goodness of fit” parameter with “most
probable,” this would use the uncertainty
we know is there to determine the best fit.
Why is this not done?
 Difficult in Excel.
 Easy with custom programs.
CORM 2002: Uncertainty
Algorithms and Uncertainty
PDF of Normal Distribution [m=100, s=10]
0.04
The value from
the fit has a
probability that
we can use.
0.03
Probability
From the data
value (mean)
and the
standard
deviation, we
can calculate
the PDF.
0.02
0.01
0
60
70
80
90
100
110
120
130
140
Value
CORM 2002: Uncertainty
Algorithms and Uncertainty
Multiply the probabilities at each point to
give the “goodness of fit” parameter.
Use this parameter instead of the leastsquares in the fit calculations.
MAXIMIZE the “goodness of fit” parameter
to obtain the best fit.
The fit will be closest where uncertainties
are lowest.
CORM 2002: Uncertainty
Conclusions
Standard deviations may be underestimated with small samples.
Cyclic variations should be integrated for
complete cycle periods.
Smoothing and interpolation should be
used with caution:
 Do not assume results are valid – check.
CORM 2002: Uncertainty
Conclusions
Polynomial fits can give good results, but:
 Avoid large dynamic range
 Avoid complex curvatures
 Avoid high initial slopes
All these manipulations ignore uncertainty
(or assume it is constant).
 But least-squares fits can be replaced by
maximum probability to take uncertainty
into consideration.
CORM 2002: Uncertainty