Systematic error

Download Report

Transcript Systematic error

Error and Error Analysis
Types of error
Random error – Unbiased error due to unpredictable fluctuations
in the environment, measuring device, and so forth.
Systematic error – Error that biases a result in a particular
direction. This can be due to calibration error, instrumental bias, errors
in the approximations used in the data analysis, and so forth.
For example, if we carried out several independent measurements
of the normal boiling point of a pure unknown liquid, we would expect
some scatter in the data due to random differences that occur from
measurement to measurement.
If the thermometer used in the
measurements was not calibrated, or we failed to correct the
measurements for the effect of pressure on boiling point, there would be
systematic errors.
Precision and Accuracy
Precision is a measure of how close successive independent
measurements of the same quantity are to one another.
Accuracy is a measure of how close a particular measurement (or
the average of several measurements) is to the true value of the measured
quantity.
Good precision; poor accuracy
Good precision, good accuracy
Significant Figures
The total number of digits in a number that provide information
about the value of a quantity is called the number of significant figures.
It is assumed that all but the rightmost digit are exact, and that there is
an uncertainty of ± a few in the last digit.
Example: The mass of a metal cylinder is measured and found to
be 42.816 g. This is interpreted as follows
42.816
error is ± 2 or 3
considered as certain
Number of significant figures is 5
Rules For Counting Significant Figures
1) Exact numbers have an infinite number of significant figures.
2) Leading zeros are not significant.
3) Zeros between nonzero digits are always significant.
4) Trailing zeros to the right of a decimal point are always
significant.
5) Trailing zeros to the left of a decimal point may or may not be
significant (they may, for example, be placeholders).
6) All other digits are significant.
Counting Significant Figures (Examples)
3.108
4 significant figures
0.00175
3 significant figures (zeros are placeholders)
7214.0
5 significant figures
4000.
1, 2, 3, or 4 significant figures
(zeros may be placeholders)
We can always make clear how many significant figures are
present in a number by writing the number in scientific notation.
4. x 103
1 significant figure
4.00 x 103
3 significant figures
Significant Figures in Calculations
There are two common general cases. A general method that
applies to cases other than those given below will be discussed later.
Multiplication and/or division - The number of significant figures
in the result is equal to the smallest number of significant figures present
in the numbers used in the calculation.
Example. For a particular sample of a solid, the mass of the solid
is 34.1764 g and the volume of the solid is 8.7 cm3. What is the value for
D, the density of the solid?
D = mass = 34.1764 g = 3.9283218 g/cm3 = 3.9 g/cm3
volume
8.7 cm3
Addition and/or subtraction. The number of significant figures in
the result is determined by where the number of significant figures in the
numbers used in the calculation, relative to the decimal point, first run
out.
Example. The masses for three rocks are 24.19 g, 2.7684 g, and
91.8 g. What is the combined mass of the rocks?
24.19
g
2.7684 g
91.8
g
118.7584 g = 118.8 g
Additional Cases
1) For calculations involving both addition/subtraction and
multiplication/division the rules for significant figures are applied one at
a time, rounding off to the correct number of significant figures at the
end of the calculation.
Example.
(46.38 - 39.981) =
13.821
6.399 = 0.4629911 = 0.463
13.821
2) Significant figures for other kinds of mathematical operations
(logarithm, exponential, etc.) are more complicated and will be discussed
separately.
Rounding Numbers
In the above calculations we had to round off the results to the
correct number of significant figures. The general procedure for doing
so is as follows:
1) If the digits being removed are 5 or less, round down.
2) If the digits being removed are more than 5, round to the even
number.
Examples:
3.4682
to 3 sig figs
3.47
18.4513
to 4 sig figs
18.45
1.4500
to 2 sig figs
1.4
1.4501
to 2 sig figs
1.5
Statistical Analysis of Random Error
There are powerful general techniques that may be used to
determine the error present in a set of experimental measurements.
These techniques generally make the following assumptions:
1) Successive measurements are independent of one another.
2) Systematic error in the measurements can be ignored.
3) The random error present in the measurements has a normal
(Gaussian) distribution
P(x) dx =
1
exp[- (x - )2/22] dx
(22)1/2
P(x) = probability of obtaining x from the measurement
= average value (the true value in the absence of systematic error)
 = standard deviation (measure of the width of the distribution)
Normal Error Distribution
P(x) dx =
1
(22)1/2
 = 0,  = 1
exp[- (x - )2/22] dx
Error Analysis – General Case
Consider the case where we have N independent experimental
measurements of a particular quantity. We may define the following:
x = (  xi )/N
S=
1
{  (xi – x )2 }1/2
(N – 1)1/2
Sm = S/N1/2 =
1
{  (xi – x )2 }1/2
[N(N – 1)]1/2
In the above equations the summation is assumed to run from i = 1 to i =
N.
We call x the average (mean) value for the N measurements, S
the estimated standard deviation, and Sm the estimated standard deviation
of the mean.
Limiting Case: N  
The equations given on the previous slide show interesting
behavior in the limit of a large number of measurements. As N  
x
S
Sm  0
What does this mean? In the limit of a large number of
measurements (and subject to the restrictions previously given) the
average value approaches the true value for the quantity being measured,
the estimated standard deviation approaches the true standard deviation,
and the estimated standard deviation of the mean approaches zero. This
means that in the absence of systematic error we can reduce the
uncertainty (error) in our average value by carrying out a larger and
larger number of measurements. Unfortunately, since Sm ~ 1/N1/2, the
square root of the number of measurements made, the reduction in the
uncertainty is slow. We normally have to make a compromise between
reducing the uncertainty and making a reasonable number of
measurements.
Uncertainty for a Finite Number of Measurements
Consider the case where N does not become large. In this case,
we usually report uncertainties in terms of P, the confidence limits on the
result. We interpret the confidence limits in the following way. If the
value for the experimental quantity is given as x  , at (P x 100%)
confidence limits, it means that the probability that the true value for the
quantity being measured lies within the stated error limits is P x 100%.
Example: The density of an unknown substance has been
measured and the following results obtained.
D = (1.485  0.16) g/cm3, at 95% confidence limits
This means that there is a 95% probability that the true value for
the density lies within  0.16 g/cm3 of the average value obtained from
experiment.
We typically use 95% confidence limits in reporting data, though
other limits can be used.
Determination of Confidence Limits
The theoretical basis for determining confidence limits is
complicated (see Garland et al., p 48-50 for a brief discussion). We will
therefore focus on the results and not be concerned with the theoretical
justification.
Consider the case where N independent measurements have been
carried out, and x and Sm have been found by the methods previously
discussed. The confidence limits for x are given by the following
relationship
P, = tP, Sm = tP, S/N1/2
where P is the confidence limits (expressed as a decimal number instead
of a percentage), Sm is the estimated standard deviation of the mean, S is
the estimated standard deviation,  is the number of degrees of freedom
in the measurements, and tP, is the critical value of t (Table 3, page 50 of
Garland et al.)
The number of degrees of freedom in a measurement is equal to
the number of measurements made minus the number of parameters
determined from the measurements. For example, for the case of N
measurements, then…
…if we find the average value for the measurements (one
parameter) then  = N – 1
…if we fit the data to a line (two parameters) then  = N – 2
…if we fit the data to a cubic equation (y = a0 + a1x + a2x2 + a3x3,
four parameters) then  = N – 4.
and so forth.
Example
Consider the determination of Uc, the energy of combustion, for an
unknown pure organic compound. The following experimental results are
found
N=5
Trial
Uc(kJ/g)
1
- 11.348
2
- 11.215
3
- 11.496
4
- 11.335
5
- 11.290
=N–1=4
x = - 11.3368 kJ/g
S = 0.010308 kJ/g
For  = 4 and P = 0.95 (95% percent confidence limits), we get
from Table 3 that t0.95,4 = 2.78. Therefore
 = tP, S/N1/2 = (2.78)(0.10308 kJ/g)/(5)1/2 = 0.1281 kJ/g
We generally round off the confidence limits to two significant
figures, and then report the average value out to the same number of
significant figures as the confidence limits.
We would therefore report the result from the above
measurements as
Uc = (- 11.34  0.13) kJ/g, at 95% confidence limits
Comparison of Experimental Results to Literature or
Theoretical Values
You will often compare your experimental results to either a
literature value or a value obtained by some appropriate theory. While
literature values have their own uncertainty it will generally be the case
that the uncertainty is small compared to the uncertainty in your results.
You will therefore generally treat literature values as “exact”.
If a literature result or theoretical result falls within the
confidence limits of your result, then we say that your result agrees with
the experimental or theoretical value at the stated confidence limits. If
the result falls outside the confidence limits then we say that you’re your
result disagrees with the literature or theoretical value at the stated
confidence limits.
Discordant Data
It will sometimes be the case that one result will look much
different than the other results that have been found in an experiment. In
this case it is usually a good idea to go back and check your calculations
to see if a mistake has been made.
Assuming there are no obvious calculation mistakes, there is a
procedure (called the Q-test) that can be used to decide whether or not to
keep an anomalous experimental result (Garland et al., p 42-43).
Generally speaking one should be reluctant to discard any
experimental result, and resist the temptation to discard results just to
make your experimental data look better. If you do decide not to include
an experimental result in your data analysis I expect that the result should
still be reported, and that a reason be given for not including the result in
your calculations.
Propagation of Error
We often use experimental results in calculations. In this case
there is a general procedure that can be found for error propagation.
Let F(x,y,z) be a function whose value is determined by the
values of the variables x, y, and z. We further assume that x, y, and z are
independent variables. Let x, y, and z be the uncertainty in the
values for x, y and z (these might, for example, be confidence limits for
the values). F, the uncertainty in the value of F, is related to the
uncertainties in x, y, and z as follows
(F)2 = (F/x)2(x)2 + (F/y)2(y)2 + (F/z)2(z)2
where (F/x), (F/y), and (F/z) are the partial derivatives of F with
respect to x, y, and z.
Examples
F=x+y+z
(F)2 = (F/x)2(x)2 + (F/y)2(y)2 + (F/z)2(z)2
(F)2 = (1)2(x)2 + (1)2(y)2 + (1)2(z)2
= (x)2 + (y)2 + (z)2
F= xyz
(F)2 = (F/x)2(x)2 + (F/y)2(y)2 + (F/z)2(z)2
(F)2 = (yz)2(x)2 + (xz)2(y)2 + (xy)2(z)2
If we divide both sizes of the equation by F = (xyz)2, we get (after
some rearrangement)
(F/F)2 = (x/x)2 + (y/y)2 + (z/z)2
Note the above results are general for combinations of addition
and subtraction (1st result) or of multiplication and division (2nd result).
Example
k = A exp(- Ea/RT)
(the Arrhenius equation)
(k)2 = (k/A)2(A)2 + (k/Ea)2(Ea)2
(k)2 = [exp(-Ea/RT)]2(A)2 + [(A/RT) exp(-Ea/RT)]2(Ea)2
If we divide through by k2, we can rewrite the above in a slightly
nicer form
(k/k)2 = (A/A)2 + (Ea/RT)2
The nice thing about the general method for finding how error
propagates in a calculation is that it allows you to deal with “nonstandard” equations - that is, equations involving operations more
complicated than addition, subtraction, multiplication, and division.
There are more sophisticated treatment of error that take into
account correlation between errors from different terms in an equation,
but we will not consider such treatments here.
“Best Fit” of Data to a Line
It is often the case that two variables x and y are linearly related.
In this case we would like to be able to find the line that best fits the
experimental data, that is, find the best values for the slope (m) and
intercept (b). We consider the following case (there are other more
complicated cases that we will not consider here).
Assume that the values for x (independent variable) are exact.
Assume the error is the same for all values of y. The general way in
which we proceed is as follows:
1) Define an error function
E(m,b) =  (yc,i – yi)2
=  (mxi + b – yi)2
where xi and yi are the ith experimental values for x and y, yc,i is the ith
calculated value for y, and the sum runs from i = 1 to i = N.
2) Find the values for m and b that minimize the error. We do
this by setting the first derivatives of the error function equal to zero
(strictly speaking this only finds extreme points in the error function, but
in this case the extreme point will turn out to be a minimum).
(E/m) = 0
(E/b) = 0
This will give us two equations in terms of two unknowns, m and b.
3) Use the equations found in 2 to determine the best fit values
for m and b.
The results (not derived here) are as follows
m = ( xiyi) – (1/N)(xi)(yi)
(xi2) – (1/N)(xi)2
b = (1/N) [ (yi) – m(xi) ]
The above procedure is called the method of least squares. Note that
expressions for the confidence limits for m and b can also be found.
“Best Fit” of Data to a Polynomial
The same procedure outlined above may be used to fit data to a
function that is a power series expansion in x.
y = a0 + a1x + a2x2 + …amxm
where a0, a1, …, am are constants determined by finding values that
minimize the error. Note that the above polynomial is of order m.
While it is tedious to do the math involved in obtaining
expressions for the constants a0, a1, …, am it turns out that the method
outlined above for finding the best fitting line works in the general case,
and that a (relatively) simple method for determining the constants and
confidence limits for the constants can be found using matrix methods.
Best Fit of Data to an Arbitrary Function
The same procedure outlined above can usually be used to fit
experimental data to any function. The difference is that the math
involved is generally more complicated, and that correlation among the
variables in the functions used can become important. In many cases
numerical approximation methods must be used to determine the values
of the constants that give a best fit of the function to the experimental
data.
The POLY Program
A computer program (called POLY) has been written that finds
the set of constants that gives the best fit of experimental results to an mth
order polynomial. Note that for m = 0 (0th order polynomial) the
program finds the average value for the data, and for m = 1 (1st order
polynomial) the program finds the line that gives the best fit to the data.
This program also gives the confidence limits for the fitting constants.
The program and instructions for its use are available in the
physical chemistry laboratory (CP 375) and on my website. Note that
the program does not seem to work on all computers.
You are welcome to use other data analysis programs (the
enhanced version of Excel, ANOVA based programs, and others) to find
confidence limits when fitting data to polynomial functions, but you are
responsible for making sure the programs are providing correct results.