Error Analysis and Curve Fitting Lecture

Download Report

Transcript Error Analysis and Curve Fitting Lecture

Atomic Lab
Introduction to Error Analysis
Significant Figures


For any quantity, x, the best measurement of x is
defined as xbest ±x
In an introductory lab, x is rounded to 1
significant figure

Example: x=0.0235 -> x=0.02


Right and Wrong



g= 9.82 ± 0.02
Wrong: speed of sound= 332.8 ± 10 m/s
Right: speed of sound= 330 ± 10 m/s
Always keep significant figures throughout
calculation, otherwise rounding errors introduced
Statistically the Same
Student A = 30 ± 2
 Student B = 34 ± 5
 Since the uncertainties for A & B overlap,
these numbers are statistically the same

Precision

Mathematical Definition
precision 


x
xbest
Precision of speed of sound= 10/330 ~ 0.33 or
33%
So often we write: speed of sound= 330 ± 33%
Propagation of Uncertainties–
Sums & Differences
Suppose that x, …, w are independent
measurements with uncertainties x, …, w
and you need to calculate
q= x+…+z-(u+….+w)
If the uncertainties are independent i.e. w is not
sum function of x etc then

q 
x
2
   z   u     w
2
2
Note: q < x+…+ z+ u+…+ w
2
Propagation of Uncertainties–
Products and Quotients
Suppose that x, …, w are independent
measurements with uncertainties x, …, w and you
need to calculate
x   z
q
u   w
If the uncertainties are independent i.e. w is not sum
function of x etc then

q
 x 
 z   u 
 w 
           
q
 x
 z u 
 w
or
2
2
2
2
 x 
 z   u 
 w 
q  q                
 x
 z u 
 w
2
2
2
2
Functions of 1 Variable
dq
q 
x
dx

Suppose = 20 ± 3 deg and want to find cos 





3 deg is 0.05 rad
|(d(cos)/d|=| -sin|= sin
(cos)= sin*  = sin(20o)*(0.05)
(cos 20o) = 0.02 rad and cos 20o= 0.94
So cos= 0.94 ± 0.02
Power Law

Suppose q= xn and x ± x
q
q
n
x
x
Types of Errors
Measure the period of a revolution of a
wheel
 As we repeat measurements some will be
more or some less
 These are called “random errors”


In this case, caused by reaction time
What if the clock is slow?



We would never know if our clock is slow; we
would have to compare to another clock
This is a “systematic error”
In some cases, there is not a clear difference
between random and systematic errors

Consider parallax:


Move head around: random error
Keep head in 1 place: systematic
Mean (or average)
x1  x2    xN
x
N
N
x
x
i 1
N
i
Deviation
di  x  xi
Need to calculate an average or “standard” deviation
To eliminate the possibility of a zero deviation, we square di
N
x 
d
i 1
2
i
N 1
When you divide by N-1, it is called the
population standard deviation
If dividing by N, the sample standard
deviation
Standard Deviation of the Mean
The uncertainty in the best measurement is given by the
standard deviation of the mean (SDOM)
N
x 
x 
2
d
 i
i 1
N 1
x
N
If the xbest = the mean, then best =mean
Histograms
4.5
4
Number of
times that
value has
occurred
3.5
3
2.5
2
1.5
1
0.5
0
1
2
3
4
5
Value
6
Distribution of a Craps Game
50
45
40
35
30
25
20
15
10
5
0
Bin
More
12
11
10
9
8
7
6
5
4
3
Bell Curve
Or
Normal
Distribution
2
Frequency
Histogram
Bell Curve
Centroid or Mean
400
350
Frequency
300
250
Dice
200
150
Gaussian
x+
x-
68 %
100
50
0
0
2
4
6
8
Dice Value
Between x-2 to x+2, 95% of population
2 is usually defined as Error
10
12
14
Gaussian
X0
400
350
Frequency
300
250
Dice
200
Gaussian
150
100
50
0
0
2
4
6
8
10
12
14
Dice Value
Gaussian  A * e

x  x0 2

2 x2
In the Gaussian, x0 is the mean and
x is the standard deviation.
They are mathematically equivalent
to formulae shown earlier
Error and Uncertainty
While definitions vary between scientists,
most would agree to the following
definitions
 Uncertainty of measurement is the value
of the standard deviation (1 )
 Error of the measurement is the value of
two times the standard deviation (2 )

Full Width at Half Maximum

A special quantity is the full width at half
maximum (FWHM)




The FWHM is measured by taking ½ of the maximum
value (usually at the centroid)
The width of distribution is measured from the left
side of the centroid at the point where the frequency
is this half value
It is measured to the corresponding value on the right
side of the centroid.
Mathematically, the FWHM is related to the
standard deviation by FWHM=2.354*x
Weighted Average

Suppose each measurement has a unique
uncertainty such as
x1 ± 1
x2 ± 2
…
xN ± N
What is the best value of x?
We need to construct statistical
weights


We desire that measurements with small errors
have the largest influence and the ones with the
largest errors have very little influence
Let w=weight= 1/i2
N
xbest 
w *x
i
i 1
i
N
w
i 1
i
This formula can be used to
determine the centroid of a Gaussian
where the weights are the values of
the frequency for each measurement
x
best

1
N
w
i 1
i
Least Squares Fitting
What if you want to fit a straight line
through your data?
 In other words, yi = A*xi + B
 First, you need to calculate residuals
 Residual= Data – Fit or
 Residual= yi – (A*xi+B)
 When as the Fit approaches the Data, the
residuals should be very small (or zero).

Big Problem
Some residuals >0
 Some residuals <0
 If there is no bias, then rj = -rk and then
rj + rk =0
 The way to correct this is to square rj and
rk and then the sum of the squares is
positive and greater than 0

Chi-square, c2
 yi   A  xi  B  

c   
i
i 1 

N
2
2
We need to minimize this function with respect to A and B so
We take the partial derivative of w.r.t. these variables and set the
resulting derivatives equal to 0
c 2
 N  yi   A  xi  B  
  0


A A i 1 
i

2
c 2
 N  yi   A  xi  B  
  0


B B i 1 
i

2
Chi-square, c2
N
 yi   A  xi  B  
c

2

    2  yi   A  xi  B   xi  0


A
A i 1 
i
i
i 1

2
2
N
N
 yi   A  xi  B  
c

2

    2  yi   A  xi  B   0


B
B i 1 
i
i
i 1

2
2
N
 x y  A  x
N
i
i 1
2
i
i
  y  A  x
N
2
i
i
i 1

N
i 1

N
i 1
i 1
i 1
N
N
2
i
i 1
N
 y   A x
i 1
i
i 1
i 1
 B   yi   A  xi   B  0
 x y   A x
i 1
i 1
N
i 1
i
2
i
N
 B  NB
i
N
 B  xi   xi yi   A  x   B  xi  0
N
N
N
i
N
  B  xi
i 1
 NB
Using Determinants
N
N
N
 x y   A x   B  x
i 1
i i
N
2
i
i 1
i 1
i
N
 y   A  x  NB
i 1
i
i
i 1
N
N

x x
i 1
N
 xi
i 1
i 1
N
i
A
N
N
x y x
x x y
y
x
i 1
N
N
2
i
N
i 1
i i
i 1
i
i 1
N
N
i

B
i 1
2
i
i 1
N
i i
y
i
i 1

i
A Pseudocode
Dim x(100), y(100)
xsum=0
x2sum=0
Xysum=0
N=100
Ysum=0
For i=1 to 100
xsum=xsum+x(i)
ysum=ysum+y(i)
xysum=xysum+x(i)*y(i)
x2sum=x2sum+x(i)*x(i)
Next I
Delta= N*x2sum-(xsum*xsum)
A=(N*xysum-xsum*ysum)/Delta
B=(x2sum*ysum-xsum*xysum)/Delta
c2 Values

If calculated properly, c2 start at large values and
approach 1



This is because the residual at a given point should
approach the value of the uncertainty
Your best fit is the values of A and B which give
the lowest c2
What if c2 is less than 1?!


Your solution is over determined i.e. a larger number
of degrees of freedom than the number of data points
Now you must change A and B until the c2 doesn’t
vary too much
Without Proof
N
 
2
A

i 1
2
yi
 xi

N
 
2
B
N 
i 1

2
yi
Extending the Method

Obviously, can be
expanded to larger
polynomials i.e.



2
2
i
Becomes a matrix
inversion problem
Exponential Functions


 yi  A  x  B  xi  C
c   
i
i 1 
N
Linearize by taking
logarithm
Solve as straight line
A(t )  A0e t
ln  A(t )   ln  A0   t
 


2
Extending the Method

Power Law
I  Ax

Multivariate
multiplicative
function
N
log I  N  log x  log A
Efficiency    A  t
log Efficiency   log   log A  log t
Uglier Functions





q=f(x,y,z)
Use a gradient search
method
Gradient is a vector
which points in the
direction of steepest
ascent
f = a direction
So follow f until it
hits a minimum
     

x
y z
x
y
z
Correlation Coefficient, r2
r2 
N
N
N
i 1
i 1
i 1
N  xi yi   xi  yi


N  x    xi 
i 1
 i 1 
N
N
2
i
2


N  y    yi 
i 1
 i 1 
N
N
2
i
 r2

2
starts at 0 and approaches 1 as fit gets better
r2 shows the correlation of x and y … i.e. is
y=f(x)?

If r2 <0.5 then there is no correlation.