Gauss and the Method of Least Squares

Download Report

Transcript Gauss and the Method of Least Squares

Gauss and the Method of Least Squares
Teddy Petrou Hongxiao Zhu
1
Outline








Who was Gauss?
Why was there controversy in finding the method of least
squares?
Gauss’ treatment of error
Gauss’ derivation of the method of least squares
Gauss’ derivation by modern matrix notation
Gauss-Markov theorem
Limitations of the method of least squares
References
2
Johann Carl Friedrich Gauss
Born:1777 Brunswick, Germany
Died: February 23, 1855, Göttingen, Germany
By the age of eight during arithmetic class he
astonished his teachers by being able to
instantly find the sum of the first hundred
integers.
3
Facts about Gauss



Attended Brunswick College in 1792, where he
discovered many important theorems before even
reaching them in his studies
Found a square root in two different ways to fifty
decimal places by ingenious expansions and
interpolations
Constructed a regular 17 sided polygon, the first
advance in this matter in two millennia. He was only
18 when he made the discovery
4
Ideas of Gauss


Gauss was a mathematical scientist with interests in so many
areas as a young man including theory of numbers, to algebra,
analysis, geometry, probability, and the theory of errors.
His interests grew, including observational astronomy, celestial
mechanics, surveying, geodesy, capillarity, geomagnetism,
electromagnetism, mechanism optics, and actuarial science.
5
Intellectual Personality and Controversy




Those who knew Gauss best found him to be cold and
uncommunicative.
He only published half of his ideas and found no one to share
his most valued thoughts.
In 1805 Adrien-Marie Legendre published a paper on the
method of least squares. His treatment, however, lacked a
‘formal consideration of probability and it’s relationship to least
squares’, making it impossible to determine the accuracy of the
method when applied to real observations.
Gauss claimed that he had written colleagues concerning the
use of least squares dating back to 1795
6
Formal Arrival of Least Squares
•
Gauss
•
•
•
Published ‘The theory of the Motion of Heavenly Bodies’ in 1809.
He gave a probabilistic justification of the method,which was
based on the assumption of a normal distribution of errors.
Gauss himself later abandoned the use of normal error function.
Published ‘Theory of the Combination of Observations Least
Subject to Errors’ in 1820s. He substituted the root mean square
error for Laplace’s mean absolute error.
Laplace Derived the method of least squares (between1802 and
1820) from the principle that the best estimate should have the
smallest ‘mean error’ -the mean of the absolute value of the error.
7
Treatment of Errors



Using probability theory to describe error
Error will be treated as a random variable
Two types of errors
 Constant-associated with calibration
 Random error
8
Error Assumptions

Gauss began his study by making two assumptions


Random errors of measurements of the same type lie within
fixed limits
All errors within these limits are possible, but not necessarily
with equal likelihood
9
Density Function
We define thefunction ( x) with thesame meaningas a density function with
thefollowingproperties.
– T heprobability of errorslying within the interval(x, x  dx) is  ( x)dx
– Small errorsare morelikely tooccur thanlarge ones
– P ositiveand negativeerrorsof thesame maginitudeare equally likely, ( x)   ( x)
10
Mean and Variance


Define k   x ( x)dx . In many cases
assume k=0
Define mean square error as

m 
2
x

(
x
)
dx

2


If k=0 then the variance will equal m
2
11
Reasons for m



2
m 2 is always positive and is simple.
The function is differentiable and integrable unlike
the absolute value function.
The function approximates the average value in cases
where large numbers of observations are being
considered,and is simple to use when considering
small numbers of observations.
12
More on Variance
2
2
m

k
k  0 then variance equals
.
If
Suppose we have independent random variables {e, e' , e' ' ,...}
with standard deviation 1 and expected value 0. The
linear function of total errors is given by E  e   ' e'...
k
k
Now the variance of E is given as M    e   i2
2
i 1
2 2
i i
i 1
This is assuming every error falls within  standard
deviations from the mean
13
Gauss’ Derivation of the Method of Least Squares

Suppose a quantity, V=f(x), where V, x are unknown. We
estimate V by an observation L.



If x is calculated by L, L~f(x), error will occur.
But if several quantities V,V’,V’’…depend on the same
unknown x and they are determined by inexact observations,
then we can recover x by some combinations of the
observations.
Similar situations occur when we observe several quantities that
depend on several unknowns.
14
Gauss’ Derivation of the Method of Least Squares
P roblem:
We want toestimateV , V ' , V ' ' ,  by takingindependent observations : L, L' , L' ' , .
whereV , V ' , V ' ' , . are  functionsof  unkowns x, y, z , .
V  f1 ( x, y, z ,)


V '  f 2 ( x, y, z , )


V "  f 3 ( x, y, z ,)



Let theerrorsin theobservations be :
v :
(V  L)
(V ' L' )
, v' 
,
p
p'
where the p' s are the weights of the' mean errorsof theobservations'.
( Note : We scaled theerrorsso theyhave thesame variance)
15
Gauss’ Derivation of the Method of Least Squares
Consider the followinglinear system:


v  ax  by  cz    l

v '  a ' x  b' y  c ' z    l ' 
v' '  a ' ' x  b' ' y  c' ' z    l ' '



v, v' , v' ' ,  are writtenas  linear functionsof  unkowns x, y, z ,...
where thecoefficient s a, b, c  are known.
Note:1.T hissystemis ' overdetermined', since   
2. T hissystemdescribes a mapping:
F : R   R  , or : parameterspace( x, y, z...)  observation space(v, v' , v"...)
16
Solve an optimization problem :


min  2   '2  ' '2  
where  ,  ' ,  " ,  are coefficien ts of v, v ' , v" , 
s.t :
v   ' v ' ' ' v ' ' etc.  x  k
for some constant k independent of x, y, z, .
We can state the problem as :
We are looking for a linear mapping G(v, v ' , v" , ) from R  to R
1. G  F is the identiy on R

such that :

2. G statisfies an optimalitycondition, described as below :
Suppose x  g (v, v ' , v"...)is the first component of G. T hen
x  g (v, v ' , v"...)  v   ' v ' " v"... k .
We want   2 to be as small as possible, and we want similar condition for
theother componet.
17
Gauss’ Derivation of the Method of Least Squares

Solutions:
 2   '2  ' '2  
  2   '2  ' '2 etc.  (   ) 2  ( ' ' ) 2  ( ' ' ' ' ) 2  etc.
where all the ' s denotethecoefficients we derived by elimination
of thesystem.From which it is obvious that thesum  2   '2  ' '2  
attainsits minimun when    ,  '   ' ,  "   " , etc.

It’s still not obvious:
How do these results relate with the least squares estimation?
18
Gauss’ Derivation of the Method of Least Squares

It can be proved that
(V ( x, y, z , )  L) 2 (V ' ( x, y, z , )  L' ) 2
Let   v  v' v"   


p
p'
Least squares picks the parameter values that minimize , where all thepartials
  
,
,
,  vanish.
x y z


i.e.
 0,
 0, 
x
y
2
2
2
we will get the same results as theminimization of  2   '2 ...
19
Gauss’ derivation by modern matrix notation:
Assume thatobservablequant it ies V1 , V2 ,,V are linear
functionsof parametersx1 , x2 ,, x such that
Vi  bi1 x1  ...  bi x  ci ,
bij , ci  R
we know the valuesof all thebij and ci .
We measure theVi in an attempt toinfer thevaluesof the xi .
Assume Li is an observation of Vi
Switch toa new coordinatesystemby setting:
vi  (Vi  Li ) / pi
T hesystembecomes:
v  Ax  l
20
Gauss’ derivation by modern matrix notation:
Gauss’ results are equivalent to the following lemma:
Suppose A is a    (   ) mat rixof rank  . T hen t her
e is a    mat rix K
such t hatt hefollowingholds :
 x  R  , KAx  x
and amongall such mat ricest hemat rixE  ( AT A) 1 AT has rows of minimun norm.
P roof:
E  ( AT A) 1 AT satisfies thefirst condition.
T heoptimization conditionis that:
Ki
2
 K ii  ...  K i should be as small as possible.
2
2
T hisis equivalent to demanding that thesum of thediagonalentriesof KK T
should be as small as possible.
21
P roof continued:
AT A is invertible, denoteD  ( AT A) 1 . T husx, x  ( AT A) 1 AT Ax  DAT Ax
E  ( AT A) 1 AT , T hus E : DAT , and EAx  x ; Also, we have KAx  x;
Subtracting, we get : x, ( K  E ) Ax  0. T hus( K  E ) A is thezero matrix.
Right multiplying DT and notingthat ADT  E T , we get ( K  E ) E T  0
Finally, KK T  ( E  ( K  E ))(E  ( K  E ))T    EET  ( K  E )(K  E )T
T hisshows that thesolution E is in fact theoptimalone,since if ( K  E ) has
any non - zero entries,( K  E )(K  E )T will have strictlypositiveentrieson its
diagonal.
Returningto our originalequation: v  Ax  l , our lemmashows that
G (v) : E ( Ax  l )  El is theleft inverseof thefunctionF ( x)  Ax  l (G  F ( x)  x)
and amongall linear left inverses,thenon - consistentpart of G is optimal.
22
Gauss-Markov theorem
In a linear model
x  A  
where A is an n  p matrix with rank p,  is an unknown vector,and  is the
error vector.If E( )  0 and Var( )   2 I , thenfor any unbiased estimator
~
~
 of   CT , we haveE(ˆLS )   and Var(CTˆLS )  Var (  )
In other words, when  ' s have thesame varianceand are uncorrelated, theleast squares estimatoris thebest unbiased estimatorwith thesmallest variance.
23
Limitation of the Method of Least Squares

Nothing is perfect:

This method is very sensitive to the presence of
unusual data points. One or two outliers can
sometimes seriously skew the results of a least
squares analysis.
24
References





Gauss, Carl Friedrich, Translated by G. W. Stewart. 1995. Theory of the
Combination of Observations Least Subject to Errors: Part One, Part
Two, Supplement. Philadelphia: Society for Industrial and Applied
Mathematics.
Plackett, R. L. 1949. A Historical Note on the Method of Least Squares.
Biometrika. 36:458–460.
Stephen M. Stiger, Gauss and the Invention of Least Squares. The
Annals of Statistics, Vol.9, No.3(May,1981),465-474.
Plackett, Robin L. 1972. The Discovery of the Method of Least Squares.
Plackett, Robin L. 1972. The Discovery of the Method of Least Squares.
Belinda B.Brand, Guass’ Method of Least Squares: A historically-based
introduction. August 2003

http://www.infoplease.com/ce6/people/A0820346.html

http://www.stetson.edu/~efriedma/periodictable/html/Ga.html
25