Transcript Probability

Probability function P on subspace of S
1. P(S)=1
2. For every event A in S, P(A)≥0
3. If A1, A2, .. are mutually exclusive, then
P(Ai)=∑ P(Ai)
A random variable X is a function that assigns a
value to each outcome s in the sample space
S (realizations of the random variable).
Example: dart, bulls eye counts 50, i.e. X(s)=50,
s = bulls eye location
Example: measurement of a mass five times,
yielding the true value m + random
measurement errors ei (m+e1, m+e2, …)
Probability Density Function (PDF), fX(x)
The relative probability of realizations for a
random variable
a
P(X≤a)=∫ fX(x)dx
-
-
∫ -fX(x)dx = 1
PDF for common functions:
1. Uniform random variable on [a,b]:
fU(x)=1/(b-a), a≤x≤b; 0, x<a or x>b
PDF for common functions:
2. Normal (Gaussian) function:
fN(x)=1/√2 exp(-0.5(x-)2/2)
N(2) normal distribution
3. Exponential function:
fexp(x)= exp(-x), x≥0; 0, x<0
4. Double-sided Exponential function:
fdexp(x)=[1/(23/2 ] exp(-√2|x-)
5.  function:
f  x 0.5-1 exp(-x/2)

with xx-1e-d
0
n independent variable with standard normal
distributions, Z=∑Xi2 is a random variable
with =n degrees of freedom.
6 . Student's t distribution with  degrees of
freedom:
ft(x)/2) 1/√ (1+x2/)-(+1)/2
with xx-1e-d
Approaches a standard normal distribution for
large number of degrees of freedom
Cumulative distribution functions (CDF):
a
FX(a)=P(X≤a)=∫ fX(x)dx
-
b
P(a≤X≤b) = ∫f(x)dx
a
Characterization of PDF’s:
Expected value of random variable x:


E[X] = ∫ xfX(x)dx, E[g(X)] = ∫ g(x)f
X(x)dx
-
-
Peak of distribution: XML (maximum likelihood)
Variance:
Var(X)=X2 = E[(X-x)2] = E[X2]-X2

2 f (x)dx,  =√Var(X)
(x-
)
x
X
x
-
=∫
Variance measures the width of PDF’s. Wide
PDF’s indicate noisy data, narrow indicate
relatively noise-free data.
Joint PDF’s:
The joint PDF can quantify the probability that a
set of random variables will take on a given
value.
a b
f(X≤a,Y≤b)= ∫ ∫ f(x,y)dydx
- -
Expected value for joint PDF:
 
E[g(X,Y)]=∫ ∫ g(x,y)f(x,y)dydx
-
-
If X and Y are independent,
f(x,y)=fX(x)fY(y)
Covariance of X and Y with Joint PDF:
Cov(X,Y)=E[(X-E[X])(Y-E[Y])]=E[XY]-E[X]E[Y]
X and Y independent, E[XY]=E[X]E[Y] and
Cov(X,Y)=0
Covariance of a variable with itself = variance
If Cov(X,Y)=0, X and Y are called uncorrelated
Correlation of X and Y
(X,Y)=Cov(X,Y)/√Var(X) Var(Y)
(correlation is a scaled covariance)
Model covariance matrix cov(m):
The general model mest=Md + v
has covariance matrix M[cov d]MT (B.64)
The least-squares solution mest=[GTG]-1GTd
has covariance matrix
[cov m] = [GTG]-1GT [cov d] ([GTG]-1GT)T
If data is uncorrelated and has equal variance
d2 then
[cov m] = d2 [GTG]-1
More on Gaussian (Normal) distributions:
Central limit theorem:
Let X1, X2, …, Xn be independent and identically
distributed random variables with finite
expected value  and variance 2. Let
Zn=[X1+X2+..+Xn-n/√n
In the limit as n approaches infinity, Zn
approaches the standard normal distribution
Thus many summed variables in nature are
normal distributions, thus LS solutions OK
Means and confidence intervals:
Given noisy measurements m1, m2, .., mn
Estimate the true value m and the uncertainty of
the estimate. Assume errors are independent
and normally distributed with expected value
0 and some unknown standard deviation 
Compute average mave=[m1+m2+..+mn/n
n
s=[∑ i=1(mi-mave)2]1/2/n-1
Sampling Theorem:
Independent, normally distributed measurements,
with expected value m and standard deviated ,
the random quantity
t = (m-mave)/s√n
has a Student’s t distribution with n-1 degrees of
freedom. If the true standard deviation  is known,
we are dealing with a standard normal distribution.
The t distribution converges toward the normal
distribution for large n
Confidence intervals: Probability that one realization
falls within a specified distance of the true mean
Let tn-1,0.975 = 97.5 percentile of t distribution
tn-1,0.025 = 2.5 percentile of t distribution
P(tn-1,0.975 ≤m-mave/(s/√n) ≤tn-1,0.025)=0.95
P(tn-1,0.975 s/ √n ≤m-mave ≤tn-1,0.025 s/√n )=0.95
95% confidence interval
Due to symmetry:
mave-tn-1,0.975 s/ √n to mave+tn-1,0.975 s/ √n
Confidence intervals related to the  - PDF’s with
large  will have large CI, and vice versa
Gaussian PDF, the 68% CI is 1 wide and the 95% CI
is 2 wide
If a particular Gaussian random variable has =1, and
if a realization of that variable is 50, there is a 95%
chance that the mean of that random variable lies
between 48 and 52
Example B.12 illustrates the case where  is
estimated
If  is known (rarely the case), we have a normal
distribution. We can use the t-distribution with an
infinite # of observations.
E.g., 16 obs, m estimated to be 31.5. is known to be
5. Estimate 80% CI for m.
mave-k ≤ m ≤ mave+k
k=1.282 x 5/√16 = 1.6
31.5-1.6 ≤ m ≤ 31.5+1.6
Statistical Aspects of LS:
PDF for Normal Distribution:
fi(di|m)=1/√2 exp(-(di-(Gm)i)2/22)
Maximum likelihood function L(m|d) is the product of all
individual probability functions:
L(m|d)=f1(d1|m)*f2(d2|m)* … * fm(dm|m)
Idea: Maximize L(m|d)
Maximize log{L(m|d)}
Minimize - log{L(m|d)}
Minimize -2 log{L(m|d)}
min ∑ [di-(Gm)I]2/2
Aside from the 1/2 factor, we have the LS solution
min ∑ [di-(Gm)I]2/2
W=diag(1/1, 1/2, …,1/m)
Gw=WG
dw=Wd
Gwm=dw
mL2=[GwTGw)-1 GwTdw
||dw-Gwmw||22= ∑ [di-(Gm)I]2/2
obs2 = ∑ [di-(Gm)I]2/2
obs2 has a 2 distribution with m-n degrees of freedom
The probability of finding a 2 value as large or larger than the
observed value is

p=∫ 
2(x)dx
f
2 
obs
p-value test
With correct model and independent error, the p-values will be
uniformly distributed between 0 and 1. Near-0 or near-1 pvalues indicates problems (incorrect model,
underestimation of data errors, unlikely realization)
Multivariate normal distribution
Random variables X1, …, Xn have a multivariate normal
distribution, then the joint PDF is (B.61)
f(x)=(2)-n/2 (det[C])-1/2 exp[-(x-)TC-1(x-)/2]
Ci,j=Cov(Xi,Xj)
Eigenvalues and eigenvectors
Ax x
(A-Ix=0
det (A-I) = 0
The roots  are the eigenvalues
SVD
G=USVT
U (m x m) orthogonal spanning data space
V (n x n) orthogonal spanning model space
S (m x n) eigenvalues along diagonal
Let rank(G)=p
G=[Up U0] [Sp 0;0 0] [Vp V0]T = UpSpVpT
G+=VpSp-1UpT = Generalized Inverse (pseudoinverse)
m+=G+d=VpSp-1UpT d = pseudoinverse solution
Since Vp-1=VpT and Up-1=UpT (A.6)
SVD
G=USVT
U (m x m) orthogonal spanning data space
V (n x n) orthogonal spanning model space
S (m x n) eigenvalues along diagonal
Rank(G)=p
Theorem A.5: N(GT)+R(G)=Rm, i.e.
p columns of Up form an orthonormal basis for R(G)
columns of U0 form an orthonormal basis for N(GT)
p columns of Vp form an orthonormal basis for R(GT)
columns of V0 form an orthonormal basis for N(G)
Properties of SVD
G=USVT
1. N(G) and N(GT) are trivial (only null vector):
Up=U, Vp=V square orthogonal matrices,
and UpT=Up-1 and VpT=Vp-1
G+=VpSp-1UpT = (UpSpVpT)-1 = G-1
(inverse for full rank matrix, m=n=p). Unique solution,
data are fit exactly.
Properties of SVD
G=USVT
2. N(G) is nontrivial (model, V); N(GT) is trivial (data, U)
UpT=Up-1 and VpTVp=Ip
Gm+=GG+d=UpSpVpTVpSp-1UpTd=UpSpIpSp-1UpTd=d
i.e. the data are fit exactly an LS solution, but nonuniquely
due to the nontrivial model null space
n
m=m++m0 =m++∑ iV.,I
i=p+1
n
||m||22=||m+||22 +∑ i ≥||m+||22 -> minimum length solution
i=p+1
Properties of SVD
G=USVT
3. N(G) is trivial (model, V); N(GT) is nontrivial (data, U)
Gm+=GG+d=UpSpVpTVpSp-1UpTd=UpUpTd
= projection of d onto R(G), I.e. the point in R(G)
that is closest to d, m+ is LS solution to Gm=d.
If d is in R(G), m+ will be solution to Gm=d.
Solution isn unique but does not fit data exactly
i=p+1
n
i.e. the data are fit exactly an LS solution, but nonuniquely
due to the nontrivial model null space
i=p+1
Properties of SVD
G=USVT
4. N(G) is nontrivial (model, V); N(GT) is nontrivial (data, U)
p < (m,n)
Gm+=GG+d=UpSpVpTVpSp-1UpTd=UpUpTd
= projection of d onto R(G), I.e. the point in R(G)
that is closest to d, m+ is LS solution to Gm=d.
i.e. LS solution to minimum norm, as case 2)
Properties of SVD
G=USVT
- Always exists
- LS or minimum length
- Properly accommodates the rank and dimensions of G
- Nontrivial model null space m0 the is heart of the problem ->
- Infinite # of solutions will fit the data equally well, since
components of N(G) have no effect on data fit, I.e.,
selection of a particular solution requires a priori constraints
(smoothing, minimum length)
- Nontrivial data space are vectors d0 that have no influence
on m+. If p<n, there are an infinite # of data sets that will
produce the same model
Properties of SVD - covariance/resolution
Least squares solution is unbiased:
min ∑ [di-(Gm)I]2/2
W=diag(1/1, 1/2, …,1/m)
Gw=WG, dw=Wd, Gwm=dw
mL2=[GwTGw]-1 GwTdw
||dw-Gwmw||22= ∑ [di-(Gm)I]2/2
E[mL2]=E[(GwTGw)-1 GwTdw] = (GwTGw)-1 GwT E[dw]=
(GwTGw)-1 GwT dwtrue
= (GwTGw)-1 GwT Gwmtrue = mtrue
Properties of SVD - covariance/resolution
Generalized inverse not necessarily unbiased:
E[m+]=E[G+d] = G+E[d] = G+Gmtrue = Rmmtrue
Bias= E[m+]-mtrue =Rmmtrue-mtrue = (Rm-I)mtrue
= VpVpT-VVTmtrue=-V0V0T mtrue I.e., as p increasees Rm->I
Cov(mL2)=2(GTG)-1
Cov(m+)=G+[Cov(d)]G+T = G+G+T = VpSp-2VpT
P
=  ∑ V.,i V.,iT/i2 I.e., as p increases, Rm->I:
I=1
bias decreases while variance increases…!
Model resolution
Rm -> I: increasing resolution
Resolution test: multiply Rm onto a particular model, fx a spike
model, with one element 1 and the rest 0, picks out the
corresponding column of Rm
Data resolution
D+=Gm+ = GG+d = Rdd
Rd=UpSpVpTVpSp-1UpT=UpUpT
p=m -> Rd=I, d+=d
p<m -> Rd<>I, m+ doesn’t fit data exactly
Rm, Rd can be assessed during design of experiment
Instabilitites of SVD
Small eigenvalues -> m+ sensitive to small amounts of noise
Small eigenvalues maybe indistinguishable from 0
Possible to remove small eigenvalues to stabilize solution ->
Truncated SVD, TSVD
Condition number cond(G)=s1/sk