Transcript Document
QMDA
Review Session
Things you should remember
1. Probability & Statistics
the Gaussian or normal
distribution
variance
expected value
p(x) =
1
(2p)s
exp{ -
2
(x-x)
/
2
2s
)
Properties of the normal distribution
Expectation =
Median =
p(x)
Mode = x
95%
x
x-2s x
x+2s
95% of
probability
within 2s of
the expected
value
Multivariate Distributions
The Covariance Matrix, C, is very important
Cij
the diagonal elements give the variance of each xi
sxi2 = Cii
The off-diagonal elemements of C indicate whether
pairs of x’s are correlated. E.g.
C12
C12>0
x2 positive correlation
C12<0
x2 negative correlation
x2
x2
x1
x1
x1
x1
the multivariate normal distribution
p(x) = (2p)-N/2 |Cx|-1/2 exp{ -1/2 (x-x)T Cx-1 (x-x) }
has expectation x
covariance Cx
And is normalized to unit area
if y is linearly related to x, y=Mx
then
y=Mx
(rule for means)
Cy = M Cx MT
(rule for propagating error)
These rules work regardless of the
distribution of x
2. Least Squares
Simple Least Squares
Linear relationship between data, d, and model, m
d = Gm
Minimize prediction error E=eTe with e=dobs-Gm
mest = [GTG]-1GTd
If data are uncorrelated with variance, sd2, then
Cm = sd2 [GTG]-1
Least Squares with prior constraints
Given uncorrelated with variance, sd2, that satisfy a
linear relationship d = Gm
And prior information with variance, sm2, that satisfy a
linear relationship h = Dm
The best estimate for the model parameters, mest, solves
d
G
m=
eh
eD
With e = sm/sd.
Previously, we
discussed only the
special case h=0
Newton’s Method for Non-Linear LeastSquares Problems
Given data that satisfies a non-linear relationship
d = g(m)
Guess a solution m(k) with k=0 and linearize around it:
Dm = m-m(k) and Dd = d-g(m(k)) and Dd=GDm
With Gij = gi/mj evaluated at m(k)
Then iterate, m(k+1) = m(k) + Dm with Dm=[GTG]-1GTDd
hoping for convergence
3. Boot-straps
Investigate the statistics of y by
creating many datasets y’
and examining their statistics
each y’ is created through
random sampling with replacement
of the original dataset y
N resampled data
Random integers
in the range 1-N
N original data
y1
4
y’1
y2
3
y’2
y3
7
y’3
y4
11
y’4
y5
4
y’5
y6
1
y’6
y7
9
y’7
…
…
…
yN
6
y’N
Example:
statistics of the
mean of y, given
N data
Compute estimate
1
Si y’i
N
Now repeat a gazillion times
and examine the resulting
distribution of estimates
4. Interpolation and Splines
linear splines
yi yi+1
y
in this interval
y(x) = yi + (yi+1-yi)(x-xi)/(xi+1-xi)
1st derivative
discontinuous here
xi
xi+1
x
cubic splines
yi yi+1
y
cubic a+bx+cx2+dx3
in this interval
a different cubic
in this interval
1st and 2nd derivative
continuous here
xi
xi+1
x
5. Hypothesis Testing
The Null Hypothesis
always a variant of this theme:
the results of an experiment differs
from the expected value only because
of random variation
Test of Significance of Results
say to 95% significance
The Null Hypothesis would generate
the observed result less than 5% of the
time
Four important distributions
Normal distribution
Distribution of xi
Chi-squared distribution
Distribution of c2 = Si=1Nxi2
Student’s t-distribution
Distribution of t = x0 / { N-1 Si=1Nxi2 }
F-distribution
Distribution of F = { N-1Si=1N xi2} / { M-1Si=1M xN+i2 }
5 tests
mobs = mprior when mprior and sprior are known
normal distribution
sobs = sprior when mprior and sprior are known
chi-squared distribution
mobs = mprior when mprior is known but sprior is unknown
t distribution
s1obs = s2obs when m1prior and m2prior are known
F distribution
m1obs = m2obs when s1prior and s2prior are unknown
modified t distribution
6. filters
Filtering operation g(t)=f(t)*h(t)
“convolution”
t
g(t) = - f(t-t) h(t) dt gk = Dt Sp=-k fk-p hp
or alternatively
g(t) = 0 f(t) h(t-t) dt
gk = Dt Sp=0 fp hk-p
How to do convolution by hand
x=[x0, x1, x2, x3, x4, …]T and y=[y0, y1, y2, y3, y4, …]T
Reverse on time-series, line them up as shown, and multiply rows. This
is first element of x*y
x0, x1, x2, x3, x4, …
… y4, y3, y2, y1, y0
[x*y]1= x0y0
Then slide, multiply rows and add to get the second element of x*y
x0, x1, x2, x3, x4, …
… y4, y3, y2, y1, y0
[x*y]2= x0y1+x1y0
And etc …
Matrix formulations of g(t)=f(t)*h(t)
g=Fh
g0
g1
…
gN
= Dt
f0 0 0 0 0 0
f1 f0 0 0 0 0
…
f N … f3 f 2 f 1 f 0
h0
h1
…
hN
and
g0
g1
…
gN
= Dt
h0 0 0 0 0 0
h1 h0 0 0 0 0
…
hN … h 3 h 2 h1 h0
g=Hf
f0
f1
…
fN
g=Hf
g0
g1
…
gN
= Dt
h0 0 0 0 0 0
h1 h0 0 0 0 0
…
hN … h 3 h2 h1 h0
f0
f1
…
fN
Least-squares equation [HTH] f = HTg
X(0)
X(1)
X(2)
…
X(N)
=
A(0)
A(1)
A(2)
…
A(N)
Cross-correlation of h and g
A(1)
A(0)
A(1)
A(2)
A(1)
A(0)
…
…
…
A(N-1) A(N-2) …
f0
f1
…
fN
Autocorrelation of h
Ai and Xi
Auto-correlation of a time-series, T(t)
A(t) =
+
-
T(t) T(t-t) dt
Ai = Sj Tj Tj-i
Cross-correlation of two time-series T(1)(t) and T(2)(t)
X(t) =
+
-
T(1)(t) T(2)(t-t) dt
Xi = Sj T(1)j T(2)j-i
7. fourier transforms and spectra
Integral transforms:
C(w) =
+
-
T(t) exp(-iwt) dt
T(t) = (1/2p)
+
-
C(w) exp(iwt) dw
Discrete transforms (DFT)
Ck = Sn=0N-1 Tn exp(-2pikn/N ) with k=0, …, N-1
Tn = N-1Sk=0N-1 Ck exp(+2pikn/N ) with n=0, …, N-1
Frequency step: DwDt = 2p/N
Maximum (Nyquist) Frequency wmax = 1/ (2Dt)
Aliasing and cyclicity
in a digital world wn+N = wn
and
since time and frequency play
symmetrical roles in exp(-iwt)
tk+N = tk
One FFT that you should know:
C(w) = -
+
d(t) exp(-iwt) dt = exp(0) = 1
FFT of a spike at t=0 is a constant
Error Estimates for the DFT
Assume uncorrelated, normally-distributed data, dn=Tn, with
variance sd2
The matrix G in Gm=d is Gnk=N-1 exp(+2pikn/N )
The problem Gm=d is linear, so the unknowns, mk=Ck, (the
coefficients of the complex exponentials) are also
normally-distributed.
Since exponentials are orthogonal, GHG=N-1I is diagonal
and Cm= sd2 [GHG]-1 = N-1sd2I is diagonal, too
Apportioning variance equally between real and imaginary
parts of Cm, each has variance s2= N-1sd2/2.
The spectrum sm2= Crm2+ Cim2 is the sum of two
uncorrelated, normally distributed random variables and is
thus c22-distributed.
The 95% value of c22 is about 5.9, so that to be significant, a
peak must exceed 5.9N-1sd2/2
Convolution Theorem
transform[ f(t)*g(t) ] =
transform[g(t)] transform[f(t)]
Power spectrum of a stationary time-series
T(t) = stationary time series
C(w) =
+T/2
-T/2
T(t) exp(-iwt) dt
S(w) = limT T-1 |C(w)|2
S(w) is called the power spectral density, the spectrum
normalized by the length of the time series.
Relationship of power spectral density
to DFT
To compute the Fourier transform, C(w), you multiply
the DFT coefficients, Ck, by Dt.
So to get power spectal density
T-1 |C(w)|2 =
(NDt)-1 |Dt Ck|2 =
(Dt/N) |Ck|2
You multiply the DFT spectrum, |Ck|2, by Dt/N.
Windowed Timeseries
Fourier transform of long time-series
convolved with the Fourier Transform of the
windowing function
is Fouier transform of windowed time-series
Window Functions
Boxcar
its Fourier transform is a sinc function
which has a narrow central peak
but large side lobes
Hanning (Cosine) taper
its Fourier transform
has a somewhat wider central peak
but now side lobes
8. EOF’s and factor analysis
Representation of samples as a linear mixing of factors
S=CF
(A in s1) (B in s1) (C in s1)
(A in s2) (B in s2) (C in s2)
(A in s3) (B in s3) (C in s3)
…
(A in sN) (B in sN) (C in sN)
=
(f1 in s1) (f2 in s1) (f3 in s1)
(f1 in s2) (f2 in s2) (f3 in s2)
(f1 in s3) (f2 in s3) (f3 in s3)
…
(f1 in sN) (f2 in sN) (f3 in sN)
(A in f1)
(A in f2)
(A in f3)
(B in f1)
(B in f2)
(B in f3)
(C in f1)
(C in f2)
(C in f3)
Samples
Coefficients
Factors
NM
NM
MM
data approximated with only most important factors
p most important factors = those with the biggest
coefficients
(A in s1) (B in s1) (C in s1)
(A in s2) (B in s2) (C in s2)
(A in s3) (B in s3) (C in s3)
…
(A in sN) (B in sN) (C in sN)
Samples
NM
=
(f1 in s1) (f2 in s1)
(f1 in s2) (f2 in s2)
(f1 in s3) (f2 in s3)
…
(f1 in sN) (f2 in sN)
ignore f3
S C’ F’
(A in f1)
(A in f2)
(B in f1)
(B in f2)
(C in f1)
(C in f2)
ignore f3
selected
coefficients
selected
factors
Np
pM
Singular Value Decomposition (SVD)
Any NM matrix S and be written as the product of three
matrices
S = U L VT
where
U is NN and satisfies UTU = UUT
V is MM and satisfies VTV = VVT
and
L is an NM diagonal matrix of singular values, li
SVD decomposition of S
S = U L VT
write as
S = U L VT = [U L] [VT] = C F
So the coefficients are C = U L
and the factors are
F = VT
The factors with the biggest li’s are the most important
Transformations of Factors
If you chose the p most important factors, they define both a
subspace in which the samples must lie, and a set of
coordinate axes of that subspace. The choice of axes is not
unique, and could be changed through a transformation, T
Fnew = T Fold
A requirement is that T-1 exists, else Fnew will not span the
same subspace as Fold
S = C F = C I F = (C T-1) (T F)= Cnew Fnew
So you could try to implement the desirable factors by
designing an appropriate transformation matrix, T
9. Metropolis Algorithm and
Simulated Annealing
Metropolis Algorithm
a method to generate a vector x of
realizations of the distribution p(x)
The process is iterative
start with an x, say x(i)
then randomly generate another x in its
neighborhood, say x(i+1), using a distribution
Q(x(i+1)|x(i))
then test whether you will accept the new x(i+1)
if it passes, you append x(i+1) to the vector x that
you are accumulating
if it fails, then you append x(i)
a reasonable choice for Q(x(i+1)|x(i))
normal distribution with mean=x(i) and sx2 that quantifies the
sense of neighborhood
The acceptance test is as follows
first compute the quantify:
a=
p(x(i+1)) Q(x(i)|x(i+1))
p(x(i)) Q(x(i+1)|x(i))
If a>1 always accept x(i+1)
If a<1 accept x(i+1) with a probability of a
and accept x(i) with a probability of 1-a
Simulated Annealing
Application of Metropolis to Nonlinear optimization
find m that minimizes E(m)=eTe
where e = dobs-g(m)
Based on using the Boltzman
distribution for p(x) in the Metropolis
Algorithm
p(x) = exp{-E(m)/T}
where temperature, T, is slowly
decreased during the iterations
10. Some final words
Start Simple !
Examine a small subset of your data and
looking them over carefully
Build processing scripts incrementally,
checking intermediated results at each stage
Make lots of plots and look them over carefully
Do reality checks