Transcript ada09
Sample variance and sample error
• We learned recently how to determine the
sample variance using the sample mean.
• How do we translate this to an unbiased
estimate of the error on a single point?
• We can’t just take the square root! This would
introduce a bias:
Y=sqrt(X)
sqrt(S2)
S2 , even though
2
S2
2 .
Mean and variance of S2
• Like any other statistic, S2 has its own mean
and variance.
• Need to know these to compute bias in S:
2
N
1
2
2
S2
X
X
i
N1
N 1 i1
N 1
S2
2
2
N1
2
(N 1) 2
N 1
N 1
2 2
2
2
Var(S )
Var(
N 1 )
N 1
4
2 4
.2(N 1)
2
(N 1)
N 1
Bias in sqrt(S2)
• Define square root function g:
• g(X) and its derivatives:
g(S 2 )
S2 S
1 1/ 2
1 3 / 2
g(x) x , g' (x) x , g"(x) x
2
4
1/ 2
• Hence compute bias:
S g S
2
g" S 2
2
Var S ...
2
2
4
g"
2
g 2
.
...
2
N 1
1 2 4
1
3.
... 1
...
8 N 1
4(N 1)
Unbiased estimator for
• Re-define bias-corrected estimator for :
S
S
2
1
1
4(N 1)
X X 2 1 / 2
i
N
3
/
2
P(X, Y) P(X | Y)P(Y)
P(Y | X)P(X)
P(Y)
P(Y|X)
• Consider 2 random
Conditional
probabilities
variables X and Y with a
joint p.d.f. P(X,Y) that looks
like:
Y
• To get P(X) or P(Y), project
P(X,Y) on to X or Y axis
and normalise.
• Can also determine P(X|Y)
(“probability of X given Y”)
which is a normalised slice
through P(X,Y) at a fixed
P(X)
value of Y or vice versa.
• At any point along each
slice, can get P(X,Y) from:
P(X|Y)
X
Bayes’ Theorem and Bayesian inference
P(X | Y)P(Y )
• Bayes’ Theorem: P(Y | X )
P(X)
• This leads to the method of Bayesian
inference:
P(data | model)P(model)
P(model | data)
P(data)
Inference Evidence Prior
• We can determine the evidence P(data|model)
using goodness-of-fit statistics.
• We can often determine P(model) using prior
knowledge about the models.
• This allows us to make inferences about the
relative probabilities of different models, given
the data.
Choice of prior
X
P(X|a)
• Suppose our model of a set
of data X is controlled by a
parameter .
• Our knowledge about
before X is measured is
quantified by the prior p.d.f.
P().
• Choice of P() is arbitrary
subject to common sense!
• After measuring X get
posterior p.d.f.
P(|X) = P(X|).P()
• Different priors P() lead to
different inferences P(|X)!
Uniform P(a)
P(a|X)
P(a)~1 / Log (a)
P(a|X)
P(X|a)
• Suppose is the Doppler shift
of a star.
X
• Adopting a search range
–200 < < 200 km/sec in
uniform velocity increments
implicitly assumes a uniform
prior.
• Alternatively: scaling an
emission-line profile of known
shape.
• If you know ≥ 0, can force
> 0 by constructing the pdf
in uniform increments of
Log so P() ~ 1/Log().
Uniform P()
• Posterior distributions are
skewed differently according P( |X)
to choice of prior.
Examples
P()~1 / Log ()
P( |X)
Relative probabilities of models
• Two models m1, m2
• Relative probabilities depend on
– Ratio of prior probabilities
– Relative ability to fit data
P(m1 | X ) P(X | m1 ) P(m1 )
P(m2 | X ) P(X | m2 ) P(m2 )
• Note that P(data) cancels.
Maximum likelihood fits
• Suppose we try to fit a spectral line +
continuum using a set of data points Xi, i=1...N
2
1 i 0
• Suppose our model is:
2
X i C Ae
Var(Xi ) i2
• Parameters are C, A, 0,
• i , i assumed known.
i
Likelihood of a model
• Likelihood of a particular set of model
parameters (i.e. probability of getting this set of
data given model ), is:
L() P(X | ) P(X1 | ) P(X2 | )
P(X N | )
N
P(X i | )
i1
• If errors are gaussian, then:
1
P(Xi | )
e
i 2
2
1 Xi i
2
i
1
N
2
1
L() (2 ) N / 2 e 2
i1 i
To maximise L(),
must minimise
2 2 ln i
2 ln L() 2 2 ln i constant
i
i
Estimating
• Data points Xi with no errors:
Xi A
i1
2
N
2
• To find A, minimise 2 2.
Xi
120
110
100
90
80
70
60
0
5
10
15
20
350
300
250
• Can’t use 2 minimisation to estimate
because :
2 0 as
• Instead, minimise
2 ln L 2N ln
2
200
-2 ln L
150
2N ln
100
50
0
0
10
20
i