Transcript ada09

Sample variance and sample error
• We learned recently how to determine the
sample variance using the sample mean.
• How do we translate this to an unbiased
estimate of the error on a single point?
• We can’t just take the square root! This would
introduce a bias:
Y=sqrt(X)
sqrt(S2)

S2   , even though
2
S2
 2  .
Mean and variance of S2
• Like any other statistic, S2 has its own mean
and variance.
• Need to know these to compute bias in S:
2
N
1

2
2
S2 
X

X


 i 

N1
N  1 i1
N 1
S2 
2
2
 N1

2
(N  1)   2
N 1
N 1
2 2


2
2
Var(S )  
Var(


N 1 )
N 1
4
2 4

.2(N  1) 
2
(N  1)
N 1
Bias in sqrt(S2)
• Define square root function g:
• g(X) and its derivatives:
g(S 2 ) 
S2  S
1 1/ 2
1 3 / 2
g(x)  x , g' (x)  x , g"(x)   x
2
4
1/ 2
• Hence compute bias:
S  g S
2

g"  S 2
2
Var S  ...
 
2
2
4
g"



2

 g 2  
.
 ...
2
N 1


1 2 4
1
  3.
 ...   1 
 ...
8 N 1
 4(N 1)

Unbiased estimator for 
• Re-define bias-corrected estimator for :
S
S
2
1
1
4(N 1)
 X  X 2 1 / 2

i


 

N

3
/
2


P(X, Y)  P(X | Y)P(Y)
 P(Y | X)P(X)
P(Y)
P(Y|X)
• Consider 2 random
Conditional
probabilities
variables X and Y with a
joint p.d.f. P(X,Y) that looks
like:
Y
• To get P(X) or P(Y), project
P(X,Y) on to X or Y axis
and normalise.
• Can also determine P(X|Y)
(“probability of X given Y”)
which is a normalised slice
through P(X,Y) at a fixed
P(X)
value of Y or vice versa.
• At any point along each
slice, can get P(X,Y) from:
P(X|Y)
X
Bayes’ Theorem and Bayesian inference
P(X | Y)P(Y )
• Bayes’ Theorem: P(Y | X ) 
P(X)
• This leads to the method of Bayesian
inference:
P(data | model)P(model)
P(model | data) 
P(data)
Inference  Evidence  Prior
• We can determine the evidence P(data|model)
using goodness-of-fit statistics.
• We can often determine P(model) using prior
knowledge about the models.
• This allows us to make inferences about the
relative probabilities of different models, given
the data.
Choice of prior
X
P(X|a)
• Suppose our model of a set
of data X is controlled by a
parameter .
• Our knowledge about 
before X is measured is
quantified by the prior p.d.f.
P().
• Choice of P() is arbitrary
subject to common sense!
• After measuring X get
posterior p.d.f.
P(|X) = P(X|).P()
• Different priors P() lead to
different inferences P(|X)!

Uniform P(a)
P(a|X)
P(a)~1 / Log (a)
P(a|X)
P(X|a)
• Suppose  is the Doppler shift
of a star.
X
• Adopting a search range
–200 <  < 200 km/sec in
uniform velocity increments
implicitly assumes a uniform
prior.
• Alternatively: scaling an
emission-line profile of known
shape.
• If you know  ≥ 0, can force
 > 0 by constructing the pdf
in uniform increments of
Log so P() ~ 1/Log().
Uniform P()
• Posterior distributions are
skewed differently according P( |X)
to choice of prior.
Examples

P()~1 / Log ()
P( |X)
Relative probabilities of models
• Two models m1, m2
• Relative probabilities depend on
– Ratio of prior probabilities
– Relative ability to fit data
P(m1 | X ) P(X | m1 ) P(m1 )


P(m2 | X ) P(X | m2 ) P(m2 )
• Note that P(data) cancels.
Maximum likelihood fits
• Suppose we try to fit a spectral line +
continuum using a set of data points Xi, i=1...N
2
1 i  0 
 

• Suppose our model is:
2   
X i  C  Ae
Var(Xi )   i2
• Parameters are C, A, 0, 
• i , i assumed known.
 i
Likelihood of a model
• Likelihood of a particular set  of model
parameters (i.e. probability of getting this set of
data given model  ), is:
L()  P(X | )  P(X1 | )  P(X2 | ) 
P(X N | )
N
  P(X i | )
i1
• If errors are gaussian, then:
1
P(Xi | ) 
e
 i 2
2
1 Xi  i 

 

2
  i 
1
N


 2
1
 L()  (2 ) N / 2  e 2
 i1  i 
To maximise L(),
must minimise
 2  2  ln  i
 2 ln  L()   2  2  ln  i  constant
i
i
Estimating 
• Data points Xi with no errors:
Xi  A
   


i1  
2
N
2
• To find A, minimise 2 2.
Xi
120
110
100
90
80
70
60
0
5
10
15
20
350
300
250
• Can’t use 2 minimisation to estimate
 because :
 2  0 as   
• Instead, minimise
2 ln L    2N ln 
2
200
-2 ln L
150
2N ln 
100
50

0
0
10

20

i