Statistics_and_kosmoshow

Transcript Statistics_and_kosmoshow

Statistical methods in
cosmology
André Tilquin (CPPM) [email protected]
m( z , ms ,  m ,  X , w0 , w1 )  ms  5 log 10 c.DL
k  1  m   X
k  0 
 DL 
1 z
sin(   k J )
 k
k  0 
 DL  (1  z ) J
k  0 
 DL 
z
J 
0
1 z
sinh(  k J )
k
dz '
; H ( z )  (1  z ) 3  m  f ( z ) X  (1  z ) 2  k
H ( z' )
f ( z )  e 3 w1z (1  z ) 3(1 w0  w1 )
Outlook
•
•
General problem
The frequentist statistic
– Likelihood, log-likelihood and 2
– Fisher analysis and limitation
•
The Bayesian statistic
– The Bayes theorem
– Example and interpretation.
•
Summary
General problem(1)
•Let assumes we have N Supernovae at different redshift
(mi , zi ,  mi )i 1, N
and a given model:
How to find the best curve ?
1. We look for the closest curve with
respect to the data points:
2
min
D
 i (m(k , zi )  mi )
2
But the worst measured points should
have less weight.

2
min
 m( k , zi )  mi 

 i 



m
i


2
mth  m(k , z )
General problem(2)
2. The problem now is:
•
Find the k value such that
minimum
2
is
 
0
 k
•
Computed the errors on k starting
from the errors on mi
   f ( m , mi ,  k )
k
i
Statistic is necessary
 m( k , zi )  mi 

  i 


 mi


2
2
Freqentist statistic
Definition: Probability is interpreted as the frequency
of the outcome of a repeatable experiment.
Central limit theorem
If you repeat N times a measurement, and for N, the
measurement distribution (pdf) will be a Gaussian
distribution around the mean value with a half width
equal to the error of your experiment.
Maximum likelihood(=0).
What is the best curve ?
Answer : The most probable curve !
The probability of the theoretical curve is
the product of each individual point to be
around the curve:

mi  m (  k , zi ) 2

n
n
L   p(mi ,m( k , zi ))  
i 1
i 1
1
2  m
e
2
2 m
i
i
We have to maximze L with respect to k 
Because it is simpler to work with sum:
1 n m  m( k , zi ) 
Ln( L)   Ln( 2  mi )  i 1 i
2
 m2 i
i 1
n
We have to minimize
L
0
 k
2
 with respect to k

 
0
 k
Some 2 property
Probability and 2 : By definition
Ln( L  p)  Ln( L0 )  1 / 2  2  L(  2 )  L0e

1
2
2
Definition in matrix form:
 m1 
Si m   .  then    (m  m0 )  V 1 (m  m0 )
 
mn 
Second derivative:
2
2
2 2
(
x

x
)
(
x

x
)




2

0
0
 

2
 2  2
2
2
x
x
x
x
x
Vii1 (   0)  [
2 2 

1


1
Vij  

2  mi m j 
•The first 2 derivative gives le minimum
•The second 2 derivative gives the weight matrix = inverse of the
error matrix independent of the measured data points.
1
 i2
]
Computing errors
When the 2 is defined on measured variables (i.e magnitude),
how to compute the errors on physical parameters, k ?
   m  mth ( )V 1 (m  mth ( )
We perform a Tailor expansion of the 2 around minimum:
n 
 (mi ,  k )  

2
min
(mi , 
min
k
)
n  p 

 k  

min T
k
 2
 k

 k  

min T
k
 min
k
=0
1 2 2
2  k l

k
  min
k
 min
k

   ,  min
k
 p
If the transformation m(k) is linear, then:
The second derivative of the 2 is a symetric positive matrix
0 
0

1
The error on k are Gaussian
  k  k U k  k

2 2

1

 
1
U kl  

2  k l 
 


Computing errors(2)
  i 1
2
Simple case:
n
mi  m( k , zi )2
 m2

n m  m( k , zi )  m( k )
 2i 1 i
 k
 m2 i
 k
i
2
2
1 2 2
n m( k , zi ) 1 m( k , zi )
n mi  m( k , zi )   m( k )
 i 1
 i 1
2
2  k  l
 k
 mi
 l
 m2 i
 k  l
If m(k, zi) is linear
1
ii
V
0
Jacobi
Error on physical parameter are deduced by a simple projection on the k
space parameter (linear approximation) Fisher analysis

U
1
 m( k , zi ) 
1  m( k , zi ) 

 V 





k
k

 i ,k

 i ,k
Independant
of the
measured
points
2
If m(k, zi) is not linear: Fisher is a
1   m( k ) 
1
mi  m( k , zi )V 
  U
good approximation if:
  k  l 
Exemple: variables change
Assume we know errors on m et  , (no correlation). We would like to
compute errors on S=m+ et D=m-:
•We construct the covariance matrix
•We construct the Jacobian:
SD




 m
2

  S  D


2

•We project:
V
1

1
1
J U J
 2 m   2 
V  2
2



 
m
m  )


1 /  2  
0
 m / S  / S 
1 1 
J 
 1 / 2




/

D


/

D
1

1



 m

•We inverse V:
 (
2

1
/

m
U 1  
 0
4 2 m  2 
 2   2
 2   2

m
 2 m   2 
 2
2



  
m
2



S
m

 
   S  D
  (  m   )   2 m   2 
 2   2
 2   2

m
 S  D 

 D2 
 2   2
;  2
    2

m

m


 

m
External constraint or prior
•Problem: Using SN we would like to measured ( m,) knowing
that from the CMB we have :T=m+=1.010.02.
•This measurement is independent from the SN measurement. So
we can add it to 2.
2
o 2
 2  i 1
n
mi  m( m ,   , zi ) 
 m2


i
m
    T
 2

T
All the previous equations are still correct by replacing:
1 /  m21

 0
U 1  
 ..
 0

0
..
..
0
0 1 /  m2 n
0
0
And the Jacobi:

 m1  m( m ,   , z1 ) 



...



 et mi  m( k , zi )  
mn  m( m ,   , zn )



o
1 /  2 T 
  m     T  ( n 1)
0
0
0
 m( k , zi ) 


 k

J 

(



)
m
 


 k ( m , )
 k
Minimisation of 2 and bias
estimate
 2  i 1
n
 2
n mi  m( k , zi )  m( k , zi )
 2i 1
0
2
k
 mi
k
mi  m( k , zi )2
 m2
i
We Tailor expand the 2 around ko:
2


 2 (k )   2 (ok ) 
 k
2 2
1

k  ok   k  ok  2 2
k
o
We apply the minimum condition in k
k
 2 ( k )
1 2 2
0
 k
2  2k
1  2
k    
2  k

 ok
V 1 (0k )  J   U 1 ( mi )  J
o
k

k
 ok 
ok

 ok


J   U 1 ( mi )  mi  m(ok , zi ) ( n )
We get the first order iterative équation:

k
 ok

( p)


 ( J TU 1 J ) 1  J   U 1  mi  m(ok , zi ) ( n )
If theoritical model is linear, this equation is exact (no iteration)
Non-linéarity
•If m(k) is linear in k then:
•If errors on mi are Gaussian then errors on k will be
12  m  m0 ( )V 1 (m  m0 ( )
• 2(k) is exactly a quadratic form

 
0 
k
0
k

  k   U k  
• The covariance matrix is positive and symetric
2
2
1

U 1
 m( k , zi ) 
1  m( k , zi ) 

 V 

 k
 k

 i ,k

 i ,k
•Fisher analysis is rigorously exact.
•On the contrary only 12 is rigorously exact: Fisher matrix is a linear
approximation.
The only valid properties are:
  
 
12


Best fit is given by
 0 =>      

1
2
 k
The « s » sigma error is given by
o
i
2
 k i
solving: 12 =
i

2

2

ok
 k
 ok
2
+s
min
Non linearity: example
Evolution of 12- min2 : SNAP simlation, flatness at 1 %
2=min2+1
Fisher
analysis
Secondary
minimum
asymetric error
-
w0  1.00 00..38
13
+
Rem:This secondary minimum is
highly due to non linearity.
w0  0.8800..26
25
Non Gaussianity
When errors on observables are not Gaussian, only the minimum
iterative equation can be use. So go back to the definition:
“Probability is interpreted as the frequency of the outcome of a
repeatable experiment” and do simulation: Gedanken experiments
a) Determine the cosmological model {k0 } by looking for the best fit
parameters on data. This set of parameters is assumed to be the true
cosmology.
b) Compute the expected observables and randomize them inside the
experimental errors, taking into account non Gaussianity. Do the same thing
with prior.
c) For each “virtual” experiment, compute the new minimum to get a new set of
cosmological parameters ki
d) Simulate as many virtual experiments as you can
e) The distributions of these “best fit” value {ki} give the errors:
• The error matrix is given by second order moments:
Ui,j={<ij> - <i> <j>} positive define
• The error on errors scale as σ(σ) ~σ/√2N
Bayesian statistic
or
The complexity of interpretation
1702-1761 (paper only published in 1764)
Bayesian theorem.
measurement
Posterior to measurement
Posterior 
Prior to measurement
Likelihood * prior
marginal Likelihood
Normalization factor=Evidence
Normalization factor is the sum over all possible posteriors to ensure
unitarily of probability.
p(M  E i ) * p ( Ei  M )
p(E i  M) 
 j p(M  E j ) * p( E j  M )
Where > means after and < means before.
Example
• Question: Suppose you have been tested
positive for a disease; what is the probability that
you actually have the disease?
-Efficiency of the test:
-Disease is rare:
L(T   D  )  L(T   D  )  95%
L(T   D  )  L(T   D  )  5%
p( D   T )  1%
p( D   T )  99%
What is the Bayesian probability?
L(T   D  ) * p( D   T )
p( D  T ) 
L(T   D  ) * p( D   T )  L(T   D  ) * p( D   T )
0.95 * 0.01
p( D   T  ) 
 16%
0.95 * 0.01  0.05 * 0.99


Why a so small Bayesian probability (16%) compare to Likelihood
probability of 95%? Which method is wrong?
Intuitive argument.
• Over 100 people, the doctor expect 1 people has a disease and 99
have no disease. If doctor makes test to all people:
– 1 has a disease and will probably have a positive test
– 5 will have a positive test while they have no disease
6 positive tests for only 1 true disease
So the probability for a patient to have a disease when the test is positive is
1/6~16%
=>Likelihood is wrong?
• In the previous argument doctor used the whole population to
compute its probability:
– 1% disease and 99% not disease before the test
– He have assumed that the patient is a random guy. The patient state before
the measurement is a superposition of 2 states
• /patient> = 0.01*/desease>+0.99*/healthy>
• But what about yourself before the test?
• /you> = /disease> or /healthy> but not both state in the same time
/you>  /patient>
=>Bayesian is wrong?
Which statistic is correct? Both!
• But they do not answer to the same question!:
– Frequentist: If “my” test is positive what is the
probability for me to have a disease? 95%
– Bayesian: If “one of the” patient has a positive test
what is the probability for this patient to have a
disease? 16%
• Different questions give different answers!
Conclusion: In Bayesian statistic, the most important is the prior
because it can change the question!
It’s the reason why statistician like Bayesian statistic, because
just playing with prior can solve a lot of different problems.
On the contrary, in Scientific works, we should care about prior
and interpretation of the Bayesian probability.
Summary
• Both statistics are used in cosmology and give similar
results if no or week priors are used.
• The frequentist statistic is very simple to use for
gaussian errors and rather linear model. Errors can
easily be computed using Fisher analysis.
• Bayesian statistic might be the only method to solve very
complex problem. But warning about probability
interpretation!
• For complex problem, only simulation can be used and
are lot of computing time.
• When using priors in both cases a careful analysis of
results should be done
References
• http://pdg.lbl.gov/2009/reviews/rpp2009-revstatistics.pdf
• http://www.inference.phy.cam.ac.uk/mackay/itila/
• http://ipsur.r-forge.r-project.org/
• http://www.nu.to.infn.it/Statistics/
• http://en.wikipedia.org/wiki/F-distribution
• http://www.danielsoper.com/statcalc/calc07.aspx
• If you have any question or problem, send me an
e-mail: [email protected]
Kosmoshow: cosmology in one click
Actions: Choose different probes
Kosmosfit: fitting and error computing
Main table: SN definition
Fitting options
Prior definition.
Your cosmology
used for simulation
Predefines survey:
Data or simulation
Define different
dark energy
Manage files
parameterization
and load
predefine survey
Parameters to be fitted
Click icon and
then in any
place for help
Server : 166.111.26.237
• Connect to the server:
– User: student
– Pwd: thcaWorkshop
• Create a directory with your name:
– mkdir tilquin
•
•
•
•
source /cosmosoft/environment/idl-env.sh
Go to your work directory
cp /home/tilquin/kosmoshow/*.* .
idl
kosmoshowsc
Or download from http://marwww.in2p3.fr/~tilquin/
Non-linearity: computing
errors
If m(k) is not linear, errors on k is not gaussian. Fisher analysis is
no more correct. If one use it, results should be verify a posteriori.
To estimated the errors we should come back to the first definition of
the 12 and solve the equation 12 = 2min +1
12 ( M ,  )  m  m0 ( M ,  )V 1 (m  m0 ( M ,  )
If we want to estimate () what about M ? How to take care of
correlation ? How to marginalize over M ?
•Average answer (simulation)
•Most probable answer (data)
1
 ( )   12 ( M ,  ) p( 12 ( M ,  )d M
2
1
0
12 (  )  min 12 ( M ,   ),  M 
It can be shown than both methods are equivalent for simulation
if simulated point are not randomized. mmes = mth
Bayesian evidence
• Bayes forecasts
•
method:
•
define experiment configuration and models
•
simulate data D for all fiducial parameters
•
compute evidence (using the data from b)
•
plot evidence ratio B01 = E(M0)/E(M1)
•
limits: plot contours of iso-evidence ratio
•
ln(B01) = 0 (equal probability)
•
ln(B01) = -2.5 (1:12 ~ substantial)
•
ln(B01) = -5 (1:150 ~ strong)
•
Computationally intensive: need to calc. 100s of
• evidences
Graphical interpretation (contour)
2
 s 2 define an iso-probability ellipse.
•The equation:  2   min
U
2
1


1 /  2 m

 0
(0)


m
m

2
m


1 /  2  
  
2
0
V 1  J U 1 J
(0)






2

m  


2
1
( 0)
( 0)
( 0)
( 0)
2  m     ( m    )
1  m     ( m    )
 
V 
1
( 0)
( 0) 
( 0)
(0) 
  m     (  m    )    m     ( m    ) 
m  
m  
2
 2   min
1
 2   2
m
 (0 )

2
 2   min
1


(0)
m


(0)

39%
39%

 (m0 )
68%
m
 2   2
m

m
 X Y
tg (2 )  2  2
 X   Y2
-/4
 (m0 )   (0 )


m  
Systematic errors
Définition: Systematic error is all that is not statistic.
Statistic: If we repeat « n » the measurement of the quantity Q
with a statistical error Q, the average value <Q>
tends to the true value Q0 with an error Q/n.
Systematic:Whatever is the number of experiments, <Q> will
never tends to Q0 better than the systematic error S.
How to deal with:
If systematic effect is measurable, we correct it, by calculating
<Q-Q> with the error Q’2= Q2+ Q2
If not, we add the error matrices: V’ = Vstat+Vsyst and we
use the general formalism.
Challenge:The systematic error should be less than the statistical error. If
not, just stop the experiment, because they won !!!!
Error on the z parameter
SNAP mesure mi et zi with errors
m et z. Redshift is used as a
paremeter on the theoritical model
and its error is not on the 2.
mi  m( m ,   , zi ) 2
 2  i 1
n
 m2
i
m
But the error on z leads to an
error on m(m,,zi)

(z)
m
m(k , z )

 zi
z
zi
Thus, the error on the difference
mi-mth is:
 m(T )   m2
i
i
 m( k , z )


 zi 


z
zi


2
z

Statistics_and_kosmoshow

Transcript Statistics_and_kosmoshow

Directory