There are two ways

Download Report

Transcript There are two ways

Combining p-values
i.e. what happens to SIGNIFICANCE when next event comes ?
There are two ways:
1) difficult, correct
Bayesian way
2) easy, approximate
Frequentist way
Luca Stanco – INFN - Padova
2 December 2010
1
I assume that everybody knows what are
- p-values
- H0/H1 hypothesis
(otherwise please refer to e.g.
http://pdg.lbl.gov/2010/reviews/rpp2010-rev-statistics.pdf )
For a short cut:
p-value = probability of less probable region of H0 hypothesis
1-p = Significance of the H1 hypthesis (power 1-b, error of type II)
(only in case of 1 random variable !!! )
Luca Stanco – INFN - Padova
2 December 2010
2
1rst way
Luca Stanco – INFN - Padova
2 December 2010
3
Luca Stanco – INFN - Padova
2 December 2010
4
Luca Stanco – INFN - Padova
2 December 2010
5
Excercise:
suppose the 2° event owns similar p-value than the 1rst one
2.98 sigmas
Of course, with the FISHER rule we forgot about any correlation!
Moreover is somehow wrong in case of 2 p-values quite different:
p1 = 0.1
p2=0.0001
→ pTOT = 0.00012 > p2
Luca Stanco – INFN - Padova
2 December 2010
6
It turns out that the FISHER rule is too conservative in case of two
independent Poissonians, being the lowest limiting p-value:
(x1  x 2 ) n (x1 x2 )
P(x1, x 2 ;n)  P(x  x1  x 2 ;n) 
e
n!
In the simplest case of no correlation, with 2 candidates
as before, the result provides:
3.39 sigmas
This is a simple demonstration that the FISHER rule is CONSERVATIVE and no so good for Discrete Cases
BUT the final result should be even greater since that probability is:
PTOT  P(x;2  0)  P(x;11)  P(x;0  2)
Luca Stanco – INFN - Padova
2 December 2010
7
WHY it is “difficult” the Bayesian way ?
If we simulate 1 million of pseudo-experiments for 1candidate,
for 2 candidates a priori we should simulate (1 million)2 = 1012 !!
Some tricks may be applies by
- Integrating the likelihood over a “normal domain” (simply connected)
- Computing 1-p
- Decoupling variables as much as possible
(this is formally correct)
Then, a Multivariate Likelihood computation is affordable.
Luca Stanco – INFN - Padova
2 December 2010
8
In the example of the simplest OPERA case the correct result is:
1.77%
0.01%
3.60 sigmas
0.018%
0%
0%
1.742%
0.031%
0%
98.22%
Error due to limited exps.
96.452%
98.22%
1.739%
1.77%
Luca Stanco – INFN - Padova
0.018%
0.01%
2 December 2010
9
Luca Stanco – INFN - Padova
2 December 2010
10
Backup
Luca Stanco – INFN - Padova
2 December 2010
11
Feldman-Cousins is “no meaning” in case of few events (<5)
and more than 1 random variable
Junk may be used (Modified Frequentist Technique):
(arXiv:hep-ex/9902006v1 5 Feb 1999)
Valid only for fully independent searches
For example it is used by D0 for the Higgs search but:
- CDF uses Bayes
- the two methods agree within 10% on the single channel
and 1% overall
- Tevatron decided to release the official result based
on the CDF/Bayes analysis.
Luca Stanco – INFN - Padova
2 December 2010
12