Transcript P - Indico

Statistical methods
in LHC data analysis
part II.1
Luca Lista
INFN Napoli
Contents
• Bayes theorem
• Bayesian probability
• Bayesian inference
Luca Lista
Statistical methods in LHC data analysis
2
Conditional probability
• Probability that the event A occurs given
that B also occurs
A
Luca Lista
B
Statistical methods in LHC data analysis
3
Bayes theorem
Thomas Bayes (1702-1761)
• P(A) = prior probability
• P(A|B) = posterior probability
Luca Lista
Statistical methods in LHC data analysis
4
Pictorial view of Bayes theorem (I)
A

B
P(A) =
P(B) =
From a drawing
by B.Cousins
P(A|B) =
Luca Lista
P(B|A) =
Statistical methods in LHC data analysis
5
Pictorial view of Bayes theorem (II)
P(A|B) P(B) =

=
= P(A  B)
P(B|A) P(A) =

=
= P(A  B)
Luca Lista
Statistical methods in LHC data analysis
6
A concrete example
• A person received a diagnosis of a serious
illness (say H1N1, or worse…)
• The probability to detect positively a ill person
is ~100%
• The probability to give a positive result on a
healthy person is 0.2%
• What is the probability that the person is
really ill? Is 99.8% a reasonable answer?
G. Cowan, Statistical data analysis 1998,
G. D'Agostini, CERN Academic Training, 2005
Luca Lista
Statistical methods in LHC data analysis
7
Conditional probability
• Probability to be really ill =
conditioned probability after the event of the positive
diagnosis
– P(+ | ill) = 100%, P(- | ill) << 1
– P(+ | healthy) = 0.2%, P(- | healthy) = 99.8%
• Using Bayes theorem:
– P(ill | +) = P(+ | ill) P(ill) / P(+)  P(ill) / P(+)
• We need to know:
– P(ill) = probability that a random person is ill (<< P(healthy))
• And we have:
– Using: P(ill) + P(healthy) = 1 and P(ill and healty) = 0
– P(+) = P(+ | ill) P(ill) + P(+| healthy) P(healthy)
 P(ill) + P(+ | healthy)
Luca Lista
Statistical methods in LHC data analysis
8
Pictorial view
P(+|healty)
P(+|ill)
 1
P(-|healthy)
P(ill)
Luca Lista
P(healthy)  1
Statistical methods in LHC data analysis
9
Pictorial view
P(+|healty)
P(healthy|+)
P(+|ill)
 1
P(ill|+) + P(healthy|+) = 1
P(-|healthy)
P(ill|+)
P(ill)
Luca Lista
P(healthy)  1
Statistical methods in LHC data analysis
10
Adding some numbers
• Probability of being really ill:
– P(ill | +) = P(ill)/P(+)
 P(ill) / (P(ill) + P(+ | healthy))
• If:
– P(ill) = 0.17%, P(+ | healthy) = 0.2%
• We have:
– P(ill | +) = 17 / (17 + 20) = 46%
Luca Lista
Statistical methods in LHC data analysis
11
A more physics example
• A muon selection has :
– Efficiency for the signal:  = P(sel | )
– Efficiency for background:  = P(sel |)
• Given a collection of particles, what is the fraction of
selected muons?
• Can’t answer, unless you know the fraction of muons:
P() (and P() = 1 - P())!
• So:
• Or:
Luca Lista
Statistical methods in LHC data analysis
12
Prob. ratios and prob. inversion
• Another convenient way to re-state the
Bayes posterior is through ratios:
• No need to consider all possible
hypotheses (not known in all cases)
• Clear how the ratio of priors plays a role
Luca Lista
Statistical methods in LHC data analysis
13
Bayesian probability as learning
•
•
•
Before the observation B, our degree of belief of A is P(A) (prior
probability)
After observing B, our degree of belief changes into P(A | B) (posterior
probability)
Probability can be expressed also as a property of non-random
variables
– E.g.: unknown parameter, unknown events
•
Easy approach to extend knowledge with subsequent observation
– E.g. combine experiment = multiply probabilities
•
•
Easy to cope with numerical problems
Consider P(B) as a normalization factor:
if
Luca Lista
Statistical methods in LHC data analysis
and
14
Bayes and likelihood function
•
Likelihood function definition: a PDF of the variables x1, …, xn:
•
Bayesian posterior probability for 1, …, m:
•
Where:
– P(1, …, m) is the prior probability.
•
Often assumed to be flat in HEP papers, but there is no motivation for this choice (and flat
distribution depends on the parameterization!)
– L(…)P(…) dm is a normalization factor
•
Interpretation:
– The observation modifies the prior knowledge of the unknown parameters
as if L is a probability distribution function for 1, …, n
– F.James et al.: “The difference between P() and P( | x) shows how one’s
knowledge (degree of belief) about  has been modified by the observation x. The
distribution P( | x) summarizes all one’s knowledge of  and can be used
accordingly.”
Luca Lista
Statistical methods in LHC data analysis
15
Repeated use of Bayes theorem
• Bayes theorem can be applied sequentially for repeated
observations (posterior = learning from experiments)
P0 = Prior
Prior
P1  P0  L1
observation 1
Conditioned posterior 1
observation 2
Note that applying Bayes theorem directly
from prior to (obs1 + obs2) leads to the
same result:
P1+2 = P0  L1+2 = P0  L1  L2 = P2
P2  P1  L2  P0  L1  L2
P3  P0  L1  L2  L3
Conditioned posterior 2
observation 3
Conditioned posterior 3
Luca Lista
Statistical methods in LHC data analysis
16
Bayesian in decision theory
• You need to decide to take some action after you
have computed your degree of belief
– E.g.: make a public announcement of a discovery or not
• What is the best decision?
• The answer also depends on the (subjective) cost of
the two possible errors:
– Announce a wrong answer
– Don’t announce a discovery (and be anticipate by a
competitor!)
• Bayesian approach fits well with decision theory,
which requires two subjective input:
– Prior degree of belief
– Cost of outcomes
Luca Lista
Statistical methods in LHC data analysis
17
Falsifiability within statistics
• With Aristotle’s or “Boolean” logic, if a
cause A forbids the observation of the
effect B, observing the effect B implies
that A is false
• Naively migrating to random possible
events (Bi) with different (uncertain!)
hypotheses (Aj) would lead to:
– Observing an event Bi that
has very low probability,
given a cause Aj, implies
that Aj is very unlikely
Luca Lista
Statistical methods in LHC data analysis
False!!!!
18
Detection of paranormal phenomena
• A person claims he has Extrasensory Perception
(ESP)
• He can “predict” the outcome of card extraction with
much higher success rate than random guess
• What is the (Bayesian) probability he really has ESP?
Luca Lista
Statistical methods in LHC data analysis
19
Simpleton, ready to believe!
• If (prior) P(ESP)  P(!ESP)  0.5
–  P(ESP|predict)  1 (posterior)
– A single experiment demonstrates ESP!
P(predict|!ESP)
<< 1
P(predict|ESP)
1
P(ESP)
Luca Lista
P(!ESP)
Statistical methods in LHC data analysis
20
With a skeptical prior prejudice
• If (prior) P(ESP) << P(!ESP)
–  P(ESP|predict) < 0.5 (at least uncertain!)
– More experiments? More hypotheses?
P(predict|!ESP)
<< 1
P(predict|ESP)
1
P(ESP)
Luca Lista
P(!ESP)
Statistical methods in LHC data analysis
21
Maybe he is cheating?
• How likely is cheating? Assume: P(ESP) << P(cheat)
–  P(ESP|predict)  0 (cheating more likely!)
– The ESP guy should now propose alternative hypotheses!
P(predict|!ESP)
<< 1
P(predict|ESP)
 P(predict|cheat)
1
P(ESP)
Luca Lista
P(cheat)
P(no ESP, not cheat)
Statistical methods in LHC data analysis
22
Ascertain physics observations
• Are those evidence for pentaquark +(1520)K0p?
• Influenced by previous evidence papers?
• Are there other possible interpretations?
arXiv:hep-ex/0509033v3
10 significance
Luca Lista
Statistical methods in LHC data analysis
23
Pentaquarks
• From PDG 2006, “PENTAQUARK UPDATE” (G.Trilling, LBNL)
• “In 2003, the field of baryon spectroscopy was almost revolutionized
by experimental evidence for the existence of baryon states constructed
from five quarks …
…To summarize, with the exception described in the previous
paragraph, there has not been a high-statistics confirmation of any of
the original experiments that claimed to see the Θ+; there have been
two high-statistics repeats from Jefferson Lab that have clearly shown
the original positive claims in those two cases to be wrong; there have
been a number of other high-statistics experiments, none of which have
found any evidence for the Θ+; and all attempts to confirm the two
other claimed pentaquark states have led to negative results.
The conclusion that pentaquarks in general, and the Θ+, in
particular, do not exist, appears compelling.”
Luca Lista
Statistical methods in LHC data analysis
24
Dark matter search
• Are those observations of Dark matter?
Nature 456, 362-365
Eur.Phys.J.C56:333-355,2008
Luca Lista
Statistical methods in LHC data analysis
25
B. & F. in the scientific process
Experiment
Strong skeptical
prejudice motivates
confirmation:
repeat the experiment and find other
evidences
( run into the frequentistic domain!)
Observation of
new phenomenon
How likely is the
interpretation?
Bayesian probabilistic interpretation
of the new phenomenon:
what is the probability that
the interpretation is correct?
• Bayesian and Frequentistic approaches have
complementary role in this process
Luca Lista
Statistical methods in LHC data analysis
26
How to compute Posterior PDF
• Perform analytical integration
– Feasible in very few cases
• Use numerical integration
RooStats::BayesianCalculator
– May be CPU intensive
• Markov Chain Monte Carlo
– Sampling parameter space efficiently using a random walk
heading to the regions of higher probability
– Metropolis algorithm to sample according to a PDF f(x)
1. Start from a random point, xi, in the parameter space
2. Generate a proposal point xp in the vicinity of xi
3. If f(xp) > f(xi) accept as next point xi+1 = xp
else, accept only with probability p = f(xp) / f(xi)
4. Repeat from point 2
– Convergence criteria and step size
must be defined
Luca Lista
Statistical methods in LHC data analysis
RooStats::MCMCCalculator
27
Problems of Bayesian approach
• The Bayesian probability is subjective, in the sense
that it depends on a prior probability, or degrees of
belief about the unknown parameters
– Anyway, increasing the amount of observations, the
posterior probability with modify significantly the prior
probability, and the final posterior probability will depend less
from the initial prior probability
– … but under those conditions, using frequentist or Bayesian
approaches does not make much difference anyway
• How to represent the total lack of knowledge?
– A uniform distribution is not invariant under coordinate
transformations
– Uniform PDF in log is scale-invariant
• Study of the sensitivity of the result on the chosen
prior PDF is usually recommended
Luca Lista
Statistical methods in LHC data analysis
28
Choosing the prior PDF
•
•
If the prior PDF is uniform in a choice of variable (“metrics”), it won’t be
uniform when applying coordinate transformation
Given a prior PDF in a random variable, there is always a
transformation that makes the PDF uniform
The problem is: chose one metric where the PDF is uniform
Harold Jeffreys’ prior: chose the prior form that is inviariant under
parameter transformation
metric related to the Fisher information (metrics invariant!)
•
Some common cases:
•
•
•
–
–
–
–
–
•
Poissonian mean:
Poissonian mean with background b:
Gaussian mean:
Gaussian r.m.s:
Binomial parameter:
Problematic with more than one dimension!
Luca Lista
Statistical methods in LHC data analysis
Demonstration on Wikipedia:
see: Jeffreys prior
29
Frequentist vs Bayesian
• Bayes theorem can be extended to give a (Bayesian)
probabilistic interpretation for the estimated parameters
• Interpretation of parameter errors:
–  = est 
• Frequentist approach:
– Knowing a parameter within some error means that a large fraction
(68% or 95%, usually) of the experiments contain the (fixed) true
value within the quoted confidence interval [est - , est + ]
• Bayesian approach:
– The posterior PDF of  is maximum at est and integrates to 68%
within the range [est- 1, est+ 2],
– The choice of the interval, i.e.. 1 and 2 can be done in different
ways, e.g: same area in the two tails, shortest interval, symmetric
error, …
• Note that both approaches provide the same results for
Gaussian models leading to possible confusions in the
interpretation
Luca Lista
Statistical methods in LHC data analysis
30
Frequentist vs Bayesian popularity
• Until 1990’s frequentist approach largely
favored:
– “at the present time (1997) [frequentists] appear to
constitute the vast majority of workers in high energy
physics”
• V.L.Highland, B.Cousins, NIM A398 (1997) 429-430
• More recently Bayesian estimates are getting
more popular and provide simpler mathematical
methods to perform complex estimates
– Bayesian estimators properties can be studied with a
frequentistic approach using Toy Monte Carlos
(feasible with today’s computers)
– Also preferred by several theorists (UTFit team,
cosmologists)
Luca Lista
Statistical methods in LHC data analysis
31
Bayesian inference
• Just use the product of likelihood function times the prior
probability as the posterior PDF for the unknown parameter(s) :
• You can evaluate then the average and variance of , as well as
the mode (most likely value)
– In many cases, the most likely value and average don’t coincide!
• Notice that the Maximum Likelihood estimate is the mode of
Bayesian inference with a flat Prior
• Upper limits are easily computed using the Bayesian approach
Luca Lista
Statistical methods in LHC data analysis
32
Bayesian inference of a Poissonian
• Posterior probability, assuming the prior to be f0(s):
• If is f0(s) is uniform:
• We have:
• Most probable value:
Luca Lista
,
Statistical methods in LHC data analysis
… but this is somewhat
arbitrary, since it is
metric-dependent!
33
Error propag. with Bayesian inference
• The result of the inference is just a PDF (of
the measured parameters)
• The error propagation is done applying the
usual transformations:
z = Z(x, y)
x= X (x, y), y =Y (x, y)
Luca Lista
Statistical methods in LHC data analysis
34
A Bayesian application: UTFit
• UTFit: Bayesian determination of the
CKM unitarity triangle
– Many experimental and theoretical inputs
combined as product of PDF
– Resulting likelihood interpreted as
Bayesian PDF in the UT plane
• Inputs:
– Experimental results that directly or
indirectly measure or put constraints on
Standard Model CKM Parameters
Luca Lista
Statistical methods in LHC data analysis
35
The Unitarity Triangle
d
s
u  Vud

c  Vcd


t  Vtd
Vus
Vcs
Vts
b
Vub 

Vcb 


Vtb 
• Quark mixing is described
by the CKM matrix
• Unitarity relations on matrix
elements lead to a triangle
in the complex plane
A=(,)
*
ud ub
*
cd cb
VV
VV
*
*
VudVub
+ VcdVcb
+ VtdVtb*  0


C=(0,0)
Luca Lista
VtdVtb*
VcdVcb*
Statistical methods in LHC data analysis

B=(1,0)
1
36
Inputs
Luca Lista
Statistical methods in LHC data analysis
37
Combine the constraints
• Given {xi} parameters and {ci} constraints
that depend on xi, ρ, η:
• Define the combined PDF
– ƒ( ρ, η, x1, x2 , ..., xN | c1, c2 , ..., cM ) ∝
∏j=1,M ƒj(cj | ρ, η, x1, x2 , ..., xN)
∏i=1,N ƒi(xi)⋅ ƒo (ρ, η)
Prior PDF
– PDF taken from experiments, wherever it is
possible
• Determine the PDF of (ρ, η) integrating over
the remaining parameters
– ƒ(ρ, η) ∝
∫ ∏j=1,M ƒj(cj | ρ, η, x1, x2 , ..., xN)
∏i=1,N ƒi(xi)⋅ ƒo (ρ, η) dNx dMc
Luca Lista
Statistical methods in LHC data analysis
38
Unitarity Triangle fit
68%, 95%
contours
Luca Lista
Statistical methods in LHC data analysis
39
PDFs for and 
Luca Lista
Statistical methods in LHC data analysis
40
Projections on other observables
Luca Lista
Statistical methods in LHC data analysis
41
References
•
•
•
•
•
•
•
"Bayesian inference in processing experimental data: principles and basic applications",
Rep.Progr.Phys. 66 (2003)1383 [physics/0304102]
H. Jeffreys, "An Invariant Form for the Prior Probability in Estimation Problems“,
Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences
186 (1007): 453–46, 1946
H. Jeffreys, “Theory of Probability”, Oxford University Press, 1939
Wikipedia: “Jeffreys prior”, with demonstration of metrics invariance
G. D'Agostini, “Bayesian Reasoning in Data Analysis: a Critical Guide", World Scientific
(2003).
W.T. Eadie, D.Drijard, F.E. James, M.Roos, B.Saudolet, Statistical Methods in Experimental
Physics, North Holland, 1971
G.D’Agostini: “Telling the truth with statistics”, CERN Academic Training Lecture, 2005
–
•
Pentaquarks update 2006 in PDG
–
–
•
pdg.lbl.gov/2006/listings/b152.ps
SVD Collaboration, Further study of narrow baryon resonance decaying into K0s p in pA-interactions
at 70 GeV/c with SVD-2 setup arXiv:hep-ex/0509033v3
Dark matter:
–
–
•
http://cdsweb.cern.ch/record/794319?ln=it
R. Bernabei et al.: Eur.Phys.J.C56:333-355,2008: arXiv:0804.2741v1
J. Chang et al.: Nature 456, 362-365
UTFit:
– http://www.utfit.org/
– M. Ciuchini et al., JHEP 0107 (2001) 013, hep-ph/0012308
Luca Lista
Statistical methods in LHC data analysis
42