Recap: The Bayesian Approach – 2

Download Report

Transcript Recap: The Bayesian Approach – 2

Practical Statistics for Particle Physicists
Lecture 3
Harrison B. Prosper
Florida State University
European School of High-Energy Physics
Anjou, France
6 – 19 June, 2012
ESHEP2012 Practical Statistics
Harrison B. Prosper
1
Outline
 Lecture 1
Descriptive Statistics
Probability
Likelihood
The Frequentist Approach – 1
 Lecture 2
The Frequentist Approach – 2
The Bayesian Approach
 Lecture 3 – Analysis Example
2
Practicum
Toy data and code at
http://www.hep.fsu.edu/~harry/ESHEP12
topdiscovery.tar
contactinteractions.tar
just download and unpack
3
Recap: The Bayesian Approach – 1
Definition:
A method is Bayesian if
1. it is based on the subjective interpretation of
probability and
2. it uses Bayes’ theorem
p(D |  , ) ( , )
p( , | D) 
p(D)
for all inferences.
D
θ
ω
π
observed data
parameter of interest
nuisance parameters
prior density
ESHEP2012 Practical Statistics
Harrison B. Prosper
4
Recap: The Bayesian Approach – 2
Nuisance parameters are removed by marginalization:
 p( , | D) d
  p( D |  , ) ( , ) d  / p(D)
p( | D) 
in contrast to profiling, which can be thought of as
marginalization with a δ-function prior  ( , )  [  φ( )]
 p(D |  , )  ( , ) d / p(D)
  p(D |  , )  [  φ( )]d / p(D)
p( | D) 
 p(D |  ,φ( ))
ESHEP2012 Practical Statistics
Harrison B. Prosper
5
Recap: The Bayesian Approach – 3
Bayes theorem can be used to compute the probability of a
model
D
observed data
θM
parameters of model M
M
model
ω
nuisance parameters
π
prior density
p(D |  M , , M )  ( M , , M )
p( M , , M | D) 
p(D)
ESHEP2012 Practical Statistics
Harrison B. Prosper
6
Recap: The Bayesian Approach – 5
1. Factorize the priors: ( M, ω, M) = (θM, ω|M) (M)
2. Then, for each model, M, compute the function
p(D | M)   p(D |  , , M)  ( , | M) d d
3. Then, compute the probability of each model, M
p( D | M )  ( M )
p( M | D) 
 p( D | M )  ( M )
H
Bayesian Methods: Theory & Practice. Harrison B. Prosper
7
Recap: The Bayesian Approach – 6
In order to compute p(M|D), however, two things are needed:
1. Proper priors over the parameter spaces
  ( , | M ) d d  1
2. The priors (M).
Usually, we compute the Bayes factor:
p( M1 | D)  p( D | M1 )    ( M1 ) 



p( M 0 | D)  p( D | M 0 )    ( M 0 ) 
which is the ratio in the first bracket, B10.
8
An Analysis Example
Search for Contact Interactions
Contact Interactions – 1
In our current theories, all interactions proceed via the
exchange of particles:
But,…
Contact Interactions – 2
… when the experimentally available energies are << the
mass of the exchanged particles, the interactions can be
approximated as contact interactions (CI), for example:
Contact Interactions – 3
Consider the model*
L ~ λ ΨγμΨ ΨγμΨ.
with λ = ξ / Λ2, where ξ can be positive or negative. At
leading order, the possible reactions are:
The amplitude is linear in λ
a = aSM + λ aCI
Eichten, Hinchliffe, Lane, Quigg, Rev. Mod. Phys. 56, 579 (1984)
Contact Interactions – 4
Contact interactions
calculated at leading
order with Pythia 6.
QCD calculated at
next-to-leading order
with FastNLO.
Expect cross section in each
bin to be of the form
  c  b  a 2
Bayesian Analysis
Simulated Data – 1
Data
M = 25 bins (362 ≤ pT ≤ 2000)
D = 575,999 to 0 (large dynamic range!)
Parameters
λ =parameter of interest
nuisance parameters
c = QCD cross section per pT bin
b, a = signal parameters
for destructive interference, b < 0
ESHEP2012 Practical Statistics
Harrison B. Prosper
15
Simulated Data – 2
Choose b < 0
Assume integrated
luminosity of 5 fb
α = 5000 pb-1
Analysis – 1
Step 1. Assume the following probability model for the
observations
K
p(D | ,  ,  )   Poisson(N i |   i )
i 1
where
 i  ci  bi   ai  2
D  N1,L , N K
  c1, b1, a1,L ,cK , bK , aK
ESHEP2012 Practical Statistics
Harrison B. Prosper
17
Analysis Issues
1. Sensitive to jet energy scale (JES)
2. Sensitive to the parton distribution functions (PDF)
3. Large dynamic range causes the limits on Λ to be very
sensitive to the value of α. For example, changing α from
5000 to 5030 decreases the limit by 25%!
Solution:
1., 2. Integrate likelihood over JES and PDF parameters
3. Integrate likelihood over the scale factor α
ESHEP2012 Practical Statistics
Harrison B. Prosper
18
Analysis – 2
Step 2: We can re-write
K
p(D | ,  ,  )   Poisson(N i |   i )
i 1
p(D | ,  ,  )  Poisson(N |   )
 Multinomial(N1,K , N K | 1,K , K )
as
where
    i , N   N i , i   i / 
Exercise 11: Show this
ESHEP2012 Practical Statistics
Harrison B. Prosper
19
Analysis – 3
We now eliminate α by integrating
p(D | ,  ,  )  Poisson(N |   )
 Multinomial(N1,K , N K | 1,K , K )
with respect to α.
But to do so, we need a prior density for α. In the absence of
reliable information about this parameter, we shall use
 ( | ,  )   / 
which is an example of a reference prior.
ESHEP2012 Practical Statistics
Harrison B. Prosper
20
Analysis – 4
Step 3: The integration with respect to α yields
p(D | ,  )  Multinomial(N1 ,K , N K | 1 ,K , K )
Step 4: Randomly sample from:
1. the jet energy scale,
2. jet energy resolution,
3. the PDF parameter sets,
4. the factorization an renormalization scales
5. and any other nuisance parameters of the problem
This generates an ensemble of points {ωi}
ESHEP2012 Practical Statistics
Harrison B. Prosper
21
Analysis – 5
Step 5: We approximate the posterior density using
p( | D)   p(D | ,  )  (,  ) d / p(D)
  p(D | ,  )  ( |  )  ( ) d / p(D)
1 T
  p(D | ,  i )  ( |  i ) / p(D)
T i 1
where, again, we use a reference prior for π (λ|ω).
It turns out that this prior can be calculated exactly.
ESHEP2012 Practical Statistics
Harrison B. Prosper
22
Analysis – 6
Step 6: Finally, we can compute a 95% Bayesian interval by
solving

UP
0
p( | D) d   0.95
for
λUP, from which we compute Λ = 1/√λUP. For the simulated
data (and ignoring Step 4., i.e., systematic uncertainties),
we obtain
Λ > 20.4 TeV @ 95% CL
Exercise 12: Write a program to implement this analysis
ESHEP2012 Practical Statistics
Harrison B. Prosper
23
Summary
Probability
Two main interpretations:
1. Degree of belief
2. Relative frequency
Likelihood Function
Main ingredient in any non-trivial statistical analysis.
Frequentist Principle
Construct statements such that a given (minimum)
fraction of them will be true over a given ensemble of
statements.
24
Summary
Frequentist Approach
1. Use likelihood function only
2. Eliminate nuisance parameters by profiling
3. Fisher: Reject null if p-value is judged to be too small
4. Neyman: Decide on a fixed threshold for rejection and
reject null if threshold has been breached, but only if
the probability of the alternative is high enough
Bayesian Approach
1) Model all uncertainty using probabilities and use
Bayes’ theorem to make inferences.
2) Eliminate nuisance parameters through
marginalization.
ESHEP2012 Practical Statistics
Harrison B. Prosper
25