Statistics In HEP
Download
Report
Transcript Statistics In HEP
Statistics In HEP 2
How do we understand/interpret
our measurements
Helge Voss
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
1
Outline
Maximum Likelihood fit
strict frequentist Neyman – confidence intervals
what “bothers” people with them
Feldmans/Cousins confidence belts/intervals
Bayesian treatement of ‘unphysical’ results
How the LEP-Higgs limit was derived
what about systematic uncertainties?
Profile Likleihood
Helge Voss
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
2
Parameter Estimation
want to measure/estimate some parameter 𝜽
e.g. mass, polarisation, etc..
observe: 𝒙𝒊 = 𝒙𝟏 , … . 𝒙𝒏
𝒊
𝒊 = 𝟏, 𝑲
e.g 𝒏 observables for 𝑲 events
“hypothesis” i.e. PDF 𝐏(𝒙; 𝜽) - distribution 𝒙 for given 𝜽
e.g. diff. cross section
𝒊
𝑲 independent events: P(𝒙𝟏 ,.. 𝒙𝑲 ; 𝜽) = 𝑲
𝒊 𝐏(𝒙 ; 𝜽)
for fixed 𝒙 regard 𝐏(𝒙; 𝜽) as function of 𝜽 (i.e. Likelihood! L(𝜽) )
𝜽 close to 𝜽𝒕𝒓𝒖𝒆 → Likelihood L(𝜽) will be large
try to maximuse
L(𝜽)
typically:
minimize 2Log(L(𝜽)) 𝜽
Glen Cowan: Statistical data analysis
Helge Voss
Maximum
Likelihood estimator
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
3
Maximum Likelihood Estimator
example: PDF(x) = Gauss(x,μ,σ) L 𝑥 𝐺𝑎𝑢𝑠𝑠 𝜇
=
1
exp
2𝜋𝜎
−
𝑥−𝜇 2
2𝜎2
estimator for 𝜇𝑡𝑟𝑢𝑒 from the data measured in an experiment 𝑥1 , … . . 𝑥𝐾
𝐾 1
𝑖 2𝜋𝜎 exp
full Likelihood L 𝑥 𝜇 =
𝐾 1
𝑖 2𝜋𝜎
typically: −𝟐𝐥𝐧(𝐋 𝑥 𝜇 ) =
𝑥𝑖 −𝜇 2
2𝜎2
Note: It’s a function of 𝜇 !
−𝟐𝐥𝐧(𝐋(𝜇))
For Gaussian PDFs:
−𝟐𝐥𝐧(𝐋(𝜇)) == χ𝟐
=1
Helge Voss
−
𝑥𝑖 −𝜇 2
2𝜎 2
𝜇𝑏𝑒𝑠𝑡 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒
∆𝟐𝒍𝒏 𝑳 = 𝟏 from 𝝁 interval 𝝁𝟏 ; 𝝁𝟐
variance on the estimate 𝝁
𝜇
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
4
Parameter Estimation
properties of estimators
biased or unbiased
large or small variance
distribution of 𝜽 on many
measurements ?
Glen Cowan:
𝜽𝒕𝒓𝒖𝒆
Small bias and small variance are typically “in conflict”
Maximum Likelihood is typically unbiased only in the limit 𝑲 → ∞
If Likelihood function is “Gaussian” (often the case for large K
central limit theorem)
get “error” estimate from or -2∆𝒍𝒐𝒈 𝑳 = 𝟏
if (very) none Gaussian
revert typically to (classical) Neyman confidence intervals
Helge Voss
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
5
Classical Confidence Intervals
μhypothetically
true
another way to look at a measurement rigorously “frequentist”
Neymans Confidence belt for CL α (e.g. 90%)
μ2
μ1
each μhypothetically true has a PDF of
how the measured values will be
distributed
determine the (central) intervals
(“acceptance region”) in these PDFs
such that they contain α
do this for ALL μhyp.true
connect all the “red dots”
confidence belt
measure xobs :
conf. interval =[μ1, μ2] given by
vertical line intersecting the belt.
xmeasured
by construction: for each xmeas. (taken according PDF(𝝁𝒕𝒓𝒖𝒆 ) the
confidence interval [μ1, μ2] contains 𝝁𝒕𝒓𝒖𝒆 in α = 90% cases
Feldman/Cousin (1998)
Helge Voss
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
6
Classical Confidence Intervals
μhypothetically
true
another way to look at a measurement rigorously “frequentist”
Neymans Confidence belt for CL α (e.g. 90%)
conf.interval =[μ1, μ2] given by
vertical line intersecting the belt.
by construction:
𝟏−𝜶
𝑷 𝒙 < 𝒙𝒐𝒃𝒔 ; 𝝁𝟐 =
𝟐
𝟏−𝜶
𝟐
μ2
𝝁𝒕𝒓𝒖𝒆
μ1
𝑷 𝒙 > 𝒙𝒐𝒃𝒔 ; 𝝁𝟏 =
if the true value were 𝝁𝒕𝒓𝒖𝒆
lies in [𝝁𝟏 , 𝝁𝟐 ] if it intersects ▐
xmeas intersects ▬ as in 90%
(that’s how it was constructed)
only those xmeas give [𝝁𝟏 , 𝝁𝟐 ]’s
that intersect with the ▬
90% of intervals cover 𝝁𝒕𝒓𝒖𝒆
xmeasured
𝑷 𝒙; 𝝁 is Gaussian (𝝈 = 𝒄𝒐𝒏𝒔𝒕) central 68% Neyman Conf. Intervals
Max. Likelihood + its “error” estimate [𝒙 − 𝝈𝒙 ; 𝒙 + 𝝈𝒙 ]
Feldman/Cousin(1998):
Helge Voss
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
7
Flip-Flop
When to quote measuremt or a limit!
estimate Gaussian distributed quantity 𝝁 that cannot be < 0 (e.g. mass)
same Neuman confidence belt construction as before:
once for measurement (two sided, each tail contains 5% )
once for limit (one sided tails contains 10%)
decide: if 𝒙𝒐𝒃𝒔 <0 assume you =0
conservative
if you observe 𝒙𝒐𝒃𝒔 <3
quote upper limit only
if you observe 𝒙𝒐𝒃𝒔 >3
quote a measurement
induces “undercovering” as
this acceptance region
contains only 85% !!
Feldman/Cousins(1998)
Helge Voss
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
8
Some things people don’t like..
same example:
estimate Gaussian distributed quantity 𝝁 that cannot be < 0 (e.g. mass)
using proper confidence belt
assume: 𝒙𝒐𝒃𝒔 = −𝟏. 𝟖
confidence interval is
EMPTY!
Note: that’s OK from the
frequentist interpretation
𝝁𝒕𝒓𝒖𝒆 ∈ 𝒄𝒐𝒏𝒇. 𝒊𝒏𝒕𝒆𝒓𝒗. in 90%
of (hypothetical) measurements.
Feldman/Cousins(1998).
Obviously we were ‘unlucky’ to
pick one out of the remaining
10%
nontheless: tempted to “flip-flop” ??? tsz .. tsz.. tsz..
Helge Voss
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
9
Feldman Cousins: a Unified Approach
How we determine the “acceptance” region for each μhyp.true is up to
us as long as it covers the desired integral of size α (e.g. 90%)
include those “xmeas. ” for which the large likelihood ratio first:
𝑳 𝒙𝒎𝒆𝒂𝒔 𝝁𝒎𝒆𝒂𝒔𝒖𝒓𝒆𝒅
𝑹=
𝑳 𝒙𝒎𝒆𝒂𝒔 𝝁𝒃𝒆𝒔𝒕 𝒆𝒔𝒕𝒊𝒎𝒂𝒕𝒆
𝝁𝒃𝒆𝒔𝒕 𝒆𝒔𝒕𝒊𝒎𝒂𝒕𝒆 here: either the observation
𝒙𝒎𝒆𝒂𝒔 or the closes ALLOWED 𝝁
𝑹
α = 90%
No “empty intervals anymore!
Helge Voss
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
10
Being Lucky…
for fixed Background 𝝁𝒃
Helge Voss
sorry… the plots don’t match: one is of
90%CL the other for 95%CL
𝝁𝒔
give upper limit on signal 𝝁𝒔 on top of know (mean) background 𝝁𝒃
n=s+b from a possion distribution
𝑷 𝒏 = 𝑷𝒐𝒊𝒔𝒔𝒐𝒏 𝒏, 𝝁𝒔 + 𝝁𝒃
Neyman: draw confidence belt with
“𝝁𝒔 ” in the “y-axis” (the possible true values of 𝝁𝒔 )
2 experiments (E1 and E2)
𝝁𝒃𝟏 = 𝟏, 𝝁𝒃𝟐 𝟐
observing (out of luck) 0
E1: 95% limit on 𝝁𝒔 ~𝟐
E2: 95% limit on 𝝁𝒔 ~𝟏
UNFAIR ! ?
observed n
Glen Cowan: Statistical data analysis
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
Background 𝝁𝒃
11
Being Lucky …
Feldman/Cousins confidence belts
motivated by “popular” ‘Bayesian’ approaches to handle such problems.
Bayesian: rather than constructing Confidence belts:
turn Likelihood for 𝝁𝐬 (on given 𝒏𝒐𝒃𝒔 ) into Posterior probability on 𝝁𝐬
𝒊. 𝒆 𝑷𝒐𝒊𝒔𝒔𝒐𝒏(𝒏𝒐𝒃𝒔 ; 𝝁𝐬 + 𝝁𝐛 )
𝒑 𝝁𝒔 𝒏𝒐𝒃𝒔 = 𝑳 𝒏𝒐𝒃𝒔 ; 𝝁𝒔 ∗ 𝝅(𝝁𝒔 )add prior probability on “s”:
Upper limit on signal
𝝅 𝝁𝒔 =
0 1 10 𝒏
𝒐𝒃𝒔
Helene(1983).
Helge Voss
𝝁𝒔 < 𝟎
𝟎
𝒖𝒏𝒊𝒇𝒐𝒓𝒎 𝝁𝒔 > 𝟎
Background b
Feldman/Cousins
• there is still SOME “unfairness”
• perfectly “fine” in frequentist
interpretation:
• should quote “limit+sensitivity”
Feldman/Cousins(1998).
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
12
Statistical Tests in Particle Searches
exclusion limits
• upper limit on cross section
•
(↔lower limit on mass scale)
(σ < limit as otherwise we would have seen it)
discoveries
need to estimate probability of
upward fluctuation of b
try to disprove H0 =“background only”
need to estimate probability of
downward fluctuation of s+b
try to “disprove” H0 = s+b
better: find minimal s, for which you
can sill exclude H0 = s+b a prespecified Confidence Level
type 1 error
𝜶 = 𝟎. 𝟎𝟓%
→ 𝟗𝟓%𝑪𝑳
b only
Helge Voss
b only
𝒑𝒐𝒊𝒔𝒔𝒐𝒏(𝒏𝒃 ; 𝝁𝒃 = 𝟒)
Nobs for 5s
discovery
P = 2.8 10–7
s+b
possible observation
Hadron Collider Physics Summer School
𝑵𝒐𝒃𝒔.
June 8-17, 2011― Statistics in HEP
13
Which Test Statistic to used?
𝒕 𝒏𝒐𝒃𝒔 =
𝑷𝒐𝒊𝒔𝒔𝒐𝒏 𝒏𝒐𝒃𝒔 ;𝒔,𝒃
𝑷𝒐𝒊𝒔𝒔𝒐𝒏 𝒏𝒐𝒃𝒔 ;𝒃
P(t(𝒙)_
exclusion limit:
• test statistic does not necessarily have to be simply the counted
number of events:
remember Neyman Pearson Likelihood ratio
pre-specify 𝜶 = 𝟎. 𝟎𝟓% 95% CL on exclusion
make your measurements
if accepted (i.e. in “chritical (red) region”)
where you decided to “recjet” H0 = s+b
b only
s+b
t
𝑪𝑳𝒔+𝒃 = 𝑷(𝒕 < 𝒕𝒐𝒃𝒔 )
(i.e. what would have been the chance for THIS particular
measurement to still have been “fooled” and there would have
actually BEEN a signal)
Helge Voss
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
14
Example :LEP-Higgs search
Remember: there were 4 experiments, many different search channels
treat different exerpiments just like “more channels”
𝑪𝑳𝒃
𝑪𝑳𝒔+𝒃
Evaluate how the -2lnQ is distributed for
background only
signal+background
(note: needs to be done for all Higgs masses)
example: mH=115GeV/c2
more signal like
Helge Voss
more background like
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
15
Example: LEP SM Higgs Limit
Helge Voss
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
16
Example LEP Higgs Search
In order to “avoid” the possible “problem” of Being Lucky when
setting the limit
rather than “quoting” in addition the expected sensitivity
weight your CLs+b by it:
𝑪𝑳𝒔+𝒃 𝑷 𝑳𝑳𝑹 ≥ 𝑳𝑳𝑹𝒐𝒃𝒔 |𝑯𝟏
𝑪𝑳𝒔 =
=
𝑪𝑳𝒃
𝑷 𝑳𝑳𝑹 ≤ 𝑳𝑳𝑹𝒐𝒃𝒔 |𝑯𝟎
𝑪𝑳𝒃
more signal like
Helge Voss
𝑪𝑳𝒔+𝒃
more background like
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
17
Systmatic Uncertainties
standard popular way: (Cousin/Highland)
integrate over all systematic errors and their “Probability
distribution)
marginalisation of the “joint probability density of
measurement paremters and systematic error)
! Bayesian ! (probability of the systematic parameter)
“hybrid” frequentist intervals and Bayesian systematic
has been shown to have possible large “undercoverage” for very
small p-values /large significances (i.e. underestimate the chance
of “false discovery” !!
LEP-Higgs: generaged MC to get the PDFs with “varying” param.
with systematic uncertainty
essentiall the same as “integrating over” need probability
density for “how these parameters vary”
Helge Voss
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
18
Systematic Uncertainties
Why don’t we:
include any systematic uncertainly as “free parameter” in the fit
eg. measure background
contribution under signal peak in
sidebands
measurement + extrapolation into
side bands have uncertainty
but you can parametrise your
expected background such that:
if sideband measurement gives
this data then b=…
K.Cranmer Phystat2003
Note: no need to specify prior probability
Build your Likelyhood function such that it includes:
your parameters of interest
those describing the influcene of the sys. uncertainty
nuisance parameters
Helge Voss
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
19
Nuisance Parameters and
Profile Liklihood
Build your Likelyhood function such that it includes:
your parameters of interest
those describing the influcene of the sys. uncertainty
nuisance parameters
ì
ˆ
ˆ)
L(
m
,
q
ï
,
ï L(m̂,qˆ)
l(m) = í
ï L(m,qˆ
ˆ)
,
ï
ˆ
î L(0,q )
Helge Voss
ü
0 £ m̂ £ m ï
ˆ)
ï
L(m,qˆ
,
ý =
L(m̂,qˆ)
ï
m̂ < 0
ï
þ
0 £ m̂ £ m
“ratio of
likelihoods”, why ?
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
20
Profile Likelihood
ì
ˆ
ˆ)
L(
m
,
q
ï
,
ˆ
ï L(m̂,q )
l(m) = í
ï L(m,qˆ
ˆ)
,
ï
î L(0,qˆ)
ü
0 £ m̂ £ m ï
ˆ)
ï
L(m,qˆ
,
ý =
L(m̂,qˆ)
ï
m̂ < 0
ï
þ
0 £ m̂ £ m
“ratio of
likelihoods”, why ?
Why not simply using L(m,q ) as test statistics ?
•
•
•
•
The number of degrees of freedom of the fit would be Nq + 1m
However, we are not interested in the values of q ( they are nuisance !)
Additional degrees of freedom dilute interesting information on m
The “profile likelihood” (= ratio of maximum likelihoods) concentrates the
information on what we are interested in
• It is just as we usually do for chi-squared: Dc2(m) = c2(m,qbest’ ) – c2(mbest, qbest)
• Nd.o.f. of Dc2(m) is 1, and value of c2(mbest, qbest) measures “Goodness-of-fit”
Helge Voss
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
21
Summary
Maximum Likelihood fit
to estimate paremters
what to do if estimator is non-gaussion:
Neyman – confidence intervals
what “bothers” people with them
Feldmans/Cousins confidene belts/intervals
unifies “limit” or “measurement” confidence belts
CLs … the HEP limit;
CLs … ratio of “p-values” … statisticians don’t like that
new idea: Power Constrained limits
rather than specifying “sensitivity” and “neymand conf. interval”
decide beforehand that you’ll “accept” limits only if the where your
exerpiment has sufficient “power” i.e. “sensitivity !
.. a bit about Profile Likelihood, systematic error.
Helge Voss
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
22