Statistics In HEP

Download Report

Transcript Statistics In HEP

Statistics In HEP 2
How do we understand/interpret
our measurements
Helge Voss
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
1
Outline
 Maximum Likelihood fit
 strict frequentist Neyman – confidence intervals
what “bothers” people with them
 Feldmans/Cousins confidence belts/intervals
 Bayesian treatement of ‘unphysical’ results
 How the LEP-Higgs limit was derived
 what about systematic uncertainties?
Profile Likleihood
Helge Voss
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
2
Parameter Estimation
 want to measure/estimate some parameter 𝜽
 e.g. mass, polarisation, etc..
 observe: 𝒙𝒊 = 𝒙𝟏 , … . 𝒙𝒏
𝒊
𝒊 = 𝟏, 𝑲
 e.g 𝒏 observables for 𝑲 events
 “hypothesis” i.e. PDF 𝐏(𝒙; 𝜽) - distribution 𝒙 for given 𝜽
 e.g. diff. cross section
𝒊
 𝑲 independent events: P(𝒙𝟏 ,.. 𝒙𝑲 ; 𝜽) = 𝑲
𝒊 𝐏(𝒙 ; 𝜽)
 for fixed 𝒙 regard 𝐏(𝒙; 𝜽) as function of 𝜽 (i.e. Likelihood! L(𝜽) )
 𝜽 close to 𝜽𝒕𝒓𝒖𝒆 → Likelihood L(𝜽) will be large
 try to maximuse
L(𝜽)
 typically:
 minimize 2Log(L(𝜽))  𝜽
Glen Cowan: Statistical data analysis
Helge Voss
 Maximum
Likelihood estimator
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
3
Maximum Likelihood Estimator
example: PDF(x) = Gauss(x,μ,σ)  L 𝑥 𝐺𝑎𝑢𝑠𝑠 𝜇
=
1
exp
2𝜋𝜎
−
𝑥−𝜇 2
2𝜎2
 estimator for 𝜇𝑡𝑟𝑢𝑒 from the data measured in an experiment 𝑥1 , … . . 𝑥𝐾
𝐾 1
𝑖 2𝜋𝜎 exp
 full Likelihood L 𝑥 𝜇 =
𝐾 1
𝑖 2𝜋𝜎
 typically: −𝟐𝐥𝐧(𝐋 𝑥 𝜇 ) =
𝑥𝑖 −𝜇 2
2𝜎2
Note: It’s a function of 𝜇 !
−𝟐𝐥𝐧(𝐋(𝜇))
For Gaussian PDFs:
−𝟐𝐥𝐧(𝐋(𝜇)) == χ𝟐
=1
Helge Voss
−
𝑥𝑖 −𝜇 2
2𝜎 2
𝜇𝑏𝑒𝑠𝑡 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒
 ∆𝟐𝒍𝒏 𝑳 = 𝟏 from 𝝁  interval 𝝁𝟏 ; 𝝁𝟐
 variance on the estimate 𝝁
𝜇
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
4
Parameter Estimation
properties of estimators
 biased or unbiased
 large or small variance
 distribution of 𝜽 on many
measurements ?
Glen Cowan:
𝜽𝒕𝒓𝒖𝒆
 Small bias and small variance are typically “in conflict”
 Maximum Likelihood is typically unbiased only in the limit 𝑲 → ∞
 If Likelihood function is “Gaussian” (often the case for large K
 central limit theorem)
 get “error” estimate from or -2∆𝒍𝒐𝒈 𝑳 = 𝟏
 if (very) none Gaussian
 revert typically to (classical) Neyman confidence intervals
Helge Voss
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
5
Classical Confidence Intervals
μhypothetically
true
another way to look at a measurement rigorously “frequentist”
 Neymans Confidence belt for CL α (e.g. 90%)
μ2
μ1
 each μhypothetically true has a PDF of
how the measured values will be
distributed
 determine the (central) intervals
(“acceptance region”) in these PDFs
such that they contain α
 do this for ALL μhyp.true
 connect all the “red dots” 
confidence belt
 measure xobs :
 conf. interval =[μ1, μ2] given by
vertical line intersecting the belt.
xmeasured
 by construction: for each xmeas. (taken according PDF(𝝁𝒕𝒓𝒖𝒆 ) the
confidence interval [μ1, μ2] contains 𝝁𝒕𝒓𝒖𝒆 in α = 90% cases
Feldman/Cousin (1998)
Helge Voss
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
6
Classical Confidence Intervals
μhypothetically
true
another way to look at a measurement rigorously “frequentist”
 Neymans Confidence belt for CL α (e.g. 90%)
conf.interval =[μ1, μ2] given by
vertical line intersecting the belt.
 by construction:
𝟏−𝜶
 𝑷 𝒙 < 𝒙𝒐𝒃𝒔 ; 𝝁𝟐 =
𝟐
𝟏−𝜶
𝟐
μ2
𝝁𝒕𝒓𝒖𝒆
μ1
 𝑷 𝒙 > 𝒙𝒐𝒃𝒔 ; 𝝁𝟏 =
 if the true value were 𝝁𝒕𝒓𝒖𝒆
 lies in [𝝁𝟏 , 𝝁𝟐 ] if it intersects ▐
 xmeas intersects ▬ as in 90%
(that’s how it was constructed)
 only those xmeas give [𝝁𝟏 , 𝝁𝟐 ]’s
that intersect with the ▬
 90% of intervals cover 𝝁𝒕𝒓𝒖𝒆
xmeasured
 𝑷 𝒙; 𝝁 is Gaussian (𝝈 = 𝒄𝒐𝒏𝒔𝒕)  central 68% Neyman Conf. Intervals
 Max. Likelihood + its “error” estimate [𝒙 − 𝝈𝒙 ; 𝒙 + 𝝈𝒙 ]
Feldman/Cousin(1998):
Helge Voss
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
7
Flip-Flop
When to quote measuremt or a limit!
 estimate Gaussian distributed quantity 𝝁 that cannot be < 0 (e.g. mass)
 same Neuman confidence belt construction as before:
 once for measurement (two sided, each tail contains 5% )
 once for limit (one sided tails contains 10%)
 decide: if 𝒙𝒐𝒃𝒔 <0 assume you =0
 conservative
 if you observe 𝒙𝒐𝒃𝒔 <3
 quote upper limit only
 if you observe 𝒙𝒐𝒃𝒔 >3
 quote a measurement
 induces “undercovering” as
this acceptance region
contains only 85% !!
Feldman/Cousins(1998)
Helge Voss
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
8
Some things people don’t like..
same example:
 estimate Gaussian distributed quantity 𝝁 that cannot be < 0 (e.g. mass)
 using proper confidence belt
 assume: 𝒙𝒐𝒃𝒔 = −𝟏. 𝟖
 confidence interval is
EMPTY!
 Note: that’s OK from the
frequentist interpretation
𝝁𝒕𝒓𝒖𝒆 ∈ 𝒄𝒐𝒏𝒇. 𝒊𝒏𝒕𝒆𝒓𝒗. in 90%
of (hypothetical) measurements.
Feldman/Cousins(1998).
Obviously we were ‘unlucky’ to
pick one out of the remaining
10%
 nontheless: tempted to “flip-flop” ??? tsz .. tsz.. tsz..
Helge Voss
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
9
Feldman Cousins: a Unified Approach
 How we determine the “acceptance” region for each μhyp.true is up to
us as long as it covers the desired integral of size α (e.g. 90%)
 include those “xmeas. ” for which the large likelihood ratio first:
𝑳 𝒙𝒎𝒆𝒂𝒔 𝝁𝒎𝒆𝒂𝒔𝒖𝒓𝒆𝒅
𝑹=
𝑳 𝒙𝒎𝒆𝒂𝒔 𝝁𝒃𝒆𝒔𝒕 𝒆𝒔𝒕𝒊𝒎𝒂𝒕𝒆
 𝝁𝒃𝒆𝒔𝒕 𝒆𝒔𝒕𝒊𝒎𝒂𝒕𝒆 here: either the observation
𝒙𝒎𝒆𝒂𝒔 or the closes ALLOWED 𝝁
𝑹
α = 90%
No “empty intervals anymore!
Helge Voss
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
10
Being Lucky…
for fixed Background 𝝁𝒃
Helge Voss
sorry… the plots don’t match: one is of
90%CL the other for 95%CL
𝝁𝒔
 give upper limit on signal 𝝁𝒔 on top of know (mean) background 𝝁𝒃
  n=s+b from a possion distribution
 𝑷 𝒏 = 𝑷𝒐𝒊𝒔𝒔𝒐𝒏 𝒏, 𝝁𝒔 + 𝝁𝒃
 Neyman: draw confidence belt with
 “𝝁𝒔 ” in the “y-axis” (the possible true values of 𝝁𝒔 )
2 experiments (E1 and E2)
𝝁𝒃𝟏 = 𝟏, 𝝁𝒃𝟐 𝟐
observing (out of luck) 0
E1: 95% limit on 𝝁𝒔 ~𝟐
E2: 95% limit on 𝝁𝒔 ~𝟏
 UNFAIR ! ?
observed n
Glen Cowan: Statistical data analysis
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
Background 𝝁𝒃
11
Being Lucky …
 Feldman/Cousins confidence belts
motivated by “popular” ‘Bayesian’ approaches to handle such problems.
Bayesian: rather than constructing Confidence belts:
 turn Likelihood for 𝝁𝐬 (on given 𝒏𝒐𝒃𝒔 ) into Posterior probability on 𝝁𝐬
𝒊. 𝒆 𝑷𝒐𝒊𝒔𝒔𝒐𝒏(𝒏𝒐𝒃𝒔 ; 𝝁𝐬 + 𝝁𝐛 )
 𝒑 𝝁𝒔 𝒏𝒐𝒃𝒔 = 𝑳 𝒏𝒐𝒃𝒔 ; 𝝁𝒔 ∗ 𝝅(𝝁𝒔 )add prior probability on “s”:
Upper limit on signal
 𝝅 𝝁𝒔 =
0 1 10 𝒏
𝒐𝒃𝒔
Helene(1983).
Helge Voss
𝝁𝒔 < 𝟎
𝟎
𝒖𝒏𝒊𝒇𝒐𝒓𝒎 𝝁𝒔 > 𝟎
Background b
Feldman/Cousins
• there is still SOME “unfairness”
• perfectly “fine” in frequentist
interpretation:
• should quote “limit+sensitivity”
Feldman/Cousins(1998).
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
12
Statistical Tests in Particle Searches
exclusion limits
• upper limit on cross section
•
(↔lower limit on mass scale)
(σ < limit as otherwise we would have seen it)
discoveries
need to estimate probability of
upward fluctuation of b
try to disprove H0 =“background only”
need to estimate probability of
downward fluctuation of s+b
try to “disprove” H0 = s+b
better: find minimal s, for which you
can sill exclude H0 = s+b a prespecified Confidence Level
type 1 error
𝜶 = 𝟎. 𝟎𝟓%
→ 𝟗𝟓%𝑪𝑳
b only
Helge Voss
b only
𝒑𝒐𝒊𝒔𝒔𝒐𝒏(𝒏𝒃 ; 𝝁𝒃 = 𝟒)
Nobs for 5s
discovery
P = 2.8 10–7
s+b
possible observation
Hadron Collider Physics Summer School
𝑵𝒐𝒃𝒔.
June 8-17, 2011― Statistics in HEP
13
Which Test Statistic to used?
 𝒕 𝒏𝒐𝒃𝒔 =
𝑷𝒐𝒊𝒔𝒔𝒐𝒏 𝒏𝒐𝒃𝒔 ;𝒔,𝒃
𝑷𝒐𝒊𝒔𝒔𝒐𝒏 𝒏𝒐𝒃𝒔 ;𝒃
P(t(𝒙)_
exclusion limit:
• test statistic does not necessarily have to be simply the counted
number of events:
 remember Neyman Pearson  Likelihood ratio
 pre-specify 𝜶 = 𝟎. 𝟎𝟓%  95% CL on exclusion
 make your measurements
 if accepted (i.e. in “chritical (red) region”)
where you decided to “recjet” H0 = s+b
b only
s+b
t
 𝑪𝑳𝒔+𝒃 = 𝑷(𝒕 < 𝒕𝒐𝒃𝒔 )
 (i.e. what would have been the chance for THIS particular
measurement to still have been “fooled” and there would have
actually BEEN a signal)
Helge Voss
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
14
Example :LEP-Higgs search
Remember: there were 4 experiments, many different search channels
 treat different exerpiments just like “more channels”
𝑪𝑳𝒃
𝑪𝑳𝒔+𝒃
 Evaluate how the -2lnQ is distributed for
 background only
 signal+background
 (note: needs to be done for all Higgs masses)
 example: mH=115GeV/c2
more signal like
Helge Voss
more background like
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
15
Example: LEP SM Higgs Limit
Helge Voss
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
16
Example LEP Higgs Search
 In order to “avoid” the possible “problem” of Being Lucky when
setting the limit
 rather than “quoting” in addition the expected sensitivity
 weight your CLs+b by it:
𝑪𝑳𝒔+𝒃 𝑷 𝑳𝑳𝑹 ≥ 𝑳𝑳𝑹𝒐𝒃𝒔 |𝑯𝟏
𝑪𝑳𝒔 =
=
𝑪𝑳𝒃
𝑷 𝑳𝑳𝑹 ≤ 𝑳𝑳𝑹𝒐𝒃𝒔 |𝑯𝟎
𝑪𝑳𝒃
more signal like
Helge Voss
𝑪𝑳𝒔+𝒃
more background like
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
17
Systmatic Uncertainties
 standard popular way: (Cousin/Highland)
integrate over all systematic errors and their “Probability
distribution)
 marginalisation of the “joint probability density of
measurement paremters and systematic error)
! Bayesian ! (probability of the systematic parameter)
“hybrid” frequentist intervals and Bayesian systematic
has been shown to have possible large “undercoverage” for very
small p-values /large significances (i.e. underestimate the chance
of “false discovery” !!
 LEP-Higgs: generaged MC to get the PDFs with “varying” param.
with systematic uncertainty
 essentiall the same as “integrating over”  need probability
density for “how these parameters vary”
Helge Voss
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
18
Systematic Uncertainties
 Why don’t we:
 include any systematic uncertainly as “free parameter” in the fit
 eg. measure background
contribution under signal peak in
sidebands
 measurement + extrapolation into
side bands have uncertainty
 but you can parametrise your
expected background such that:
 if sideband measurement gives
this data  then b=…
K.Cranmer Phystat2003
Note: no need to specify prior probability
 Build your Likelyhood function such that it includes:
 your parameters of interest
 those describing the influcene of the sys. uncertainty
 nuisance parameters
Helge Voss
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
19
Nuisance Parameters and
Profile Liklihood
 Build your Likelyhood function such that it includes:
 your parameters of interest
 those describing the influcene of the sys. uncertainty
 nuisance parameters
ì
ˆ
ˆ)
L(
m
,
q
ï
,
ï L(m̂,qˆ)
l(m) = í
ï L(m,qˆ
ˆ)
,
ï
ˆ
î L(0,q )
Helge Voss
ü
0 £ m̂ £ m ï
ˆ)
ï
L(m,qˆ
,
ý =
L(m̂,qˆ)
ï
m̂ < 0
ï
þ
0 £ m̂ £ m
“ratio of
likelihoods”, why ?
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
20
Profile Likelihood
ì
ˆ
ˆ)
L(
m
,
q
ï
,
ˆ
ï L(m̂,q )
l(m) = í
ï L(m,qˆ
ˆ)
,
ï
î L(0,qˆ)
ü
0 £ m̂ £ m ï
ˆ)
ï
L(m,qˆ
,
ý =
L(m̂,qˆ)
ï
m̂ < 0
ï
þ
0 £ m̂ £ m
“ratio of
likelihoods”, why ?
Why not simply using L(m,q ) as test statistics ?
•
•
•
•
The number of degrees of freedom of the fit would be Nq + 1m
However, we are not interested in the values of q ( they are nuisance !)
Additional degrees of freedom dilute interesting information on m
The “profile likelihood” (= ratio of maximum likelihoods) concentrates the
information on what we are interested in
• It is just as we usually do for chi-squared: Dc2(m) = c2(m,qbest’ ) – c2(mbest, qbest)
• Nd.o.f. of Dc2(m) is 1, and value of c2(mbest, qbest) measures “Goodness-of-fit”
Helge Voss
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
21
Summary
 Maximum Likelihood fit
to estimate paremters
 what to do if estimator is non-gaussion:
Neyman – confidence intervals
what “bothers” people with them
 Feldmans/Cousins confidene belts/intervals
unifies “limit” or “measurement” confidence belts
 CLs … the HEP limit;
CLs … ratio of “p-values” … statisticians don’t like that
new idea: Power Constrained limits
 rather than specifying “sensitivity” and “neymand conf. interval”
 decide beforehand that you’ll “accept” limits only if the where your
exerpiment has sufficient “power” i.e. “sensitivity !
 .. a bit about Profile Likelihood, systematic error.
Helge Voss
Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
22