Transcript s(m H )+

http://eilam.weizmann.ac.il/Images/TheHiggsMechanism.swf
1
Eilam Gross, Higgs Statistics, Santander
Orsay 09 2009
Statistical issues for
Higgs Physics
Eilam Gross,
Weizmann Institute of Science
2
Eilam Gross, Higgs Statistics, Orsay 2009
Bayesian
3
Eilam Gross, Higgs Statistics, Orsay 09
Modified Frequentist - CLs
CMS & ATLAS Higgs Prospects
Bayesian
CMS
Gr‫י‬egory Schott SUSY 2009
4
Eilam Gross, Higgs Statistics, Orsay 09
Profile Likelihood
ATLAS
Discovery vs Exclusion
 Higgs statistics is about testing one hypothesis against another
hypothesis
 One hypothesis is the Standard Model with no Higgs Boson (H0)
 Another hypothesis is the SM with a Higgs boson with a specific mass mH
(H1)
 Rejecting the No-Higgs (H0) hypothesis DISCOVERY
s
5
b
 Rejecting the Higgs hypothesis (H1) EXCLUDING the Higgs at
the 95% CL:
s
 2? 1.64?
sb
 Usually H0 is referred to as the null hypothesis, but it depends on the context.
5
Eilam Gross, Higgs Statistics, Orsay 09
Basic Definition: Test Statistic
 Given an hypothesis H0
(Background only) one wants to
test
against an alternate hypothesis
H1 (Higgs with mass mH)
 One (very good) way is to
construct a test statistic Q and
use it to accept or reject an
hypothesis
 For a physicist the test statistic is
part of the analysis model
6
L( H1 ) L  s(mH )  b 
Q(mH ) 

L( H 0 )
L(b)
Eilam Gross, Higgs Statistics, Orsay 09
Basic Definitions: p-Value

A lot of it is about a language…. A jargon

Discovery…. A deviation from the SM - from the background
only hypothesis…

When will one reject an hypothesis?

p-value = probability that result is
as or less compatible with the background only hypothesis

Control region a
or size a defines the criterion

It is a custom to choose a2.87

7
 The pdf of Q….
10-7
If result falls within the control region, i.e.
p< a the BG only hypothesis is rejected
A discovery
Eilam Gross, Higgs Statistics, Orsay 09
Control region
Of size a
From p-values to Gaussian significance
It is a custom to express
the p-value as the
significance associated
to it, had the pdf were
Gaussians
Beware of 1 vs 2-sided definitions!
8
Eilam Gross, Higgs Statistics, Orsay 09
LR motivationThe Neyman-Pearson Lemma
 When performing a hypothesis test between two simple
hypotheses, H0 and H1, the Likelihood Ratio test, which
rejects H0 in favor of H1,
is the most powerful test
L( H | x)
of size a for a threshold 
 ( x) 
  , P   ( x)   | H   a
L( H | x)
 Define a test statistic
L ( H1 )
1
0
0

L( H 0 )
 Note: Likelihoods are functions of the data,
even though we often not specify it explicitly
L ( H1 | x )
( x) 
L( H 0 | x)
9
Eilam Gross, Higgs Statistics, Orsay 09
Basic Definitions:
Confidence Interval & Coverage
 Say you have a measurement mmeas of mH with mtrue being the unknown true
value of mH
 Assume you know the pdf of p(mmeas|mH)
 Given the measurement you deduce somehow (based on your statistical
model) that there is a 90% Confidence interval [m1,m2]....
 The correct statement: In an ensemble of experiments 90% of the obtained
confidence intervals will contain the true value of mH.
 The misconception; Given the data, the probability that there is a Higgs with a
mass inH the interval [m1,m2] is 90%.
10
Eilam Gross, Higgs Statistics, Orsay 09
Basic Definitions:
Confidence Interval & Coverage

Confidence Level: A CL of 90% means that in an ensemble of experiments, each

If in an ensemble of (MC) experiments our estimated Confidence Interval fail to
contain the true value of mH 90% of the cases (for every possible mH) we claim that
our method undercover

If in an ensemble of (MC) experiments the true value of mH is covered within the
estimated confidence interval , we claim a coverage
producing a confidence interval, 90% of the confidence intervals will
contain the true value of mH
 Normally, we make one experiment and try to estimate from this one
experiment the confidence interval at a specified CL% Confidence Level….
11
Eilam Gross, Higgs Statistics, Orsay 09

Basic Definitions:
Confidence
Interval & Coverage
The basic question will come again and again:
WHAT IS THE IMPORTANCE OF COVERAGE for a Physicist?
12

The “ problem” : Maybe coverage answers the wrong question…
you want to know what is the probability that the Higgs boson exists and is in specific
mass range…

Excluding a Higgs with a mass<mH at the 95% CL aims to mean that the interval [0,mH]
does not contain a Higgs Boson with that mass with a 95% probability

This is a wrong statement, and the right statement depends on the statistical model.

You can never test the signal by itself, only s+b, so your statement is about s(mH)+b ,
including the coverage….

We might prefer not to exclude a signal to which we are not sensitive to at the price of
undercoverage (CLS)
Eilam Gross, Higgs Statistics, Orsay 09
Subjective
Bayesian is Good
for YOU
Thomas Bayes (b 1702)
a British mathematician and Presbyterian minister
13
Eilam Gross, Higgs Statistics, Orsay 09


What
is
the
Right
Question
Is there a Higgs Boson? What do you mean?
Given the data , is there a Higgs Boson?
Can you really answer that without any a priori knowledge of the Higgs Boson?
Change your question: What is your degree of belief in the Higgs Boson given the data…
Need a prior degree of belief regarding the Higgs Boson itself…
P( Higgs | Data) 
P( Datas | Higgs) P( Higgs)
L( Higgs) ( Higgs)

P( Data)
 L( Higgs) ( Higgs)d ( Higgs)
L( Higgs)  P( Data | Higgs)



Make sure that when you quote your answer you also quote your prior assumption!
Can we assign a probability to a model P(Higgs)? Of course not, we can only assign to it a
degree of belief!
The most refined question is:


14
Assuming there is a Higgs Boson with some mass mH, how well the data agrees with that?
But even then the answer relies on the way you measured the data (i.e. measurement
uncertainties),
and |that
might
P( Data
Higgs
) include some pre-assumptions, priors!
Eilam Gross, Higgs Statistics, Orsay 09
What is the Right Answer?



The Question is:
Is there a Higgs Boson?
Is there a God?
P(God | Earth) 

In the book the author uses




15
P( Earth | God ) P(God )
P( Earth)
“divine factors” to estimate the
P(Earth|God),
a prior for God of 50%
He “calculates” a 67% probability for God’s
existence given earth…
In Scientific American
July 2004, playing a bit with the “divine factors”
the probability drops to 2%...
Eilam Gross, Higgs Statistics, Orsay 09
Basic Definitions:
The Bayesian Way
 Can the model have a probability (what is Prob(SM)?) ?
 We assign a degree of belief in models parameterized by µ


n    s(mH )  b
p (  | x) 
H0 :   0
H1 :   1
L(  ) (  )
 L( ) ( )d 
 Instead of talking about confidence intervals we talk about credible
intervals, where p(µ|x) is the credibility of µ given the data.
16
Eilam Gross, Higgs Statistics, Orsay 09
Systematics
Why “download” only music?
“Download” original ideas as well…. 
17
Eilam Gross, Higgs Statistics, Orsay 09
Systematics is Important
 An analysis might be killed by systematics
b   b  b 
 
b
2
    b   b   2b 2
2

s / b  s / b(1  b2 ) L

s/b

s/b
 5  s / b  0.5 for  ~ 10%

18
Eilam Gross, Higgs Statistics, Orsay 09
Basic Definitions:
Nuisance Parameters (Systematics)

There are two kinds of parameters:
 Parameters of interest (signal strength… cross section… µ)
 Nuisance parameters (background cross section, b, signal efficiency)

The nuisance parameters carry systematic uncertainties
 There are two related issues:
 Classifying and estimating the systematic uncertainties
 Implementing them in the analysis
 The physicist must make the difference between cross checks and
identifying the sources of the systematic uncertainty.
 Shifting cuts around and measure the effect on the observable…
Very often the observed variation is dominated by the statistical
uncertainty in the measurement.
19
Eilam Gross, Higgs Statistics, Orsay 09
Basic Definitions:
Implementation of Nuisance Parameters
 Implement by marginalizing or profiling
 Marginalization (Integrating) (The C&H Hybrid)
Integrate L over possible values of nuisance parameters
(weighted by their prior belief functions -- Gaussian,gamma,
others...)
 Consistent Bayesian interpretation of uncertainty on
nuisance parameters

 Note that in that sense MC “statistical” uncertainties (like
background statistical uncertainty) are systematic
uncertainties
20
Eilam Gross, Higgs Statistics, Orsay 09
Integrating Out The Nuisance Parameters
(Marginalization)
p (  ,  | x) 
L(  , ) (  , )
 L( , ) (, )d  d

L(  , ) (  , )
Normalization
 Our degree of belief in µ is the sum of our degree of belief
in µ given  (nuisance parameter), over “all” possible values of 
 That’s a Bayesian way
p(  | x)   p(  ,  | x) ( )d
21
Eilam Gross, Higgs Statistics, Orsay 09
Basic Definitions:
P(  | data) ~  L(  ,  ) ( ) d  d
Priors

A prior probability is interpreted as a description of what we believe about a parameter
preceding the current experiment
 Informative Priors: When you have some information about  the prior might be informative
(Gaussian or Truncated Gaussians…)
 Most would say that subjective informative priors about the parameters of interest should
be avoided (“….what's wrong with assuming that there is a Higgs in the mass range
[115,140] with equal probability for each mass point?”)
 Subjective informative priors about the Nuisance parameters are more difficult to argue
with
 These Priors can come from our assumed model (Pythia, Herwig etc…)
 These priors can come from subsidiary measurements of the response of the
detector to the photon energy, for example.
 Some priors come from subjective assumptions (theoretical, prejudice
symmetries….) of our model
22
Eilam Gross, Higgs Statistics, Orsay 09
Basic Definitions:
Priors – Uninformative Priors
 Uninformative Priors: All priors on the parameter of interest are usually
uninformative….
 Therefore flat uninformative priors are most common in HEP.
 When taking a uniform prior for the Higgs mass [115, ]… is it really uninformative?
do uninformative priors exist?
 When constructing an uninformative prior you actually put some information in it…
 But a prior flat in the coupling g will not be flat in ~g2
Depends on the metric!
 Note, flat priors are improper and might lead to serious problems of
undercoverage (when one deals with >1 channel, i.e. beyond counting, one should
AVOID them)
23
Eilam Gross, Higgs Statistics, Orsay 09
DISCOVERY
24
Eilam Gross, Higgs Statistics, Orsay 2009
Basic Definitions:
Rayleigh Distribution
25
Eilam Gross, Higgs Statistics, Orsay 09
The Discovery Case Study
•We assume a Gaussian “Higgs” signal (s)
on top of a Rayleigh shaped background
(b)
NOTE: b=b()
n   s(mH )  b
•The signal strength is µ
n  s  b
µ=1, SM Higgs
• µ=0, SM without Higgs
•
•Two hypothetical measurements
•Data
n ~  s(mH )  b
mH
26
Eilam Gross, Higgs Statistics, Orsay 09
The Discovery Case Study
•BG control sample scaled to
the expected BG via a factor 
n ~  s(mH )  b
n  s  b
NOTE: b=b()
m b
n   s(mH )  b
m ~ b
• The control sample
constraints the background
mH
27
Eilam Gross, Higgs Statistics, Orsay 09
A Simultaneous Fit
Two measurements
n  s(mH )  b
m ~ b
nbins
L    s  b( )    Poisson  ni ;   si  bi ( )   Poisson  mi ; bi ( ) 
i 1
mH
28
Eilam Gross, Higgs Statistics, Orsay 09
mH
Understanding hat and hat - hat
 2 L(   s  b)
0
MLE of L(   s  b) : ˆ , bˆ
 b
  ˆ , bˆ
ˆˆ
MLE of L(b) : b
L(   s  b)

0
ˆˆ
b
  0, b
mH
29
Eilam Gross, Higgs Statistics, Orsay 09
mH
The Discovery Case Study
Note, in this example, the signal towards the end of the background mass
distribution (mH=20,80) is better separated from the signal near
the middle (mH=50).
mH
30
mH
Eilam Gross, Higgs Statistics, Orsay 09
mH
LR motivationThe Neyman-Pearson Lemma
 When performing a hypothesis test between two simple
hypotheses, H0 and H1, the Likelihood Ratio test, which
rejects H0 in favor of H1,
is the most powerful test
L( H | x)
of size a for a threshold 
 ( x) 
  , P   ( x)   | H   a
L( H | x)
 Define a test statistic
L ( H1 )
1
0
0

L( H 0 )
 Note: Likelihoods are functions of the data,
even though we often not specify it explicitly
L ( H1 | x )
( x) 
L( H 0 | x)
31
Eilam Gross, Higgs Statistics, Orsay 09
The frequentist LR (CLb) method
• Define a test statistics
32

L( H1 ) L  s  b( ) 

L( H 0 )
L  b( ) 
Eilam Gross, Higgs Statistics, Orsay 09
The frequentist LR (CL) method
• Define a test statistics

• Use MC to generate the pdf of  under
L( H1 ) L  s  b( ) 

L( H 0 )
L  b( ) 
H0(B only) and H1 (S+B)
33
Eilam Gross, Higgs Statistics, Orsay 09
The frequentist LR (CL) method
• Define a test statistics

• Use MC to generate the pdf of  under
L( H1 ) L  s  b( ) 

L( H 0 )
L  b( ) 
H0(B only) and H1 (S+B)
• Let obs be a result of one experiment
(LHC)
logobserved
34
Eilam Gross, Higgs Statistics, Orsay 09
Example (from SUSY02):
Simulating BG Only Experiments

The likelihood ratio, -2ln(mH) tells us how much the outcome of an experiment is signal-like
(mH ) 

L( H1 ) L  s(mH )  b 

L( H 0 )
L(b)
NOTE, here the s+b pdf is plotted to the left!
Discriminator
(from SUSY02):
-2ln(mH)
35
Eilam Gross, Higgs Statistics, Orsay 09
s+b like
b-like
Example (from SUSY02):
Simulating S(mH)+b Experiments

The likelihood ratio, -2ln(mH) tells us how much the outcome of an experiment is signal-like
(mH ) 

36
L( H1 ) L  s(mH )  b 

L( H 0 )
L(b)
NOTE, here the s+b pdf is plotted to the left!
s+b like
Eilam Gross, Higgs Statistics, Orsay 09
-2ln(mH)
b-like
Straighten Things Up
0.2
0.2
0.18
0.16
H0
0.18
H
(b)
0.16
(b)
0.14
H1
0.12
PDF
PDF
0.14
(s+b)
0.1
0.08
0.12
0.1
1-CLb
1-CLb
0.04
CLs+b
0.02
CLs+b
0.02
-20
-15
-10
-5
0
5
10
15
20
25
0
-25
Likelihood
37
(s+b)
0.06
0.04
s+b like
H1
0.08
0.06
0
-25
0
-20
-15
-10
-5
0
5
10
15
20
25
Likelihood
b-like
b-like
Eilam Gross, Higgs Statistics, Orsay 09
s+b like
The frequentist LR (CL) method
• Define a test statistics

• Use MC to generate the pdf of  under
L( H1 ) L  s  b( ) 

L( H 0 )
L  b( ) 
H0(B only) and H1 (S+B)
• Let  be a result of one experiment
(LHC)
• The p-value is the probability to get an
observation which is less b-like than the
observed one
• If the result of the experiment (LHC) yields
a p-value< 2.8·10-7
a 5 discovery is claimed
• NOTE: the p-value can be interpreted as a
frequency  this is a frequentist approach
38
Eilam Gross, Higgs Statistics, Orsay 09
logobserved
Profile Likelihood
 Let us consider a counting experiment, measuring n
 Define
   0, background only
n  s  b 
   1, signal  background
 Construct a Likelihood
L(   s  b)  Poiss (n;  s  b)
 Test the bg-only ( µ=0 ) hypothesis
L(0  s  b)
L(b)
 (   0) 

L( ˆ  s  b) L( ˆ  s  b)
39
Eilam Gross, Higgs Statistics, Orsay 09
Profile Likelihood
 Define the PL ratio
L(0  s  b)
 (   0) 
L( ˆ  s  b)
 Define the test statistics to be
q0  -2ln  (0)
0.7
¡ 2log¸
 Note: If the data is b-like
Â21 P.D.F
0.6
ˆ ~ 0   (0) ~ 1  q0 ~ 0
0.5
0.4
 If the data contains a signal
0.3
ˆ ~ 1   (0) ~ 0  q0 ~ 
0.2
0.1
0
0
1
2
b-like
40
3
4
5
6
7
q0
8
9
10
s+b-like
Eilam Gross, Higgs Statistics, Orsay 09
The Profile Likelihood Simulator
 The Profile Likelihood Matlab Simulator
© O. Vitells & E. Gross
41
Eilam Gross, Higgs Statistics, Orsay 09
Systematics
 Normally, the background, b(), has an uncertainty which has to be taken
into account. In this case  is called a nuisance parameter (which we
associate with background systematics)
 How can we take into account the nuisance parameters?
 One way: marginalize them (integrate them out using priors)
 the Hybrid CL (mix frequentist and Bayesian approach)
prior
 Hybrid
 Another way is profiling
L( H1 ( )) ( )d
L( H1 ) L( H1 ( ))




L( H 0 ) L( H 0 ( ))
 L( H0 ( )) ( ) d
via the MLEs:
42
Eilam Gross, Higgs Statistics, Orsay 09
Systematics via Profiling

under H1

under H0

ˆ is the MLE of
 ˆ is the MLE of
ˆs b

is the MLE of
ˆˆ
b

PL
 PL Ratio
 LR (CLs+b)
43
ˆ
L(  s  b(ˆ ))
L (  s  b)


L( ˆ s  b)
L( ˆ s  b(ˆ))
 PL CL
ˆ
L  n, m | s  b(ˆs b ) 
L( H1 ) L( H1 ( ))



 
L( H 0 ) L( H 0 ( ))
L n, m | b(ˆb )
Eilam Gross, Higgs Statistics, Orsay 09


Wilks Theorem
L (   s  b)
 ( ) 
L( ˆ  s  b)
L(0  s  b)
 (   0) 
L( ˆ  s  b)
 Under a set of regularity conditions and for a sufficiently
large data sample, Wilks’ theorem says that for a hypothesized
value of μ, the pdf of the statistic −2lnλ (μ) approaches the
chi-square pdf for one degree of freedom
 One of the conditions is HµH0
44
Eilam Gross, Higgs Statistics, Orsay 09
Wilks Theorem
L (   s  b)
 ( ) 
L( ˆ  s  b)

q  -2 ln  (  )
distributes as a c2 with 1 d.o.f for experiments under the hypothesis Hµ
 i.e. q0 distributes as c2
for b-only experiments,
q1 distributes as c2 for s+b experiments
 This ensures simplicity, coverage, speed
f ( q |  )
f ( q |  )
0.7
0
10
¡ 2log¸
Â21
¡ 2log¸
Â21 P.D.F
P.D.F
0.6
-1
10
0.5
-2
10
0.4
-3
10
0.3
-4
0.2
10
0.1
-5
10
0
45
0
1
2
3
4
5
6
7
8
9
10
2
4
6
Eilam Gross, Higgs Statistics, Orsay 09
8
10
12
14
16
18
Wilks Theorem

L (   s  b)
 ( ) 
L( ˆ  s  b)
q  -2 ln  (  )
distributes as a c2 with 1 d.o.f for experiments under the hypothesis
Hµ
f (q |  ) ~ c 2 (q )
2
f (qobs |  ) ~ c 2 (qobs )  qobs
 obs -  
 -2ln obs (  )  
 Z2

 




Z is the significance
Z obs  qobs  -2ln obs (  )
dis cov ery
Z obs
 -2 ln obs (0)
exclusion
Z obs
 -2 ln obs (1)
46
Eilam Gross, Higgs Statistics, Orsay 09
Discovery
- Illustrated
 (   0) 
L(0  s  b | data)
, q0  -2log  (   0)
L( ˆ  s  b | data)
The profile LR of bg - only experiments (  = 0)
under the hypothesis of BG only (H0 )
f (q0 |   0)
The profile LR of S + B experiments (  = 1)
under the hypothesis of BG only (H0 )
The observed profile LR
L(0  s  b | data)
q0,obs  -2log
L( ˆ  s  b | data)
47
f (q0 |   1)
Eilam
Gross, Higgs between
Statistics, the
Orsay
09 and the no-Higgs hypothesis
p0 is the level47
of compatibility
data
If p0 is smaller than ~2.8∙10-7 we claim a 5 discovery
Median Sensitivity
 To estimate the median
sensitivity of an experiment
(before looking at the data),
one can either perform lots
of s+b experiments and
estimate the median q0,med or
evaluate q0 with respect to a
representative data set, the
ASIMOV data set with =1,
i.e. n=s+b
Z med  -2ln A (0)
A (0) 
48
Eilam Gross, Higgs Statistics, Orsay 09
L(   0 | ASIMOV data  s  b)
L( ˆ A  1| ASIMOV data  s  b)
The Profile Likelihood Simulator
 The Profile Likelihood Matlab Simulator
© O. Vitells & E. Gross
49
Eilam Gross, Higgs Statistics, Orsay 09
The ASIMOV data sets
 The name of the Asimov data set is inspired by the short story
Franchise, by Isaac Asimov [1]. In it, elections are held by selecting
a single voter to represent the entire electorate.
 The "Asimov" Representative Data-set for Estimating Median Sensitivities with
the Profile Likelihood
G/ Cowan, K. Cranmer, E. Gross , O. Vitells, under preparation

50
[1] Isaac Asimov, Franchise, in Isaac Asimov:The Complete Stories,Vol. 1, Broadway Books, 1990.
Eilam Gross, Higgs Statistics, Orsay 09
The Profile Likelihood Simulator
 The Profile Likelihood Matlab Simulator
© O. Vitells & E. Gross
51
Eilam Gross, Higgs Statistics, Orsay 09
Wilks Theorem
with
Nuisance
Parameters
ˆ
L(  , b(ˆ))
 ( ) 
L( ˆ , b(ˆ))
q  -2 ln  (  )
 if there are n parameters of interest, i.e., those parameters that
do not get a double hat in the numerator of the likelihood ratio
then −2lnλ (μ) asymptotically follows a chi-square distribution
for n degrees of freedom.
f ( q |  )
0.7
¡ 2log¸
Â21 P.D.F
0.6
0.5
0.4
0.3
52
ˆ
L(  , b(ˆ))
 ( ) 
L( ˆ , b(ˆ))
Eilam Gross, Higgs Statistics, Orsay 09
0.2
0.1
0
0
1
2
3
4
5
6
7
8
9
10
The Profile Likelihood with Nuisance Parameters
 The procedure is identical to the one without nuisance
parameters
ˆ
L(  , b(ˆ))
 ( ) 
L( ˆ , b(ˆ))
ˆˆ
L(   s  b( ))
 ( ) 
L( ˆ  s  b(ˆ))
53
Eilam Gross, Higgs Statistics, Orsay 09
CL & CI - Wikipedia
 A confidence interval (CI) is a particular kind of interval
estimate of a population parameter. Instead of estimating the
parameter by a single value, an interval likely to include the
parameter is given. Thus, confidence intervals are used to
indicate the reliability of an estimate. How likely the interval
is to contain the parameter is determined by the
confidence level or confidence coefficient. Increasing the
desired confidence level will widen the confidence interval.
54
Eilam Gross, Higgs Statistics, Orsay 09
The Profiled CL way
 Generate the pdf of -2lnPL
 PL
under H0 and H1
ˆ
L  n, m | s  b(ˆs b ) 
L ( H1 )


 
L( H 0 )
L n, m | b(ˆb )


 Define CI for q0: [- ,qobs]
 The CL of this CI is given by
-

p0  

qobs
f (q0 | H 0 )dq0
f (q0 | H 0 )dq0
0.18
H0
0.16
(b)
0.14
PDF
CLb  
qobs
0.2
H
0.12
1
(s+b)
0.1
0.08
0.06
 p0 ''='' 1-CLb
ATLAS, CERN – Open 2008-029
Cowan, Cranmer, E.G., Vitells, in preparation
55
1-CLb
0.04
CL
s+b
0.02
0
-25
Eilam Gross, Higgs Statistics, Orsay 09
-20
-15
-10
-5
0
5
Likelihood
10
15
20
25
30
The Profiled CL way
 PL
p0 ''='' 1-CLb
ˆ
L  n, m | s  b(ˆs b ) 
L ( H1 )


 
L( H 0 )
L n, m | b(ˆb )
• The median
significance can be
obtained with the one
Asimov data set
n~s+b, m~b
ATLAS, CERN – Open 2008-029
Cowan, Cranmer, E.G., Vitells, in preparation
56
Eilam Gross, Higgs Statistics, Orsay 09


The Profiled CL way
 PL
ˆ
L  n, m | s  b(ˆs b ) 
L ( H1 )


 
L( H 0 )
L n, m | b(ˆb )
In this example a Higgs with a mass
mH<32 or mH>52 is expected to be
discovered, i.e.
if the Higgs exists in this mass range
it will be discovered >50% of
hypothetical LHC experiments
57
Eilam Gross, Higgs Statistics, Orsay 09


The Profile Likelihood Ratio
•
CL Profiled LR
 PL
ˆ
L  n, m | s  b(ˆs b ) 
L ( H1 )


 
L( H 0 )
L n, m | b(ˆb )

PROFILE LIKELIHOOD RATIO
IS DIFFERENT!
•PL Ratio:
ˆ
L    s  b(ˆ(  1) ) 

 ( )  
L ˆ  s  b(ˆ)


•Test the null H0 hypothesis
ˆ
L  b(ˆ(  1) ) 


 (0) 
L ˆ  s  b(ˆ)


•Use the Wilks theorem and the Asimov
data set to deduce the p0,med
58
Eilam Gross, Higgs Statistics, Orsay 09

The frequentist
Profile Likelihood Ratio vs Profiled CL
•Its not surprising that using
a modified LR (not the
Neyman-Pearson motivated
LR) gives a slightly less
sensitive result ,yet
•Why using a method with a
slightly lower sensitivity?
•Because of the Wilks
theorem which can save us
hours and days of computer
time
59
Eilam Gross, Higgs Statistics, Orsay 09
Profiled CL
PL Ratio
Expected Discovery Sensitivity
Combination of Channels
 Each channel has its PLR with respect to the Asimov data set (x=s+b)
 For each channel
Li ( xA  s  b; ˆ ,ˆ)  LA,i (  A , A )  LA,i (1, A )
 We find
A (0 | x  s  b)  
ˆˆ
LA,i (0,  0 )
LA,i (1,  A )
 That is, A0|x=s+b approximates the median value of  0 one
i
would obtain from data generated according to the s+b hypothesis.
 The value of A0|x=s+b is used to determine the median q0,med, which
is used to find the median p-value, pmed.
q0,med  -2 ln A (0);
60
Eilam Gross, Higgs Statistics, Orsay 09
Z  q0,med
The Look Elsewhere Effect
61
Eilam Gross, Higgs Statistics, Orsay 2009
Look Elsewhere Effect
• To establish a discovery we try to reject the background only hypothesis
H0 against the alternate hypothesis H1
• H1 could be
• A Higgs Boson with a specified mass mH
• A Higgs Boson at some mass mH in the search mass range
• The look elsewhere effect deals with the floating mass case
Let the Higgs mass, mH, and the
signal strength µ
be 2 parameters of interest
62
ˆˆ
L(  , mH , b)
 (  , mH ) 
L( ˆ , mˆ , bˆ)
Eilam Gross, Higgs Statistics, Orsay 09
H
Look Elsewhere Effect
2 parameters of interest: the signal strength µ and the Higgs mass mH
ˆˆ
L(  , mH , b)
 (  , mH | n  s( mH 50)  b, m   b) 
L( ˆ , mˆ , bˆ)
H
-2log  (0)
mH
63
mH
Eilam Gross, Higgs Statistics, Orsay 09
Look Elsewhere Effect
•Letting the Higgs mass float
we find that the backgroundonly experiments distribute
approximately
as a
c 22
•The median sensitivity is
given by the corresponding pvalue
•
 (0, mH )   (0  s(m 'H )  b)
•Does
H 0,mHtheorem?
•DoesH
it 1,satisfy
mH Wilks
-2log  (0)
64
Eilam Gross, Higgs Statistics, Orsay 09
Look Elsewhere Effect
trial factor 
p float
p fix
Back of the envelope:
trial factor 

range
 m
resolution  m
mH
65
Eilam Gross, Higgs Statistics, Orsay 09
Discovery via the Bayes Way:
Bayes Factors
66
Eilam Gross, Higgs Statistics, Orsay 2009
The Bayes Way
67
Eilam Gross, Higgs Statistics, Orsay 09
Frequentist ~ Bayesian ?
68
Eilam Gross, Higgs Statistics, Orsay 09
Frequentist ~ Bayesian ?
 We found
B10 (  ) 
 ( )
 (0)
 With some work (E.G.,O.Vitells, POS
Krakow 09) we
show that for the
Asimov data set (s+b)
 Finally we find
69
Eilam Gross, Higgs Statistics, Orsay 09
Frequentist ~ Bayesian ?
70
Eilam Gross, Higgs Statistics, Orsay 09
Frequentist~Bayesian
Why Does It Work
 This relationship between the Bayes factor and the frequentist
PL ratio, though disturbing in the first place, is not surprising
when you come to think about it.
 Wilks theorem ensures that using the PL ratio you do not
need to perform any toy MC experiments to tell a
significance of an observation based on the one observed data
set.
 This is also the characteristics of a Bayesian hypothesis test.
71
Eilam Gross, Higgs Statistics, Orsay 09
EXCLUSION
72
Eilam Gross, Higgs Statistics, Orsay 2009
Exclusion Case Study
M=20
mH
73
M=50
M=80
mH
Eilam Gross, Higgs Statistics, Orsay 09
mH
Exclusion and p-value
 To exclude the s(mH)+b hypothesis (H1) we try to reject it
 i.e. calculating the observed or expected p-value
of the test statistic under the s(mH)+b pdf
 If the p-value ps+b<5% we claim the s+b hypothesis was
rejected at >95% CL
 Here we use the arguable relationship CL=1-ps+b
74
Eilam Gross, Higgs Statistics, Orsay 09
The frequentist CLs+b method
 Generate the pdf of -2ln

under H0 and H1
 Test the H1 hypothesis
 Define CI for q1: [- ,qobs]
0.2
 The CL of this CI is given by
-

0.18
H0
0.16
(b)
0.14
f (q1 | H1 )dq1
p1  ps b  
qobs
-
PDF
CLs b  
qobs
H
0.12
1
(s+b)
0.1
0.08
f (q1 | H1 )dq1
0.06
75
1-CLb
0.04
CL
s+b
0.02
 p1 ''='' CLs+b
L( H1 ) L  n, m | s  b( ) 

L( H 0 )
L  n, m | b( ) 
0
-25
Eilam Gross, Higgs Statistics, Orsay 09
-20
-15
-10
-5
0
5
Likelihood
10
15
20
25
30
The frequentist CLs+b method
 Use the LR as a test statistics
L( H1 ) L  n, m | s  b( ) 


L( H 0 )
L  n, m | b( ) 
 To take systematics unto account integrate the nuisance parameters or
profile them
 The exclusion is given by the
s(mH)+b hypothesis p-value
ps+b=CLs+b
 If ps+b<5%, the
s(mH)+b hypothesis is rejected
at the 95% CL
p-value=CLs+b
76
Eilam Gross, Higgs Statistics, Orsay 09
Exclusion –
Illustrated
ˆ
L( s  bˆ | data)
 (   1) 
, q1  -2log  (   1)
ˆ
L( ˆ  s  b | data)
The profile LR of s  b experiments (  = 1 )
under the hypothesis of s  b (H1 )
f (q1 |   1)
The profile LR of b - only experiments (  = 0 )
under the hypothesis of s  b (H1 )
The observed profile LR
ˆˆ
L( s  b | data )
q1,obs  -2 log
L( ˆ  s  bˆ | data )
p1  

q1 ,obs
77
f (q1 |   0)
f (q1 |1)dq1
Eilam
77
Higgs Statistics,
09 and the Higgs hypothesis
p1 is the level
of Gross,
compatibility
betweenOrsay
the data
If p1 is smaller than 0.05 we claim an exclusion at the 95% CL
Exclusion with Profile Likelihood
 Exclusion is related to the probability of the “would be”
signal to fluctuate down to the background only region
(i.e. the p-value of the s+b hypothesis)
 To evaluate the median sensitivity of an experiment we
generate a BG only data and calculate the median
q1,medp1,medZmed.
 -2 (
1) 95% C.L. (one sided)
 ZExclusion
at the
means Z=1.64
78
Eilam Gross, Higgs Statistics, Orsay 09
78
Deriving an Upper Limit
 µ is the signal strength
 When measured it can be interpreted as
 To get a “95% CL” upper limit on µ one has to solve


 SM
Z  1.64  -2 ( 95 )
 This will give the 95% credible (or confidence) interval [0,µ95]
 If this interval contains the value µ=1, the SM Higgs is NOT excluded
 This µ95 is therefore interpreted as a 95% upper limit on µ, i.e. µupper.
79
Eilam Gross, Higgs Statistics, Orsay 09
Exclusion Expected Limit,
Combination of Channels
 Each channel has its PLR with respect to the Asimov data set (background only,
x=b)
 For each channel
 We find
Li ( xA  b; ˆ ,ˆ)  LA,i (  A , A )  LA,i (0, A )
ˆ
LA,i (  , ˆ )
 A (  | x  b)  
LA,i (0,  A )
i
 That is, A|x=b approximates the median value of   one would obtain
from data generated according to the background-only hypothesis.
 The value of A|x=b is used to determine the median qmed, which is used to
find the median p-value, pmed.
 This has to be computed for all µ and the point where pµmed = 0.05 gives the
95% CL upper limit on µmed.
80
Eilam Gross, Higgs Statistics, Orsay 09
Profile Likelihood Ratio
Test the S(mH)+b hypothesis
i.e. test the µ=1 hypotheis
ˆ
L  s  b(ˆ(  1) ) 

 (   1)  
L ˆ  s  b(ˆ)


q1  -2 ln  (   1)
•q1 distributes as a c2 under
s(mH)+b experiments (H1)
•The exclusion significance
Z  q1  -2 ln  (1)
can be expressed in terms of an
equivalent exclusion CL
p1  ps b  1 - CL
•The exclusion sensitivity is the
median CL, and using toy MCs
one can find the 1 and 2  bands
81
If ps+b<5% we (using a wrong jargon) say that
the signal is excluded at >95% CL (CL=1-ps+b(
Note that the exclusion CL is identified here with
CL=1-CLs+b where CLs+b=ps+b
Eilam Gross, Higgs Statistics, Orsay 09
Exclusion Profile Likelihood Ratio
•A Higgs with a specific
mass mH is excluded at
the 95% CL if the
observed p-value of the
s(mH)+b hypothesis is
below 0.05
If ps+b<5%, the s(mH)+b hypothesis
is rejected at the 95% CL
 pexample
•p
In1 this
a CL
s b  1 Higgs Boson is
expected to be
excluded
p1<0.05 (CL>95%)
in all the mass range
82
Eilam Gross, Higgs Statistics, Orsay 09
Exclusion Bayesian
Let
prob(  | n, m) be the posterior for µ
prob(  | n, m) 
 L    s  b( )   ( ) ( )d
 L    s  b( )   ( ) ( ) d d 
NOTE: The pdf of the posterior is
based on the one observed data
event with the likelihood integrated
over the nuisance parameters
To set an upper limit on the signal
strength
calculate the

credibility interval[0,µ
]

 95
SM
0.95  
95
0
83
Pr ob( | n, m)d 
Eilam Gross, Higgs Statistics, Orsay 09
Data = Asimov b
Exclusion Bayesian
Let
prob(  | n, m) be the posterior for µ
NOTE: The toy MC are
needed just to find the
median sensitivity, but
once the data is delivered,
it is sufficient to
determine the upper limit
using the posterior
integration
0.95  
95
0
84
Pr ob( | n, m)d 
Eilam Gross, Higgs Statistics, Orsay 09
Data = b-only
Exclusion Bayesian
•We find that the credibility
interval [0,µ95] does not
contain µ95=1 (SM) for
mH<28 or mH>61
• This is sometimes wrongly
expressed as an exclusion
at the 95% CL
mH
85
Eilam Gross, Higgs Statistics, Orsay 09
Comparing Bayesian to Frequentist PL
prob(  | n, m) 
saddle-point
approximation
(for flat priors)

 L  n, m |   s  b( )   ( ) ( )d
 L  n, m |   s  b( )   ( ) ( ) d d 
e
e
ˆ
log L (  s  bˆ )
log L ( ˆ s  bˆ )
e
log
ˆ
L (  s  bˆ )
L ( ˆ s  bˆ )

  ( )
2
For the Asimov BG only -2 log  (  ) ~ 2

data ˆ  0
prob(  | x  b) ~  (  ) ~ e
-
2
2 2
Note: Taking the proper normalizations into account we find the following
equivalence:95
0.95   Pr ob(  | n, m)d 
-2 log  ( 95 )  1.96  97.5% exclusion CL
0
0.90  
95
0
86
Pr ob(  | n, m)d 
-2 log  ( 95 )  1.64  95.0% exclusion CL
Eilam Gross, Higgs Statistics, Orsay 09
Exclusion Bayesian vs PL Ratio
•Comparing a credibility Bayesian
interval to
95% frequentist CL is like
comparing
oranges to apples….Yet
•In Bayesian statistics, the observed data is
sufficient to infer how strong is an hypothesis
(assuming some priors).
•In the Profile Likelihood frequentist approach,
Wilks’ theorem ensures that under a set of
regularity conditions and for a sufficiently large
data sample, for an hypothesized value of µ, the
mH
•NOTE: One has to be careful about the 1sided vs 2-sided significance
87
Eilam Gross, Higgs Statistics, Orsay 09
The problem with the CLs+b method
 Use the LR as a test statistics
L( H1 ) L  n, m | s  b( ) 


L( H 0 )
L  n, m | b( ) 
 To take systematics unto account integrate the nuisance parameters or
profile them
 The exclusion is given by the
s(mH)+b hypothesis p-value
ps+b=CLs+b
 If ps+b<5%, the
s(mH)+b hypothesis is rejected
at the 95% CL
p-value=CLs+b
88
Eilam Gross, Higgs Statistics, Orsay 09
A wrong jargon –
mixing p-values and Confidence Levels
0.2
0.18
H
0.16
(b)
0
PDF
0.14
0.12
0.1
H1
0.08
(s+b)
0.06
1-CLb
0.04
CLs+b
0.02
0
-25
-20
-15
-10
-5
0
5
10
15
20
25
Likelihood
b-like
90
Eilam Gross, Higgs Statistics, Orsay 09
s+b like
CLs+b and CLb


1-CLb is the p value of the b-hypothesis,
i.e. the probability to get a result less
compatible with the BG only hypothesis
than the observed one
(in experiments where BG only hypothesis
is true)
CLs+b is the p-value of the s+b hypothesis,
i.e. the probability to get a result which is
less compatible with a Higgs signal when
the signal hypothesis is true!
A small CLs+b leads to an exclusion
of the signal hypothesis at the CL=1CLs+b confidence level.
0.2
0.18
H0
0.16
(b)
0.14
PDF

H
0.12
1
(s+b)
0.1
0.08
0.06
1-CLb
0.04
CL
s+b
0.02
0
-25
-20
-15
-10
-5
0
5
10
15
b-like
25
s+b like
Observed Likelihood
91
20
Likelihood
Eilam Gross, Higgs Statistics, Orsay 09
30
The Problem of Small Signal

<Nobs>=s+b leads to the physical requirement that Nobs>b

A very small expected s might lead to an anomaly when Nobs
fluctuates far below the expected background, b.
0.2

At one point DELPHI alone had CLs+b=0.03 for mH=116 GeV
However, the cross section for 116 GeV Higgs at LEP was too
small and Delphi actually had no sensitivity to observe it

The frequntist would say: Suppose there is a 116 GeV Higgs….

Only 3% of the confidence intervals contain the true value of
qSM(mH)
i.e. in 3% of the experiments the true signal would be rejected…
(one would obtain a result incompatible or more so with m=116)
i.e. a 116 GeV Higgs is excluded at the 97% CL…..


92
97% of the intervals [qobs,] do not contain
qSM (mH)
Eilam Gross, Higgs Statistics, Orsay 09
0
(b)
0.16
PDF

H
0.18
0.14
H1
0.12
(s+b)
0.1
1-CL
0.08
b
0.06
0.04
CLs+b
0.02
0
-20
-15
-10
-5
0
Likelihood
Observed Likelihood
5
10
15
20
The CLs Method for Upper Limits

Inspired by Zech(Roe and Woodroofe)’s
derivation for counting experiments
P(ns b  no )
P(nb  no )
A. Read suggested the CLs method
P(ns b  no nb no ) 

0
(b)
0.16
with
CLs b
ps  b
CLs 

CLb 1 - pb
H
0.18
PDF

0.2
In the DELPHI example,
CLs=0.03/0.13=0.26, i.e. a
116 GeV could not be
excluded at the 97% CL
anymore…..
(pb=1-CLb=0.87)
0.14
H1
0.12
(s+b)
0.1
1-CL
0.08
b
0.06
0.04
CLs+b
0.02
0
-20
-15
-10
-5
0
Likelihood
Observed Likelihood
93
Eilam Gross, Higgs Statistics, Orsay 09
5
10
15
20
The Meaning of CLs
0.07
False exclusion
rate of
Signal when
Signal is true

Is it really that bad
that a method
undercovers where
Physics is sort of
handicapped… (due
to loss of sensitivity)?
0.06
5%
0.05
0.04
0.03
0.02
0.01
0
80
94
85
90
95
100
105
110
115
120
125
mH
Eilam Gross, Higgs Statistics, Orsay 09
115 GeV
130
The modified frequentist CLs
• In
this example, while
using PL or the CLs the
Higgs is excluded in all
the mass range, the CLs
reduces the sensitivity
and does not allow to
exclude a Higgs with
30<mH<60
95
Eilam Gross, Higgs Statistics, Orsay 09
Conclusions
 We have explored and compared all the methods to test hypotheses that
are currently in use in the High Energy Physics market
(PLR, CLs+b, CLs, Bayesian )
 We have shown that all methods tend to give similar results, (for both
exclusion and discovery using flat priors) weather one integrates the
nuisance parameters or profile them
 Even though we have used typical case studies, real life might be
different and all available methods should be explored
96
Eilam Gross, Higgs Statistics, Orsay 09