PC_June27_2011

Download Report

Transcript PC_June27_2011

Statistics Forum Follow-up
info for Physics Coordination
27 June, 2011
Glen Cowan, Eilam Gross
and the Third Man Kyle Cranmer
E. Gross, G. Cowan, K. Cranmer
Follow-up from the Statistics Forum / CERN, 24 June 2011
1
Main questions
What do we see as the main way forward with CMS?
What do we recommend in the short term (summer 2011)?
What do we recommend after summer 2011?
E. Gross, G. Cowan, K. Cranmer
Follow-up from the Statistics Forum / CERN, 24 June 2011
2
Interactions with the CMS Statistics Group
Interaction between ATLAS and CMS statistics groups began
already several years ago in the context of the Higgs combination;
this effort continues successfully in the separate HCG
(with CLs being the method with the ATLAS one-sided test
statistic and ATLAS treatment of nuisance parameters):
In addition, the meetings between the ATLAS and CMS Statistics
Groups have increased this year with the goal of agreeing on
statistical tools and practice to facilitate comparison and eventual
combination of results.
ATLAS: G. Cowan, E. Gross, K. Cranmer, O. Vitells, W. Murray
CMS: R. Cousins, L. Lyons, L. Demortier, T. Dorigo
E. Gross, G. Cowan, K. Cranmer
Report from the Statistics Forum / CERN, 23 June 2011
3
The way forward with CMS
We met again with CMS in the evening of 23 June 2011
(ATLAS: Cowan, Gross, Murray, Read, Cranmer;
CMS: Cousins, Lyons, Dorigo, Demortier)
Cousins more or less ruled out supporting either CLs or PCL as
a long-term recommendation for CMS. We tried to clarify if
this was his view or that of CMS. He believes his own view,
which is to use Feldman-Cousins unified (two-sided) intervals
would be followed in CMS.
We replied that the prevailing view in ATLAS has been to
quote a one-sided upper limit, and it was difficult to envisage
adopting F-C in place of this. So at present there is no single
frequentist method that would have long-term support from both
ATLAS and CMS.
E. Gross, G. Cowan, K. Cranmer
Follow-up from the Statistics Forum / CERN, 24 June 2011
4
ATLAS/CMS discussions on one-sided limits
Some prefer to report one-sided frequentist upper limits (CLs,
PCL); others prefer unified (Feldman-Cousins) limits, where
the lower edge may or may not exclude zero.
The prevailing view in the ATLAS Statistics Forum has been that
in searches for new phenomena, one wants to know whether a cross
section is excluded on the basis that its predicted rate is too high
relative to the observation, not excluded on some other grounds
(e.g., a mixture of too high or too low).
Among statisticians there is support for both approaches.
E. Gross, G. Cowan, K. Cranmer
Follow-up from the Statistics Forum / CERN, 24 June 2011
5
ATLAS/CMS discussions on one-sided limits
Using FC is almost sure to produce intervals which will exclude
μ = 0 at the 95% CL; those might require an apologetic
explanation to the reader.
Using 2-sided to produce limits will enable an exclusion of the
Higgs boson when the data fluctuates upwards with respect to
the expected signal.
We prefer to stick to the traditional way of 1-sided, where we
state clearly that we are interested in a limit and produce a CI
which always include zero, i.e.  [0,up ]
E. Gross, G. Cowan, K. Cranmer
Follow-up from the Statistics Forum / CERN, 24 June 2011
6
Discussions concerning flip-flopping
One-sided limits (CLs, PCL) can suffer from “flip-flopping”, i.e.,
violation of coverage probability if one decides, based on the data,
whether to report an upper limit or a measurement with error bars
(two-sided interval).
This can be avoided by “always” reporting:
(1) An upper limit based on a one-sided test.
(2) The discovery significance (equivalent to p-value
of background-only hypothesis with the q0 test statistic).
In practice, “always” can mean “for every analysis carried out
as a search”, i.e., until the existence of the process is well
established (e.g., 5σ).
I.e. we only require what is done in practice to map approximately
onto the idealized infinite ensemble.
E. Gross, G. Cowan, K. Cranmer
Follow-up from the Statistics Forum / CERN, 24 June 2011
7
Discussions on CLs and F-C
CLs has been criticized as a method for preventing spurious
exclusion as it leads to significant overcoverage that is in practice
not communicated to the reader.
This was the motivation behind PCL.
We have also not supported using the upper edge of a FeldmanCousins interval as a substitute for a one-sided upper limit, since
when used in this way F-C has lower power.
Furthermore F-C unified intervals protect against small (or null)
intervals by counting the probability of upward data fluctuations,
which are not relevant if the goal is to establish an upper limit.
E. Gross, G. Cowan, K. Cranmer
Follow-up from the Statistics Forum / CERN, 24 June 2011
8
The way forward with CMS (2)
In the short term, there is support for CLs in both collaborations
as an interim solution to allow for comparison of limits.
Bayesian methods emerged as a solution with support from
both sides. On the one hand this had always been viewed
as a useful complement to the frequentist limit. Furthermore,
one can study and report the frequentist properties of Bayesian
intervals (i.e., the fraction of times they would cover the
true parameter value), and in many examples this turns out to
be very good.
Both sides agreed to consider Bayesian methods with priors
chosen to have good frequentist properties as a common method.
E. Gross, G. Cowan, K. Cranmer
Follow-up from the Statistics Forum / CERN, 24 June 2011
9
The way forward with CMS (3)
At a more detailed level it will take some more time to agree
on and implement the procedures. So in the short term this is
not a realistic solution for analyses where Bayesian methods have
not already been developed.
We have already started to discuss the Bayesian implementation
in ATLAS. A twiki has been created|
E. Gross, G. Cowan, K. Cranmer
Follow-up from the Statistics Forum / CERN, 24 June 2011
10
https://twiki.cern.ch/twiki/bin/view/AtlasProtected/BayesianLimitRecommendationIm
plementation
E. Gross, G. Cowan, K. Cranmer
Follow-up from the Statistics Forum / CERN, 24 June 2011
11
Recommendation on minimum power
for PCL from 16% to 50%
For summer 2011 (and beyond), we recommend quoting PCL
limits with the minimum power of 50%. The reasons for
moving the minimum power to 50% are both theoretical and
practical:
50% avoids the possibility of having a conservative treatment
of systematics lead to a stronger limit.
Some computational issues related to low-count analyses are
less problematic with 50%.
There is a slight reduction in the burden on the analyst, since
the 50% quantile (median) needed for the power constraint is
easier to find than the 16% quantile (-1 sigma error band).
E. Gross, G. Cowan, K. Cranmer
Follow-up from the Statistics Forum / CERN, 24 June 2011
12
Recommendation on minimum power
for PCL from 16% to 50% (2)
50% minimum power gives a slight reduction in the
“psychological burden” on conference speakers, in that the
fraction of times one sees a sizable difference between PCL and
CLs would be less, and then only in cases where a strong
downward fluctuation leads to a stronger CLs limit (see graph
on next page and recall that under the background-only model,
muHhat lives 68% of the time between -1 and 1).
Owing to the short notice before EPS, it may be desirable to
leave the minimum power at 16% for the short term. This
should depend on whether groups feel they need more time to
shift from 16% to 50%. In practice this step should not take any
more time, and in some cases will save time.
E. Gross, G. Cowan, K. Cranmer
Follow-up from the Statistics Forum / CERN, 24 June 2011
13
(unknown) true value →
Upper limits for Gaussian problem
measurement →
E. Gross, G. Cowan, K. Cranmer
Follow-up from the Statistics Forum / CERN, 24 June 2011
14
E. Gross, G. Cowan, K. Cranmer
Follow-up from the Statistics Forum / CERN, 24 June 2011
15
Changing the constraining power to 50% looks natural and seems
to be in the same boat with CLs (psychologically)
The expected PCL is better because it covers exactly at 95%, while
CLs has an intrinsic overcoverage (its expectation is at the 97.5% CL)
E. Gross, G. Cowan, K. Cranmer
Follow-up from the Statistics Forum / CERN, 24 June 2011
16
Recommendations
PCL solves problem of “spurious exclusion” by separating the
parameter space into regions in which one has/hasn’t sufficient
sensitivity as given by the probability to reject μ if backgroundonly model is true.
Recommendations for ATLAS:
Report unconstrained limit.
Report power constrained limit (with power M0(μ) ≥ 0.5).
Report p-value of background-only hypothesis.
new
Also report CLs.
In problems with low background, recent improvement to
software implementation related to treatment of nuisance params.
ATLAS also has ongoing effort to establish recommendations for
Bayesian limits (Georgios Choudalakis, Diego Casadei).
E. Gross, G. Cowan, K. Cranmer
Report from the Statistics Forum / CERN, 23 June 2011
17
https://twiki.cern.ch/twiki/bin/view/
AtlasProtected/StatisticsTools
E. Gross
Follow-up from the Statistics Forum / CERN, 24 June 2011
18
New frequentist limit document
https://twiki.cern.ch/twiki/pub/AtlasProtected/
StatisticsTools/Frequentist_Limit_Recommendation.pdf
E. Gross, G. Cowan, K. Cranmer
Report from the Statistics Forum / CERN, 23 June 2011
19
https://twiki.cern.ch/twiki/bin/view/AtlasProtected/Frequentist
LimitRecommendationImplementation
E. Gross, G. Cowan, K. Cranmer
Follow-up from the Statistics Forum / CERN, 24 June 2011
20
Low Counts
E. Gross, G. Cowan, K. Cranmer
Follow-up from the Statistics Forum / CERN, 24 June 2011
21
Intermediate
E. Gross, G. Cowan, K. Cranmer
Follow-up from the Statistics Forum / CERN, 24 June 2011
22
Asymptotic
E. Gross, G. Cowan, K. Cranmer
Follow-up from the Statistics Forum / CERN, 24 June 2011
23
Conclusions
We recommend using PCL with a minimum power of 50% as
the primary result.
For the short term, we support also reporting CLs provided to
allow for comparison with CMS.
In the longer term, the Bayesian approach appears to have
common support in both ATLAS and CMS. This will take some
time to implement for many analyses; for others it is already
available.
Search analyses should also report the discovery significance (pvalue of the background-only hypothesis).
Documentation and code exist with twiki walkthrough
https://twiki.cern.ch/twiki/bin/view/AtlasProtected/FrequentistLi
mitRecommendationImplementation
E. Gross, G. Cowan, K. Cranmer
Follow-up from the Statistics Forum / CERN, 24 June 2011
24
Extra material (repeated from 23 June talk)
E. Gross, G. Cowan, K. Cranmer
Follow-up from the Statistics Forum / CERN, 24 June 2011
25
Discussions concerning PCL
PCL has been criticized as it does not obviously map onto a
Bayesian result for some choice of prior (CLs = Bayesian for
special cases, e.g., x ~ Gauss(μ, σ), constant prior for μ ≥ 0).
We are not convinced of the need for this. The frequentist properties
of PCL are well defined, and as with all frequentist limits one
should not interpret them as representing Bayesian credible intervals.
Further criticism of PCL is related to an unconstrained limit that
could exclude all values of μ. A remnant of this problem could
survive after application of the power constraint (cf. “negatively
biased relevant subsets”).
PCL does not have negatively biased relevant subsets (nor does
our unconstrained limit, as it never excludes μ = 0).
On both points, debate still ongoing.
E. Gross, G. Cowan, K. Cranmer
Follow-up from the Statistics Forum / CERN, 24 June 2011
26