StatForum_24jun11
Download
Report
Transcript StatForum_24jun11
Statistics Forum Follow-up
info for Physics Coordination
24 June, 2011
Glen Cowan, Eilam Gross
G. Cowan
Follow-up from the Statistics Forum / CERN, 24 June 2011
1
Main questions
What do we see as the main way forward with CMS?
What do we recommend in the short term (summer 2011)?
What do we recommend after summer 2011?
G. Cowan
Follow-up from the Statistics Forum / CERN, 24 June 2011
2
The way forward with CMS
We met again with CMS in the evening of 23 June 2011
(ATLAS: Cowan, Gross, Murray, Read, Cranmer; CMS:
Cousins, Lyons, Dorigo, Demortier)
Cousins more or less ruled out supporting either CLs or PCL as
a long-term recommendation for CMS. We tried to clarify if
this was his view or that of CMS. He believes his own view,
which is to use Feldman-Cousins unified (two-sided) intervals
would be followed in CMS.
We replied that the prevailing view in ATLAS has been to
quote a one-sided upper limit, and it was difficult to envisage
adopting F-C in place of this. So at present there is no single
frequentist method that would have long-term support from both
ATLAS and CMS.
In the short term, there is support for CLs in both collaborations
as an interim solution to allow for comparison of limits.
G. Cowan
Follow-up from the Statistics Forum / CERN, 24 June 2011
3
The way forward with CMS (2)
Bayesian methods emerged as a solution with support from
both sides. On the one hand this had always been viewed
as a useful complement to the frequentist limit. Furthermore,
one can study and report the frequentist properties of Bayesian
intervals (i.e., the fraction of times they would cover the
true parameter value), and in many examples this turns out to
be very good.
Both sides agreed to consider Bayesian methods with priors
chosen to have good frequentist properties as a common method.
At a more detailed level it will take some more time to agree
on and implement the procedures. So in the short term this is
not a realistic solution for analyses where Bayesian methods have
not already been developed.
G. Cowan
Follow-up from the Statistics Forum / CERN, 24 June 2011
4
Recommendation on minimum power
for PCL from 16% to 50%
For summer 2011 (and beyond), we recommend quoting PCL
limits with the minimum power of 50%. The reasons for
moving the minimum power to 50% are both theoretical and
practical:
50% avoids the possibility of having a conservative treatment
of systematics lead to a stronger limit.
Some computational issues related to low-count analyses are
less problematic with 50%.
There is a slight reduction in the burden on the analyst, since
the 50% quantile (median) needed for the power constraint is
easier to find than the 16% quantile (-1 sigma error band).
G. Cowan
Follow-up from the Statistics Forum / CERN, 24 June 2011
5
Recommendation on minimum power
for PCL from 16% to 50% (2)
50% minimum power gives a slight reduction in the
“psychological burden” on conference speakers, in that the
fraction of times one sees a sizable difference between PCL and
CLs would be less, and then only in cases where a strong
downward fluctuation leads to a stronger CLs limit (see graph
on next page and recall that under the background-only model,
muHhat lives 68% of the time between -1 and 1).
Owing to the short notice before EPS, it may be desirable to
leave the minimum power at 16% for the short term. This
should depend on whether groups feel they need more time to
shift from 16% to 50%. In practice this step should not take any
more time, and in some cases will save time.
G. Cowan
Follow-up from the Statistics Forum / CERN, 24 June 2011
6
(unknown) true value →
Upper limits for Gaussian problem
measurement →
G. Cowan
Follow-up from the Statistics Forum / CERN, 24 June 2011
7
Conclusions
We recommend using PCL with a minimum power of 50% as
the primary result.
For the short term, we support also reporting CLs provided to
allow for comparison with CMS.
In the longer term, the Bayesian approach appears to have
common support in both ATLAS and CMS. This will take some
time to implement for many analyses; for others it is already
available.
Search analyses should also report the discovery significance (pvalue of the background-only hypothesis).
G. Cowan
Follow-up from the Statistics Forum / CERN, 24 June 2011
8
Extra material (repeated from 23 June talk)
G. Cowan
Follow-up from the Statistics Forum / CERN, 24 June 2011
9
ATLAS/CMS discussions on one-sided limits
Some prefer to report one-sided frequentist upper limits (CLs,
PCL); others prefer unified (Feldman-Cousins) limits, where
the lower edge may or may not exclude zero.
The prevailing view in the ATLAS Statistics Forum has been that
in searches for new phenomena, one wants to know whether a cross
section is excluded on the basis that its predicted rate is too high
relative to the observation, not excluded on some other grounds
(e.g., a mixture of too high or too low).
Among statisticians there is support for both approaches.
G. Cowan
Follow-up from the Statistics Forum / CERN, 24 June 2011
10
Discussions concerning flip-flopping
One-sided limits (CLs, PCL) can suffer from “flip-flopping”, i.e.,
violation of coverage probability if one decides, based on the data,
whether to report an upper limit or a measurement with error bars
(two-sided interval).
This can be avoided by “always” reporting:
(1) An upper limit based on a one-sided test.
(2) The discovery significance (equivalent to p-value
of background-only hypothesis).
In practice, “always” can mean “for every analysis carried out
as a search”, i.e., until the existence of the process is well
established (e.g., 5σ).
I.e. we only require what is done in practice to map approximately
onto the idealized infinite ensemble.
G. Cowan
Follow-up from the Statistics Forum / CERN, 24 June 2011
11
Discussions on CLs and F-C
CLs has been criticized as a method for preventing spurious
exclusion as it leads to significant overcoverage that is in practice
not communicated to the reader.
This was the motivation behind PCL.
We have also not supported using the upper edge of a FeldmanCousins interval as a substitute for a one-sided upper limit, since
when used in this way F-C has lower power.
Furthermore F-C unified intervals protect against small (or null)
intervals by counting the probability of upward data fluctuations,
which are not relevant if the goal is to establish an upper limit.
G. Cowan
Follow-up from the Statistics Forum / CERN, 24 June 2011
12
Discussions concerning PCL
PCL has been criticized as it does not obviously map onto a
Bayesian result for some choice of prior (CLs = Bayesian for
special cases, e.g., x ~ Gauss(μ, σ), constant prior for μ ≥ 0).
We are not convinced of the need for this. The frequentist properties
of PCL are well defined, and as with all frequentist limits one
should not interpret them as representing Bayesian credible intervals.
Further criticism of PCL is related to an unconstrained limit that
could exclude all values of μ. A remnant of this problem could
survive after application of the power constraint (cf. “negatively
biased relevant subsets”).
PCL does not have negatively biased relevant subsets (nor does
our unconstrained limit, as it never excludes μ = 0).
On both points, debate still ongoing.
G. Cowan
Follow-up from the Statistics Forum / CERN, 24 June 2011
13