cowan_atlas_23jun11
Download
Report
Transcript cowan_atlas_23jun11
Report from the Statistics Forum
ATLAS Week Physics Plenary
CERN, 23 June, 2011
Glen Cowan, Eilam Gross, Kyle Cranmer*
* on behalf of the ATLAS Statistics Forum
G. Cowan
Report from the Statistics Forum / CERN, 23 June 2011
1
Outline
Recent issues concerning upper limits
Update to recommendation for PCL
New software for low background analyses
Interactions with the CMS Statistics Group
Provisional agreement for summer conferences
Longer-term issues
G. Cowan
Report from the Statistics Forum / CERN, 23 June 2011
2
Setting Limits
There are several methods one may use for setting limits:
One-sided (frequentist), e.g., PCL, CLs
Unified intervals (Feldman-Cousins)
Bayesian
In ATLAS, we have recommended using Power-Constrained
Limits (PCL) and also to report CLs limits to allow for
comparison with CMS.
This recommendation was adopted by Physics Coordination
after the Statistics Workshop held on 15 April 2011, and will be
revisited at the upcoming PC meeting on 27 June 2011.
G. Cowan
Report from the Statistics Forum / CERN, 23 June 2011
3
PCL Quick Review (see arXiv:1105.3166)
Consider a parameter μ proportional to rate of signal (μ ≥ 0).
“Naive” upper limits can exclude parameter values to which
one has little or no sensitivity (for s << b, exclusion prob ~ 5%).
CLs solves this by effectively penalizing the test of each parameter
value by an amount that varies continuously with the sensitivity;
result is a limit with coverage probability > 95%.
PCL addresses the problem by regarding μ to be excluded if:
(a)It is excluded by a statistical test at 95% CL.
(b) One has sufficient sensitivity to μ.
Here sensitivity is measured by the power M0(μ) of a test of μ
with respect to the background-only alternative. I.e. require
M0 () P( above limit | 0) Mmin
G. Cowan
Report from the Statistics Forum / CERN, 23 June 2011
4
PCL in practice
+/- 1σ band
of limit dist.
assuming μ = 0.
median limit
(unconstrained)
observed limit
PCL with Mmin = 0.16
Here power below
threshold; do not
exclude.
Important to report both the constrained and unconstrained limits.
G. Cowan
Report from the Statistics Forum / CERN, 23 June 2011
5
Choice of minimum power
Choice of Mmin is convention. Formally it should be large relative
to 1 – CL (5%). Earlier we have proposed
because in case of x ~ Gauss(μ,σ) this means that one applies the
power constraint if the observed limit fluctuates down by one
standard deviation or more.
For the Gaussian example, this gives μmin = 0.64σ, i.e., the lowest
limit is similar to the intrinsic resolution of the measurement (σ).
We have recently revisited this point and now propose moving the
minimum power to Mmin = 0.5, i.e., PCL never goes below the
median limit under assumption of background only.
G. Cowan
Report from the Statistics Forum / CERN, 23 June 2011
6
Aggressive conservatism
It could be that owing to practical constraints, certain systematic
uncertainties are over-estimated in an analysis; this could
be justified by wanting to be conservative.
The consequence of this will be that the +/-1 sigma bands of
the unconstrained limit are broader than they otherwise would be.
If the unconstrained limit fluctuates low, it could be that the
PCL limit, constrained at the -1sigma band, is lower than it
would be had the systematics been estimated correctly.
Being conservative could be more aggressive.
If the power constraint Mmin is at 0.5, then by inflating the
systematics the median of the unconstrained limit is expected to
move less, and in any case upwards, i.e., it will lead to a less
strong limit (as one would expect from “conservatism”).
G. Cowan
Report from the Statistics Forum / CERN, 23 June 2011
7
(unknown) true value →
Upper limits for Gaussian problem
measurement →
G. Cowan
Report from the Statistics Forum / CERN, 23 June 2011
8
P(μ ≤ μup | μ) →
Coverage probability for Gaussian problem
(unknown) true value →
G. Cowan
Report from the Statistics Forum / CERN, 23 June 2011
9
PCL summary of recent developments
Proposal to move minimum power from 16% to 50%.
Power constraint applied at the median limit.
Improvement of approximations used for low-count analyses.
New code available (see Statistics Forum twiki):
https://twiki.cern.ch/twiki/bin/view/AtlasProtected/
StatisticsTools.
Substantial improvement in speed.
Substantial progress on documentation, including background on
method and implementation details (see twiki).
G. Cowan
Report from the Statistics Forum / CERN, 23 June 2011
10
New frequentist limit document
https://twiki.cern.ch/twiki/pub/AtlasProtected/
StatisticsTools/Frequentist_Limit_Recommendation.pdf
G. Cowan
Report from the Statistics Forum / CERN, 23 June 2011
11
New usage details on twiki
https://twiki.cern.ch/twiki/bin/view/AtlasProtected/
FrequentistLimitRecommendationImplementation
Example from twiki of how
to determine whether
asymptotic formulae are
valid.
The new scripts implement
the appropriate procedures
for different regimes, e.g.,
asymptotic, b < 10, b > 10.
G. Cowan
Report from the Statistics Forum / CERN, 23 June 2011
12
Interactions with the CMS Statistics Group
Interaction between ATLAS and CMS statistics groups began
already several years ago in the context of the Higgs combination;
this effort continues in the separate LHC Higgs Combination Group:
In addition, the meetings between the ATLAS and CMS Statistics
Groups have increased this year with the goal of agreeing on
statistical tools and practice to facilitate comparison and eventual
combination of results.
ATLAS: G. Cowan, E. Gross, K. Cranmer, O. Vitells, W. Murray
CMS: R. Cousins, L. Lyons, L. Demortier, T. Dorigo
G. Cowan
Report from the Statistics Forum / CERN, 23 June 2011
13
Discussion on Limits with CMS
Within CMS it has been recommended to use at least one of the
three methods mentioned in the PDG Statistics Review:
Bayesian
CLs
Feldman-Cousins
In ATLAS, we have recommended using Power-Constrained
Limits (PCL) and also to report CLs limits to allow for
comparison with CMS.
In recent meetings with CMS we have listed the mathematical
properties of the various limits and on these we essentially agree.
There is some disagreement on the importance that one should
attach to different properties.
G. Cowan
Report from the Statistics Forum / CERN, 23 June 2011
14
Properties of Frequentist Limits (1)
One-sided (PCL, CLs) versus unified (Feldman-Cousins)
Exclude parameter values because predicted rate higher
than data, or because prediction ≠ data on other grounds
(e.g., likelihood ratio wrt two-sided alternative).
Coverage
Substantial over-coverage for CLs and upper edge of
F-C. Exact for full interval of F-C. Exact for PCL in
region of sensitivity; 100% otherwise.
Flip-flopping
Violation of coverage if decision to report limit or
two-sided interval is based on data. Not problem for F-C;
OK for one-sided limits if one agrees to always report
upper limit for searches (also should report p-value of
background-only hypothesis, p0).
G. Cowan
Report from the Statistics Forum / CERN, 23 June 2011
15
Properties of Frequentist Limits (2)
Avoiding exclusion in cases with little/no sensitivity.
PCL Discontinuous separation of (in)sensitive regions.
CLs Ratio of p-values → penalty against low sensitivity.
F-C Counts prob. of upwards fluctuation for upper limit.
Power (related to median limit under background-only hypothesis).
PCL Most powerful for region with sensitivity; zero otherwise.
CLs Less powerful than PCL
F-C upper edge as limit less powerful than PCL, CLs, but full
interval also has power relative to higher values of μ.
Correspondence with Bayesian result for some prior
CLs yes; F-C yes (approx.); PCL no.
G. Cowan
Report from the Statistics Forum / CERN, 23 June 2011
16
Properties of Frequentist Limits (3)
Negatively biased relevant subsets
Related to conditional coverage probability given that outcome
is observed in some identifiable subset of data space.
PCL, CLs, F-C do not have NBRS.
If also condition on m, all methods have (adapted) NBRS.
Familiarity in HEP community
CLs widely used. F-C used for many problems but not often as
a replacement for upper limits. PCL is new but core concepts
are textbook statistics and documentation now greatly improved:
arXiv:1105.3166 and info on method and implementation on
https://twiki.cern.ch/twiki/bin/view/AtlasProtected/
StatisticsTools.
G. Cowan
Report from the Statistics Forum / CERN, 23 June 2011
17
Areas where ATLAS and CMS agree
Both collaborations support RooStats as the software tool for
combinations. See, e.g., K. Cranmer talk at PHYSTAT 2011:
https://indico.cern.ch/conferenceOtherViews.py?view=standard
&confId=107747
Within both collaborations there are many who support the
Bayesian approach, especially for limits (see, e.g., talks by
A. Harel and D. Casadei at PHYSTAT 2011):
Recent effort in ATLAS to establish recommendations for
Bayesian limits (Georgios Choudalakis, Diego Casadei).
Within both ATLAS and CMS there exist different views on
unfolding, with a strong tendency away from use of bin-by-bin
factors. (See e.g. talks by G. Choudalakis and M. Weber from
PHYSTAT 2011).
G. Cowan
Report from the Statistics Forum / CERN, 23 June 2011
18
Discussions on Discovery with CMS
The two collaborations broadly agree on how to report the
significance of a discovery.
The test statistic recommended in ATLAS coincides with the
Feldman-Cousins approach for testing the background-only model.
There is also support in both collaborations for an approximate
correction for the Look-Elsewhere Effect using the approach of
Gross and Vitells (EPJC 70 (2010) 525, arXiv:1005.1891;
arXiv:1105.4355).
And there is no controversy if analyses correct for LEE exactly
(e.g., floating-mass Higgs search), as long as the uncorrected
(e.g., fixed-mass) discovery significance is also reported.
Both collaborations have made some progress in studying Bayesian
Model Selection using Bayes Factors (ongoing).
G. Cowan
Report from the Statistics Forum / CERN, 23 June 2011
19
Summary and conclusions (1)
PCL solves problem of “spurious exclusion” by separating the
parameter space into regions in which one has/hasn’t sufficient
sensitivity as given by the probability to reject μ if backgroundonly model is true.
Recommendations for ATLAS:
Report unconstrained limit.
Report power constrained limit (with power M0(μ) ≥ 0.5).
Report p-value of background-only hypothesis.
new
Also report CLs.
In problems with low background, recent improvement to
software implementation related to treatment of nuisance params.
ATLAS also has ongoing effort to establish recommendations for
Bayesian limits (Georgios Choudalakis, Diego Casadei).
G. Cowan
Report from the Statistics Forum / CERN, 23 June 2011
20
Summary and conclusions (2)
Discussions with the CMS Statistics Group are ongoing.
Goal is to agree on statistical tools and practice to facilitate
comparison and eventual combination of results.
Broad agreement in a number of areas but still non-trivial issues
concerning limits:
one-sided vs. unified
PCL vs. CLs
We essentially agree on the mathematical properties of the
approaches; debate is on relative importance of various properties.
Provisional agreement to use CLs as basis for comparison;
in longer term Bayesian limit may play this role.
G. Cowan
Report from the Statistics Forum / CERN, 23 June 2011
21
Extra slides
G. Cowan
Report from the Statistics Forum / CERN, 23 June 2011
22
Some reasons to consider increasing Mmin
Mmin is supposed to be “substantially” greater than α (5%).
So Mmin = 16% is fine for 1 – α = 95%, but if we ever want
1 – α = 90%, then16% is not “large” compared to 10%;
μmin = 0.28σ starts to look small relative to the intrinsic resolution
of the measurement. Not an issue if we stick to 95% CL.
PCL with Mmin = 16% is often substantially lower than CLs.
This is because of the conservatism of CLs (see coverage).
But goal is not to get a lower limit per se, rather
● to use a test with higher power in those regions where one
feels there is enough sensitivity to justify exclusion and
● to allow for easy communication of coverage (95% for
μ ≥ μmin; 100% otherwise).
G. Cowan
Report from the Statistics Forum / CERN, 23 June 2011
23
A few further considerations
Obtaining PCL requires the distribution of unconstrained limits,
from which one finds the Mmin (16%, 50%) percentile.
In some analyses this can entail calculational issues that
are expected to be less problematic for Mmin = 50% than for 16%.
Analysts produce anyway the median limit, even in absence of
the error bands, so with Mmin = 50% the burden on the analyst is
reduced somewhat (but one would still want the error bands).
We therefore recently proposed moving Mmin to 50%.
G. Cowan
Report from the Statistics Forum / CERN, 23 June 2011
24
Treatment of nuisance parameters
In most problems, the data distribution is not uniquely specified
by μ but contains nuisance parameters θ.
This makes it more difficult to construct an (unconstrained)
interval with correct coverage probability for all values of θ,
so sometimes approximate methods used (“profile construction”).
More importantly for PCL, the power M0(μ) can depend on θ.
So which value of θ to use to define the power?
Since the power represents the probability to reject μ if the
true value is μ = 0, to find the distribution of μup we take the
values of θ that best agree with the data for μ = 0:
May seem counterintuitive, since the measure of sensitivity
now depends on the data. We are simply using the data to choose
the most appropriate value of θ where we quote the power.
G. Cowan
Report from the Statistics Forum / CERN, 23 June 2011
25
ATLAS/CMS discussions on one-sided limits
Some prefer to report one-sided frequentist upper limits (CLs,
PCL); others prefer unified (Feldman-Cousins) limits, where
the lower edge may or may not exclude zero.
The prevailing view in the ATLAS Statistics Forum has been that
in searches for new phenomena, one wants to know whether a cross
section is excluded on the basis that its predicted rate is too high
relative to the observation, not excluded on some other grounds
(e.g., a mixture of too high or too low).
Among statisticians there is support for both approaches.
G. Cowan
Report from the Statistics Forum / CERN, 23 June 2011
26
Discussions concerning flip-flopping
One-sided limits (CLs, PCL) can suffer from “flip-flopping”, i.e.,
violation of coverage probability if one decides, based on the data,
whether to report an upper limit or a measurement with error bars
(two-sided interval).
This can be avoided by “always” reporting:
(1) An upper limit based on a one-sided test.
(2) The discovery significance (equivalent to p-value
of background-only hypothesis).
In practice, “always” can mean “for every analysis carried out
as a search”, i.e., until the existence of the process is well
established (e.g., 5σ).
I.e. we only require what is done in practice to map approximately
onto the idealized infinite ensemble.
G. Cowan
Report from the Statistics Forum / CERN, 23 June 2011
27
Discussions on CLs and F-C
CLs has been criticized as a method for preventing spurious
exclusion as it leads to significant overcoverage that is in practice
not communicated to the reader.
This was the motivation behind PCL.
We have also not supported using the upper edge of a FeldmanCousins interval as a substitute for a one-sided upper limit, since
when used in this way F-C has lower power.
Furthermore F-C unified intervals protect against small (or null)
intervals by counting the probability of upward data fluctuations,
which are not relevant if the goal is to establish an upper limit.
G. Cowan
Report from the Statistics Forum / CERN, 23 June 2011
28
Discussions concerning PCL
PCL has been criticized as it does not obviously map onto a
Bayesian result for some choice of prior (CLs = Bayesian for
special cases, e.g., x ~ Gauss(μ, σ), constant prior for μ ≥ 0).
We are not convinced of the need for this. The frequentist properties
of PCL are well defined, and as with all frequentist limits one
should not interpret them as representing Bayesian credible intervals.
Further criticism of PCL is related to an unconstrained limit that
could exclude all values of μ. A remnant of this problem could
survive after application of the power constraint (cf. “negatively
biased relevant subsets”).
PCL does not have negatively biased relevant subsets (nor does
our unconstrained limit, as it never excludes μ = 0).
On both points, debate still ongoing.
G. Cowan
Report from the Statistics Forum / CERN, 23 June 2011
29