Durham_Brownbag

Download Report

Transcript Durham_Brownbag

Advanced Statistical
Techniques in Particle
Physics
Conference Summary
(Thanks to Bob Cousins!)
Jim Linnemann
MSU HEP Seminar
23 April, 2002
Conference Overview
Durham, UK
– 5 days, nearly no rain!
• Mixture of “theoretical” and practical
–
–
–
–
Overview/Tutorial talks
Systematic Comparisons of Methods
New Developments
Problems
• Visiting (tolerant!) Statisticians:
– Michael Goldstein
– Wolfgang Rolke
• Radical idea: if phenomenologist in collaboration, why
not a professional statistician (a la medical research)?
http://www.ippp.dur.ac.uk/statistics/
Tutorials, Overviews, Explanations
• Fred James:
– Overview
– Goodness of Fit vs. Intervals
• Roger Barlow:
– Systematics:
• mistakes, effects, errors
• Niels Kjaer: Monte Carlo
– Interpolating (+ much else)
• Pekka Sinervo:
– Significance
• Berkan Aslan (G. Zech);
– Goodness of Fit measures
Multidimensional:
• Sherry Towers:
– PDE’s
– Reducing variables in
classification
• Harrison Prosper:
– multi- dimensional methods
• Tony Vaiculis:
– Support Vector Machines
• Glen Cowan, Volker Blobel:
– Unfolding
• Paul Harrison:
– Blind Analysis
Theory, Practice, and Methods
• Chris Parkes
– Combining Lep W results
• Gary Hill, Tyce De Young
– Bayes in Amanda tracking
• Rudy Bock, Wolfgang Wittek
– Multidimensional methods for
Gamma/hadron separation
• Volker Blobel
– Global Alignment Fits
• Alex Read
– CLS
• Dean Karlen
– Credibility of Conf Intervals
• Raja
– Uncertainty of Limits
Problems to Chew On
• Nigel Smith and Dan Tovey
– Dark Matter Searches
• Bruce Yabsley
– Statistics in Practice at Belle
Fred James
Important not to confuse these problems, e.g., interval
estimation and goodness-of-fit testing.
Weng
Parameter
Parameter fitting
fitting criterion
criterion
– Hypothesis-testing vs parameter-fitting criteria
( cited from J.C. Collins, J. Pumplin, hepph/0105207, p.3 )
Roger Barlow
Calculated the  to use for comparison checks…
Roger Barlow
Multidimensional Methods
• Aspire to full extraction of information
• Equivalent to trying to fit
P(signal)/P(background) (Neyman-Pearson)
• Issues in
–
–
–
–
–
–
–
choice of dimensionality (no one tells you how many!)
methods of approximation
control of bias/variance tradeoff
complexity of fit
Number of free parameters
Amount of training data needed
“ease” of interpretation
• We are following the field; hope for theory to help
• See: Elements of Statistical Learning,
– Hastie, Tibshirani, Friedman
Sherry Towers
Sherry Towers
Wow! Several questions
come to my mind…
[In general case,
variables deletion is
safer than variable
addition. –M.G.]
Harrison Prosper
• Thumbnail sketch of some methods of interest:
–
–
–
–
–
–
–
–
Fisher Linear Discriminant
Principal Components Analysis
Independent Component Analysis
Self-Organizing Map
Grid Search
Probability Density Estimation
Neural Networks
Support Vector Machines
• Said these all are attempts to solve the single classification
problem whose solution is the Bayes discriminator
D(x) = P(S|x)/P(B|x) = (L(S)/L(B)) (P(S)/P(B))
… = Neyman-Pearson when P(S)=P(B)
• Multivariate analysis is hard: important to use all the
information used by D(x) (which might be lost, e.g., by
marginalization). Appears that there is no single optimal
approximation.
SVM
Vaiculis
Ref
Glen Cowan
Unfolding (unsmearing)
• Inherently unstable
Measured smoother than
true
Un-smooth: Enhances noise!
• Nice discussion of
regularization, biases,
uncertainies
• See talk
– and his statistics book
One program:
• Must balance between
oscillations and oversmoothed result
• = Bias-variance tradeoff
– Same issues in
multidimensional methods
V. Blobel
Unfolding: Insight
• View as matrix problem
• “ill-posed” = singular
• Analyze in terms of
Eigenvalues/vectors and
condition number
Statistical
error
Truncate eigenfunctions when below error bound
Unfolding Results
Blobel
oversmoothed
Statistical error in neighboring
bins no longer uncorrelated!
High frequencies not measured:
Report fewer bins?
(or supply from prior??)
Higher modes converge slowly:
interate only a few times (d’Agostini)
N.J. Kjaer (I)
Delphi MC
Re-interpretation of data
to interpolate on physics
paramters
Analogy with Stat Mech
MC techniques?
Goodness of Fit
“Energy Test” (electrostatics motivated)
Aslan/Zech
Aslan/Zech
Paul Harrison
“liberating”
Blind Analysis
Cousins: it takes longer, especially first time
By the way, no one can read light green print…
Blind Analysis
• A called shot
– Step towards making 3 mean 3
• Many ways to blind
– 10% of data; background-only, obscure fit result
• Creates a mindset
– Avoiding biases and subjectivity
R.K. Bock
R.K. Bock
R.K. Bock
Method details and comments:
composite probabilities (2-D)
• intuitive determination of event probabilities by
multiplying the probabilities in all 2D projections that can
be made from image parameters, using constant bin
content for some data
• shown on some IACT data to at least match best
existing results (but strict comparisons suffered from
moving data sets)
Method details and comments:
composite probabilities (2-D)
R.K.Bock, Durham, March 2002
23
CP program uses same-content binning
in 2 dimensions
Bins are set up for
gammas (red),
Interesting idea:
probabilities are
evaluated for
protons (blue)
Expansion in
dimensionality of
correlation
all possible 2-D
projections are
used
R.K.Bock, Durham, March 2002
24
R.K. Bock
R.K. Bock
(Hill/DeYoung)
(Hill/DeYoung)
Very Interesting Technique!
• Let’s relate it to something we do: say particle ID
in a detector:
– In hot part of detector near beam: lots of background,
we tighten particle-ID cuts
– In lower-occupancy part of the detector away from
beam, can loosen certain particle-ID cuts without letting
in a lot of background
• Use our knowledge of position-dependent
occupancy rates in Bayes’s Theorem to calculate
the probability that a given particle in a given
location is the species of interest.
Comments:
• If all input P’s are frequentist P’s, the output
P(particle type | data) is a frequentist P.
• We can use this posterior frequentist P like any
other observable for cuts, weights, etc. If we
independently calibrate the signal efficiency/
background rejection of this use, there is nothing
circular about using our knowledge of the input
occupancies.
• If the input occupancy knowledge is imperfect it
will not introduce a bias, but rather make the
technique less powerful.
Bayes’s Theorem applies to any P
satisfying the axioms of probability
• Frequentist P: limiting frequency
– Theorem not much use if the unknown is a constant of
nature: P(unknown) = delta-function at unknown value
• Bayesian P: degree of belief
– For constant of nature, P(unknown) can be combination
of delta-function and continuous function, reflecting
degree of belief
• Is the Amanda technique “Bayesian”?
– Not if “Bayesian” implies “not frequentist”, as I think is
common, even though frequency P is emulated in a
certain application/limit of degree of belief.
• In any case, instructive example!
Practicalities of Combining
Analyses:
W Physics Results at LEP
Chris
Parkes
Now the stuff you don’t
normally see…
RC: An informative talk about
both methodology and sociology!
An important reminder:
pragmatic considerations
(sometimes even irrational) can
be as important as principles in
order to get out a result.
Parkes
This talk is not for the squeamish or over-idealistic,
but it is a vivid description of the real world in action!
Cousins:
• LEP experiments contained a sizable fraction of
world HEP community, and reached very mature
state of analysis.
– We have much to learn from them, both theoretical and
practical.
Studies of Intervals
• Byron Roe and
• Punzi: Strong
Michael Woodroofe:
Confidence Intervals
Mini-Boone
• Giovanni Signorelli et
• Jan Conrad: Coverage
al: Strong C.I. And
with Systematics
systematics
• Rolke and Lopez: Bias
correction via doublebootstrap
• Giunti and Laveder:
the “power” of
confidence intervals
Dean Karlen’s Proposal to Evaluate
Credibility of Confidence Intervals
• Yesterday evening, generally interested-tofavorable reaction
• Cousins: I’m outlier: I think it will only encourage
unthinking “easy” use of Bayes, with more flat
(i.e., not degree of belief) priors.
• We evaluate Bayesian intervals with serious
frequentist methods.
• Why not evaluate confidence intervals with
serious Bayesian methods? One metric-dependent
prior constituteth not a sensitivity analysis.
• Who was it who said “How do you know that the
outlier isn’t right?”
Alex Read’s Beautiful Talk on CLS
• CLs = PV(s+b)/PV(b)
(Cowan, PDG Stats)
– PV = P Value = prob(obs), posterior, like P(2)
• Behavior compared to LR Ordering (F-C) is
understood and lucidly explained. Application to
neutrino oscillations!
• Please see his talk
• Cousins comment: The non-standard conditioning
(inequality, not ancillary statistic) of Zech and
Roe&W and Read leads to problems with lower
end of confidence intervals (see Cousins PRD
Comment). Alex recognized this.
• Therefore, Alex now advocates CLS only for
limits, and in case of signal, he now would use LR
Ordering.
To Use Bayes or not?
• Professional Statisticians are much more Bayesoriented in last 20 years
– Computationally possible
– Philosophically coherent
• (solipsistic?? Subjective Bayes…)
• In HEP: want to publish result, not prior
– We want to talk about P(theory|data)
• But this requires prior: P(theory)
– Likelihoods we can agree on!
– Conclusions should be insensitive to a range of priors
• Probably true, with enough data
• Search limits DO depend on priors!
– Hard to convince anyone of a single objective prior!!!
– Unpleasant properties of naïve frequentist limits, too
• Feldman-Cousins is current consensus
• Systematic errors hard to treat in frequentist
– PDG currently recommends Bayes “smearing of likelihood”
• close in spirit to Cousins-Highland mixed Frequentist-Bayesian
Michael Goldstein
• A real pleasure to have you here!
• Since subjective Bayes is rarely used in HEP, but
is “known” to be the “coherent” version, it has
been very enlightening:
• “Sensitivity Analysis is at the heart of scientific
Bayesianism”
– How skeptical would the community as a whole have to
be in order not to be convinced.
– What prior gives P(hypothesis) > 0.5
– What prior gives P(hypothesis) > 0.99, etc
• There’s a split among Bayesians; M.G. is in the
group that sees no virtue in objective (“arbitrary”)
priors (except as one of many examples of
possible prior beliefs in a sensitivity analysis).
Michael Goldstein (cont.)
• Procedures should obey the likelihood
principle. Frequentist methods don’t obey
it: fundamental flaw.
• Bayesian methods are hard to do right, but
they are the only way to attack certain hard
problems.
• Bayes Linear Methodology: addresses
expectations rather than whole pdf’s.
• HEP problems: appear to map onto a very
similar set of abstract problems.
Cousins would add:
• (Coherent) Subjective priors behave like real probabilities
under transformations, unlike, e.g., flat priors.
• M.G. represents only one school of Bayesian stats, but I
don’t think you will find a school advocating uniform prior
for a Poisson mean.
• M.G. portrays Bayesian methods as hard, but worth the
effort. This should be stressed in HEP, where the hard part
(subjective prior) is dodged, and the math is (indeed) easily
cranked out (without backwards thinking) to give an
“answer” that I think is without much content unless
evaluated by frequentist standards.
• I think M.G.’s point about sensitivity analysis has to be
taken to heart in HEP, whether one uses objective or
subjective priors.
Cousins’ Last Words (for now!)
• The area under the likelihood function is
meaningless.
• Mode of a probability density is metric-dependent,
as are shortest intervals.
• A confidence interval is a statement about
P(data | parameters), not P(parameters | data)
• Don’t confuse confidence intervals (statements
about parameter) with goodness of fit (statement
about model itself).
• P(non-SM physics | data) requires a prior; you
won’t get it from frequentist statistics.
• The argument for coherence of Bayesian P is
based on P = subjective degree of belief.