Sensitivity, Completeness, Reliability

Download Report

Transcript Sensitivity, Completeness, Reliability

•
•
•
•
Lecture 6
A little bit more about source detection.
Parameter fitting – to measure the source.
Mr Bayes gets to put his point of view at
last.
How to assess our source list when we
have it.
NASSP Masters 5003F - Computational Astronomy - 2009
Significance
• We have talked about the
probability of the Null Hypothesis
as a way of deciding which
detections to label as sources.
• Significance is another way to talk
about this.
• “X-sigma detection” means amp =
X times the standard deviation of
the noise.
• Often used loosely and
questionably.
• Related to the concept of
confidence intervals (see later).
• For Gaussian noise,
σ
• 5-sigma => NH P-value ~ 3*10-7
NASSP Masters 5003F - Computational Astronomy - 2009
What is the best detection method?
From my 2009 Cash paper.
NASSP Masters 5003F - Computational Astronomy - 2009
Now back to fitting.
• We want good values of the parameters.
– Amplitude terms and units:
• Counts, count rate (instrumental units)
• Brightness
• Magnitudes in a given filter band (eg B
magnitudes)
• Flux S
– 10-26 W m-2 Hz-1 = 1 Jansky (radio)
– erg cm-2 s-1 (x-ray)
– Position terms and units:
• RA and dec
• arcsec offset from a reference
• detector coordinates.
NASSP Masters 5003F - Computational Astronomy - 2009
Confidence intervals - uncertainties.
• Frequentist
interpretation:
– Given a confidence
interval enclosing
probability P:
– Parent values of θ will
fall within the interval a
fraction P of the time.
• Can ‘cut the cake’ an
infinite number of
ways.
• Symmetric, width 1σ
=> P = 0.68 for
Gaussian noise.
NASSP Masters 5003F - Computational Astronomy - 2009
Confidence intervals.
• Uncertainties with
multi-dimensional
data:
– fit a paraboloid to the
maximum;
– find 68% confidence
contour (an ellipse);
– σ12, σ22, σ122 etc from
the ellipse equation.
– Alternatively, this
covariance matrix is
the inverse of the
curvature matrix for χ2.
NASSP Masters 5003F - Computational Astronomy - 2009
General problems with fitting:
• When some of the θs are ‘near
degenerate’.
– Solution: avoid this.
• When the model is wrong – or, several
different models fit equally well.
– Solution: F-test (sometimes). Supposedly
restricted to the case in which 2 models differ
by an additive component.
NASSP Masters 5003F - Computational Astronomy - 2009
Degenerate θs
NASSP Masters 5003F - Computational Astronomy - 2009
Competing models
• Both models give
moderately ok chi2,
but clearly neither
really describes what
is happening. Ie we
don’t understand the
physics here.
NASSP Masters 5003F - Computational Astronomy - 2009
Bayesian vs Frequentist
Frequentist:
Data y
Unknown true parent f
Assume parent ^f
Monte Carlo
or analysis
Calculated p(y)
p(y|f):
Prob. dens’y that
parent f gives y.
Bayesian:
p(f|y), prob. dens’y
of parent f given
data y.
Datayy
Markov Chain
Monte Carlo
Bayes’ theorem
Prior knowledge
of p(f)
NASSP Masters 5003F - Computational Astronomy - 2009
Bayesian statistics – a bare outline.
• Bayes’ theorem:
– p(Θ|y,I) is called the ‘posterior distribution’ of
the model parameters,
– p(y|Θ) is the probability distribution of the data
given the model,
– p(Θ|I) contains ‘prior’ knowledge about Θ,
– and p(y|I) is a normalizing constant.
• Hopefully some examples next week…
NASSP Masters 5003F - Computational Astronomy - 2009
Markov Chain Monte Carlo (MCMC)
•
(To some extent this may just be grandiose terminology…shh…)
• A Monte Carlo you know about – it is just a machine for
generating random numbers having a particular
distribution.
• The Markov Chain bit means that you have a loop as
follows:
Rules
Random number
Starting
value
• If you set the rules up correctly, no matter what the
starting value, the random numbers converge to the
desired distribution.
NASSP Masters 5003F - Computational Astronomy - 2009
How to analyse the source catalog.
• Things we want to know:
– Reliability:
• The number of false positives.
– Completeness:
• The fraction of real sources we are finding. People
say things such as “the survey is essentially
complete at a flux greater than so-and-so.”
– Sensitivity:
• Broadly speaking this is the flux at which we are
only detecting 50% of the sources.
• These are often not very exactly defined
terms.
NASSP Masters 5003F - Computational Astronomy - 2009
A source-detection Monte Carlo:
•
Things to note
from this plot:
1. Fainter sources
become more
numerous…
2. …until a cutoff
value of S.
3. Measured ^S is
scattered about
true S.
4. The ^S distribution
is biased at low S.
NASSP Masters 5003F - Computational Astronomy - 2009
Log N – log S.
Input fluxes
• In a ‘Euclidean
universe’,
• Therefore
• But Olbers’ paradox
says there must be a
cutoff.
– This is observed in
several actual surveys.
=> large-scale structure.
NASSP Masters 5003F - Computational Astronomy - 2009
Eddington bias
Red: ‘measurable’ fluxes
• Happens because
measured flux ^S is
random – it is
scattered about the
true value.
• The result is a
‘blurring’ of the ‘true’
logN-logS.
• Because usually n(S)
has a negative slope,
this blurring inflates
the number of
sources.
NASSP Masters 5003F - Computational Astronomy - 2009
How does our catalog shape up?
Blue: ‘true’ detections
• The really interesting
things in the logNlogS curve always
seem to be
happening just at our
sensitivity limit. 2
things to do:
– Persuade ESA/NASA
etc to spend $$$$ on a
bigger and better
telescope;
• What do you mean,
“don’t be ridiculous?”
– Ok then, let’s lower the
Pcutoff.
• But…
NASSP Masters 5003F - Computational Astronomy - 2009
…Don’t forget the falsies.
Cyan: false detections.
• Next episode:
Confusion, dynamic
range.
NASSP Masters 5003F - Computational Astronomy - 2009
From R L White et al (1997).
NASSP Masters 5003F - Computational Astronomy - 2009
From M Tajer et al (2004).
NASSP Masters 5003F - Computational Astronomy - 2009