Priors - Statistics
Download
Report
Transcript Priors - Statistics
Priors
Trevor Sweeting
Department of Statistical Science
University College London
RSC 2003
1
Structure of talk
Bayesian inference: the basics
Specification of the prior
Examples
Subjective priors
Nonsubjective priors
Examples
Methods of prior construction
Coverage probability bias
Relative entropy loss
Wrap-up
RSC 2003
2
Bayesian inference: the basics
X – the experimental or observational data to be
observed
Y – the future observations to be predicted
Data model
(Possibly improper) prior distribution
The posterior density of q is
Posterior density Prior density x Likelihood function
posterior probabilities, moments, marginal densities,
expected losses, predictive densities ...
RSC 2003
3
Bayesian inference
The predictive density of Y given X is
Where are we?...
Philosophical basis
Practical implementation
Prior construction ...
RSC 2003
4
Specification of the prior
Approaches vary
from fully Bayesian analyses based on fully elicited
subjective priors
to fully frequentist analyses based on nonsubjective
(‘objective’) priors
Fully
Bayesian
Subjective
Elicited prior
Mixed
Nonsubjective Default prior
RSC 2003
Fully
Frequentist
Performance
Penalty fn
Dual verification
Performance
5
Examples
Four examples
All taken from Applied Statistics, 52 (2003)
Competing risks
Image analysis
Diagnostic testing
Geostatistical modelling
RSC 2003
6
Competing risks (Basu and Sen)
System failure data; cause of failure not identified
n systems, R competing risks
Datum for each system is (T, S, C)
T is failure time, S are the possible causes of failure, C is a
censoring indicator
Parameters in the model are of location & scale type
Use (i) informative conjugate priors
Source: historical data
or (ii) ‘noninformative’ priors
Such that they have a ‘minimal effect’ on the analysis
Implementation: via Gibbs sampling
RSC 2003
7
Image analysis (Dryden, Scarr
and Taylor)
Segmentation of weed and crop textures
Automatic identification of weeds in images of row crops
Parameters are (k, C, f)
k is the number of texture components, C are texture labels,
f are parameters associated with the distribution of pixel
intensities
Highly structured prior for (k, C, f)
Markov random field for C, truncated conjugate priors for f
Hyperparameters set in context e.g. to ‘encourage relatively
few textures’
Implementation: via Markov chain Monte Carlo
RSC 2003
8
Diagnostic testing (Georgiadie,
Johnson, Gardner and Singh)
Multiple-test screening data models are unidentifiable
A Bayesian analysis therefore depends critically on prior
information
Parameters consist of various (at least 8) joint
sensitivity and specificity probabilities
Independent beta priors; two informative, the rest
noninformative
Investigate coverage performance and sensitivities
for various choices of prior
Implementation: via Gibbs sampling
RSC 2003
9
Geostatistical modelling
(Kammann and Wand)
Geostatistical mapping to study geographical variability
of reproductive health outcomes (disease mapping)
Geoadditive models
Universal kriging model involves a stationary zeromean stochastic process over sites
leads to ‘borrowing strength’
Non-Bayesian analysis, but model could be formulated
in a Bayesian way, with the mean responses at the given
sites having a multivariate normal prior
Implementation: residual ML and splines
RSC 2003
10
Table for examples
Fully
Bayesian
Subjective
Fully
Frequentist
Image
analysis
Competing risks Diagnostic
testing
Geostatistical
modelling
Nonsubjective
RSC 2003
11
Subjective priors
To some extent, all the previous examples included
subjective prior specification
Methods of elicitation
Industrial and medical contexts
Scientific reporting
Range of prior specifications; conduct sensitivity analyses
RSC 2003
12
Subjective priors
Psychological research: should take account of when
devising methods for prior elicitation
Construction of questions
Anchors
Probability assessment by frequency
Availability; inverse expertise effect
Priors are often ‘too narrow’
Experimental Psychology, Behavioural Decision Making,
Management Science, Cognitive Psychology
RSC 2003
13
Nonsubjective priors
Nonsubjective (‘objective’) priors: why?
Sensible default priors for non-experts (and experts!)
Recognise basis often weak
Possible nasty surprises!
Reference priors for regulatory bodies
Clinical trials, industrial standards, official statistics
Safe default priors for high-dimensional problems
Priors more difficult to specify and possibly more severe effect
RSC 2003
14
Nonsubjective priors
Some general problems
Improper priors
Improper posteriors
E.g. Hierarchical models
Marginalisation and sampling theory paradoxes
Dutch books
Inconsistency
Posterior doesn’t concentrate around true value asymptotically
Inadmissibility
of Bayes decision rules/estimators
RSC 2003
15
Nonsubjective priors
Proper ‘diffuse’ priors
Near-impropriety of posterior
Unintended large impact on posterior
Example to follow ...
Arbitrary choice of hyperparameters
Non-objectivity
Lack of invariance
Egg on face ...
Two examples ...
RSC 2003
16
WinBUGS - the Movie!
( f is the precision)
Data: 529.0, 530.0, 532.0, 533.1, 533.4, 533.6, 533.7,
534.1, 534.8, 535.3
Prior parameters: a = b = c = 0.001
Relatively diffuse prior
Results ...
RSC 2003
17
WinBUGS - the Movie!
Just another few iterations to make sure ...
RSC 2003
18
WinBUGS - the Movie!
Oops!
RSC 2003
19
WinBUGS - the Movie!
Effect of choice of c (the prior precision of m)
c = 0.001 WinBUGS eventually gets the ‘right’ answer
but presumably not the answer we wanted!
Marginal posterior density of m
1.40
Prior precision = 0.001
1.20
density
1.00
0.80
0.60
0.40
0.20
-300
0.00
-100
100
300
500
700
900
1100
m
RSC 2003
The ‘noninformative’ prior dominates the likelihood.
20
WinBUGS - the Movie!
c = 0.0002 WinBUGS gives the ‘right’ answer with the likelihood
dominating
However, it's the ‘wrong’ answer as the true marginal posterior of m
is still dominated by the prior
density
Marginal posterior density of m
-300
1.60
1.40
1.20
1.00
0.80
0.60
0.40
0.20
0.00
-100
Prior precision = 0.0002
100
300
500
700
900
1100
m
RSC 2003
21
WinBUGS - the Movie!
c = 0.00016 WinBUGS again gives the ‘right’ answer with the
likelihood dominating
But it's still the ‘wrong’ answer
density
Marginal posterior density of m
-300
8.00
7.00
6.00
5.00
4.00
3.00
2.00
1.00
0.00
-100
Prior precision = 0.00016
100
300
500
700
900
1100
m
The true marginal posterior distribution of m is bimodal
RSC 2003
22
WinBUGS - the Movie!
c = 0.00010 WinBUGS gives the right answer
... and presumably the one we wanted!
density
Marginal posterior density of m
-300
1.80
1.60
1.40
1.20
1.00
0.80
0.60
0.40
0.20
0.00
-100
Prior precision = 0.0001
100
300
500
700
900
1100
m
Care needed in the choice of prior parameters in
diffuse but proper priors
RSC 2003
23
Normal regression
( f is the precision)
Conjugate prior:
Limit as
is
Jeffreys' prior
Here
gives exact matching in both posterior
and predictive distributions
RSC 2003
24
Normal regression
Data: n = 25, R = residual sum of squares = 2.1
1.
2.
RSC 2003
25
Normal regression
Prediction. Let Y be a future observation and let
denote the ‘usual’ predictive pivotal quantity. Then
1.
2.
Prediction less sensitive to prior than estimation
RSC 2003
26
Methods of prior construction
Limits of proper priors
Uniform priors/choice of scale
Data-translated likelihood
Constant asymptotic precision
Canonical parameterisation
Coverage Probability Bias
Decision-theoretic
RSC 2003
27
Coverage probability bias
Sometimes investigated in papers via simulation (cf.
the diagnostic testing example)
Parametric CPB
When do Bayesian credible intervals have the correct
frequentist coverage?
In regular one-parameter problems, ‘matching’ is asymptotically
achieved by Jeffreys' prior (Welch and Peers, 1963)
In multiparameter families cannot in general achieve
matching for all marginals using the same prior
Usually contravenes the likelihood principle (see Sweeting,
2001 for a discussion)
Avoid infinite confidence sets! (e.g. ratios of parameters)
RSC 2003
28
Coverage probability bias
Predictive CPB
When do Bayesian predictive intervals have the correct
frequentist coverage?
In regular one-parameter problems, there exists a unique
prior for which there's no asymptotic CPB ...
... but in general this depends on the probability level a!
If there does exist a matching prior that is free from a then it is
Jeffreys' prior (Datta, Mukerjee, Ghosh and Sweeting, 2000)
In the multiparameter case, if there exists a matching prior
then it is usually not Jeffreys' prior
RSC 2003
29
Relative entropy loss
The ‘reference prior’ (Bernardo, 1979) maximises the
Shannon mutual information between q and X
Maximises the ‘distance’ between the prior and posterior;
minimal effect of the prior
Also arises as an asymptotically minimax solution
under relative entropy loss (Clarke and Barron, 1994,
Barron, 1998)
RSC 2003
30
Relative entropy loss
Define the prior-predictive regret
Minimax/reference prior solution for the full parameter is
usually Jeffreys' prior
Bernardo argues that when nuisance parameters are
present the reference prior should depend on which
parameter(s) are considered to be of primary interest
RSC 2003
31
Relative entropy loss
A predictive relative entropy approach
Geisser (1979) suggested a predictive information
criterion introduced by Aitchison (1975)
Standard argument for using log q(Y) as an
operational/default utility function for q as a
predictive density for a future observation Y (c.f.
Good, 1968)
RSC 2003
32
Relative entropy loss
Define
is the expected regret under the loss function
associated with using the predictive density
when Y arises from
Appropriate object to study for constructing objective prior
distributions when we are interested in predictive performance
of p under repeated use or under alternative subjective
priors t
RSC 2003
33
Relative entropy loss
Now define the predictive relative entropy loss
(PREL)
where J is Jeffreys’ prior
Studying the behaviour of the regret
over t in
sets of constant 'predictive information' is equivalent to
studying the behaviour of the PREL
RSC 2003
34
Relative entropy loss
RSC 2003
35
Relative entropy loss
Under suitable regularity conditions we get
Although the defined loss functions cover an infinite
variety of possibilities for (a) amount of data to be
observed and (b) predictions to be made, they are all
approximately equivalent to
provided that a
sufficient amount of data is to be observed.
Call
RSC 2003
the (asymptotic) predictive loss
36
Relative entropy loss
More generally define
represents the asymptotically worst-case loss
Investigate its behaviour
Let
The prior
RSC 2003
is minimax if
37
Relative entropy loss
Example 1.
Consider the class of improper priors
These all deliver constant risk, with
All the priors with c nonzero are therefore inadmissible
Jeffreys' prior (c = 0) is minimax
RSC 2003
38
Relative entropy loss
Example 2.
Consider the class of improper priors
These all deliver constant risk, with
L attains its minimum value when a = 1, which corresponds to
Jeffreys' independence prior
The minimum value -½ < 0 so that Jeffreys' prior is
inadmissible
RSC 2003
39
Relative entropy loss
Example 3.
Consider again the class of improper priors
These all deliver constant risk, with
L attains its minimum value when a = 1, which again
corresponds to Jeffreys' independence prior
The drop in predictive loss increases as the square of the
number q of regressors in the model
RSC 2003
40
Relative entropy loss
The above predictive minimax priors also give rise to
minimum predictive coverage probability bias
(Datta, Mukerjee, Ghosh and Sweeting, 2000)
Final note: an inappropriately elicited subjective prior
may lead to very high predictive risk!
RSC 2003
41
Wrap-up
We have reviewed some common approaches to prior
construction, from full elicitation to using default recipes
Need to be aware of dangers, whatever the approach
As model complexity increases it becomes more
difficult to make sensible prior assignments. At the
same time, the effect of the prior specification can
become more pronounced
Important to have a sound methodology for the construction
of priors in the multiparameter case
Data-dependent priors may be justifiable (e.g. Box-Cox
transformation model)
RSC 2003
42
Wrap-up
More extensive analysis of the predictive risk
approach needed
Developing general methods of finding exact and approximate
solutions for practical implementation
Investigating connections with predictive coverage probability
bias
Analysing dependent and non-regular problems
Investigating problems involving mixed
subjective/nonsubjective priors
Priors for model choice or model averaging ...
... another talk!
RSC 2003
43
Wrap-up
And finally
Have a great conference!
RSC 2003
44