You may believe you are a Bayesian But you are probably wrong

Download Report

Transcript You may believe you are a Bayesian But you are probably wrong

You may believe you are a Bayesian
But you are probably wrong
Stephen Senn
(c) Stephen Senn 2010
1
Outline
• The four systems of statistical inference
– An example of where it is good to be Bayesian
– Fisher’s argument against the Neyman-Pearson approach
• Examples of experts applying ‘the Bayesian’ approach
– Adrian Smith and colleagues 1987
– Lindley, 1993
– Howson and Urbach, 1989
• Some theoretical reasons for hesitation
• Conclusion
– Why I shall (probably) still be using mongrel statistics after this
conference
(c) Stephen Senn 2010
2
Warning
• This talk should not be taken as an attack
on the subjective Bayesian approach to
statistical inference
• I do not claim it is a bad approach
• I do claim it can be very difficult and
perhaps dangerous to rely on it as the only
approach
(c) Stephen Senn 2010
3
Four systems (Barnard)
•
•
•
•
Fisherian
Neyman-Pearson
Jeffreys
Bayesian (Ramsey-De Finetti-Savage)
George Barnard’s advice was to be familiar with all four
(c) Stephen Senn 2010
4
A two dimensional view of the four systems
Inverse
probability
Jeffreys
Use of semiobjective prior
distributions
to produce
inverse
probabilities
De Finetti
Use of subjective
expectation via
utility
Fiducial
inference
Fisher
Likelihood
Significance
tests
Direct
Probability
Inferences
(c) Stephen Senn 2010
Pearson
Neyman
Decisions
5
TGN1412
• A monoclonal antibody
• First-in-man study on 13 March 2006
carried out by Parexel on behalf of
TeGenero
• In first cohort 8 volunteers
• Six allocated TGN1412 and two allocated
placebo
• All six given TGN1412 suffered a cytokine
storm
(c) Stephen Senn 2010
6
See.
22.
Senn SJ. Lessons from TGN1412. Applied Clinical Trials 2007;16(6):18-
(c) Stephen Senn 2010
7
A Conventional Analysis
FISHER'S EXACT TEST
Statistic based on the observed 2 by 2 table(x) :
P(X) = Hypergeometric Prob. of the table =
FI(X) = Fisher statistic
=
0.0357
6.095
Asymptotic p-value: (based on Chi-Square distribution with 1 df )
Two-sided:Pr{FI(X) .GE.
6.095} =
0.0136
One-sided:0.5 * Two-sided
=
0.0068
Exact p-value and point probabilities :
Two-sided:Pr{FI(X) .GE.
6.095}= Pr{P(X) .LE.
0.0357}=
Pr{FI(X) .EQ.
6.095}= Pr{P(X) .EQ.
0.0357}=
One-sided:Let y be the value in Row 1 and Column 1
y =6 min(Y) =4 max(Y) =6 mean(Y) =
4.500 std(Y) =
0.5669
Pr { Y .GE. 6 } =
Pr { Y .EQ. 6 } =
(c) Stephen Senn 2010
0.0357
0.0357
0.0357
0.0357
8
A Slightly Less Conventional Analysis
Datafile: C:\Program Files\Numerical\StatXact-4.0.1\Files\Research\TGN1412.cy3
BARNARD'S UNCONDITIONAL TEST FOR DIFFERENCE OF TWO BINOMIAL PROPORTIONS
Statistic based on the observed 2 by 2 table :
Binomial proportion for column <Yes
> : pi_1
Binomial proportion for column <No
> : pi_2
Difference of binomial proportions : Delta = pi_2 - pi_1
Standardized difference of binomial proportions : Delta/Stdev
=
=
=
=
1.000
0.0000
-1.000
-2.828
Results:
------------------------------------------------------------------------Method
P-value(1-sided)
P-value( 2-sided)
------------------------------------------------------------------------Asymp
0.0023
(Left Tail)
0.0047
Exact
0.0111
(Left Tail)
0.0113
(c) Stephen Senn 2010
9
Conclusions
• “If you need statistics to prove it, I don’t
believe it”
• Here the problem is the reverse
• You can’t prove it with statistics but
everybody believes
• So does this mean statistics is irrelevant?
• Not if you look more closely…
(c) Stephen Senn 2010
10
Further information
• Timing of adverse events
• Increasing interest in using this feature in
epidemiological studies
– Case series methodology
• Farrington and Whitaker (2006)
• Also if we use background knowledge of
risk of cytokine storm we come to quite
different conclusions
• But this is to be rather Bayesian
(c) Stephen Senn 2010
11
(c) Stephen Senn 2010
12
Fisher on Neyman-Pearson
‘Their method only leads to definite results when mathematical
postulates are introduced which could only be justified as a result of
extensive experience.’
Fisher to Chester Bliss 6 October 1938 (Published in Bennett, 1990)
What Fisher is pointing to here is that although a null hypothesis may
be more primitive than a test statistic, the same is not true of an
alternative hypothesis.
Thus the alternative hypothesis cannot be made the justification for
choosing the test statistic
(c) Stephen Senn 2010
13
Three Examples Provided by
Expert Bayesians
• Two involve choice of prior distribution
followed by formal Bayesian updating
– Racine-Poon, Grieve, Fluehler and Smith
1987
– Lindley 1993
• One involves an intuitive assertion of the
posterior result, which is claimed to be
Bayesian
– Howson and Urbach
(c) Stephen Senn 2010
14
Racine et al
• This is a fine paper with many examples
as to how the Bayesian approach can be
applied in drug development
• I shall just look at one of these
• The analysis of the Martin and Browning
(1985) Data of metoprolol
– Actually, this paper is not cited by Racine et al
but this is the relevant citation
(c) Stephen Senn 2010
15
Design
Period 2
6 weeks
6 weeks
100 mg once
daily
4 weeks
Run in
Period 1
200 mg
once daily
Randomisation
200 mg
once daily
100 mg once
daily
31 patients aged 65+ with diastolic blood pressure in excess of 100mmHg
randomised to these sequences. DBP measured after 6 weeks and 4-8
hours after last dose.
(c) Stephen Senn 2010
16
Carry-over Problem
• The period 2 values could still be being
influenced by the period 1 treatment
• Hence a comparison of period 1 and period 2
results would provide a biased measurement of
the effect of treatment
• However, if we knew what the magnitude carryover was we could take account of it
• Hence carry-over is a nuisance parameter and a
prime candidate for the Bayesian approach
(c) Stephen Senn 2010
17
Unfortunately
• None of the authors noted that the carryover effect has to last for six weeks
– Nor did any of the discussants whether
Bayesian or frequentist
• However the treatment effect only has to
last for 4-8 hours
• The ratio of one to the other is at least 126
• You cannot use an uninformative prior for
carry-over and be coherent
(c) Stephen Senn 2010
18
Anyone who is not shocked by quantum theory
has not understood a single word.
Niels Bohr
Anyone who is not shocked by the Bayesian theory
of statistical inference has not understood a single
word
Stephen Senn
(c) Stephen Senn 2010
19
A Bayesian Lady Tasting
Wine
Paper by Lindley.
Lindley, D. The Analysis of Experimental Data, Teaching
Statistics, 15, 22-25 (1993)
“The lady is a wine expert, testified by her being a Master (sic) of
Wine, MW. She was given 6 pairs of glasses (not cups). One
member of each pair contained some French claret. The other had
a Californian Cabernet Sauvignon Merlot Blend.”
see also
Lindley, D. A Bayesian lady tasting tea. In Statistics an Appreciation, David
and David (ed) Iowa State University Press (1984).
(c) Stephen Senn 2010
20
Lindley’s Prior for Wine Tasting
3
.
0
2
.
5
2
.
0
1
.
5
PriopbaltydensiofP
1
.
0
0
.
5
0
.
0
0
.
5 0
.
6 0
.
7 0
.
8 0
.
9 1
.
0
P
r
o
b
a
b
i
l
i
t
y
o
f
s
u
c
c
e
s
s
f
u
l
i
n
d
e
n
t
i
f
i
c
a
t
i
o
n
,
P
(c) Stephen Senn 2010
21
‘At this point I can only speak for myself though I hope many will
agree with me. You may freely disagree and still be sensible.’
Lindley
I do disagree
Either the Lady knows something about wine or she hasn’t a clue.
If she has, I think that she can repeat the trick of correct
identification with high probability.
If she is a charlatan, there is a small probability that she may have a
fine palate
(c) Stephen Senn 2010
22
The Difference between Mathematical
and Applied Statistics
Mathematical statistics is full of
lemmas whereas applied statistics is
full of dilemmas.
(c) Stephen Senn 2010
23
Senn’s Prior for Wine Tasting
4
3
Priopbaltydensi
2
1
0
0
.
0 0
.
2 0
.
4 0
.
6 0
.
8 1
.
0
P
r
o
b
a
b
i
l
i
t
y
o
f
s
u
c
c
e
s
s
f
u
l
i
d
e
n
t
i
f
i
c
a
t
i
o
n
(c) Stephen Senn 2010
24
Place Your Bets
• Imagine the lady has to distinguish
between 20 pairs of glasses.
• You are given £100,000 to place at evens
either for or against the following
• The lady will choose correctly in 12, 13,14,
15 or 16 pairs.
• How do you choose?
(c) Stephen Senn 2010
25
An Example of Howson and
Urbach’s
• Consider example of die rolled 600 times
• Results are
– 100, sixes, fives, fours and threes
– 123 twos
– 77 ones
• Pearson-Fisher chi-square statistic is
10.58
(c) Stephen Senn 2010
26
Howson and Urbach’s
Conclusion
...one is, therefore, under no obligation to reject
the null hypothesis, even though that hypothesis
has pretty clearly got it badly wrong, in particular,
in regard to the outcomes two and one” (p136, my
italics).
From the second edition
(c) Stephen Senn 2010
27
R
e
s
u
l
t
s
o
f
r
o
l
l
i
n
g
t
w
o
d
i
f
f
e
r
e
n
t
d
i
c
e
6
0
0
t
i
m
e
s
C
h
i
s
q
u
a
r
e
:
P
F
=
1
0
.
5
8
L
R
=
1
0
.
6
8
H
o
w
s
o
n
a
n
d
U
r
b
a
c
h
1
0
0
5
0
L
R
=
2
0
8
0
A
l
t
e
r
n
a
t
i
v
e
D
i
e
C
h
i
s
q
u
a
r
e
:
P
F
=
1
0
.
8
8
L
R
=
1
0
.
9
1
1
0
0
5
0
0
1
2
3
4
5
6
L
R
=
2
3
4
S
c
o
r
e

c
r
i
t









(c) Stephen Senn 2010
28
An Analysis Using Good’s
Approach
• Lump of probability on fair die
• Symmetric Dirichlet prior over alternative
• Do not commit yourself to particular value
of k, (Dirichlet parameter)
• Instead plot Bayes factor as function of K
• This is a sort of Type II likelihood
(c) Stephen Senn 2010
29
Bayes factor as function of prior
2
1
8
8
2
.
5
2
.
0
1
.
5
H
o
w
s
o
n
a
n
d
U
r
b
a
c
h
M
o
d
i
f
i
e
d
e
x
a
m
p
l
e
TypeIBsFactor
1
.
0
0
.
5
0
.
0
0 5
01
0
01
5
02
0
02
5
0
k
Parameter of symmetric Dirichlet
(c) Stephen Senn 2010
30
Conclusion
• If you had witnessed the die being rolled you would not
necessarily conclude it was unfair
• If you were asked to decide whether these were results
from a real die or one some philosophers had written
down in a book you might decide on the latter
• This is because the Dirichlet distribution could not model
your prior distribution
• It is somewhat unfair of H&U to claim that the frequentist
approach has pretty clearly got it badly wrong
• I think that they would have great difficulty honestly
specifying a prior distribution that allowed them to ‘get it
right’ for this example and not look foolish for others
(c) Stephen Senn 2010
31
Am I being unfair?
• Yes, if my aim is to claim that Bayesian
methods are particularly bad
– They are not
– We all make errors in our search for errors
and I am no exception
• No, if my aim is to counter the claim that
Bayesian statistics is uniquely good
• In particular if the argument is that the only
requirement for inference is coherence
(c) Stephen Senn 2010
32
Perfection and Goodness
• The DeFinetti theory is a theory of how to
remain perfect
• You have a prior probability of all possible
sequences of events
• As events unfold you strike out the
sequences that did not occur and
renormalise
• This is not, however a theory of how to
become good
(c) Stephen Senn 2010
33
If You are not Already a Bayesian
You have a collection of priors which do not form a coherent
set.
You can only become Bayesian by trashing some of the
priors until those that are left are coherent.
But if this is a legitimate thing to do, it seems to me that it
must remain a legitimate thing to do in the future. This is
then a license to continue not being Bayesian.
This then means that the Dutch book argument loses much of
its force.
(c) Stephen Senn 2010
34
The Date of Information
Problem
Statistician: Here is the result of the analysis of the trial you
asked me to look at. I have added the likelihood to your prior.
This is the posterior distribution.
Physician: Excellent! Now could you please take the results
of the previous trials and do a meta-analysis?
Statistician. (after a pause) There is no need. The result I gave
you is the meta-analysis. The previous trials are in your prior.
Physician. (after a pause). If the previous trials are in my prior,
they got into my prior without your help at all. Why did I need
you to help with producing the posterior?
(c) Stephen Senn 2010
35
The Bayesian Meta-Analyst’s Dilemma
In general Pn-1 + Dn  Pn
Step 1: P0 + D1  P1,
Step 2: P1 + D2 P2 ,
Or equivalently, P0 + D1 + D2  P2
But suppose P0 already includes D1 then this analysis would
be illegitimate (like analysing 50 values using a chi-square
on the percentages).
(c) Stephen Senn 2010
36
The Dilemma Continued
So use step 2 only. But suppose that P1 does not include D1.
This would be equivalent to analysing a contingency table of 200
observations using a chi-square on the percentages.
Then the principle of total information has been violated.
(Note, however that according to David Miller the principle of
total information seems to be an independent principle which
cannot be derived from maximising expected posterior utility
except by imposing very artificial additional conditions.)
(c) Stephen Senn 2010
37
The Two Faces of (subjective) Bayes
•
•
•
•
Theory
Elegant development
based on coherence
Claim that it is the only
way to behave
Claim to integrate all
sources of information
Requires (in my view)
perfect temporal
coherence
(c) Stephen Senn 2010
•
•
•
•
Practice
A rag-bag of
computational tools
Use of Bayes theorem
but not therefore coherent
Often surprisingly poor
treatment of prior
distributions
Back to the drawing
board allowed
38
My conclusion
• It is highly doubtful that the strong claims for
Bayesian theory are a justification for Bayesian
practice
• This does not mean that Bayesian statistics as
practised is not useful
– The applied statistician needs a method that is useful
in practice and not just in theory
• I remain sceptical of its claims to be the only
useful statistical approach not least because
admitting this to be true would still leave you
sorely puzzled to do in practice
(c) Stephen Senn 2010
39
The Robot Turtle in the Corner
• When the robot gets stuck the scientist
gets up and gives it a kick
• The robot does not know it is stuck
• To avoid being stuck in the inferential
corner it is useful for us to have different
ways of making inferences
• Where they disagree there is a warning
that it is time to do some creative thinking
(c) Stephen Senn 2010
40
Where this leaves me
• Bayesian approach is excellent when you have to make
decisions
• If you are going to uses frequentist approaches to
decision making you may need to use stopping rules
• However, stopping rule adjustments are not a good way
to summarise evidence
• And the same is true of Bayesian analyses
• I like randomisation but don’t make a fetish of it
• I like the likelihood principle but don’t make a fetish of it
• No (current) single approach to statistical inference
seems to fit my needs as a jobbing statistician
• I like (to the limits of my lesser ability) following George
Barnard’s advice of being prepared to consider four
(c) Stephen Senn 2010
41
Finally
Frequentists think that it is the
thought that counts whereas
Bayesians count the thoughts.
(c) Stephen Senn 2010
42