Pierce Reference Set Talk.

Download Report

Transcript Pierce Reference Set Talk.

Effect of the Reference Set
on Frequency Inference
Donald A. Pierce
Radiation Effects Research
Foundation, Japan
Ruggero Bellio
Udine University, Italy
Paper, this talk, other things at
http://home.att.ne.jp/apple/pierce/
Frequency inferences depend to first order only on the
likelihood function, and to higher order on other aspects of
the probability model or reference set
That is, “other aspects” not affecting the
likelihood function: e.g. censoring models,
stopping rules
Here we study to what extent, and in what
manner, second-order inferences depend on the
reference set
Various reasons for interest in this. e.g. :
Foundational: to what extent do frequency
inferences violate the Likelihood Principle?
Unattractiveness of specifying censoring
models
Practical effects of stopping rules
2
Example: Sequential Clinical Trials
Patients arrive and are randomized to treatments, outcomes
y1 , y2 ,
Stop at n patients based on outcomes to that point.
Then the data has probability model p( y, n; ) and the
likelihood function L( ; y, n) is this as a function of  ,
defined only up to proportionality.
The likelihood function does not depend on the stopping
rule, including that with fixed n .
First-order inference based only on the likelihood
function does not depend on the stopping rule, but
higher-order inference does depend on this.
How does inference allowing for the stopping rule differ
from that for fixed sample size?
3
Example: Censored Survival Data
Patients arrive and are given treatments. Outcome is
response time, and when it is time for analysis some
patients have yet to respond
The likelihood function based on data (t1 , f1 ),
,(tn , f n ) is
L( ; t )  i 1 pi (ti ; ) fi pri (Ti  ti )1 fi
n
but the full probability model involves matters such as
the probability distribution of patient arrival times
This involves what is called the censoring model.
First-order inferences depend on only the likelihood
function and not on the censoring model.
In what way (how much) do higher-order inferences
depend on the censoring model? It is unattractive that
they should depend at all on this.
4
Typical second-order effects
Binomial regression: test for trend with 15 observations,
estimate towards the boundary, P-values that should be 5%
First-order: 7.3% , 0.8%
Second-order 5.3% , 4.7%
Sequential experiment: underlying N (  ,1) data, stop when
n y  c n : c1  3, c59  2 or n  60
Testing   0.25 , and when stopping at following n
n:
10
20
30
Pexact : 5.6% 12.5% 16.9%
Pfirst : 2.1% 6.1% 9.1%
Psecond : 5.3% 11.6% 15.8%
Generally, in settings with substantial numbers of
nuisance parameters, and even for large samples,
adjustments may be much larger than this --- or
they may not be
5
Some general notation and concepts
Model for data p( y; ) , parametric function of interest    ( )
MLE ˆ , constrained MLE ˆ , profile likelihood LP ( ; y )  L(ˆ ; y )
Starting point: signed LR statistic, first order N(0,1)


1/ 2
r ( y )  sgn(ˆ  )  2 l (ˆ; y )  l (ˆ ; y ) 
so to first order P-value   r ( yobs )
To second order, modern likelihood asymptotics yield that
pr r ( y)  r ( yobs )   r ( yobs )  ADJ ( yobs )
Only the adjustment ADJ depends on the reference
set, so this is what we aim to study
6
Even to higher order, ideal inference should be based on the
distribution of r ( y ) . Modifications of this pertain to its
distribution, not to its inferential relevance
Computing a P-value for testing an hypotheses on
requires only an ordering of datasets for evidence
against the hypothesis    ( )
Consider integrated likelihoods of form
LB ( ; y )   L( , ; y ) ( |  ) d
where  ( | ) is any smooth prior on the
nuisance parameter LB ( ; y)
Then, regardless of the prior, the signed LR
statistic based on LB ( ; y) provides to second
order the same ordering of datasets, for
evidence against an hypothesis, as does r ( y )
7
Now return to the main point of higher-order likelihood
asymptotics, namely
pr r ( y)  r ( yobs )   r ( yobs )  ADJ ( yobs )
The theory for this is due to Barndorff-Nielsen (Bmtrka,
1986) and he refers to r ( yobs )  ADJ ( yobs ) as r *
Thinking of the data as (ˆ, a) , ADJ depends on
notorious sample space derivatives
 2l ( ;ˆ, a) / ˆ , l (ˆ ;ˆ, a)  l (ˆ;ˆ, a) / ˆ


Very difficult to compute, but Skovgaard (Bernoulli, 1996)
showed they can be approximated to second order as


cov l (ˆ )  l (ˆ), l (ˆ )
cov l (ˆ), l (ˆ ) iˆ 1 ˆj


iˆ 1 ˆj
8
It turns out that each of these approximations has a
leading term depending on only the likelihood function,
with a next term of one order smaller depending on the
reference set
For example,







cov l (ˆ), l (ˆ ) iˆ 1 ˆj  cov l (ˆ), l (ˆ)  (ˆ  ˆ) cov l (ˆ), l (ˆ)  iˆ 1 ˆj
 ˆj  (ˆ  ˆ) cov l (ˆ), l (ˆ) iˆ 1 ˆj



Thus we need the quantity cov l (ˆ), l (ˆ) to only
first order to obtain second-order final results
A similar expansion gives the same result for the
other sample-space derivative
9
This provides our first main result: If within some class of
reference sets (models) we can write, without regard to the
reference set,
l ( ; y )   li ( ; y )
where the li are stochastically independent, then
second-order inference is the same for all of the
reference sets
The reason is that when the contributions are independent,
the value of
cov l (ˆ), l (ˆ)


must agree to first order with the empirical mean of the
contributions li (ˆ)li (ˆ), and this mean does not depend on
the reference set
Thus, in this “independence” case, second-order
inference, although not determined by the likelihood
function, is determined by the contributions to it
10
A main application of this pertains to censoring models, if
censoring and response times for individuals are
stochastically independent
Then the usual contributions to the likelihood,
namely
li  log  pi (ti ; ) fi pri (Ti  ti )1 fi 
do not depend on the censoring model, and are
stochastically independent
So to second order, frequency inference is the same for
any censoring model --- even though some higher-order
adjustment should be made
Probably should either assume some convenient
censoring model, or approximate the covariances from
the empirical covariances of contributions to the
loglikelihood
11
Things are quite different for comparing sequential and fixed
sample size experiments --- usually cannot have “contributions”
that are independent in both reference sets
But first we need to consider under what conditions secondorder likelihood asymptotics applies to sequential settings
We argue in our paper that it does whenever usual firstorder asymptotics applies
These conditions are given by Anscombe’s Theorem: A
statistic asymptotically standard normal for fixed n remains
so when: (a) the CV of n approaches zero, and (b) the
statistic is asymptotically suitably continuous. Discrete n in
itself does not invalidate (b)
12
In the key relation
pr r ( y)  r ( yobs )   r ( yobs )  ADJ ( yobs )
need to consider, following Pierce & Peters (JRSSB 1992),
the decomposition
ADJ ( y )  NP ( y )  INF ( y )
Related to Barndorff-Nielsen’s modified profile likelihood
LMP ( ; y )  M ( y ) LP ( ; y )
by
NP   log( M ) / r
NP pertains to effect of fitting nuisance parameters, and
INF pertains to moving from likelihood to frequency
inference --- INF is small when adj information is large
13
When  and  are chosen as orthogonal, we have that to
second order
M ( y ) | j ( ,ˆ ) |1/ 2
depending only on the likelihood function
Parameters orthogonal for fixed-size experiments remain
orthogonal for any stopping rule, since (for underlying i. i. d.
observations) we have from the Wald Identity that
in ( )  E (n) i1 ( )
Thus, in sequential experiments the NP adjustment and MPL
do not depend on the stopping rule, but the INF adjustment
does
Except for Gaussian experiments with regression
parameter  , there is an INF adjustment both for
fixed n and sequential, but they are different
14
SUMMARY
When there are contributions to the likelihood that are
independent under each of two reference sets, then
second-order ideal frequency inference is the same for
these.
In sequential settings we need to consider the nuisance
parameter and information adjustments. To second order,
the former and the modified profile likelihood do not
depend on the stopping rule, but the latter does.
This is all as one might hope, or expect. Inference should
not, for example, depend on the censoring model but it
should depend on the stopping rule
15
Appendix: Basis for higher-order likelihood asymptotics
p (ˆ | a; ) ˆ
ˆ
p ( | a; ) 
p ( | a;ˆ)
p (ˆ | a;ˆ)
p (ˆ, a; ) ˆ

p ( | a;ˆ)
p (ˆ, a;ˆ)
L( ; y ) ˆ
p ( | a;ˆ)
L(ˆ; y )
L( ; y ) ˆ ˆ 1/ 2

| j ( ) | O( n 1 )
L(ˆ; y )



Transform from ˆ to {r (ˆ, a ),ˆ }, integrate out ˆ
Provides a second-order approximation to the distribution
of r ( y ). The Jacobian and resultant from the integration
are what comprise ADJ ( y )
16