Cox model with additional info on the baseline

Download Report

Transcript Cox model with additional info on the baseline

Empirical Likelihood
Mai Zhou
Dept. of Statistics, University of Kentucky
• Any first year Statistical Inference course
will talk about “likelihood”.
• For independent observations x1 , x2 ,...xn
the likelihood is
•
n
L( )   f ( xi )dxi
i 1
• Notice f(a)dx = P( X=a )
• Three inference methods (tests) based on
likelihood:
• 1. Wald test
• 2. Score test (Rao’s Score test)
• 3. Likelihood ratio test (Wilks)
• Empirical likelihood is a nonparametric
version of 3
• Empirical Likelihood allows the statistician to
employ likelihood methods, without having to
pick a parametric family of distributions for the
data. --- Owen
• Empirical Likelihood allows for hypothesis
testing and confidence region construction
without an information/variance estimator.- me
• Plus many additional nice properties.
• A new (2001) book by A. Owen
“Empirical Likelihood” .
But in the Cox model the (partial) likelihood
ratio exists for a long time (over 20 years).
SAS proc phreg, Splus function coxph( )
all have it computed.
Claim: The (partial) likelihood ratio statistic
for the regression coefficients in the Cox
model can be interpreted as a case of the
Empirical Likelihood Ratio. (Pan 1997)
• For n observations,
• independent, from
likelihood is
• EL(F) =
x1 , x2 ,...xn
F (t )
the empirical
 F ( x )
i
i 1,...,n
Where
F ( xi )  F ( xi )  F ( xi )  PF ( X  xi )
• EL(F) is maximized by the empirical distribution
function:
1
F ( xi ) 
n
Censored Observations
For a right censored observation x
i
•
• The likelihood contribution is
1  F ( xi )
• For a left censored observation the contribution is
F ( xi )
• Interval censored:
F (U i )  F ( Li )
Truncated observations
For a left truncated observation (often referred to as
delayed entry) :
(entry time, survival time) = yi , xi
• The likelihood contribution is
F ( xi )
1  F ( yi )
• If the survival time is right censored, then the
likelihood contribution is
1  F ( xi )
1  F ( yi )
• Empirical Likelihood Theorem:
• If the null hypothesis is true then
 2 log
•
maxFH 0 EL( F )
maxFH 0  H A EL( F )

2
1
if the hypothesis is not true, the quantity above
will be larger.
R = Gnu S/Splus
http://cran.us.r-project.org
+ many add-on packages
A Package for empirical likelihood with
censored/truncated data
Contributed package – emplik (maintained by
Mai Zhou)
It
Does Empirical likelihood ratio tests for mean or
weighted hazard, based on left-truncated, right
censored or left, right, doubly censored data.
Tests hypothesis of the form:
H 0 :  g (t )dF(t )  
H 0 :  g (t )d(t )  
with right, left, doubly censored data. Or
with left-truncated, right censored data.
(t )   log(1  F (t ))
t
dF( s)
(t )  
1

F
(
s

)

• Example: Data taken from Klein & Moeschberger (1997)
Survival Analysis book as reported in their table 1.7
•
•
•
•
•
•
y = left truncation time
= (51, 58, 55, 28, 25, 48, 47, 25, 31, 30, 33, 43, 45, 35, 36)
x = survival times of female psychiatric inpatients
= (52, 59, 57, 50, 57, 59, 61, 61, 62, 67, 68, 69, 69, 65, 76)
d = censoring status
= ( 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1 )
> library(emplik)
> el.ltrc.EM( y, x, d, mu=62)
The mean of the NPMLE is 63.18557.
• (if ‘fun’ is left out, then fun=t, by default).
Two of the outputs are
-2LLR = 0.2740571
Pval = 0.6006231
• Repeat the test for many different values of
the mean. (mu=59, etc. )
• If the hypothesized mean is inside
• [58.78936, 67.81304], the p-value of the test
is larger then 0.05. ----- the 95% confidence
interval for the mean is
• [58.78936, 67.81304]
• For doubly censored data, the standard deviation of
the NPMLE is hard to compute.
• The Wald test/confidence interval is hard to do.
• No problem with empirical likelihood ratio!
• No need to estimate the standard deviation, instead,
we need to maximize EL under some constraint.
• The maximization can be achieved with the help of
modern computing. (E-M algorithm etc.)