Transcript Motivation

models with continuous
covariates
with a practical emphasis on fractional
polynomials and applications in clinical
epidemiology
Professor Patrick Royston,
MRC Clinical Trials Unit, London.
Berlin, April 2005.
8/4/2005
1
The problem …
“Quantifying epidemiologic risk factors
using non-parametric regression: model
selection remains the greatest challenge”
Rosenberg PS et al, Statistics in Medicine 2003; 22:3369-3381
Trivial nowadays to fit almost any model
To choose a good model is much harder
8/4/2005
2
Overview
• Context and motivation
• Introduction to fractional polynomials for the
univariate smoothing problem
• Extension to multivariable models
• More on spline models
• Stability analysis
• Stata aspects
• Conclusions
8/4/2005
3
Motivation
• Often have continuous risk factors in epidemiology
and clinical studies – how to model them?
• Linear model may describe a dose-response
relationship badly
‘Linear’ = straight line = 0 + 1 X + … throughout talk
• Using cut-points has several problems
• Splines recommended by some – but are not ideal
Lack a well-defined approach to model selection
‘Black box’
Robustness issues
8/4/2005
4
Problems of cut-points
• Step-function is a poor approximation to true
relationship
Almost always fits data less well than a suitable
continuous function
• ‘Optimal’ cut-points have several difficulties
Biased effect estimates
Inflated P-values
Not reproducible in other studies
8/4/2005
5
Example datasets
1. Epidemiology
• Whitehall 1
17,370 male Civil Servants aged 40-64 years
Measurements include: age, cigarette smoking,
BP, cholesterol, height, weight, job grade
Outcomes of interest: coronary heart disease, allcause mortality  logistic regression
Interested in risk as function of covariates
Several continuous covariates
 Some may have no influence in multivariable context
8/4/2005
6
Example datasets
2. Clinical studies
• German breast cancer study group (BMFT-2)
Prognostic factors in primary breast cancer
Age, menopausal status, tumour size, grade, no. of
positive lymph nodes, hormone receptor status
Recurrence-free survival time  Cox regression
686 patients, 299 events
Several continuous covariates
Interested in prognostic model and effect of
individual variables
8/4/2005
7
Example:
Systolic blood pressure vs. age
50
100
150
200
250
300
Whitehall 1: BP vs age
40
45
50
55
60
65
Age, years
8/4/2005
8
(Systolic BP and age – not
linear)
150
Whitehall 1: BP vs age
125
130
135
140
145
95% CI
Linear function
FP1 function
Running line
40
45
50
55
60
65
Age, years
8/4/2005
9
Empirical curve fitting: Aims
• Smoothing
• Visualise relationship of Y with X
• Provide and/or suggest functional form
8/4/2005
10
Some approaches
• ‘Non-parametric’ (local-influence) models
Locally weighted (kernel) fits (e.g. lowess)
Regression splines
Smoothing splines (used in generalized additive models)
• Parametric (non-local influence) models
Polynomials
Non-linear curves
Fractional polynomials
 Intermediate between polynomials and non-linear curves
8/4/2005
11
Local regression models
• Advantages
Flexible – because local!
May reveal ‘true’ curve shape (?)
• Disadvantages
Unstable – because local!
No concise form for models
 Therefore, hard for others to use – publication,compare results with
those from other models
Curves not necessarily smooth
‘Black box’ approach
Many approaches – which one(s) to use?
8/4/2005
12
Polynomial models
• Do not have the disadvantages of local
regression models, but do have others:
• Lack of flexibility (low order)
• Artefacts in fitted curves (high order)
• Cannot have asymptotes
8/4/2005
13
Fractional polynomial models
• Describe for one covariate, X
multiple regression later
• Fractional polynomial of degree m for X with powers
p1, … , pm is given by
FPm(X) = 1 X p + … + m X p
1
m
• Powers p1,…, pm are taken from a special set
{2,  1,  0.5, 0, 0.5, 1, 2, 3}
• Usually m = 1 or m = 2 is sufficient for a good fit
8/4/2005
14
FP1 and FP2 models
• FP1 models are simple power transformations
• 1/X2, 1/X, 1/X, log X, X, X, X2, X3
8 models
• FP2 models are combinations of these
For example 1(1/X) + 2(X2)
28 models
• Note ‘repeated powers’ models
For example 1(1/X) + 2(1/X)log X
8 models
8/4/2005
15
FP1 and FP2 models:
some properties
• Many useful curves
• A variety of features are available:
Monotonic
Can have asymptote
Non-monotonic (single maximum or minimum)
Single turning-point
• Get better fit than with conventional
polynomials, even of higher degree
8/4/2005
16
Examples of FP2 curves
- varying powers
8/4/2005
(-2, 1)
(-2, 2)
(-2, -2)
(-2, -1)
17
- single power, different
coefficients
(-2, 2)
4
Y
2
0
-2
-4
10
8/4/2005
20
30
x
40
50
18
A philosophy of function
selection
• Prefer simple (linear) model
• Use more complex (non-linear) FP1 or FP2
model if indicated by the data
• Contrast to local regression modelling
Already starts with a complex model
8/4/2005
19
Estimation and significance
testing for FP models
• Fit model with each combination of powers
FP1: 8 single powers
FP2: 36 combinations of powers
• Choose model with lowest deviance (MLE)
• Comparing FPm with FP(m  1):
compare deviance difference with 2 on 2 d.f.
one d.f. for power, 1 d.f. for regression coefficient
supported by simulations; slightly conservative
8/4/2005
20
Selection of FP function
•
•
•
•
•
•
•
Has flavour of a closed test procedure
Use 2 approximations to get P-values
Define nominal P-value for all tests (often 5%)
Fit linear and best FP1 and FP2 models
Test FP2 vs. null – test of any effect of X (2 on 4 df)
Test FP2 vs linear – test of non-linearity (2 on 3 df)
Test FP2 vs FP1 – test of more complex function
against simpler one (2 on 2 df)
8/4/2005
21
Example: Systolic BP and age
Model
FP2 v Null
FP2 v Linear
FP2 v FP1
d.f.
4
3
2
Deviance
difference
944.57
29.95
3.29
Pvalue
0.000
0.000
0.2
Reminder:
8/4/2005
FP1 had power 3:
1 X3
FP2 had powers (1,1):
1 X + 2 X log X
22
Aside: FP versus spline
• Why care about FPs when splines are more
flexible?
• More flexible  more unstable
More chance of ‘over-fitting’
• In epidemiology, dose-response relationships
are often simple
• Illustrate by small simulation example
8/4/2005
23
FP versus spline (continued)
•
•
•
•
•
•
•
•
Logarithmic relationships are common in practice
Simulate regression model y = 0 + 1log(X) + error
Error is normally distributed N(0, 2)
Take 0 = 0, 1 = 1; X has lognormal distribution
Vary  = {1, 0.5, 0.25, 0.125}
Fit FP1, FP2 and spline with 2, 4, 6 d.f.
Compute mean square error
Compare with mean square error for true model
8/4/2005
24
FP vs. spline (continued)
2
y
0
-2
-4
-4
-2
y
0
2
4
Sigma = 0.5
4
Sigma = 1
2
4
6
2
4
Sigma = 0.25
Sigma = 0.125
6
2
y
0
-2
-4
-2
y
0
2
4
x
-4
0
2
4
x
8/4/2005
0
x
4
0
6
0
2
4
6
x
25
FP vs. spline (continued)
FP1 and spline with 2 df
2
1
0
-1
-2
-2
-1
0
1
2
Solid: FP1; dashed: spline 2 df
4
6
0
2
4
6
2
4
6
0
2
4
6
1
-2
-1
0
1
0
-1
-2
8/4/2005
0
2
2
2
0
26
FP vs. spline (continued)
2
1
0
-1
-2
-2
-1
0
1
2
FP2 and spline with 4 df
2
3
4
5
0
1
2
3
4
5
1
2
3
4
5
0
1
2
3
4
5
1
-2
-1
0
1
0
-1
-2
8/4/2005
0
2
1
2
0
27
FP vs. spline (continued)
0
.04
.08
.12
FP vs. spline: prediction error
.125
.25
.5
1
sigma
True
Spline 2df
8/4/2005
FP1
Spline 4df
FP2
Spline 6df
28
FP vs. spline (continued)
• In this example, spline usually less accurate
than FP
• FP2 less accurate than FP1 (over-fitting)
• FP1 and FP2 more accurate than splines
• Splines often had non-monotonic fitted curves
Could be medically implausible
• Of course, this is a special example
8/4/2005
29
Multivariable FP (MFP) models
• Assume have k > 1 continuous covariates and
perhaps some categoric or binary covariates
• Allow dropping of non-significant variables
• Wish to find best multivariable FP model for
all X’s
• Impractical to try all combinations of powers
• Require iterative fitting procedure
8/4/2005
30
Fitting multivariable FP models
(MFP algorithm)
• Combine backward elimination of weak
variables with search for best FP functions
• Determine fitting order from linear model
• Apply FP model selection procedure to each X
in turn
fixing functions (but not ’s) for other X’s
• Cycle until FP functions (i.e. powers) and
variables selected do not change
8/4/2005
31
Example: Prognostic factors in
breast cancer
• Aim to develop a prognostic index for risk of
tumour recurrence or death
• Have 7 prognostic factors
4 continuous, 3 categorical
• Select variables and functions using 5%
significance level
8/4/2005
32
Univariate linear analysis
Variable
X1
X2
X3
X4a
X4b
X5
X6
X7
8/4/2005
Name
Age
Menopausal status
Tumour size
Grade 2 or 3
Grade 3
No. of positive lymph nodes
Progesterone receptor status
Oestrogen receptor status
2
0.58
0.28
15.68
19.92
8.19
50.02
34.04
4.70
33
Univariate FP2 analysis
Variable
X1 age
X3 size
X5 nodes
X6 PgR
X7 ER
Powers
(2, 0.5)
(1, 3)
(1, 2)
(0.5, 0)
(2, 1)
2 d.f.
17.61
4
19.81
4
81.36
4
52.73
4
23.07
4
P
0.001
0.001
< 0.001
< 0.001
< 0.001
Gain
17.03
4.13
31.34
18.69
18.37
Gain compares FP2 with linear on 3 d.f.
All factors except for X3 have a non-linear effect
8/4/2005
34
Multivariable FP analysis
Variable
X1 age
X3 size
X5 nodes
X6 PgR
X7 ER
X2 mens.
X4a grad 2/3
X4b grad 3
8/4/2005
FP etc.
(2, 0.5)
Out
(2, 1)
0.5
Out
Out
In
Out
2
19.33
5.31
74.14
32.70
2.15
0.21
4.59
0.15
d.f.
P
4 0.001
4
0.3
4 <0.001
4 <0.001
4
0.7
1
0.6
1
0.03
1
0.7
35
Comments on analysis
• Conventional backwards elimination at 5%
level selects X4a, X5, X6, and X1 is excluded
• FP analysis picks up same variables as
backward elimination, and additionally X1
• Note considerable non-linearity of X1 and X5
• X1 has no linear influence on risk of recurrence
• FP model detects more structure in the data
than the linear model
8/4/2005
36
Plots of fitted FP functions
Breast cancer: Fitted FP functions
Nodes
20
40
-1
-.5
0
.5
Log relative hazard
0
1
2
3
4
Log relative hazard
5
1
Age
60
80
Age, years
0
10
20
30
40
No. of positive lymph nodes
50
-3
-2
-1
0
Log relative hazard
1
Progesterone receptor
0
8/4/2005
500
1000
1500
2000
Progesterone receptor status
2500
37
Survival by risk groups
0.00
0.25
0.50
0.75
1.00
Prognostic classification scheme
0
2
4
Recurrence-free survival, yr
Group = Low risk
Group = High risk
8/4/2005
6
8
Group = Medium risk
38
Robustness of FP functions
• Breast cancer example showed non-robust
functions for nodes – not medically sensible
• Situation can be improved by performing
covariate transformation before FP analysis
• Can be done systematically (work in progress)
• Sauerbrei & Royston (1999) used negative
exponential transformation of nodes
exp(–0.12 * number of nodes)
8/4/2005
39
0
.5
1
1.5
Making the function for lymph
nodes more robust
-.5
Original
Exponential transformation
0
8/4/2005
10
20
30
No. of positive lymph nodes
40
50
40
2nd example: Whitehall 1
MFP analysis
Covariate
Age
Cigarettes
Systolic BP
Total cholesterol
Height
Weight
Job grade
FP etc.
Linear
0.5
-1, -0.5
Linear
Linear
-2, 3
In
No variables were eliminated by the MFP algorithm
Weight is eliminated by linear backward elimination
8/4/2005
41
Plots of FP functions
Whitehall 1: multivariable FP analysis
Cigarettes
50 55 60
Age at entry
65
.5
0
.4
.3
20
40
Cigarettes/day
60
15
40
60
Height
.08 .09
.1
Probability of death
.12 .14 .16 .18
.1
.1
.12
.14
Probability of death
.2
.16
5
10
Cholesterol/ mmol/l
100 150 200 250 300
Systolic BP
Weight
.08
0
50
.11 .12 .13
45
Total cholesterol
Probability of death
.2
.1
.08
40
8/4/2005
Systolic BP
Probability of death
.1
Probability of death
.05
.1
.15
Probability of death
.2
.12 .14 .16 .18
Age
80 100 120 140
Weight/kgs
140
160
180
Height/cms
200
42
A new multivariable regression
algorithm with spline functions
• Inspired by closed test procedure for selecting an FP
function
• Start with predefined number of knots
Determines maximum complexity of function
• Use predetermined knot positions
E.g. at fixed percentile positions of distn. of x
• Simplest function (default) is linear
• Closed test procedure to reduce the knot set if some
knots are not significant
• Apply backfitting procedure as in mfp
• Implemented in Stata as new command mrsnb
8/4/2005
43
Splines: Breast cancer example
• Selects variables similar to mfp
Grade 2/3 omitted, otherwise selected variables
are identical
• Knots: age(46, 53); transformed nodes(linear);
PgR(7, 132)
• Deviance of selected model almost identical to
mfp model
8/4/2005
44
-.5
0
0
1
.5
2
1
3
4
1.5
Plots of fitted FP functions
40
60
80
0
10
20
30
40
No. of positive lymph nodes
50
0
Age, years
-6
-4
-2
Log HR
20
0
500
1000
1500
2000
Progesterone receptor status
2500
Solid lines, FP; dashed lines, spline
8/4/2005
45
Improving the robustness of
spline models
• Often have covariates with positively skew
distributions – can produce curve artefacts
• Simple approach is to log-transform covariates
with a skew distribution – e.g. 1 > 0.5
• Then fit the spline model
• In the breast cancer example, this approach
gives a more satisfactory log function for PgR
8/4/2005
46
Stability of FP models
• Models (variables, FP functions) selected by
statistical criteria – cut-off on P-value
• Approach has several advantages …
• … and also is known to have problems
Omission bias
Selection bias
Unstable – many models may fit equally well
8/4/2005
47
Stability investigation
• Instability may be studied by bootstrap resampling
(sampling with replacement)
Take bootstrap sample B times
Select model by chosen procedure
Count how many times each variable is selected
Summarise inclusion frequencies & their dependencies
Study fitted functions for each covariate
• May lead to choosing several possible models, or a
model different from the original one
8/4/2005
48
Bootstrap stability analysis of
the breast cancer dataset
• 5000 bootstrap samples taken (!)
• MFP algorithm with Cox model applied to
each sample
• Resulted in 1222 different models (!!)
• Nevertheless, could identify stable subset
consisting of 60% of replications
Judged by similarity of functions selected
8/4/2005
49
Bootstrap stability analysis of
the breast cancer dataset
Variable
Model
selected
Age
FP1
FP2
Menopausal status
—
Tumour size
FP1
FP2
Grade 2/3
—
Grade 3
—
Lymph nodes
FP1
Progesterone receptors
FP1
FP2
Oestrogen receptors
FP1
FP2
8/4/2005
% bootstraps
model selected
16
76
20
34
6
58
9
100
95
4
13
6
50
of fitted curves from stable
subset
Log relative hazard
6
1
0
4
-1
2
-2
-3
0
20
30
40
50
60
Age, years
70
80
Log relative hazard
2
25
50
75
Tumour size, mm
100
0
250
PgR, fmol/L
500
1
1
0
0
-1
-1
0
8/4/2005
0
10
20
30
Number of positive lymph nodes
51
Presentation of models for
continuous covariates
• The function + 95% CI gives the whole story
• Functions for important covariates should
always be plotted
• In epidemiology, sometimes useful to give a
more conventional table of results in
categories
• This can be done from the fitted function
8/4/2005
52
Example: Cigarette smoking and
all-cause mortality (Whitehall 1)
Cigarettes per day
Number
OR (model based)
Range
Ref. At risk Dyin Estimate 95% CI
point
g
0 (referent) 0
10103 690
1.00
-1-10
5
2254 243
1.69
1.59, 1.80
11-20
15
3448 494
2.25
2.04, 2.49
21-30
25
1117 185
2.60
2.31, 2.91
31-40
35
283
48
2.86
2.52, 3.24
41-50
45
43
8
3.07
2.68, 3.52
51-60
55
12
2
3.25
2.82, 3.75
8/4/2005
53
Other issues (1)
• Handling continuous confounders
May use a larger P-value for selection e.g. 0.2
Not so concerned about functional form here
• Binary/continuous covariate interactions
Can be modelled using FPs (Royston & Sauerbrei
2004)
Adjust for other factors using MFP
8/4/2005
54
Other issues (2)
• Time-varying effects in survival analysis
Can be modelled using FP functions of time
(Berger; also Sauerbrei & Royston, in progress)
• Checking adequacy of FP functions
May be done by using splines
Fit FP function and see if spline function adds
anything, adjusting for the fitted FP function
8/4/2005
55
Stata aspects
• Command mfp is part of Stata 8
• Example of use:
 mfp stcox x1 x2 x3 x4a x4b x5 x6 x7
hormon, select(0.05, hormon:1)
• Command mrsnb is available from PR
• Example of use:
 mrsnb stcox x1 x2 x3 x4a x4b x5 x6 x7
hormon, select(0.05, hormon:1)
• Command mfpboot is available from PR
Does bootstrap stability analysis of MFP models
8/4/2005
56
Concluding remarks (1)
• FP method in general
No reason (other than convention) why regression models
should include only positive integer powers of covariates
FP is a simple extension of an existing method
Simple to program and simple to explain
Parametric, so can easily get predicted values
FP usually gives better fit than standard polynomials
Cannot do worse, since standard polynomials are included
8/4/2005
57
Concluding remarks (2)
• Multivariable FP modelling
Many applications in general context of multiple
regression modelling
Well-defined procedure based on standard
principles for selecting variables and functions
Aspects of robustness and stability have been
investigated (and methods are available)
Much experience gained so far suggests that
method is very useful in clinical epidemiology
8/4/2005
58
Some references
•
•
•
•
•
•
•
Royston P, Altman DG (1994) Regression using fractional polynomials of
continuous covariates: parsimonious parametric modelling. Applied Statistics 43:
429-467
Royston P, Altman DG (1997) Approximating statistical functions by using
fractional polynomial regression. The Statistician 46: 1-12
Sauerbrei W, Royston P (1999) Building multivariable prognostic and diagnostic
models: transformation of the predictors by using fractional polynomials. JRSS(A)
162: 71-94. Corrigendum JRSS(A) 165: 399-400, 2002
Royston P, Ambler G, Sauerbrei W. (1999) The use of fractional polynomials to
model continuous risk variables in epidemiology. International Journal of
Epidemiology, 28: 964-974.
Royston P, Sauerbrei W (2004). A new approach to modelling interactions between
treatment and continuous covariates in clinical trials by using fractional
polynomials. Statistics in Medicine 23: 2509-2525.
Royston P, Sauerbrei W (2003) Stability of multivariable fractional polynomial
models with selection of variables and transformations: a bootstrap investigation.
Statistics in Medicine 22: 639-659.
Armitage P, Berry G, Matthews JNS (2002) Statistical Methods in Medical
Research. Oxford, Blackwell.
8/4/2005
59