Survival analysis, also called event history analysis in social science,
or reliability analysis in engineering, deals with time until occurrence
of an event of interest. However, this failure time may not be observed
within the relevant time period, producing so-called censored observations.
This task view aims at presenting the useful R packages for the analysis
of time to event data.
Please let the
maintainers
know if
something is inaccurate or missing.
Standard Survival Analysis
Estimation of the Survival Distribution
-
Kaplan-Meier:
The
survfit
function from the
survival
package
computes the Kaplan-Meier estimator for truncated and/or censored data.
rms
(replacement of the Design package)
proposes a modified version of the
survfit
function.
The
prodlim
package implements a fast algorithm and some features
not included in
survival.
Various confidence intervals and confidence bands for the Kaplan-Meier estimator
are implemented in the
km.ci
package.
plot.Surv
of package
eha
plots
the Kaplan-Meier estimator.
The
NADA
package includes a function to compute the Kaplan-Meier
estimator for left-censored data.
svykm
in
survey
provides a weighted Kaplan-Meier
estimator.
nested.km
in
NestedCohort
estimates the
survival curve for each level of categorical variables with missing data.
The
kaplan-meier
function in
spatstat
computes the Kaplan-Meier estimator from histogram data.
The
MAMSE
package permits to compute a weighted Kaplan-Meier estimate.
The
KM
function in package
rhosp
plots the survival
function using a variant of the Kaplan-Meier estimator in a hospitalisation
risk context.
The
survPresmooth
package computes presmoothed estimates of the main quantities used
for right-censored data, i.e., survival, hazard and density functions.
The
asbio
package permits to compute the Kaplan-Meier estimator following Pollock et al. (1998).
-
Nonparametric maximum likelihood estimation (NPMLE):
The
Icens
package provides several ways to compute the NPMLE
of the survival distribution for various censoring and truncation
schemes.
MLEcens
can also be used to compute the MLE for interval-censored data.
dblcens
permits to compute the NPMLE of the cumulative distribution
function for left- and right-censored data.
The
icfit
function in package
interval
computes the NPMLE
for interval-censored data.
The
dcens
package simultaneously estimates marginal survival probabilities
in the presence of interval censoring.
The
DTDA
implements several algorithms permitting to analyse possibly doubly
truncated survival data.
The conditional survival function can be estimated nonparametrically for successive
survival times using the
bwsurvival
package.
-
MLE:
The
fitdistrplus
package permits to fit an univariate
distribution by maximum likelihood. Data can be interval censored.
Hazard Estimation
-
The
muhaz
package permits
to estimate the hazard function through kernel methods for right-censored data.
-
The
epi.insthaz
function from
epiR
computes
the instantaneous hazard from the Kaplan-Meier estimator.
-
polspline,
gss
and
logspline
allow
to estimate the hazard function using splines.
-
The
ICE
package aims at estimating the hazard function for interval
censored data.
Testing
-
The
survdiff
function in
survival
compares survival curves using the Fleming-Harrington G-rho family of test.
NADA
implements this class of tests for left-censored
data.
-
clinfun
implements a permutation version of the
logrank test and a version of the logrank that adjusts for
covariates.
-
The
exactRankTests
implements the shift-algorithm by Streitberg and Roehmel for
computing exact conditional p-values and quantiles, possibly for censored data.
-
SurvTest
in the
coin
package implements
the logrank test reformulated as a linear rank test.
-
surv2sample
computes Neyman's smooth, logrank,
Kolmogorov-Smirnov, Cramer-von Mises and Anderson-Darling tests
to compare 2 survival curves.
-
The
maxstat
package performs tests using maximally selected
rank statistics.
-
The
interval
package implements logrank and Wilcoxon type tests
for interval-censored data.
-
Three generalised logrank tests and a score test for interval-censored data
are implemented in the
glrt
package.
-
survcomp
compares 2 hazard ratios.
-
The
TSHRC
implements a two stage procedure for comparing
hazard functions.
-
The
Survgini
package proposes to test the equality of
two survival distributions based on the Gini index.
-
The
cmprskContin
package compares continuous mark-specific relative
risks in two treatment groups, with a view towards vaccine trials.
Regression Modelling
-
Cox model:
The
coxph
function in
the
survival
package fits the Cox model.
cph
in the
rms
package and
the
eha
package propose some extensions to the
coxph
function. The package
coxphf
implements the Firth's penalised maximum likelihood bias reduction
method for the Cox model. An implementation of weighted
estimation in Cox regression can be found in
coxphw.
The
coxrobust
package proposes a robust implementation
of the Cox model. The Cox model can be fitted to
interval-censored data using the
intcox
package.
timecox
in package
timereg
fits Cox models
with possibly time-varying effects. The
mfp
package
permits to fit Cox models with multiple fractional polynomial.
The
NestedCohort
fits Cox models for covariates with
missing data. A Cox model model can be fitted to data from
complex survey design using the
svycoxph
function
in
survey. The
dynsurv
package fits
time-varying coefficient models for interval censored and right
censored survival data using a Bayesian Cox model, a spline based
Cox model or a transformation model.
The
CPHshape
package computes the Cox proportional hazards model
with shape constrained hazard functions.
The
OrdFacReg
package implements the Cox model using an active set algorithm for
dummy variables of ordered factors.
The
cumres
function in
gof
computes goodness-of-fit methods for
the Cox proportional hazards model. The proportionality
assumption can be checked using the
cox.zph
function
in
survival, and also with the
proptest
package. The
CPE
package calculates concordance
probability estimate for the Cox model, as does
the
coxphCPE
function in
clinfun.
The
coxphQuantile
in the latter package draws a
quantile curve of the survival distribution as a function of
covariates. The
multcomp
package computes simultaneous
tests and confidence intervals for the Cox model and other
parametric survival models. The
multtest
package on
Bioconductor proposes a resampling based multiple hypothesis
testing that can be applied to the Cox model. Testing
coefficients of Cox regression models using a Wald test with a
sandwich estimator of variance can be done using
the
saws
package. The
rankhazard
package
permits to plot visualisation of the relative importance of
covariates in a proportional hazards model.
-
Parametric Proportional Hazards Model:
survreg
(from
survival) fits a parametric
proportional hazards model. The
eha
and
mixPHM
packages implement a proportional hazards
model with a parametric baseline hazard. The
pphsm
in
rms
translates an AFT model to a proportional
hazards form. The
polspline
package includes
the
hare
function that fits a hazard regression
model, using splines to model the baseline hazard. Hazards can be,
but not necessarily, proportional. The
flexsurv
package
implements the model of Royston and Parmar (2002). The model uses
natural cubic splines for the baseline survival function, and
proportional hazards, proportional odds or probit functions for
regression.
-
Accelerated Failure Time (AFT) Models:
The
survreg
function in package
survival
can fit an accelerated failure time model.
A modified version of
survreg
is implemented in the
rms
package (
psm
function). It permits to use some of the
rms
functionalities.
The
eha
package also proposes an implementation of the AFT model
(function
aftreg).
An AFT model with an error distribution assumed to be a mixture of G-splines
is implemented in the
smoothSurv
package.
The
NADA
package proposes the front end of the
survreg
function for left-censored data.
A least-square principled implementation of the AFT model can be found in the
lss
package.
The
simexaft
package implements the Simulation-Extrapolation algorithm for the
AFT model, that can be used when covariates are subject to measurement error.
A robust version of the accelerated failure time model can be found in
RobustAFT.
-
Additive Models:
Both
survival
and
timereg
fit the additive hazards model of Aalen in
functions
aareg
and
aalen,
respectively.
timereg
also proposes an implementation
of the Cox-Aalen model (that can also be used to perform the Lin, Wei and
Ying (1994) goodness-of-fit for Cox regression models) and the
partly parametric additive risk model of McKeague and Sasieni.
-
Buckley-James Models:
The
bj
function in
rms
and
BJnoint
in
emplik
compute the
Buckley-James model, though the latter does it without
an intercept term.
-
Other models:
Functions like
survreg
can fit other types
of models depending on the chosen distribution,
e.g.
,
a tobit model.
The
AER
package provides the
tobit
function,
which is a wrapper of
survreg
to fit the tobit model.
An implementation of the tobit model for cross-sectional data and panel data
can be found in the
censReg
package.
The
timereg
package provides implementation of
the proportional odds model and of the proportional excess hazards
model.
The
pseudo
package computes the pseudo-observation for
modelling the survival function based on the Kaplan-Meier estimator
and the restricted mean.
flexsurv
fits parametric time-to-event models, in which
any parametric distribution can be used to model the survival probability, and
where one of the parameters is a linear function of covariates.
The
Icens
function in package
Epi
provides
a multiplicative relative risk and an additive excess risk model
for interval-censored data. The
VGAM
package can fit
vector generalised linear and additive models for censored data.
The
gamlss.cens
package implements the generalised
additive model for location, scale and shape that can be fitted to
censored data. The
nltm
package fits non-linear
transformation models for censored data, which
include,
e.g.
, proportional hazards, proportional odds,
or cure models. The
locfit.censor
function
in
locfit
produces local regression estimates.
The
crq
function included in the
quantreg
package implements a conditional quantile regression model for
censored data. The
JM
package fits shared parameter
models for the joint modelling of a longitudinal response and
event times. The temporal process regression model is implemented
in the
tpr
package. The
TwoWaySurvival
package permits to fit an additive model using two time scales.
Aster models, which combine aspects of
generalized linear models and Cox models, are implemented in
the
aster
and
aster2
packages.
The
concreg
package implements conditional logistic
regression for survival data as an alternative to the Cox model
when hazards are non-proportional.
Multistate Models
-
General Multistate Models:
The
coxph
function from package
survival
can be fitted for any
transition of a multistate model. It can also be used for
comparing two transition hazards, using correspondence between
multistate models and time-dependent covariates. Besides, all the
regression methods presented above can be used for multistate
models as long as they allow for left-truncation.
The
mvna
package provides convenient functions for
estimating and plotting the cumulative transition hazards in any
multistate model, possibly subject to right-censoring and
left-truncation.
changeLOS
permits to estimate and
plot the transition probabilities for any multistate model. It
also estimates the change of length of hospital stay. The
etm
package estimates and plots transition
probabilities for any multistate models. It can also estimate the
variance of the Aalen-Johansen estimator, and handles
left-truncated data. It also implements the technique for
estimating the length of hospital stay included in
changeLOS
in the presence of left-truncation. The
msSurv
package provides nonparametric estimation for
multistate models subject to right-censoring (possibly
state-dependent) and left-truncation. The
mstate
package permits to estimate hazards and probabilities, possibly
depending on covariates, and to obtain prediction probabilities in
the context of competing risks and multistate models. The
msm
package contains functions for fitting general
continuous-time Markov and hidden Markov multistate models to
longitudinal data. Transition rates and output processes can be
modelled in terms of covariates. The
mspath
package,
based on
msm, can fit non-Markov multistate models by
maximum likelihood, using a discrete-time approximation.
Nonparametric estimates in illness-death models and other three
state models can be obtained with package
p3state.msm.
The
Epi
package implements
Lexis objects as a way to represent, manipulate and summarise data
from multistate models. The
TraMineR
package is
intended for analysing state or event sequences that describe life
courses. Also, the
Biograph
package provides various
functions for exploring life histories.
asbio
compute the
expected numbers of individuals in specified age classes or life
stages given survivorship probabilities from a transition matrix.
-
Competing risks:
surv2sample
provides estimation of the cumulative incidence
functions for several causes of failure. The package also performs comparison
in two samples.
The package
cmprsk
also estimates the cumulative
incidence functions, but they can be compared in more than two samples.
The package also implements the Fine and Gray model for regressing the
subdistribution hazard of a competing risk.
crrSC
extends the
cmprsk
package to stratified and clustered data.
The
kmi
package performs a Kaplan-Meier multiple imputation to recover missing
potential censoring information from competing risks events, permitting to use standard
right-censored methods to analyse cumulative incidence functions.
Package
pseudo
computes pseudo observations
for modelling competing risks based on the cumulative incidence functions.
timereg
does flexible regression modelling for competing risks data based
on the on the inverse-probability-censoring-weights and direct binomial regression approach.
riskRegression
implements risk regression for competing risks data,
along with other extensions of existing packages useful for survival analysis and
competing risks data.
The
CompetingRiskFrailty
package estimates the cause-specific hazards
of a competing risks model using frailties and splines.
The
Cprob
package estimates the conditional probability of a competing event, aka.,
the conditional cumulative incidence. It also implements a proportional-odds model using either
the temporal process regression or the pseudo-value approaches.
Packages
survival
(via
survfit) and
prodlim
can also be used to estimate the cumulative incidence function.
The
compeir
package estimates event-specific incidence rates,
rate ratios, event-specific incidence proportions and cumulative incidence functions.
RandomSurvivalForest
provides summary measures for
competing risk such as ensemble CIF.
-
Recurrent event data:
coxph
from the
survival
package can be used to analyse recurrent event
data. The
cph
function of the
rms
package
fits the Anderson-Gill model for recurrent events, model that can
also be fitted with the
frailtypack
package. The latter
also permits to fit joint frailty models for joint modelling of
recurrent events and a terminal event. The
survrec
package proposes implementations of several models for recurrent
events data, such as the Peña-Strawderman-Hollander,
Wang-Chang estimators, and MLE estimation under a Gamma Frailty
model. The Peña-Hollander model can be fitted using the
gcmrec
package. The
condGEE
package
implements the conditional GEE for recurrent event gap times.
survivalBIV
permits to estimate the bivariate
distribution of two gap times.
Relative Survival
-
The
relsurv
package proposes several functions to deal
with relative survival data. For example,
rs.surv
computes a relative
survival curve.
rs.add
fits an additive model and
rsmul
fits the Cox model of Andersen et al. for relative survival, while
rstrans
fits a Cox model in transformed time.
-
The
timereg
package permits to fit relative survival models like
the proportional excess and additive excess models.
Multivariate Survival
Multivariate survival refers to the analysis of unit,
e.g., the survival of twins or a family. To analyse
such data, we can estimate the joint distribution of the
survival times or use frailty models.
-
Joint modelling:
Both
Icens
and
MLEcens
can estimate bivariate
survival data subject to interval censoring.
-
Frailties:
Frailty terms can be added
in
coxph
and
survreg
functions in
package
survival. A mixed-effects Cox model is
implemented in the
coxme
package.
The
two.stage
function in the
timereg
package fits the Clayton-Oakes-Glidden model.
The
parfm
package fits fully parametric frailty models
via maximisation of the marginal likelihood.
The
frailtypack
package fits proportional hazards
models with a shared Gamma frailty to right-censored and/or
left-truncated data using a penalised likelihood on the hazard
function. The package also fits additive and nested frailty models
that can be used for, e.g., meta-analysis and for hierarchically
clustered data (with 2 levels of clustering), respectively. A
proportional hazards model with mixed effects can be fitted using
the
phmm
package. The
lmec
package fits a
linear mixed-effects model for left-censored data. The Cox model
using h-likelihood estimation for the frailty terms can be fitted
using the
frailtyHL
package. The
tlmec
package implements a linear mixed effects model for censored data
with Student-t or normal distributions.
Bayesian Models
-
survBayes
fits through a Bayesian approach a proportional hazards
model for right and interval-censored data.
-
The
bayesSurv
package proposes an implementation of a bivariate
AFT model.
-
The package
BMA
computes a Bayesian model averaging for
Cox proportional hazards models.
-
The
DPsurvint
function in
DPpackage
fits a Bayesian
semiparametric AFT model.
LDDPsurvival
in the same package
fits a Linear Dependent Dirichlet Process Mixture of survival models.
-
NMixMCMC
in
mixAK
performs an MCMC estimation
of normal mixtures for censored data.
-
A MCMC for Gaussian linear regression with left-, right- or interval-censored
data can be fitted using the
MCMCtobit
in
MCMCpack.
-
The
BayHaz
package estimates the hazard function from censored
data in a Bayesian framework.
-
The
weibullregpost
function in
learnBayes
computes
the log posterior density for a Weibull proportional-odds regression model.
-
The
MCMCglmm
fits generalised linear mixed models using MCMC
to right-, left- and interval censored data.
-
The
splinesurv
package implements a proportional hazards model
for possibly clustered survival data using MCMC. Baseline hazard and frailty
densities are modelled using B-splines.
-
The
BaSTA
package aims at drawing inference on
age-specific mortality from capture-recapture/recovery data when
some or all records have missing information on times of birth
and death. Covariates can also be included in the model.
High-Dimensional Data
-
Recursive partitioning:
rpart
and
mvpart
packages implement CART-like trees that can be used with
censored outcomes.
The
party
package implements recursive partitioning for survival
data.
LogicReg
can perform logic regression.
kaps
implements K-adaptive partitioning and recursive
partitioning algorithms for censored survival data
-
Random forest:
Package
ipred
implements bagging for survival data.
The
randomSurvivalForest
package fits random forest
to survival data,
while a variant of the random forest is implemented in
party.
-
Regularised and shrinkage methods:
The
glmpath
package implements a L1 regularised Cox
proportional hazards model.
An L1 and L2 penalised Cox models are available in
penalized.
The
pamr
package computes a nearest shrunken centroid
for survival gene expression data.
A high dimensional Cox model using univariate shrinkage is available
in
uniCox.
The
lpc
package implements the lassoed principal components
method.
The
ahaz
package implements the LASSO and elastic net estimator for the
additive risk model.
bujar
provides the Buckley-James regression model for
high-dimensional data.
-
Boosting:
Gradient boosting for the Cox model is implemented in the
gbm
package.
The
mboost
package includes a generic gradient boosting algorithm
for the construction of prognostic and diagnostic models for right-censored data.
globalboosttest
implements permutation-based testing procedure to test
the additional predictive value of high-dimensional data. It is based on
mboost.
CoxBoost
provides routines for fitting the Cox proportional hazards model
and the Fine and Gray model by likelihood based boosting.
-
Other:
The
superpc
package implements the supervised principal components
for survival data.
Package
plsRcox
fits Cox models in a high dimensional setting
through partial least square regression.
The
AIM
package can construct index models for survival outcomes, that is,
construct scores based on a training dataset.
Predictions and Prediction Performance
-
The
pec
package provides utilities to plot prediction error
curves for several survival models
-
peperr
implements prediction error techniques which can
be computed in a parallelised way. Useful for high-dimensional
data.
-
survivalROC
computes time-dependent ROC curves and time-dependent AUC from
censored data using Kaplan-Meier or Akritas's nearest neighbour estimation method
(Cumulative sensitivity and dynamic specificity).
-
risksetROC
implements time-dependent ROC curves,
AUC and integrated AUC of Heagerty and Zheng (Biometrics, 2005).
-
Various time-dependent true/false positive rates and
Cumulative/Dynamic AUC are implemented in the
survAUC
package.
-
The
survcomp
package provides several functions to
assess and compare the performance of survival models.
-
C-statistics for risk prediction models with censored survival
data can be computed via the
survC1
package.
Miscellaneous
-
dynpred
is the companion package to "Dynamic Prediction
in Clinical Survival Analysis".
-
Inverse probability weights can be estimated using the
ipw
package.
These can be used to fit marginal structural models to estimate causal effects
from observational data.
-
Package
boot
proposes the
censboot
function that
implements several types of bootstrap techniques for right-censored data.
-
The
powerSurvEpi
package provides power and sample size
calculation
for survival analysis (with a focus towards epidemiological studies).
-
The
PermAlgo
package permits the user to simulate complex survival data,
in which event and censoring times could be conditional on
an user-specified list of (possibly time-dependent) covariates.
-
The
survJamda
package provides functions for performing meta-analyses
of gene expression data and to predict patients' survival and risk assessment.
-
ipdmeta
provides tools for individual patient data meta-analysis, mixed-level meta-analysis with patient
level data and mulivariate survival estimates for aggregate studies.
-
The
KMsurv
package includes the data sets from Klein and Moeschberger (1997).
Some supplementary data sets and functions can be found in the
OIsurv
package.
The package
SMIR
that accompanies Aitkin et al. (2009),
SMPracticals
that accompanies Davidson (2003)
and
DAAG
that accompanies Maindonald, J.H. and Braun,
W.J. (2003, 2007) also contain survival data sets.
-
The
rtv
package proposes convenience functions for
representing, manipulating and visualising time data.
-
The
logconcens
package compute the MLE of a density
(log-concave) possibly for interval censored data.