simrec

Simulation of recurrent event data for non-constant baseline hazard (total-time model)

This function allows simulation of recurrent event data following the multiplicative intensity model described in Andersen and Gill (ref1) with the baseline hazard being a function of the total/calendar time. To induce between-subject-heterogeneity a random effect covariate (frailty term) can be incorporated. Data for individual $i$ are generated according to the intensity process $$Y_i(t) * \lambda_0(t)* Z_i *exp(\beta^t X_i),$$ where $X_i$ defines the covariate vector and $\beta$ the regression coefficient vector. $\lambda_0(t)$ denotes the baseline hazard, being a function of the total/calendar time $t$, and $Y_i(t)$ the predictable process that equals one as long as individual $i$ is under observation and at risk for experiencing events. $Z_i$ denotes the frailty variable with $(Z_i)_i$ iid with $E(Z_i)=1$ and $Var(Z_i)=\theta$. The parameter $\theta$ describes the degree of between-subject-heterogeneity. Data output is in the counting process format.

simrec(
  N,
  fu.min,
  fu.max,
  cens.prob = 0,
  dist.x = "binomial",
  par.x = 0,
  beta.x = 0,
  dist.z = "gamma",
  par.z = 0,
  dist.rec,
  par.rec,
  pfree = 0,
  dfree = 0
)

Arguments

N: Number of individuals
fu.min: Minimum length of follow-up.
fu.max: Maximum length of follow-up. Individuals length of follow-up is generated from a uniform distribution on [fu.min, fu.max]. If fu.min=fu.max, then all individuals have a common follow-up.
cens.prob: Gives the probability of being censored due to loss to follow-up before fu.max. For a random set of individuals defined by a B(N,cens.prob)-distribution, the time to censoring is generated from a uniform distribution on [0, fu.max]. Default is cens.prob=0, i.e. no censoring due to loss to follow-up.
dist.x: Distribution of the covariate(s) $X$. If there is more than one covariate, dist.x must be a vector of distributions with one entry for each covariate. Possible values are "binomial" and "normal", default is dist.x="binomial".
par.x: Parameters of the covariate distribution(s). For "binomial", par.x is the probability for $x=1$. For "normal", par.x=c($\mu, \sigma$) where $\mu$ is the mean and $\sigma$ is the standard deviation of a normal distribution. If one of the covariates is defined to be normally distributed, par.x must be a list, e.g. dist.x <- c("binomial", "normal") and par.x <- list(0.5, c(1,2)). Default is par.x=0, i.e. $x=0$ for all individuals.
beta.x: Regression coefficient(s) for the covariate(s) $x$. If there is more than one covariate, beta.x must be a vector of coefficients with one entry for each covariate. simrec generates as many covariates as there are entries in beta.x. Default is beta.x=0, corresponding to no effect of the covariate $x$.
dist.z: Distribution of the frailty variable $Z$ with $E(Z)=1$ and $Var(Z)=\theta$. Possible values are "gamma" for a Gamma distributed frailty and "lognormal" for a lognormal distributed frailty. Default is dist.z="gamma".
par.z: Parameter $\theta$ for the frailty distribution: this parameter gives the variance of the frailty variable $Z$. Default is par.z=0, which causes $Z=1$, i.e. no frailty effect.
dist.rec: Form of the baseline hazard function. Possible values are "weibull" or "gompertz" or "lognormal" or "step".
par.rec: Parameters for the distribution of the event data. If dist.rec="weibull" the hazard function is $$\lambda_0(t)=\lambda*\nu* t^{\nu - 1},$$ where $\lambda>0$ is the scale and $\nu>0$ is the shape parameter. Then par.rec=c($\lambda, \nu$). A special case of this is the exponential distribution for $\nu=1$.\ If dist.rec="gompertz", the hazard function is $$\lambda_0(t)=\lambda*exp(\alpha t),$$ where $\lambda>0$ is the scale and $\alpha\in(-\infty,+\infty)$ is the shape parameter. Then par.rec=c($\lambda, \alpha$).\ If dist.rec="lognormal", the hazard function is $$\lambda_0(t)=[(1/(\sigma t))*\phi((ln(t)-\mu)/\sigma)]/[\Phi((-ln(t)-\mu)/\sigma)],$$ where $\phi$ is the probability density function and $\Phi$ is the cumulative distribution function of the standard normal distribution, $\mu\in(-\infty,+\infty)$ is a location parameter and $\sigma>0$ is a shape parameter. Then par.rec=c($\mu,\sigma$). Please note, that specifying dist.rec="lognormal" together with some covariates does not specify the usual lognormal model (with covariates specified as effects on the parameters of the lognormal distribution resulting in non-proportional hazards), but only defines the baseline hazard and incorporates covariate effects using the proportional hazard assumption.\ If dist.rec="step" the hazard function is $$\lambda_0(t)=a, t<=t_1, and \lambda_0(t)=b, t>t_1$$. Then par.rec=c($a,b,t_1$).
pfree: Probability that after experiencing an event the individual is not at risk for experiencing further events for a length of dfree time units. Default is pfree=0.
dfree: Length of the risk-free interval. Must be in the same time unit as fu.max. Default is dfree=0, i.e. the individual is continously at risk for experiencing events until end of follow-up.

Value

The output is a data.frame consisting of the columns:

id: An integer number for identification of each individual
x: or x.V1, x.V2, ... - depending on the covariate matrix. Contains the randomly generated value of the covariate(s) $X$ for each individual.
z: Contains the randomly generated value of the frailty variable $Z$ for each individual.
start: The start of interval [start, stop], when the individual starts to be at risk for a next event.
stop: The time of an event or censoring, i.e. the end of interval [start, stop].
status: An indicator of whether an event occured at time stop (status=1) or the individual is censored at time stop (status=0).
fu: Length of follow-up period [0,fu] for each individual.

For each individual there are as many lines as it experiences events, plus one line if being censored. The data format corresponds to the counting process format.

Details

Simulation of recurrent event data for non-constant baseline hazard in the total time model with risk-free intervalls and possibly a competing event. The simrec package enables to cut the data to an interim data set, and provides functionality to plot.

Data are simulated by extending the methods proposed by Bender et al (ref2) to the multiplicative intensity model.

References

Andersen P, Gill R (1982): Cox's regression model for counting processes: a large sample study. The Annals of Statistics 10:1100-1120
Bender R, Augustin T, Blettner M (2005): Generating survival times to simulate Cox proportional hazards models. Statistics in Medicine 24:1713-1723
Jahn-Eimermacher A, Ingel K, Ozga AK, Preussler S, Binder H (2015): Simulating recurrent event data with hazard functions defined on a total time scale. BMC Medical Research Methodology 15:16

Author

Katharina Ingel, Stella Preussler, Antje Jahn-Eimermacher, Federico Marini

Maintainer: Antje Jahn-Eimermacher jahna@uni-mainz.de

Katharina Ingel, Stella Preussler, Antje Jahn-Eimermacher. Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center of the Johannes Gutenberg-University Mainz, Germany

Examples