Simulation of recurrent event data for non-constant baseline hazard (total-time model)
This function allows simulation of recurrent event data following the multiplicative intensity model described in Andersen and Gill (ref1) with the baseline hazard being a function of the total/calendar time. To induce between-subject-heterogeneity a random effect covariate (frailty term) can be incorporated. Data for individual \(i\) are generated according to the intensity process $$Y_i(t) * \lambda_0(t)* Z_i *exp(\beta^t X_i),$$ where \(X_i\) defines the covariate vector and \(\beta\) the regression coefficient vector. \(\lambda_0(t)\) denotes the baseline hazard, being a function of the total/calendar time \(t\), and \(Y_i(t)\) the predictable process that equals one as long as individual \(i\) is under observation and at risk for experiencing events. \(Z_i\) denotes the frailty variable with \((Z_i)_i\) iid with \(E(Z_i)=1\) and \(Var(Z_i)=\theta\). The parameter \(\theta\) describes the degree of between-subject-heterogeneity. Data output is in the counting process format.
simrec(
N,
fu.min,
fu.max,
cens.prob = 0,
dist.x = "binomial",
par.x = 0,
beta.x = 0,
dist.z = "gamma",
par.z = 0,
dist.rec,
par.rec,
pfree = 0,
dfree = 0
)
Number of individuals
Minimum length of follow-up.
Maximum length of follow-up. Individuals length of follow-up is
generated from a uniform distribution on
[fu.min, fu.max]
. If fu.min=fu.max
, then all individuals have a common
follow-up.
Gives the probability of being censored due to loss to follow-up before
fu.max
. For a random set of individuals defined by a B(N,cens.prob
)-distribution,
the time to censoring is generated from a uniform
distribution on [0, fu.max]
. Default is cens.prob=0
, i.e. no censoring
due to loss to follow-up.
Distribution of the covariate(s) \(X\). If there is more than one covariate,
dist.x
must be a vector of distributions with one entry for each covariate. Possible
values are "binomial"
and "normal"
, default is dist.x="binomial"
.
Parameters of the covariate distribution(s). For "binomial", par.x
is
the probability for \(x=1\). For "normal"
, par.x=c(
\(\mu, \sigma\))
where \(\mu\) is the mean and \(\sigma\) is the standard deviation of a normal distribution.
If one of the covariates is defined to be normally distributed, par.x
must be a list,
e.g. dist.x <- c("binomial", "normal")
and par.x <- list(0.5, c(1,2))
.
Default is par.x=0
, i.e. \(x=0\) for all individuals.
Regression coefficient(s) for the covariate(s) \(x\). If there is more than one
covariate, beta.x
must be a vector of coefficients with one entry for each covariate.
simrec
generates as many covariates as there are entries in beta.x
. Default is
beta.x=0
, corresponding to no effect of the covariate \(x\).
Distribution of the frailty variable \(Z\) with \(E(Z)=1\) and
\(Var(Z)=\theta\). Possible values are "gamma"
for a Gamma distributed frailty
and "lognormal"
for a lognormal distributed frailty.
Default is dist.z="gamma"
.
Parameter \(\theta\) for the frailty distribution: this parameter gives
the variance of the frailty variable \(Z\).
Default is par.z=0
, which causes \(Z=1\), i.e. no frailty effect.
Form of the baseline hazard function. Possible values are "weibull"
or
"gompertz"
or "lognormal"
or "step"
.
Parameters for the distribution of the event data.
If dist.rec="weibull"
the hazard function is $$\lambda_0(t)=\lambda*\nu* t^{\nu - 1},$$
where \(\lambda>0\) is the scale and \(\nu>0\) is the shape parameter. Then
par.rec=c(
\(\lambda, \nu\))
. A special case
of this is the exponential distribution for \(\nu=1\).\
If dist.rec="gompertz"
, the hazard function is $$\lambda_0(t)=\lambda*exp(\alpha t),$$
where \(\lambda>0\) is the scale and \(\alpha\in(-\infty,+\infty)\) is the shape parameter.
Then par.rec=c(
\(\lambda, \alpha\))
.\
If dist.rec="lognormal"
, the hazard function is
$$\lambda_0(t)=[(1/(\sigma t))*\phi((ln(t)-\mu)/\sigma)]/[\Phi((-ln(t)-\mu)/\sigma)],$$
where \(\phi\) is the probability density function and \(\Phi\) is the cumulative
distribution function of the standard normal distribution, \(\mu\in(-\infty,+\infty)\) is a
location parameter and \(\sigma>0\) is a shape parameter. Then par.rec=c(
\(\mu,\sigma\))
.
Please note, that specifying dist.rec="lognormal"
together with some covariates does not
specify the usual lognormal model (with covariates specified as effects on the parameters of the
lognormal distribution resulting in non-proportional hazards), but only defines the baseline
hazard and incorporates covariate effects using the proportional hazard assumption.\
If dist.rec="step"
the hazard function is $$\lambda_0(t)=a, t<=t_1, and \lambda_0(t)=b, t>t_1$$.
Then par.rec=c(
\(a,b,t_1\))
.
Probability that after experiencing an event the individual is not at risk
for experiencing further events for a length of dfree
time units.
Default is pfree=0
.
Length of the risk-free interval. Must be in the same time unit as fu.max
.
Default is dfree=0
, i.e. the individual is continously at risk for experiencing
events until end of follow-up.
The output is a data.frame consisting of the columns:
An integer number for identification of each individual
or x.V1, x.V2, ...
- depending on the covariate matrix. Contains the
randomly generated value of the covariate(s) \(X\) for each individual.
Contains the randomly generated value of the frailty variable \(Z\) for each individual.
The start of interval [start, stop]
, when the individual
starts to be at risk for a next event.
The time of an event or censoring, i.e. the end of interval
[start, stop]
.
An indicator of whether an event occured at time stop
(status=1
)
or the individual is censored at time stop
(status=0
).
Length of follow-up period [0,fu]
for each individual.
For each individual there are as many lines as it experiences events, plus one line if being censored. The data format corresponds to the counting process format.
Simulation of recurrent event data for non-constant baseline hazard in the total time model with risk-free intervalls and possibly a competing event. The simrec package enables to cut the data to an interim data set, and provides functionality to plot.
Data are simulated by extending the methods proposed by Bender et al (ref2) to the multiplicative intensity model.
Andersen P, Gill R (1982): Cox's regression model for counting processes: a large sample study. The Annals of Statistics 10:1100-1120
Bender R, Augustin T, Blettner M (2005): Generating survival times to simulate Cox proportional hazards models. Statistics in Medicine 24:1713-1723
Jahn-Eimermacher A, Ingel K, Ozga AK, Preussler S, Binder H (2015): Simulating recurrent event data with hazard functions defined on a total time scale. BMC Medical Research Methodology 15:16
### Example:
### A sample of 10 individuals
N <- 10
### with a binomially distributed covariate with a regression coefficient
### of beta=0.3, and a standard normally distributed covariate with a
### regression coefficient of beta=0.2,
dist.x <- c("binomial", "normal")
par.x <- list(0.5, c(0, 1))
beta.x <- c(0.3, 0.2)
### a gamma distributed frailty variable with variance 0.25
dist.z <- "gamma"
par.z <- 0.25
### and a Weibull-shaped baseline hazard with shape parameter lambda=1
### and scale parameter nu=2.
dist.rec <- "weibull"
par.rec <- c(1, 2)
### Subjects are to be followed for two years with 20% of the subjects
### being censored according to a uniformly distributed censoring time
### within [0,2] (in years).
fu.min <- 2
fu.max <- 2
cens.prob <- 0.2
### After each event a subject is not at risk for experiencing further events
### for a period of 30 days with a probability of 50%.
dfree <- 30 / 365
pfree <- 0.5
simdata <- simrec(
N, fu.min, fu.max, cens.prob, dist.x, par.x, beta.x, dist.z, par.z,
dist.rec, par.rec, pfree, dfree
)
# print(simdata) # only run for small N!