The line list is simulated using a branching process and parameterised with epidemiological parameters.
Usage
sim_linelist(
contact_distribution = function(x) stats::dpois(x = x, lambda = 2),
infectious_period = function(x) stats::rlnorm(n = x, meanlog = 2, sdlog = 0.5),
prob_infection = 0.5,
onset_to_hosp = function(x) stats::rlnorm(n = x, meanlog = 1.5, sdlog = 0.5),
onset_to_death = function(x) stats::rlnorm(n = x, meanlog = 2.5, sdlog = 0.5),
onset_to_recovery = NULL,
hosp_risk = 0.2,
hosp_death_risk = 0.5,
non_hosp_death_risk = 0.05,
outbreak_start_date = as.Date("2023-01-01"),
anonymise = FALSE,
outbreak_size = c(10, 10000),
population_age = c(1, 90),
case_type_probs = c(suspected = 0.2, probable = 0.3, confirmed = 0.5),
config = create_config()
)
Arguments
- contact_distribution
A
function
or an<epiparameter>
object to generate the number of contacts per infection.The function can be defined or anonymous. The function must have a single argument in the form of an
integer
vector with elements representing the number of contacts, and return anumeric
vector where each element corresponds to the probability of observing the number of contacts in the vector passed to the function. The index of thenumeric
vector returned is offset by one to the corresponding probability of observing the number of contacts, i.e. the first element of the output vector is the probability of observing zero contacts, the second element is the probability of observing one contact, etc.An
<epiparameter>
can be provided. This will be converted into a probability mass function internally.The default is an anonymous function with a Poisson probability mass function (
dpois()
) with a mean (\(\lambda\)) of 2 contacts per infection.- infectious_period
A
function
or an<epiparameter>
object for the infectious period. This defines the duration from becoming infectious to no longer infectious. In the simulation, individuals are assumed to become infectious immediately after being infected (the latency period is assumed to be zero). The time intervals between an infected individual and their contacts are assumed to be uniformly distributed within the infectious period. Infectious periods must be strictly positive.The function can be defined or anonymous. The function must return a vector of randomly generated real numbers representing sampled infectious periods. The function must have a single argument, the number of random infectious periods to generate.
An
<epiparameter>
can be provided. This will be converted into random number generator internally.The default is an anonymous function with a lognormal distribution random number generator (
rlnorm()
) withmeanlog = 2
andsdlog = 0.5
.- prob_infection
A single
numeric
for the probability of a secondary contact being infected by an infected primary contact.- onset_to_hosp
A
function
or an<epiparameter>
object for the onset-to-hospitalisation delay distribution.onset_to_hosp
can also be set toNULL
to not simulate hospitalisation (admission) dates.The function can be defined or anonymous. The function must return a vector of
numeric
s for the length of the onset-to-hospitalisation delay. The function must have a single argument.An
<epiparameter>
can be provided. This will be converted into a random number generator internally.The default is an anonymous function with a lognormal distribution random number generator (
rlnorm()
) withmeanlog = 1.5
andsdlog = 0.5
.If
onset_to_hosp
is set toNULL
thenhosp_risk
andhosp_death_risk
will be automatically set toNULL
if not manually specified.- onset_to_death
A
function
or an<epiparameter>
object for the onset-to-death delay distribution.onset_to_death
can also be set toNULL
to not simulate dates for individuals that died.The function can be defined or anonymous. The function must return a vector of
numeric
s for the length of the onset-to-death delay. The function must have a single argument.An
<epiparameter>
can be provided. This will be converted into a random number generator internally.The default is an anonymous function with a lognormal distribution random number generator (
rlnorm()
) withmeanlog = 2.5
andsdlog = 0.5
.If
onset_to_death
is set toNULL
thennon_hosp_death_risk
andhosp_death_risk
will be automatically set toNULL
if not manually specified.- onset_to_recovery
A
function
or an<epiparameter>
object for the onset-to-recovery delay distribution.onset_to_recovery
can also beNULL
to not simulate dates for individuals that recovered.The function can be defined or anonymous. The function must return a vector of
numeric
s for the length of the onset-to-recovery delay. The function must have a single argument.An
<epiparameter>
can be provided. This will be converted into a random number generator internally.The default is
NULL
so by default cases that recover get anNA
in the$date_outcome
line list column.- hosp_risk
Either a single
numeric
for the hospitalisation risk of everyone in the population, or a<data.frame>
with age specific hospitalisation risks. Default is 20% hospitalisation (0.2
) for the entire population. If theonset_to_hosp
argument is set toNULL
this argument will automatically be set toNULL
if not specified or can be manually set toNULL
. See details and examples for more information.- hosp_death_risk
Either a single
numeric
for the death risk for hospitalised individuals across the population, or a<data.frame>
with age specific hospitalised death risks Default is 50% death risk in hospitals (0.5
) for the entire population. If theonset_to_death
argument is set toNULL
this argument will automatically be set toNULL
if not specified or can be manually set toNULL
. See details and examples for more information. Thehosp_death_risk
can vary through time if specified in thetime_varying_death_risk
element ofconfig
, seevignette("time-varying-cfr", package = "simulist")
for more information.- non_hosp_death_risk
Either a single
numeric
for the death risk for outside of hospitals across the population, or a<data.frame>
with age specific death risks outside of hospitals. Default is 5% death risk outside of hospitals (0.05
) for the entire population. If theonset_to_death
argument is set toNULL
this argument will automatically be set toNULL
if not specified or can be manually set toNULL
. See details and examples for more information. Thenon_hosp_death_risk
can vary through time if specified in thetime_varying_death_risk
element ofconfig
, seevignette("time-varying-cfr", package = "simulist")
for more information.- outbreak_start_date
A
date
for the start of the outbreak.- anonymise
A
logical
boolean for whether case names should be anonymised. Default isFALSE
.- outbreak_size
A
numeric
vector of length 2 defining the minimum and the maximum number of infected individuals for the simulated outbreak. Default isc(10, 1e4)
, so the minimum outbreak size is 10 infected individuals, and the maximum outbreak size is 10,000 infected individuals. Either number can be changed to increase or decrease the maximum or minimum outbreak size to allow simulating larger or smaller outbreaks. If the minimum outbreak size cannot be reached after running the simulation for many iterations (internally) then the function errors, whereas if the maximum outbreak size is exceeded the function returns the data early and a warning stating how many cases and contacts are returned.- population_age
Either a
numeric
vector with two elements or a<data.frame>
with age structure in the population. Use anumeric
vector to specific the age range of the population, the first element is the lower bound for the age range, and and the second is the upper bound for the age range (both inclusive, i.e. [lower, upper]). The<data.frame>
with age groups and the proportion of the population in that group. See details and examples for more information.- case_type_probs
A named
numeric
vector with the probability of each case type. The names of the vector must be"suspected"
,"probable"
,"confirmed"
. Values of each case type must sum to one.- config
A list of settings to adjust the randomly sampled delays and Ct values. See
create_config()
for more information.
Details
For age-stratified hospitalised and death risks a <data.frame>
will need to be passed to the hosp_risk
and/or hosp_death_risk
arguments. This <data.frame>
should have two columns:
age_limit
: a column with onenumeric
per cell for the lower bound (minimum) age of the age group (inclusive).risk
: a column with onenumeric
per cell for the proportion (or probability) of hospitalisation for that age group. Should be between 0 and 1.
For an age structured population, a <data.frame>
with two columns:
age_range
: a column with characters specifying the lower and upper bound of that age group, separated by a hyphen (-). Both bounds are inclusive (integers). For example, an age group of one to ten would be given as"1-10"
.proportion
: a column with the proportion of the population that are in that age group. Proportions must sum to one.
Examples
# quickly simulate a line list using the function defaults
linelist <- sim_linelist()
head(linelist)
#> id case_name case_type sex age date_onset date_admission outcome
#> 1 1 Kelly Wesney probable f 45 2023-01-01 <NA> recovered
#> 2 2 Milton Brown confirmed m 84 2023-01-01 <NA> recovered
#> 3 7 David Rodgers probable m 47 2023-01-02 2023-01-04 recovered
#> 4 8 Rashaad el-Ozer probable m 34 2023-01-04 <NA> recovered
#> 5 9 Layaali el-Mahfouz confirmed f 71 2023-01-03 <NA> recovered
#> 6 10 Aaron Goldstein probable m 90 2023-01-02 <NA> recovered
#> date_outcome date_first_contact date_last_contact ct_value
#> 1 <NA> <NA> <NA> NA
#> 2 <NA> 2023-01-03 2023-01-06 22.5
#> 3 <NA> 2022-12-28 2023-01-02 NA
#> 4 <NA> 2023-01-04 2023-01-07 NA
#> 5 <NA> 2023-01-01 2023-01-03 22.9
#> 6 <NA> 2023-01-01 2023-01-05 NA
# to simulate a more realistic line list load epiparameters from
# {epiparameter}
library(epiparameter)
contact_distribution <- epiparameter(
disease = "COVID-19",
epi_name = "contact distribution",
prob_distribution = create_prob_distribution(
prob_distribution = "pois",
prob_distribution_params = c(mean = 2)
)
)
#> Citation cannot be created as author, year, journal or title is missing
infectious_period <- epiparameter(
disease = "COVID-19",
epi_name = "infectious period",
prob_distribution = create_prob_distribution(
prob_distribution = "gamma",
prob_distribution_params = c(shape = 1, scale = 1)
)
)
#> Citation cannot be created as author, year, journal or title is missing
# get onset to hospital admission from {epiparameter} database
onset_to_hosp <- epiparameter_db(
disease = "COVID-19",
epi_name = "onset to hospitalisation",
single_epiparameter = TRUE
)
#> Using Linton N, Kobayashi T, Yang Y, Hayashi K, Akhmetzhanov A, Jung S, Yuan
#> B, Kinoshita R, Nishiura H (2020). “Incubation Period and Other
#> Epidemiological Characteristics of 2019 Novel Coronavirus Infections
#> with Right Truncation: A Statistical Analysis of Publicly Available
#> Case Data.” _Journal of Clinical Medicine_. doi:10.3390/jcm9020538
#> <https://doi.org/10.3390/jcm9020538>..
#> To retrieve the citation use the 'get_citation' function
# get onset to death from {epiparameter} database
onset_to_death <- epiparameter_db(
disease = "COVID-19",
epi_name = "onset to death",
single_epiparameter = TRUE
)
#> Using Linton N, Kobayashi T, Yang Y, Hayashi K, Akhmetzhanov A, Jung S, Yuan
#> B, Kinoshita R, Nishiura H (2020). “Incubation Period and Other
#> Epidemiological Characteristics of 2019 Novel Coronavirus Infections
#> with Right Truncation: A Statistical Analysis of Publicly Available
#> Case Data.” _Journal of Clinical Medicine_. doi:10.3390/jcm9020538
#> <https://doi.org/10.3390/jcm9020538>..
#> To retrieve the citation use the 'get_citation' function
# example with single hospitalisation risk for entire population
linelist <- sim_linelist(
contact_distribution = contact_distribution,
infectious_period = infectious_period,
prob_infection = 0.5,
onset_to_hosp = onset_to_hosp,
onset_to_death = onset_to_death,
hosp_risk = 0.5
)
head(linelist)
#> id case_name case_type sex age date_onset date_admission outcome
#> 1 1 Henry Sumearll confirmed m 39 2023-01-01 <NA> recovered
#> 2 2 Warren Le confirmed m 44 2023-01-01 2023-01-01 died
#> 3 3 Iffat el-Qadir probable f 32 2023-01-01 2023-01-10 recovered
#> 4 7 Simone Patterson confirmed f 67 2023-01-02 2023-01-03 recovered
#> 5 8 Lutfiyya al-Khalili suspected f 59 2023-01-02 2023-01-06 died
#> 6 9 Fakeeha el-Hashim probable f 49 2023-01-02 <NA> recovered
#> date_outcome date_first_contact date_last_contact ct_value
#> 1 <NA> <NA> <NA> 26.5
#> 2 2023-01-12 2023-01-01 2023-01-06 20.6
#> 3 <NA> 2022-12-29 2023-01-01 NA
#> 4 <NA> 2022-12-29 2023-01-05 26.4
#> 5 2023-01-26 2023-01-02 2023-01-06 NA
#> 6 <NA> 2023-01-02 2023-01-04 NA
# example with age-stratified hospitalisation risk
# 20% for over 80s
# 10% for under 5s
# 5% for the rest
age_dep_hosp_risk <- data.frame(
age_limit = c(1, 5, 80),
risk = c(0.1, 0.05, 0.2)
)
linelist <- sim_linelist(
contact_distribution = contact_distribution,
infectious_period = infectious_period,
prob_infection = 0.5,
onset_to_hosp = onset_to_hosp,
onset_to_death = onset_to_death,
hosp_risk = age_dep_hosp_risk
)
head(linelist)
#> id case_name case_type sex age date_onset date_admission outcome
#> 1 1 Elizabeth Ko suspected f 15 2023-01-01 <NA> recovered
#> 2 2 Dekota Rector confirmed m 56 2023-01-01 <NA> recovered
#> 3 3 Nicklos Rough suspected m 49 2023-01-01 <NA> recovered
#> 4 4 Quartus Sufi confirmed m 10 2023-01-03 <NA> recovered
#> 5 5 William Madison probable m 15 2023-01-01 <NA> recovered
#> 6 7 Seth Cretecos suspected m 37 2023-01-03 <NA> recovered
#> date_outcome date_first_contact date_last_contact ct_value
#> 1 <NA> <NA> <NA> NA
#> 2 <NA> 2023-01-02 2023-01-05 22.1
#> 3 <NA> 2022-12-30 2023-01-02 NA
#> 4 <NA> 2022-12-28 2023-01-01 23.8
#> 5 <NA> 2023-01-03 2023-01-05 NA
#> 6 <NA> 2023-01-04 2023-01-07 NA