Skip to contents

If you are unfamiliar with the {simulist} package or the sim_linelist() function Get Started vignette is a great place to start.

This vignette will describe how to simulate a line list with age-stratified (or age-dependent) hospitalisation and death risks.

There are three risks in {simulist}, specifically in the sim_linelist() and sim_outbreak() functions, that relate to hospitalisations and deaths:

  1. Hospitalisation risk (hosp_risk)
  2. Death risk in hospitals (hosp_death_risk)
  3. Death risk outside of hospitals (non_hosp_death_risk)

The default for sim_linelist() is a 0.2 (or 20%) hospitalisation risk for individuals infected, 0.5 (or 50%) death risk for hospitalised individuals, and 0.05 (or 5%) death risk for infected individuals outside of hospitals. These default risks are applied to all age groups in the populations.

However, it may be unrealistic to assume that the probability an infected person is admitted to hospital or dies is independent of their age. For many diseases, young children or elderly individuals are at higher risk of being hospitalised and having adverse outcomes.

The sim_linelist() function can accommodate age-stratified risks by accepting a <data.frame> instead of a single risk for the entire population.

Here is an example that uses the default hospitalisation and death risks (inside and outside of hospital). First we load the requested delay distributions using the {epiparameter} package.

contact_distribution <- epiparameter(
  disease = "COVID-19",
  epi_name = "contact distribution",
  prob_distribution = create_prob_distribution(
    prob_distribution = "pois",
    prob_distribution_params = c(mean = 2)
  )
)
#> Citation cannot be created as author, year, journal or title is missing

infectious_period <- epiparameter(
  disease = "COVID-19",
  epi_name = "infectious period",
  prob_distribution = create_prob_distribution(
    prob_distribution = "gamma",
    prob_distribution_params = c(shape = 1, scale = 1)
  )
)
#> Citation cannot be created as author, year, journal or title is missing

# get onset to hospital admission from {epiparameter} database
onset_to_hosp <- epiparameter_db(
  disease = "COVID-19",
  epi_name = "onset to hospitalisation",
  single_epiparameter = TRUE
)
#> Using Linton N, Kobayashi T, Yang Y, Hayashi K, Akhmetzhanov A, Jung S, Yuan
#> B, Kinoshita R, Nishiura H (2020). "Incubation Period and Other
#> Epidemiological Characteristics of 2019 Novel Coronavirus Infections
#> with Right Truncation: A Statistical Analysis of Publicly Available
#> Case Data." _Journal of Clinical Medicine_. doi:10.3390/jcm9020538
#> <https://doi.org/10.3390/jcm9020538>.. 
#> To retrieve the citation use the 'get_citation' function

# get onset to death from {epiparameter} database
onset_to_death <- epiparameter_db(
  disease = "COVID-19",
  epi_name = "onset to death",
  single_epiparameter = TRUE
)
#> Using Linton N, Kobayashi T, Yang Y, Hayashi K, Akhmetzhanov A, Jung S, Yuan
#> B, Kinoshita R, Nishiura H (2020). "Incubation Period and Other
#> Epidemiological Characteristics of 2019 Novel Coronavirus Infections
#> with Right Truncation: A Statistical Analysis of Publicly Available
#> Case Data." _Journal of Clinical Medicine_. doi:10.3390/jcm9020538
#> <https://doi.org/10.3390/jcm9020538>.. 
#> To retrieve the citation use the 'get_citation' function

We set the seed to ensure we have the same output each time the vignette is rendered. When using {simulist}, setting the seed is not required unless you need to simulate the same line list multiple times.

Population-wide risks

Simulate a line list with population-wide default risks:

  • hospitalisation risk: 0.2
  • death risk in hospitals 0.5
  • death risk outside of hospitals 0.05
linelist <- sim_linelist(
  contact_distribution = contact_distribution,
  infectious_period = infectious_period,
  prob_infection = 0.5,
  onset_to_hosp = onset_to_hosp,
  onset_to_death = onset_to_death
)

# first 6 rows of linelist
head(linelist)
#>   id       case_name case_type sex age date_onset date_admission   outcome
#> 1  1 Wajdi al-Demian  probable   m  35 2023-01-01           <NA> recovered
#> 2  2   Raaid el-Diab confirmed   m  43 2023-01-01     2023-01-07 recovered
#> 3  3  Nickolas Nault suspected   m   1 2023-01-01           <NA> recovered
#> 4  5     Hee Kennedy confirmed   m  78 2023-01-01     2023-01-03      died
#> 5  6     Hope Arshad suspected   f  22 2023-01-01     2023-01-28      died
#> 6  8  Shanta Holiday  probable   f  28 2023-01-01           <NA> recovered
#>   date_outcome date_first_contact date_last_contact ct_value
#> 1         <NA>               <NA>              <NA>       NA
#> 2         <NA>         2022-12-30        2023-01-05     23.2
#> 3         <NA>         2022-12-30        2023-01-02       NA
#> 4   2023-01-21         2022-12-29        2023-01-02     25.2
#> 5   2023-01-10         2023-01-01        2023-01-03       NA
#> 6         <NA>         2023-01-03        2023-01-04       NA

We can run another simulation and change the hospitalisation and death risks, inside and outside the hospital, still applied to the entire population. In this scenario the probability of being hospitalised if infected is higher, but the mortality risk for both hospitalised and non-hospitalised groups is lower.

linelist <- sim_linelist(
  contact_distribution = contact_distribution,
  infectious_period = infectious_period,
  prob_infection = 0.5,
  onset_to_hosp = onset_to_hosp,
  onset_to_death = onset_to_death,
  hosp_risk = 0.4,
  hosp_death_risk = 0.2,
  non_hosp_death_risk = 0.01
)

head(linelist)
#>   id          case_name case_type sex age date_onset date_admission   outcome
#> 1  1      Robert Wanzek suspected   m  80 2023-01-01     2023-01-08 recovered
#> 2  2           Kacy Kim  probable   f  85 2023-01-01           <NA> recovered
#> 3  4    Jade Goldsberry  probable   f  76 2023-01-01           <NA> recovered
#> 4  8    Brittany Brooks confirmed   f  12 2023-01-01     2023-01-01      died
#> 5 11 Fat'hiyaa al-Zafar suspected   f  50 2023-01-01           <NA> recovered
#> 6 14   Desirae Carranza  probable   f  54 2023-01-01     2023-01-03 recovered
#>   date_outcome date_first_contact date_last_contact ct_value
#> 1         <NA>               <NA>              <NA>       NA
#> 2         <NA>         2023-01-04        2023-01-06       NA
#> 3         <NA>         2022-12-30        2023-01-03       NA
#> 4   2023-01-18         2023-01-05        2023-01-05     26.4
#> 5         <NA>         2023-01-03        2023-01-05       NA
#> 6         <NA>         2022-12-31        2023-01-02       NA

Age-stratified hospitalisation and death risks

To define age-stratified risks, we must create a table (<data.frame>) which contains the lower limits of the age groups and their respective risks.

In this example the hospitalisation risk will be:

  • 0.2 (or 20%) for those aged 80 years or older
  • 0.1 (or 10%) for those younger than 5 years
  • 0.05 (or 5%) for the remaining age group

The oldest age group stops at the upper age range given in the population_age argument. The default upper age range in 90, so in this example the oldest age bracket will be 80-90 (inclusive). The minimum age of each age group is inclusive, and the maximum age of each age group is exclusive, except the oldest age group which is inclusive of the minimum and maximum age. In this example the first age group is the first element of each vector, so the minimum age is 1, maximum age is four (as the next age group starts at five), and the hospitalisation risk for that group is 0.1. Each age group forms a row in the table.

age_dep_hosp_risk <- data.frame(
  age_limit = c(1, 5, 80),
  risk = c(0.1, 0.05, 0.2)
)
age_dep_hosp_risk
#>   age_limit risk
#> 1         1 0.10
#> 2         5 0.05
#> 3        80 0.20
linelist <- sim_linelist(
  contact_distribution = contact_distribution,
  infectious_period = infectious_period,
  prob_infection = 0.5,
  onset_to_hosp = onset_to_hosp,
  onset_to_death = onset_to_death,
  hosp_risk = age_dep_hosp_risk
)

head(linelist)
#>   id        case_name case_type sex age date_onset date_admission   outcome
#> 1  1     Ignacio Albo  probable   m  88 2023-01-01           <NA> recovered
#> 2  2    Desire Aguayo confirmed   f  90 2023-01-01           <NA> recovered
#> 3  4 Miska al-Ramadan confirmed   f  30 2023-01-01           <NA> recovered
#> 4  5   Frankie Garcia confirmed   f  37 2023-01-01           <NA> recovered
#> 5  6   Tahma Mitchell  probable   m  88 2023-01-01           <NA> recovered
#> 6  7      Tan Antonio  probable   m  56 2023-01-02           <NA> recovered
#>   date_outcome date_first_contact date_last_contact ct_value
#> 1         <NA>               <NA>              <NA>       NA
#> 2         <NA>         2023-01-02        2023-01-04     23.5
#> 3         <NA>         2023-01-05        2023-01-07     22.9
#> 4         <NA>         2023-01-05        2023-01-06     24.9
#> 5         <NA>         2022-12-29        2023-01-03       NA
#> 6         <NA>         2023-01-01        2023-01-05       NA

The minimum age of the youngest age group must match the age range specified in the population_age argument of sim_linelist(), and the largest age limit in the risk <data.frame> must not be older than the upper age range. If these conditions are not met the function will error.

If the age-stratified risk table does not match the default (c(1, 90)), the population_age argument will need to be set to match.

For example, the default age range of the population is 1 to 90 (inclusive). In our example above, the lowest age group started at 1 and the oldest age group stopped at 90. This matches the default population_age = c(1, 90). However, see here that if the lower age limit exceeds the age range the function will not run.

age_dep_hosp_risk <- data.frame(
  age_limit = c(1, 5, 95),
  risk = c(0.1, 0.05, 0.2)
)
age_dep_hosp_risk
#>   age_limit risk
#> 1         1 0.10
#> 2         5 0.05
#> 3        95 0.20

linelist <- sim_linelist(
  contact_distribution = contact_distribution,
  infectious_period = infectious_period,
  prob_infection = 0.5,
  onset_to_hosp = onset_to_hosp,
  onset_to_death = onset_to_death,
  hosp_risk = age_dep_hosp_risk
)
#> Error in .check_risk_df(hosp_risk, age_range = age_range): Lower bound of oldest age group must be lower than highest age range

In order to make this code run with the age-stratified hospitalisation risk given, the population_age can be adjusted. Here the oldest age bracket is now 95 to 100 ([95, 100]).

age_dep_hosp_risk <- data.frame(
  age_limit = c(1, 5, 95),
  risk = c(0.1, 0.05, 0.2)
)

linelist <- sim_linelist(
  contact_distribution = contact_distribution,
  infectious_period = infectious_period,
  prob_infection = 0.5,
  onset_to_hosp = onset_to_hosp,
  onset_to_death = onset_to_death,
  hosp_risk = age_dep_hosp_risk,
  population_age = c(1, 100)
)

head(linelist)
#>   id         case_name case_type sex age date_onset date_admission   outcome
#> 1  1     Devin Soloman  probable   m  80 2023-01-01           <NA> recovered
#> 2  2    Hazm el-Vaziri confirmed   m  28 2023-01-01           <NA> recovered
#> 3  3         Grace Vue  probable   f  83 2023-01-01           <NA> recovered
#> 4  4 Shajee'a al-Sadek suspected   f  16 2023-01-01           <NA> recovered
#> 5  5   Azzaam el-Yusuf confirmed   m  36 2023-01-01           <NA> recovered
#> 6  7 Cordero Manheimer confirmed   m   5 2023-01-01           <NA>      died
#>   date_outcome date_first_contact date_last_contact ct_value
#> 1         <NA>               <NA>              <NA>       NA
#> 2         <NA>         2022-12-29        2023-01-04     27.2
#> 3         <NA>         2023-01-03        2023-01-05       NA
#> 4         <NA>         2022-12-31        2023-01-03       NA
#> 5         <NA>         2022-12-30        2023-01-03     29.8
#> 6   2023-01-19         2023-01-03        2023-01-08     24.2

Exactly the same method of age-stratified risks applies to death risks. First create the <data.frame> with the age groups and their respective, in this case, death risks, and then supply this to either the hosp_death_risk or non_hosp_death_risk arguments, to define the death risks in and outside the hospital, respectively, or both.

Here are a couple of examples:

age_dep_hosp_death_risk <- data.frame(
  age_limit = c(1, 5, 80),
  risk = c(0.3, 0.1, 0.6)
)
age_dep_hosp_death_risk
#>   age_limit risk
#> 1         1  0.3
#> 2         5  0.1
#> 3        80  0.6

linelist <- sim_linelist(
  contact_distribution = contact_distribution,
  infectious_period = infectious_period,
  prob_infection = 0.5,
  onset_to_hosp = onset_to_hosp,
  onset_to_death = onset_to_death,
  hosp_death_risk = age_dep_hosp_death_risk
)
age_dep_non_hosp_death_risk <- data.frame(
  age_limit = c(1, 5, 80),
  risk = c(0.1, 0.05, 0.1)
)
age_dep_non_hosp_death_risk
#>   age_limit risk
#> 1         1 0.10
#> 2         5 0.05
#> 3        80 0.10

linelist <- sim_linelist(
  contact_distribution = contact_distribution,
  infectious_period = infectious_period,
  prob_infection = 0.5,
  onset_to_hosp = onset_to_hosp,
  onset_to_death = onset_to_death,
  non_hosp_death_risk = age_dep_non_hosp_death_risk
)

Up until now these age-stratified tables have only been supplied to one risk. However, these can be supplied in the same simulation. In this case the hospitalisation risk, and death risks inside and outside of hospital, are all specified.

age_dep_hosp_risk <- data.frame(
  age_limit = c(1, 5, 80),
  risk = c(0.1, 0.05, 0.2)
)
age_dep_hosp_death_risk <- data.frame(
  age_limit = c(1, 5, 80),
  risk = c(0.3, 0.1, 0.6)
)
age_dep_non_hosp_death_risk <- data.frame(
  age_limit = c(1, 5, 80),
  risk = c(0.1, 0.05, 0.1)
)

linelist <- sim_linelist(
  contact_distribution = contact_distribution,
  infectious_period = infectious_period,
  prob_infection = 0.5,
  onset_to_hosp = onset_to_hosp,
  onset_to_death = onset_to_death,
  hosp_risk = age_dep_hosp_risk,
  hosp_death_risk = age_dep_hosp_death_risk,
  non_hosp_death_risk = age_dep_non_hosp_death_risk
)

head(linelist)
#>   id        case_name case_type sex age date_onset date_admission   outcome
#> 1  1   Afeef al-Saber suspected   m  54 2023-01-01           <NA> recovered
#> 2  3  Duncan Kleinman suspected   m  39 2023-01-02     2023-01-03 recovered
#> 3  4 Montica Gallegos  probable   f  37 2023-01-01           <NA> recovered
#> 4  5        Son Ahuna suspected   m  25 2023-01-03           <NA> recovered
#> 5  6  Alyssa Gonzalez confirmed   f  10 2023-01-02           <NA> recovered
#> 6  7    Ashley Palomo confirmed   f  13 2023-01-02           <NA> recovered
#>   date_outcome date_first_contact date_last_contact ct_value
#> 1         <NA>               <NA>              <NA>       NA
#> 2         <NA>         2022-12-27        2023-01-01       NA
#> 3         <NA>         2022-12-31        2023-01-06       NA
#> 4         <NA>         2023-01-02        2023-01-06       NA
#> 5         <NA>         2023-01-02        2023-01-06     25.8
#> 6         <NA>         2023-01-02        2023-01-03     27.3