Skip to contents

It is often the case that in an infectious disease outbreak epidemiological parameters are required in order to characterise and model the dynamics of disease transmission and evaluate control strategies. In those scenarios, epidemiological parameters are commonly retrieved from the literature, and there is currently no library of parameters in order to contrast and compare different reported parameters for a range of infectious diseases and pathogens, from different published studies over time, of which some may be meta-analyses.

The {epiparameter} R package is a library of epidemiological parameters, with classes to handle this data and a set of functions to manipulate and use epidemiological parameters and distributions. The package also contains functionality for converting and extracting distribution parameters from summary statistics.

Use case

An outbreak of a known or potentially novel pathogen is detected and key parameters such as delay distributions (e.g. incubation period or serial interval) are required to interpret early data.

{epiparameter} can provide these distributions from a selection of published sources, such as past analysis of the same or similar pathogen, in order to provide relevant epidemiological parameters for new analysis.

This vignette will provide a introduction to the data stored within {epiparameter}, how to read it into R, manipulate the data, and the functions (and methods) implemented in the package to facilitate easy application of parameters into epidemiological pipelines.

The {distributional} package is loaded as some of the {epiparameter} methods use S3 generics from this package.

Code

Working with {epiparameter} data

{epiparameter} introduces two new classes for working with epidemiological parameters in R:

  • <epidist>: Contains the name of the disease, the name of the epidemiological distribution, parameters (if available) and citation information of parameter source, as well as other information.
  • <vb_epidist>: A list of two <epidist> objects for a vector-borne disease. One for the human (intrinsic) distribution, and one for the vector (extrinsic).

Library of epidemiological parameters

First, we will introduce the library, or database, of epidemiological parameters available from {epiparameter}. The library is stored in the package as a JSON file and can be read into R using the epidist_db() function. By default all entries in the library are supplied.

Code
epi_dist_db <- epidist_db()
#> Returning 122 results that match the criteria (99 are parameterised). 
#> Use subset to filter by entry variables or single_epidist to return a single entry. 
#> To retrieve the citation for each use the 'get_citation' function
epi_dist_db
#> List of <epidist> objects
#>   Number of entries in library: 122
#>   Number of studies in library: 47
#>   Number of diseases: 23
#>   Number of delay distributions: 112
#>   Number of offspring distributions: 10

The output is a list of <epidist> objects, where each element in the list corresponds to an entry in the parameter database. Technical note: the reason the object does not look like a regular R list is because it uses a custom print method which will print a summary of the data to the console when there are more than 5 entries to prevent a large list flooding the console. When the number of database entries returned is less than or equal to 5 the print will look like a standard R list.

To see a full list of the diseases and distributions stored in the library use the list_distributions() function. Here we show the first six rows of the output.

Code
head(list_distributions(multi_epidist = epi_dist_db))
#>             disease  epi_distribution prob_distribution       author year
#> 1        Adenovirus incubation period             lnorm Justin L.... 2009
#> 2 Human Coronavirus incubation period             lnorm Justin L.... 2009
#> 3              SARS incubation period             lnorm Justin L.... 2009
#> 4         Influenza incubation period             lnorm Justin L.... 2009
#> 5         Influenza incubation period             lnorm Justin L.... 2009
#> 6         Influenza incubation period             lnorm Justin L.... 2009

list_distributions() can also subset the database supplied to the function.

Code
list_distributions(multi_epidist = epi_dist_db, disease = "Ebola")
#>                disease             epi_distribution prob_distribution
#> 1  Ebola Virus Disease       offspring distribution            nbinom
#> 2  Ebola Virus Disease            incubation period             lnorm
#> 3  Ebola Virus Disease               onset to death             gamma
#> 4  Ebola Virus Disease            incubation period             gamma
#> 5  Ebola Virus Disease            incubation period             gamma
#> 6  Ebola Virus Disease            incubation period             gamma
#> 7  Ebola Virus Disease            incubation period             gamma
#> 8  Ebola Virus Disease              serial interval             gamma
#> 9  Ebola Virus Disease              serial interval             gamma
#> 10 Ebola Virus Disease              serial interval             gamma
#> 11 Ebola Virus Disease              serial interval             gamma
#> 12 Ebola Virus Disease     hospitalisation to death             gamma
#> 13 Ebola Virus Disease hospitalisation to discharge             gamma
#> 14 Ebola Virus Disease        notification to death             gamma
#> 15 Ebola Virus Disease    notification to discharge             gamma
#> 16 Ebola Virus Disease               onset to death             gamma
#> 17 Ebola Virus Disease           onset to discharge             gamma
#>          author year
#> 1  J. O. Ll.... 2005
#> 2  Martin E.... 2011
#> 3  The Ebol.... 2018
#> 4  WHO Ebol.... 2015
#> 5  WHO Ebol.... 2015
#> 6  WHO Ebol.... 2015
#> 7  WHO Ebol.... 2015
#> 8  WHO Ebol.... 2015
#> 9  WHO Ebol.... 2015
#> 10 WHO Ebol.... 2015
#> 11 WHO Ebol.... 2015
#> 12 WHO Ebol.... 2015
#> 13 WHO Ebol.... 2015
#> 14 WHO Ebol.... 2015
#> 15 WHO Ebol.... 2015
#> 16 WHO Ebol.... 2015
#> 17 WHO Ebol.... 2015

More details on the data collation and the library of parameters can be found in the Data Collation and Synthesis Protocol vignette.

Single set of epidemiological parameters

The core data structure introduced in the {epiparameter} package is the <epidist> class. This holds a single set of epidemiological parameters.

An <epidist> object can be:

  1. Pulled from database (epidist_db())
  2. Created manually (using the class constructor function: epidist())
Code
# <epidist> from database

# fetch <epidist> for COVID-19 incubation period from database
# return only a single <epidist>
covid_incubation <- epidist_db(
  disease = "COVID-19",
  epi_dist = "incubation period",
  single_epidist = TRUE
)
#> Using McAloon C, Collins Á, Hunt K, Barber A, Byrne A, Butler F, Casey M,
#> Griffin J, Lane E, McEvoy D, Wall P, Green M, O'Grady L, More S (2020).
#> "Incubation period of COVID-19: a rapid systematic review and
#> meta-analysis of observational research." _BMJ Open_.
#> doi:10.1136/bmjopen-2020-039652
#> <https://doi.org/10.1136/bmjopen-2020-039652>.. 
#> To retrieve the citation use the 'get_citation' function
covid_incubation
#> Disease: COVID-19
#> Pathogen: SARS-CoV-2
#> Epi Distribution: incubation period
#> Study: McAloon C, Collins Á, Hunt K, Barber A, Byrne A, Butler F, Casey M,
#> Griffin J, Lane E, McEvoy D, Wall P, Green M, O'Grady L, More S (2020).
#> "Incubation period of COVID-19: a rapid systematic review and
#> meta-analysis of observational research." _BMJ Open_.
#> doi:10.1136/bmjopen-2020-039652
#> <https://doi.org/10.1136/bmjopen-2020-039652>.
#> Distribution: lnorm
#> Parameters:
#>   meanlog: 1.660
#>   sdlog: 0.480

# <epidist> using constructor function
covid_incubation <- epidist(
  disease = "COVID-19",
  pathogen = "SARS-CoV-2",
  epi_dist = "incubation period",
  prob_distribution = "gamma",
  prob_distribution_params = c(shape = 2, scale = 1),
  summary_stats = create_epidist_summary_stats(mean = 2),
  citation = create_epidist_citation(
    author = person(
      given = list("John", "Amy"),
      family = list("Smith", "Jones")
    ),
    year = 2022,
    title = "COVID Incubation Period",
    journal = "Epi Journal",
    DOI = "10.27861182.x"
  )
)
#> Using Smith J, Jones A (2022). "COVID Incubation Period." _Epi Journal_.
#> doi:10.27861182.x <https://doi.org/10.27861182.x>. 
#> To retrieve the citation use the 'get_citation' function
covid_incubation
#> Disease: COVID-19
#> Pathogen: SARS-CoV-2
#> Epi Distribution: incubation period
#> Study: Smith J, Jones A (2022). "COVID Incubation Period." _Epi Journal_.
#> doi:10.27861182.x <https://doi.org/10.27861182.x>.
#> Distribution: gamma
#> Parameters:
#>   shape: 2.000
#>   scale: 1.000

Not all arguments are specified in the example using the class constructor (epidist()) above, for example the metadata or parameter uncertainty (uncertainty) is not provided. See the help documentation for the epidist() function using ?epidist to see each argument. Also see documentation for <epidist> helper functions, e.g., ?create_epidist_citation().

Manually creating <epidist> objects can be especially useful if new parameter estimates become available but are not yet incorporated into the {epiparameter} library.

As seen in the examples in this vignette, the <epidist> class has a custom printing method which shows the disease, pathogen (if known), the epidemiological distribution, a citation of the study the parameters are from and the probability distribution and parameter of that distribution (if available).

Benefit of <epidist>

By providing a consistent and robust object to store epidemiological parameters, <epidist> objects can be applied in epidemiological pipelines, for example {episoap}. The data contained within the object (e.g. parameter values, pathogen type, etc.) can be modified but the pipeline will continue to operate because the class is unchanged.

The probability distribution (prob_distribution) argument requires the distribution specified in the standard R naming. In some cases these are the same as the distribution’s name, e.g., gamma and weibull. Examples of where the distribution name and R name differ are lognormal and lnorm, negative binomial and nbinom, geometric and geom, and poisson and pois.

Subsetting database

The database can be subset directly by epidist_db(). Here the results can be subset by author. It is recommended to use the family name of the first author instead of the full name. Only the first author will be matched when the entry is from a source with multiple authors.

Code
epidist_db(
  disease = "COVID-19",
  epi_dist = "incubation period",
  author = "Linton"
)
#> Returning 3 results that match the criteria (3 are parameterised). 
#> Use subset to filter by entry variables or single_epidist to return a single entry. 
#> To retrieve the citation for each use the 'get_citation' function
#> [[1]]
#> Disease: COVID-19
#> Pathogen: SARS-CoV-2
#> Epi Distribution: incubation period
#> Study: Linton N, Kobayashi T, Yang Y, Hayashi K, Akhmetzhanov A, Jung S, Yuan
#> B, Kinoshita R, Nishiura H (2020). "Incubation Period and Other
#> Epidemiological Characteristics of 2019 Novel Coronavirus Infections
#> with Right Truncation: A Statistical Analysis of Publicly Available
#> Case Data." _Journal of Clinical Medicine_. doi:10.3390/jcm9020538
#> <https://doi.org/10.3390/jcm9020538>.
#> Distribution: lnorm
#> Parameters:
#>   meanlog: 1.456
#>   sdlog: 0.555
#> 
#> [[2]]
#> Disease: COVID-19
#> Pathogen: SARS-CoV-2
#> Epi Distribution: incubation period
#> Study: Linton N, Kobayashi T, Yang Y, Hayashi K, Akhmetzhanov A, Jung S, Yuan
#> B, Kinoshita R, Nishiura H (2020). "Incubation Period and Other
#> Epidemiological Characteristics of 2019 Novel Coronavirus Infections
#> with Right Truncation: A Statistical Analysis of Publicly Available
#> Case Data." _Journal of Clinical Medicine_. doi:10.3390/jcm9020538
#> <https://doi.org/10.3390/jcm9020538>.
#> Distribution: lnorm
#> Parameters:
#>   meanlog: 1.611
#>   sdlog: 0.472
#> 
#> [[3]]
#> Disease: COVID-19
#> Pathogen: SARS-CoV-2
#> Epi Distribution: incubation period
#> Study: Linton N, Kobayashi T, Yang Y, Hayashi K, Akhmetzhanov A, Jung S, Yuan
#> B, Kinoshita R, Nishiura H (2020). "Incubation Period and Other
#> Epidemiological Characteristics of 2019 Novel Coronavirus Infections
#> with Right Truncation: A Statistical Analysis of Publicly Available
#> Case Data." _Journal of Clinical Medicine_. doi:10.3390/jcm9020538
#> <https://doi.org/10.3390/jcm9020538>.
#> Distribution: lnorm
#> Parameters:
#>   meanlog: 1.525
#>   sdlog: 0.629

The results can be further subset using the subset argument, for example subset = sample_size > 100 will return entries with a sample size greater than 100. See ?epidist_db() for details on how to use this argument to subset which database entries get returned.

Adding library entries and contributing to {epiparameter}

If a set of epidemiological parameter has been inferred and is known to the user but has not yet been incorporated into the {epiparameter} database, these parameters can be manually added to the library.

Code
# wrap <epidist> in list to append to database
new_db <- append(epi_dist_db, list(covid_incubation))

Note that this only adds the parameters to the library in the environment, and does not save to the database file in the package.

The library of epidemiological parameters is a living database, as new studies are published we hope to incorporate these. Due to the large time requirement of searching for and recording parameters in the database we welcome others to add parameters by either making a pull request to the package or adding information to the contributing spreadsheet. These will be incorporated into the database by the package maintainers. See the Data Collation and Synthesis Protocol vignette on information about contributing to the library of epidemiological parameters.

Distribution functions

<epidist> objects store distributions, and mathematical functions of these distribution can easily be extracted directly from them. It is often useful to access the probability density function, cumulative distribution function, quantiles of the distribution, or generate random numbers from the distribution in the <epidist> object. The distribution functions in {epiparameter} allow users to easily use these.

Code
ebola_incubation <- epidist_db(
  disease = "Ebola",
  epi_dist = "incubation period",
  single_epidist = TRUE
)
#> Using WHO Ebola Response Team, Agua-Agum J, Ariyarajah A, Aylward B, Blake I,
#> Brennan R, Cori A, Donnelly C, Dorigatti I, Dye C, Eckmanns T, Ferguson
#> N, Formenty P, Fraser C, Garcia E, Garske T, Hinsley W, Holmes D,
#> Hugonnet S, Iyengar S, Jombart T, Krishnan R, Meijers S, Mills H,
#> Mohamed Y, Nedjati-Gilani G, Newton E, Nouvellet P, Pelletier L,
#> Perkins D, Riley S, Sagrado M, Schnitzler J, Schumacher D, Shah A, Van
#> Kerkhove M, Varsaneux O, Kannangarage N (2015). "West African Ebola
#> Epidemic after One Year — Slowing but Not Yet under Control." _The New
#> England Journal of Medicine_. doi:10.1056/NEJMc1414992
#> <https://doi.org/10.1056/NEJMc1414992>.. 
#> To retrieve the citation use the 'get_citation' function

density(ebola_incubation, at = 0.5)
#> [1] 0.03608013
cdf(ebola_incubation, q = 0.5)
#> [1] 0.01178094
quantile(ebola_incubation, p = 0.5)
#> [1] 8.224347
generate(ebola_incubation, times = 10)
#>  [1]  0.7464232  1.8696650  6.9982148 13.0108742  2.1266018  5.4595597
#>  [7]  0.4965333  3.7836991  2.4298127  0.3236717

Plotting epidemiological distributions

<epidist> objects can easily be plotted to see the PDF and CDF of distribution.

Code
plot(ebola_incubation)

The default plotting range for time since infection is from zero to ten days. This can be altered by specifying the day_range argument when plotting an <epidist> object.

Code
plot(ebola_incubation, day_range = 1:25)

This plotting function can be useful for visually comparing epidemiological distributions from different publications on the same disease. In addition, plotting the distribution after manually creating an <epidist> help to check that the parameters are sensible and produce the expected distribution.

Parameter conversion and extraction

Conversion

Parameters are often reported in the literature as mean and standard deviation (or variance). These summary statistics can often be (analytically) converted to the parameters of the distribution using the conversion function in the package (convert_summary_stats_to_params()). We also provide conversion functions in the opposite direction, parameters to summary statistics (convert_params_to_summary_stats()).

Extraction

The functions extract_param() handles all the extraction of parameter estimates from summary statistics. The two extractions currently supported in {epiparameter} are from percentiles and from median and range.