Design Principles for {superspreading}
Source:vignettes/design-principles.Rmd
design-principles.Rmd
This vignette outlines the design decisions that have been taken during the development of the {superspreading} R package, and provides some of the reasoning, and possible pros and cons of each decision.
This document is primarily intended to be read by those interested in understanding the code within the package and for potential package contributors.
Scope
The {superspreading} package aims to provide a range of summary
metrics to characterise individual-level variation in disease
transmission and its impact on the growth or decline of an epidemic.
These include calculating the probability an outbreak becomes an
epidemic (probability_epidemic()
), or conversely goes
extinct (probability_extinct()
), the probability an
outbreak can be contained (probability_contain()
), the
proportion of cases in cluster of a given size
(proportion_cluster_size()
), and the proportion of cases
that cause a proportion of transmission
(proportion_transmission()
).
The other aspect of the package is to provide probability density
functions and cumulative distribution functions to compute the
likelihood for distribution models to estimate heterogeneity in
individual-level disease transmission that are not available in R
(i.e. base R). At present we include two models: Poisson-lognormal
(dpoislnorm()
& ppoislnorm()
) and
Poisson-Weibull (dpoisweibull()
&
ppoisweibull()
) distributions.
The package implements a branching process simulation based on bpmodels::chain_sim()
to enable the numerical calculation of the probability of containment
within a outbreak time and outbreak duration threshold. In the future
this function could be removed in favour of using a package implementing
branching process models as a dependency. The package is mostly focused
on analytical functions that are derived from branching process models.
The package provides functions to calculate variation in
individual-level transmission but does not provide functions for
inference, and currently relies on {fitdistrplus} for fitting
models.
Output
Functions with the name probability_*()
return a single
numeric
. Functions with the name
proportion_*()
return a <data.frame>
with as many rows as combinations of input values (see
expand.grid()
). The consistency of simple well-known data
structure makes it easy for users to apply these functions in different
scenarios.
The distribution functions return a vector of numeric
s
of equal length to the input vector. This is the same behaviour as the
base R distribution functions.
Design decisions
proportion_*()
functions return a<data.frame>
with the proportion column(s) containingcharacter
strings, formatted with a percentage sign (%
) by default. It was reasoned that {superspreading} is most likely used either as a stand-alone package, or at the terminus of a epidemiological analysis pipeline, and thus the outputs of {superspreading} functions would not be passed into other functions. For instances where these proportions need to be passed to another calculation or for plotting purposes theformat_prop
argument can be switched toFALSE
and anumeric
column of proportions will be returned.The distribution functions are vectorised (i.e. wrapped in
Vectorize()
). This enables them to be used identically to base R distribution functions.Native interoperability with
<epiparameter>
objects, from the {epiparameter} package is enabled forprobability_*()
andproportion_*()
via theoffspring_dist
argument. This allows user to pass in a single object and the parameters required by the {superspreading} function will be extracted, if these are not available within the<epiparameter>
object the function returns an informative error. Theoffspring_dist
argument is after...
to ensure users specify the argument in full and not accidentally provide data to this argument.Internal functions have a dot (
.
) prefix, exported functions do not.Several functions use constants that are internally defined (e.g.
NSIM
andFINITE_INF
). These are used in several functions to prevent the use of apparently arbitrary magic numbers. Constants are all uppercase to make clear they are internal constants (following MDN and PEP8 styles. These constants should not be exported (i.e. should not appear in theNAMESPACE
) as they should only be used by functions and not package users.
Dependencies
The aim is to restrict the number of dependencies to a minimal required set for ease of maintenance. The current hard dependencies are:
- {stats}
- {checkmate}
{stats} is distributed with the R language so is viewed as a lightweight dependency, that should already be installed on a user’s machine if they have R. {checkmate} is an input checking package widely used across Epiverse-TRACE packages.
Suggested dependencies (not including package documentation ({knitr}, {rmarkdown}), testing ({spelling} and {testthat}), and plotting ({ggplot2})) are: {epiparameter}, used to easily access epidemiological parameters from the package’s library, and {fitdistrplus}, used for model fitting methods.
Contribute
There are no special requirements to contributing to {superspreading}, please follow the package contributing guide.