Analysing infectious disease transmission chains with bpmodels
Sebastian Funk, James Azam
Source:vignettes/bpmodels.Rmd
bpmodels.Rmd
bpmodels provides methods to analyse and simulate the size and length of branching processes with an arbitrary offspring distribution. These can be used, for example, to analyse the distribution of chain sizes or length of infectious disease outbreaks, as discussed in Farrington, Kanaan, and Gay (2003) and Blumberg and Lloyd-Smith (2013).
Quick start
Code
chain_ll()
: calculate log-likelihoods
The chain_ll()
function calculates the log-likelihood of a distribution of
chain sizes or lengths given an offspring distribution and its associated
parameters.
For example, if we have observed a distribution of chains of sizes \(1, 1, 4, 7\), we can calculate the log-likelihood of this observed chain by assuming the offspring per generation is Poisson distributed with a mean number (which can be interpreted as the reproduction number, \(R_0\)) of \(0.5\).
To do this, we run
Code
The first argument of chain_ll()
is the chain size (or length, in number of
generations that a chain lasted) distribution to
analyse.
The second argument, offspring
, specifies the offspring
distribution. This is specified as a string of the name of the function used to generate random offspring.
It can be any probability distribution implemented in R
, that is, one that
has a corresponding function for generating random numbers beginning with the
letter r
. In the case of the example above, since random Poisson numbers are
generated in R
using a function called rpois()
, the string to pass to the
offspring
argument is "pois"
.
The third argument, stat
, determines whether to analyse chain sizes
("size"
, the default if this argument is not specified) or lengths
("length"
). Lastly, any named arguments not recognised by chain_ll()
are interpreted as parameters of the corresponding probability distribution,
here lambda = 0.5
as the mean of the Poisson distribution (see the R
help
page for the Poisson distribution for more information).
Imperfect observations
By default, chain_ll()
assumes perfect observation, where obs_prob = 1
(See ?chain_ll
), meaning that all transmission events are observed and
recorded in the data. If observations are imperfect, chain_ll()
provides
the argument, obs_prob
, for specifying the probability of observation.
This probability is used to determine the likelihood of observing the specified
chain sizes or lengths. In the case of imperfect observation, true chain sizes
or lengths are simulated repeatedly (the number of times given by the
nsim_obs
argument), and the likelihood calculated for each of
these simulations.
For example, if the probability of observing each case is obs_prob = 0.30
,
we use
Code
This returns 10
likelihood values (because nsim_obs = 10
), which can be
averaged to come up with an overall likelihood estimate.
To find out about the usage of the chain_ll()
function, you can run ?chain_ll
to access its R
help file.
How chain_ll()
works
If the probability distribution of chain sizes or lengths has an analytical
solution, this will be used. chain_ll()
currently supports the Poisson and
negative binomial size distribution and the Poisson and geometric length
distribution.
If an analytical solution does not exist, simulations are used to approximate
this probability distributions (using a linear approximation to the cumulative
distribution
for unobserved sizes/lengths). In that case, an extra argument nsim_offspring
must be passed to chain_ll()
to specify the number of simulations to be
used for this approximation.
For example, to get offspring drawn from a binomial distribution with
probability prob = 0.5
, we run
chain_sim()
: simulate transmission chains with branching processes
To simulate a branching process, we use the chain_sim()
function. This
function follows the same syntax as chain_ll()
. Note that chain_sim()
is
stochastic, so a seed needs to be set to ensure that the results can be
reproduced.
Below, we are simulating \(5\) chains, assuming the offspring are generated using
a Poisson distribution with mean, lambda = 0.5
. By default, chain_sim()
returns a vector of chain sizes/lengths. If we instead want to return
a tree of infectees and infectors, we need to specify a function for
the serial interval and set tree = TRUE
(see next section).
Code
Simulating trees
To simulate a tree of transmission chains, we specify the serial interval
generation function (serial_interval()
) and set tree = TRUE
as follows:
Code
set.seed(12)
serial_interval <- function(n) {
rlnorm(n, meanlog = 0.58, sdlog = 1.58)
}
chains_df <- chain_sim(
n = 5, offspring = "pois", lambda = 0.5, stat = "length",
infinite = 100, serial = serial_interval, tree = TRUE
)
head(chains_df)
#> n id ancestor generation time
#> 1 1 1 NA 1 0.00000000
#> 2 2 1 NA 1 0.00000000
#> 3 3 1 NA 1 0.00000000
#> 4 4 1 NA 1 0.00000000
#> 5 5 1 NA 1 0.00000000
#> 6 2 2 1 2 0.09968906