Simulating transmission

Last updated on 2024-11-21 | Edit this page

Estimated time: 75 minutes

Overview

Questions

  • How do I simulate disease spread using a mathematical model?
  • What inputs are needed for a model simulation?
  • How do I account for uncertainty?

Objectives

  • Load an existing model structure from {epidemics} R package
  • Load an existing social contact matrix with socialmixr
  • Generate a disease spread model simulation with {epidemics}
  • Generate multiple model simulations and visualise uncertainty

Prerequisite

Learners should familiarise themselves with following concept dependencies before working through this tutorial:

Mathematical Modelling : Introduction to infectious disease models, state variables, model parameters, initial conditions, differential equations.

Epidemic theory : Transmission, Reproduction number.

Introduction


Mathematical models are useful tools for generating future trajectories of disease spread. In this tutorial, we will use the R package {epidemics} to generate disease trajectories of an influenza strain with pandemic potential. By the end of this tutorial, you will be able to generate the trajectory below showing the number of infectious individuals in different age categories over time.

In this tutorial we are going to learn how to use the {epidemics} package to simulate disease trajectories and access to social contact data with socialmixr. We’ll use dplyr, ggplot2 and the pipe %>% to connect some of their functions, so let’s also call to the tidyverse package:

R

library(epidemics)
library(socialmixr)
library(tidyverse)

By the end of this tutorial, learners should be able to replicate the above image on their own computers.

Simulating disease spread


To simulate infectious disease trajectories, we must first select a mathematical model to use. There is a library of models to choose from in {epidemics}. These are prefixed with model_* and suffixed by the name of infection (e.g. model_ebola for Ebola) or a different identifier (e.g. model_default).

In this tutorial, we will use the default model in {epidemics}, model_default() which is an age-structured model that categorises individuals based on their infection status. For each age group \(i\), individuals are categorized as either susceptible \(S\), infected but not yet infectious \(E\), infectious \(I\) or recovered \(R\). Next, we need to define the process by which individuals flow from one compartment to another. This can be done by defining a set of differential equations that specify how the number of individuals in each compartment changes over time.

The schematic below shows the processes which describe the flow of individuals between the disease states \(S\), \(E\), \(I\) and \(R\) and the key parameters for each process.

Model parameters: rates

In population-level models defined by differential equations, model parameters are often (but not always) specified as rates. The rate at which an event occurs is the inverse of the average time until that event. For example, in the SEIR model, the recovery rate \(\gamma\) is the inverse of the average infectious period.

Values of these rates can be determined from the natural history of the disease. For example, if people are on average infectious for 8 days, then in the model, 1/8 of currently infectious people would recover each day (i.e. the rate of recovery, \(\gamma=1/8=0.125\)).

For each disease state (\(S\), \(E\), \(I\) and \(R\)) and age group (\(i\)), we have a differential equation describing the rate of change with respect to time.

\[ \begin{aligned} \frac{dS_i}{dt} & = - \beta S_i \sum_j C_{i,j} I_j \\ \frac{dE_i}{dt} &= \beta S_i\sum_j C_{i,j} I_j - \alpha E_i \\ \frac{dI_i}{dt} &= \alpha E_i - \gamma I_i \\ \frac{dR_i}{dt} &=\gamma I_i \\ \end{aligned} \] Individuals in age group (\(i\)) move from the susceptible state (\(S_i\)) to the exposed state (\(E_i\)) via age-specific contacts with infectious individuals in all groups \(\beta S_i \sum_j C_{i,j} I_j\). The contact matrix \(C\) allows for heterogeneity in contacts between age groups. They then move to the infectious state at a rate \(\alpha\) and recover at a rate \(\gamma\). There is no loss of immunity (there are no flows out of the recovered state).

The model parameters are:

  • transmission rate \(\beta\),
  • contact matrix \(C\) containing the frequency of contacts between age groups (a square \(i \times j\) matrix),
  • infectiousness rate \(\alpha\) (pre-infectious period, or latent period =\(1/\alpha\)), and
  • recovery rate \(\gamma\) (infectious period = \(1/\gamma\)).

Exposed, infected, infectious

Confusion sometimes arises when referring to the terms ‘exposed’, ‘infected’ and ‘infectious’ in mathematical modelling. Infection occurs after a person has been exposed, but in modelling terms individuals that are ‘exposed’ are treated as already infected.

We will use the following definitions for our state variables:

  • \(E\) = Exposed : infected but not yet infectious,
  • \(I\) = Infectious: infected and infectious.

To generate trajectories using our model, we must prepare the following inputs:

  1. Contact matrix
  2. Initial conditions
  3. Population structure
  4. Model parameters

1. Contact matrix

Contact matrices can be estimated from surveys or contact data, or synthetic ones can be used. We will use the R package socialmixr to load a contact matrix estimated from POLYMOD survey data (Mossong et al. 2008).

Load contact and population data

Using the R package socialmixr, run the following lines of R code to obtain the contact matrix for the United Kingdom for the year age bins:

  • age between 0 and 20 years,
  • age between 20 and 40,
  • 40 years and over.

R

polymod <- socialmixr::polymod
contact_data <- socialmixr::contact_matrix(
  survey = polymod,
  countries = "United Kingdom",
  age.limits = c(0, 20, 40),
  symmetric = TRUE
)
# prepare contact matrix
contact_matrix <- t(contact_data$matrix)
contact_matrix

OUTPUT

                 age.group
contact.age.group   [0,20)  [20,40)      40+
          [0,20)  7.883663 2.794154 1.565665
          [20,40) 3.120220 4.854839 2.624868
          40+     3.063895 4.599893 5.005571

The result is a square matrix with rows and columns for each age group. Contact matrices can be loaded from other sources, but they must be formatted as a matrix to be used in epidemics.

Why would a contact matrix be non-symmetric?

One of the arguments of the function contact_matrix() is symmetric=TRUE. This means that the total number of contacts of age group 1 with age group 2, should be the same as the total number of contacts of age group 2 and age group 1 (see the socialmixr vignette for more detail). However, when contact matrices are estimated from surveys or other sources, the reported number of contacts may differ by age group resulting in a non-symmetric contact matrix (Prem et al 2021).

2. Initial conditions

The initial conditions are the proportion of individuals in each disease state \(S\), \(E\), \(I\) and \(R\) for each age group at time 0. In this example, we have three age groups age between 0 and 20 years, age between 20 and 40 years and over. Let’s assume that in the youngest age category, one in a million individuals are infectious, and the remaining age categories are infection free.

The initial conditions in the first age category are \(S(0)=1-\frac{1}{1,000,000}\), \(E(0) =0\), \(I(0)=\frac{1}{1,000,000}\), \(R(0)=0\). This is specified as a vector as follows:

R

initial_i <- 1e-6
initial_conditions_inf <- c(
  S = 1 - initial_i, E = 0, I = initial_i, R = 0, V = 0
)

For the age categories that are free from infection, the initial conditions are \(S(0)=1\), \(E(0) =0\), \(I(0)=0\), \(R(0)=0\). We specify this as follows,

R

initial_conditions_free <- c(
  S = 1, E = 0, I = 0, R = 0, V = 0
)

We combine the three initial conditions vectors into one matrix,

R

# combine the initial conditions
initial_conditions <- rbind(
  initial_conditions_inf, # age group 1
  initial_conditions_free, # age group 2
  initial_conditions_free # age group 3
)

# use contact matrix to assign age group names
rownames(initial_conditions) <- rownames(contact_matrix)
initial_conditions

OUTPUT

               S E     I R V
[0,20)  0.999999 0 1e-06 0 0
[20,40) 1.000000 0 0e+00 0 0
40+     1.000000 0 0e+00 0 0

3. Population structure

The population object requires a vector containing the demographic structure of the population. The demographic vector must be a named vector containing the number of individuals in each age group of our given population. In this example, we can extract the demographic information from the contact_data object that we obtained using the socialmixr package.

R

demography_vector <- contact_data$demography$population
names(demography_vector) <- rownames(contact_matrix)
demography_vector

OUTPUT

  [0,20)  [20,40)      40+
14799290 16526302 28961159 

To create our population object, from the {epidemics} package we call the function population() specifying a name, the contact matrix, the demography vector and the initial conditions.

R

library(epidemics)

uk_population <- population(
  name = "UK",
  contact_matrix = contact_matrix,
  demography_vector = demography_vector,
  initial_conditions = initial_conditions
)

4. Model parameters

To run our model we need to specify the model parameters:

  • transmission rate \(\beta\),
  • infectiousness rate \(\alpha\) (preinfectious period=\(1/\alpha\)),
  • recovery rate \(\gamma\) (infectious period=\(1/\gamma\)).

In epidemics, we specify the model inputs as :

  • transmission_rate \(\beta = R_0 \gamma\),
  • infectiousness_rate = \(\alpha\),
  • recovery_rate = \(\gamma\),

We will simulate a strain of influenza with pandemic potential with \(R_0=1.46\), with a pre-infectious period of 3 days and infectious period of 7 days. Therefore our inputs will be:

R

# time periods
preinfectious_period <- 3.0
infectious_period <- 7.0
basic_reproduction <- 1.46

R

# rates
infectiousness_rate <- 1.0 / preinfectious_period
recovery_rate <- 1.0 / infectious_period
transmission_rate <- basic_reproduction / infectious_period

The basic reproduction number \(R_0\)

The basic reproduction number, \(R_0\), for the SEIR model is:

\[ R_0 = \frac{\beta}{\gamma}.\]

Therefore, we can rewrite transmission rate \(\beta\) as:

\[ \beta = R_0 \gamma.\]

Running the model


Running (solving) the model

For models that are described by differential equations, ‘running’ the model actually means to take the system of differential equations and ‘solve’ them to find out how the number of people in the underlying compartments change over time. Because differential equations describe the rate of change in the disease states with respect to time, rather than the number of individuals in each of these states, we typically need to use numerical methods to solve the equations.

An ODE solver is the software used to find numerical solutions to differential equations. If interested on how a system of differential equations is solved in {epidemics}, we suggest you to read the section on ODE systems and models at the “Design principles” vignette.

Now we are ready to run our model using model_default() from the {epidemics} package.

Let’s specify time_end=600 to run the model for 600 days.

R

output <- model_default(
  # population
  population = uk_population,
  # rates
  transmission_rate = transmission_rate,
  infectiousness_rate = infectiousness_rate,
  recovery_rate = recovery_rate,
  # time
  time_end = 600, increment = 1.0
)
head(output)

OUTPUT

    time demography_group compartment    value
   <num>           <char>      <char>    <num>
1:     0           [0,20) susceptible 14799275
2:     0          [20,40) susceptible 16526302
3:     0              40+ susceptible 28961159
4:     0           [0,20)     exposed        0
5:     0          [20,40)     exposed        0
6:     0              40+     exposed        0

Note: This model also has the functionality to include vaccination and tracks the number of vaccinated individuals through time. Even though we have not specified any vaccination, there is still a vaccinated compartment in the output (containing no individuals). We will cover the use of vaccination in future tutorials.

Our model output consists of the number of individuals in each compartment in each age group through time. We can visualise the infectious individuals only (those in the \(I\) class) through time.

R

library(tidyverse)

output %>%
  filter(compartment == "infectious") %>%
  ggplot() +
  geom_line(
    aes(
      x = time,
      y = value,
      colour = demography_group
    )
  ) +
  scale_y_continuous(
    labels = scales::comma
  ) +
  theme_bw() +
  labs(
    x = "Simulation time (days)",
    linetype = "Compartment",
    y = "Individuals"
  )

Time increments

Note that there is a default argument of increment = 1. This relates to the time step of the ODE solver. When the parameters are specified on a daily time-scale and maximum number of time steps (time_end) is days, the default time step of the ODE solver one day.

The choice of increment will depend on the time scale of the parameters, and the rate at which events can occur. In general, the increment should be smaller than the fastest event. For example:

  • if parameters are on a daily time scale, and all events are reported on a daily basis, then the increment should be equal to one day;
  • if parameters are on a monthly time scale, but some events will occur within a month, then the increment should be less than one month.

Accounting for uncertainty


The epidemic model is deterministic, which means it runs like clockwork: the same parameters will always lead to the same trajectory. However, reality is not so predictable. There are two main reasons for this: the transmission process can involve randomness, and we may not know the exact epidemiological characteristics of the pathogen we’re interested in. In the next episode, we will consider ‘stochastic’ models (i.e. models where we can define the process that creates randomness in transmission). In the meantime, we can include uncertainty in the value of the parameters that go into the deterministic model. To account for this, we must run our model for different parameter combinations.

We ran our model with \(R_0= 1.5\). However, we believe that \(R_0\) follows a normal distribution with mean 1.5 and standard deviation 0.05. To account for uncertainty we will run the model for different values of \(R_0\). The steps we will follow to do this are:

  1. Obtain 100 samples from the from a normal distribution

R

# get mean mean and sd over time
r_estimate_mean <- 1.5
r_estimate_sd <- 0.05

# Generate 100 R samples
r_samples <- withr::with_seed(
  seed = 1,
  rnorm(
    n = 100, mean = r_estimate_mean, sd = r_estimate_sd
  )
)

infectious_period <- 7
beta <- r_samples / infectious_period
  1. Run the model 100 times with \(R_0\) equal to a different sample each time

R

output_samples <- model_default(
  population = uk_population,
  transmission_rate = beta,
  infectiousness_rate = infectiousness_rate,
  recovery_rate = recovery_rate,
  time_end = 600, increment = 1
)
  1. Calculate the mean and 95% quantiles of number of infectious individuals across each model simulation and visualise output

R

output_samples %>%
  mutate(r_value = r_samples) %>%
  unnest(data) %>%
  filter(compartment == "infectious") %>%
  ggplot() +
  geom_line(
    aes(time, value, color = r_value, group = param_set),
    alpha = 3
  ) +
  scale_color_fermenter(
    palette = "RdBu",
    name = "R"
  ) +
  scale_y_continuous(
    labels = scales::comma
  ) +
  facet_grid(
    cols = vars(demography_group)
  ) +
  theme_bw() +
  labs(
    x = "Simulation time (days)",
    y = "Individuals"
  )

Deciding which parameters to include uncertainty in depends on a few factors: how well informed a parameter value is e.g. consistency of estimates from the literature; how sensitive model outputs are to parameter value changes; and the purpose of the modelling task. See McCabe et al. 2021 to learn about different types of uncertainty in infectious disease modelling.

Summary


In this tutorial, we have learnt how to simulate disease spread using a mathematical model. Once a model has been chosen, the parameters and other inputs must be specified in the correct way to perform model simulations. In the next tutorial, we will consider how to choose the right model for different tasks.

Key Points

  • Disease trajectories can be generated using the R package epidemics
  • Uncertainty should be included in model trajectories using a range of model parameter values