Summary and Setup

This is an Epiverse-TRACE tutorial built with The Carpentries Workbench.

Motivation


Outbreaks appear with different diseases and in different contexts, but what all of them have in common is the key public health questions (Cori et al. 2017). We can relate these key public health questions to outbreak data analysis tasks.

Epiverse-TRACE aims to provide a software ecosystem for outbreak analytics with integrated, generalisable and scalable community-driven software. We support the development of R packages, make the existing ones interoperable for the user experience, and stimulate a community of practice.

Epiverse-TRACE tutorials

The tutorials are built around an outbreak analysis pipeline split into three stages: Early tasks, Middle tasks and Late tasks.

Outbreak analysis pipeline

An overview of the tutorial topics

Each task has its tutorial website. Each tutorial website consists of a set of episodes.

Early task tutorials ➠ Middle task tutorials ➠ Late task tutorials ➠
Read and clean case data, and make linelist Real-time analysis and forecasting Scenario modelling
Read, clean and validate case data, convert linelist data to incidence for visualization. Access delay distributions and estimate transmission metrics, forecast cases, estimate severity and superspreading. Simulate disease spread and investigate interventions.

Each episode contains:

  • Overview: describes what questions will be answered and what are the objectives of the episode.
  • Prerequisites: describes what episodes/packages need to be covered before the current episode.
  • Example R code: work through the episodes on your own computer using the example R code.
  • Challenges: complete challenges to test your understanding.
  • Explainers: add to your understanding of mathematical and modelling concepts with the explainer boxes.

Also check out the glossary for any terms you may be unfamiliar with.

Epiverse-TRACE R packages

Our strategy is to gradually incorporate specialised R packages into a traditional analysis pipeline. These packages should fill the gaps in these epidemiology-specific tasks in response to outbreaks.

I.

Outbreak analysis R packages

In R, the fundamental unit of shareable code is the package. A package bundles together code, data, documentation, and tests and is easy to share with others (Wickham and Bryan, 2023)

Prerequisite

This content assumes intermediate R knowledge. These tutorials are for you if:

  • You can read data into R, transform and reshape data, and make a wide variety of graphs
  • You are familiar with functions from dplyr, tidyr, and ggplot2
  • You can use the magrittr pipe %>% and/or native pipe |>.

We expect learners to have some exposure to basic Statistical, Mathematical and Epidemic theory concepts, but NOT intermediate or expert familiarity with modeling.

R refresher

If you need to refresh your R knowledge to fulfill the prerequisites, we recommend you solve these interactive online self-paced tutorials from Applied Epi on:

  • R basics
  • Data cleaning
  • Data visualization
  • Data preparation

Access these tutorials following instructions in https://appliedepi.org/tutorial/!

Software Setup


Follow these two steps:

1. Install or upgrade R and RStudio

R and RStudio are two separate pieces of software:

  • R is a programming language and software used to run code written in R.
  • RStudio is an integrated development environment (IDE) that makes using R easier. We recommend to use RStudio to interact with R.

To install R and RStudio, follow these instructions https://posit.co/download/rstudio-desktop/.

Already installed?

Hold on: This is a great time to make sure your R installation is current.

This tutorial requires R version 4.0.0 or later.

To check if your R version is up to date:

  • In RStudio your R version will be printed in the console window. Or run sessionInfo().

  • To update R, download and install the latest version from the R project website for your operating system.

    • After installing a new version, you will have to reinstall all your packages with the new version.

    • For Windows, the installr package can upgrade your R version and migrate your package library.

  • To update RStudio, open RStudio and click on Help > Check for Updates. If a new version is available follow the instructions on the screen.

Check for Updates regularly

While this may sound scary, it is far more common to run into issues due to using out-of-date versions of R or R packages. Keeping up with the latest versions of R, RStudio, and any packages you regularly use is a good practice.

2. Install the required R packages

Open RStudio and copy and paste the following code chunk into the console window, then press the Enter (Windows and Linux) or Return (MacOS) to execute the command:

R

if(!require("pak")) install.packages("pak")

new_packages <- c(
  # for Introduction tutorial
  "here",
  "tidyverse",
  "visdat",
  "skimr",
  "rmarkdown",
  "quarto",
  # for Early Task tutorials
  "epiverse-trace/cleanepi",
  "rio",
  "DBI",
  "RSQLite",
  "dbplyr",
  "linelist",
  "epiverse-trace/simulist",
  "incidence2",
  "epiverse-trace/tracetheme",
  # for Middle Task tutorials
  "EpiNow2",
  "epiverse-trace/epiparameter",
  "cfr",
  "outbreaks",
  "epicontacts",
  "fitdistrplus",
  "epiverse-trace/superspreading",
  "epichains",
  # for Late task tutorials
  "socialmixr",
  "epiverse-trace/epidemics",
  "scales"
)

pak::pak(new_packages)

These installation steps could ask you ? Do you want to continue (Y/n) write Y and press Enter.

Windows users will need a working installation of Rtools in order to build the package from source. Rtools is not an R package, but a software you need to download and install. We suggest you to follow:

  1. Verify Rtools installation. You can do so by using Windows search across your system. Optionally, you can use devtools running:

R

if(!require("devtools")) install.packages("devtools")
devtools::find_rtools()

If the result is FALSE, then you should do step 2.

  1. Install Rtools. Download the Rtools installer from https://cran.r-project.org/bin/windows/Rtools/. Install with default selections.

  2. Verify Rtools installation. Again, we can use devtools:

R

if(!require("devtools")) install.packages("devtools")
devtools::find_rtools()

For example, if you get an error message when installing simulist, try this alternative code:

R

# for simulist
install.packages("simulist", repos = c("https://epiverse-trace.r-universe.dev"))

Try using the classical code function to install one package, for example:

R

install.packages("rio")

If the error message keyword include an string like Personal access token (PAT), you may need to set up your GitHub token.

First, install these R packages:

R

if(!require("pak")) install.packages("pak")

new <- c("gh",
         "gitcreds",
         "usethis")

pak::pak(new)

Then, follow these three steps to set up your GitHub token (read this step-by-step guide):

R

# Generate a token
usethis::create_github_token()

# Configure your token 
gitcreds::gitcreds_set()

# Get a situational report
usethis::git_sitrep()

Try again installing {epiparameter}:

R

if(!require("remotes")) install.packages("remotes")
remotes::install_github("epiverse-trace/epiparameter")

If the error persist, contact us!

You should update all of the packages required for the tutorial, even if you installed them relatively recently. New versions bring improvements and important bug fixes.

When the installation has finished, you can try to load the packages by pasting the following code into the console:

R

# for Introduction tutorial
library(here)
library(tidyverse)
library(visdat)
library(skimr)
library(rmarkdown)
library(quarto)
# for Early Task tutorials
library(cleanepi)
library(rio)
library(DBI)
library(RSQLite)
library(dbplyr)
library(linelist)
library(simulist)
library(incidence2)
library(tracetheme)
# for Middle Task tutorials
library(EpiNow2)
library(epiparameter)
library(cfr)
library(outbreaks)
library(epicontacts)
library(fitdistrplus)
library(superspreading)
library(epichains)
# for Late task tutorials
library(socialmixr)
library(epidemics)
library(scales)

If you do NOT see an error like there is no package called ‘...’ you are good to go! If you do, contact us!

3. Setup an RStudio project and folder

We suggest to use RStudio Projects.

Follow these steps

  • Create an RStudio Project. If needed, follow this how-to guide on “Hello RStudio Projects” to create a New Project in a New Directory.
  • Create the data/ folder inside the RStudio project or corresponding directory. Use the data/ folder to save the data sets to download.

The directory of an RStudio Project named, for example training, should look like this:

training/
|__ data/
|__ training.Rproj

RStudio Projects allows you to use relative file paths with respect to the R Project, making your code more portable and less error-prone. Avoids using setwd() with absolute paths like "C:/Users/MyName/WeirdPath/training/data/file.csv".

4. Create a GitHub Account

We can use GitHub as a collaboration platform to communicate package issues and engage in community discussions.

Follow all these steps

  1. Go to https://github.com and follow the “Sign up” link at the top-right of the window.
  2. Follow the instructions to create an account.
  3. Verify your email address with GitHub.

5. Watch and Read the pre-training material

Prerequisite

Watch three 5-minute video refreshers on statistical distributions:

Read a two-page paper introduction to Infectious Disease Modelling:

Data sets


Download the data

We will download the data directly from R during the tutorial. However, if you are expecting problems with the network, it may be better to download the data beforehand and store it on your machine.

The data files for the tutorial can be downloaded manually here:

Your Questions


If you need any assistance installing the software or have any other questions about this tutorial, please send an email to