Summary and Setup
This is an Epiverse-TRACE tutorial built with The Carpentries Workbench.
Motivation
Outbreaks appear with different diseases and in different contexts, but what all of them have in common is the key public health questions (Cori et al. 2017). We can relate these key public health questions to outbreak data analysis tasks.
Epiverse-TRACE aims to provide a software ecosystem for outbreak analytics with integrated, generalisable and scalable community-driven software. We support the development of R packages, make the existing ones interoperable for the user experience, and stimulate a community of practice.
Epiverse-TRACE tutorials
The tutorials are built around an outbreak analysis pipeline split into three stages: Early tasks, Middle tasks and Late tasks.
Each task has its tutorial website. Each tutorial website consists of a set of episodes.
Early task tutorials ➠ | Middle task tutorials ➠ | Late task tutorials ➠ |
---|---|---|
Read and clean case data, and make linelist | Real-time analysis and forecasting | Scenario modelling |
Read, clean and validate case data, convert linelist data to incidence for visualization. | Access delay distributions and estimate transmission metrics, forecast cases, estimate severity and superspreading. | Simulate disease spread and investigate interventions. |
Each episode contains:
- Overview: describes what questions will be answered and what are the objectives of the episode.
- Prerequisites: describes what episodes/packages need to be covered before the current episode.
- Example R code: work through the episodes on your own computer using the example R code.
- Challenges: complete challenges to test your understanding.
- Explainers: add to your understanding of mathematical and modelling concepts with the explainer boxes.
Also check out the glossary for any terms you may be unfamiliar with.
Epiverse-TRACE R packages
Our strategy is to gradually incorporate specialised R packages into a traditional analysis pipeline. These packages should fill the gaps in these epidemiology-specific tasks in response to outbreaks.
.
Prerequisite
This content assumes intermediate R knowledge. These tutorials are for you if:
- You can read data into R, transform and reshape data, and make a wide variety of graphs
- You are familiar with functions from dplyr, tidyr, and ggplot2
- You can use the magrittr pipe
%>%
and/or native pipe|>
.
We expect learners to have some exposure to basic Statistical, Mathematical and Epidemic theory concepts, but NOT intermediate or expert familiarity with modeling.
R refresher
If you need to refresh your R knowledge to fulfill the prerequisites, we recommend you solve these interactive online self-paced tutorials from Applied Epi on:
- R basics
- Data cleaning
- Data visualization
- Data preparation
Access these tutorials following instructions in https://appliedepi.org/tutorial/!
Software Setup
Follow these two steps:
1. Install or upgrade R and RStudio
R and RStudio are two separate pieces of software:
- R is a programming language and software used to run code written in R.
- RStudio is an integrated development environment (IDE) that makes using R easier. We recommend to use RStudio to interact with R.
To install R and RStudio, follow these instructions https://posit.co/download/rstudio-desktop/.
Already installed?
Hold on: This is a great time to make sure your R installation is current.
This tutorial requires R version 4.0.0 or later.
To check if your R version is up to date:
In RStudio your R version will be printed in the console window. Or run
sessionInfo()
.-
To update R, download and install the latest version from the R project website for your operating system.
After installing a new version, you will have to reinstall all your packages with the new version.
For Windows, the installr package can upgrade your R version and migrate your package library.
To update RStudio, open RStudio and click on
Help > Check for Updates
. If a new version is available follow the instructions on the screen.
Check for Updates regularly
While this may sound scary, it is far more common to run into issues due to using out-of-date versions of R or R packages. Keeping up with the latest versions of R, RStudio, and any packages you regularly use is a good practice.
2. Install the required R packages
Open RStudio and copy and paste the following code chunk into the console window, then press the Enter (Windows and Linux) or Return (MacOS) to execute the command:
R
if(!require("pak")) install.packages("pak")
new_packages <- c(
# for Introduction tutorial
"here",
"tidyverse",
"visdat",
"skimr",
"rmarkdown",
"quarto",
# for Early Task tutorials
"epiverse-trace/cleanepi",
"rio",
"DBI",
"RSQLite",
"dbplyr",
"linelist",
"epiverse-trace/simulist",
"incidence2",
"epiverse-trace/tracetheme",
# for Middle Task tutorials
"EpiNow2",
"epiverse-trace/epiparameter",
"cfr",
"outbreaks",
"epicontacts",
"fitdistrplus",
"epiverse-trace/superspreading",
"epichains",
# for Late task tutorials
"socialmixr",
"epiverse-trace/epidemics",
"scales"
)
pak::pak(new_packages)
These installation steps could ask you
? Do you want to continue (Y/n)
write Y
and
press Enter.
Windows users will need a working installation of Rtools
in order to build the package from source. Rtools
is not an
R package, but a software you need to download and install. We suggest
you to follow:
-
Verify
Rtools
installation. You can do so by using Windows search across your system. Optionally, you can use devtools running:
R
if(!require("devtools")) install.packages("devtools")
devtools::find_rtools()
If the result is FALSE
, then you should do step 2.
Install
Rtools
. Download theRtools
installer from https://cran.r-project.org/bin/windows/Rtools/. Install with default selections.Verify
Rtools
installation. Again, we can use devtools:
R
if(!require("devtools")) install.packages("devtools")
devtools::find_rtools()
For example, if you get an error message when installing
{simulist}
, try this alternative code:
R
# for simulist
install.packages("simulist", repos = c("https://epiverse-trace.r-universe.dev"))
Try using the classical code function to install one package, for example:
R
install.packages("rio")
If the error message keyword include an string like
Personal access token (PAT)
, you may need to set
up your GitHub token.
First, install these R packages:
R
if(!require("pak")) install.packages("pak")
new <- c("gh",
"gitcreds",
"usethis")
pak::pak(new)
Then, follow these three steps to set up your GitHub token (read this step-by-step guide):
R
# Generate a token
usethis::create_github_token()
# Configure your token
gitcreds::gitcreds_set()
# Get a situational report
usethis::git_sitrep()
Try again installing {epiparameter}:
R
if(!require("remotes")) install.packages("remotes")
remotes::install_github("epiverse-trace/epiparameter")
If the error persist, contact us!
You should update all of the packages required for the tutorial, even if you installed them relatively recently. New versions bring improvements and important bug fixes.
When the installation has finished, you can try to load the packages by pasting the following code into the console:
R
# for Introduction tutorial
library(here)
library(tidyverse)
library(visdat)
library(skimr)
library(rmarkdown)
library(quarto)
# for Early Task tutorials
library(cleanepi)
library(rio)
library(DBI)
library(RSQLite)
library(dbplyr)
library(linelist)
library(simulist)
library(incidence2)
library(tracetheme)
# for Middle Task tutorials
library(EpiNow2)
library(epiparameter)
library(cfr)
library(outbreaks)
library(epicontacts)
library(fitdistrplus)
library(superspreading)
library(epichains)
# for Late task tutorials
library(socialmixr)
library(epidemics)
library(scales)
If you do NOT see an error like
there is no package called ‘...’
you are good to go! If you
do, contact us!
3. Setup an RStudio project and folder
We suggest to use RStudio Projects.
Follow these steps
- Create an RStudio Project. If needed, follow this how-to guide on “Hello RStudio Projects” to create a New Project in a New Directory.
-
Create the
data/
folder inside the RStudio project or corresponding directory. Use thedata/
folder to save the data sets to download.
The directory of an RStudio Project named, for example
training
, should look like this:
training/
|__ data/
|__ training.Rproj
RStudio Projects allows you to use relative
file paths with respect to the R
Project, making your
code more portable and less error-prone. Avoids using
setwd()
with absolute paths like
"C:/Users/MyName/WeirdPath/training/data/file.csv"
.
4. Create a GitHub Account
We can use GitHub as a collaboration platform to communicate package issues and engage in community discussions.
Follow all these steps
- Go to https://github.com and follow the “Sign up” link at the top-right of the window.
- Follow the instructions to create an account.
- Verify your email address with GitHub.
5. Watch and Read the pre-training material
Prerequisite
Watch three 5-minute video refreshers on statistical distributions:
- StatQuest with Josh Starmer (2017) The Main Ideas behind Probability Distributions, YouTube. Available at: https://www.youtube.com/watch?v=oI3hZJqXJuc&t
StatQuest with Josh Starmer (2018) Probability is not Likelihood. Find out why!!!, YouTube. Available at: https://www.youtube.com/watch?v=pYxNSUDSFH4
StatQuest with Josh Starmer (2017) Maximum Likelihood, clearly explained!!!, YouTube. Available at: https://www.youtube.com/watch?v=XepXtl9YKwc
Read a two-page paper introduction to Infectious Disease Modelling:
- Bjørnstad ON, Shea K, Krzywinski M, Altman N. Modeling infectious epidemics. Nat Methods. 2020 May;17(5):455-456. doi: 10.1038/s41592-020-0822-z. PMID: 32313223. https://www.nature.com/articles/s41592-020-0822-z
Data sets
Download the data
We will download the data directly from R during the tutorial. However, if you are expecting problems with the network, it may be better to download the data beforehand and store it on your machine.
The data files for the tutorial can be downloaded manually here:
- https://epiverse-trace.github.io/tutorials/data/linelist.csv
- https://epiverse-trace.github.io/tutorials-early/data/Marburg.zip
- https://epiverse-trace.github.io/tutorials-early/data/simulated_ebola_2.csv
- https://epiverse-trace.github.io/tutorials-early/data/linelist-date_of_birth.csv
- https://epiverse-trace.github.io/tutorials-early/data/ebola_cases_2.csv
Your Questions
If you need any assistance installing the software or have any other questions about this tutorial, please send an email to andree.valle-campos@lshtm.ac.uk