Read case data

Last updated on 2024-04-29 | Edit this page

Overview

Questions

  • Where do you usually store your outbreak data?
  • How many different data formats can I read?
  • Is it possible to import data from database and health APIs?

Objectives

  • Explain how to import outbreak data from different sources into R environment for analysis.

Prerequisites

This episode requires you to be familiar with:

Data science : Basic programming with R.

Introduction


The initial step in outbreak analysis involves importing the target dataset into the R environment from various sources. Outbreak data is typically stored in files of diverse formats, relational database management systems (RDBMS), or health information system (HIS) application program interfaces (APIs) such as REDCap, DHIS2, etc. The latter option is particularly well-suited for storing institutional health data. This episode will elucidate the process of reading cases from these sources.

Reading from files


Several packages are available for importing outbreak data stored in individual files into R. These include rio, readr from the tidyverse, io, ImportExport, data.table. Together, these packages offer methods to read single or multiple files in a wide range of formats.

The below example shows how to import a csv file into R environment using rio package.

R

library("rio")
library("here")

# read data
# e.g.: if path to file is data/raw-data/ebola_cases.csv then:
ebola_confirmed <- read_csv(here::here("data", "raw-data", "ebola_cases.csv"))

# preview data
head(ebola_confirmed, 5)

OUTPUT

        date confirm
1 2014-05-18       1
2 2014-05-20       2
3 2014-05-21       4
4 2014-05-22       6
5 2014-05-23       1

Similarly, you can import files of other formats such as tsv, xlsx, etc.

Reading compressed data

Take 1 minute: - Is it possible to read compressed data in R?

You can check the full list of supported file formats in the rio package on the package website. Here is a selection of some key ones:

R

rio::install_formats()

R

rio::import(here::here("some", "where", "downto", "path", "file_name.zip"))

Click here to download a zip file containing data for Marburg outbreak and then import it to your working environment.

Reading from databases


The DBI package serves as a versatile interface for interacting with database management systems (DBMS) across different back-ends or servers. It offers a uniform method for accessing and retrieving data from various database systems.

The following code chunk demonstrates how to create a temporary SQLite database in memory, store the case_data as a table within it, and subsequently read from it:

R

library("DBI")
library("RSQLite")

# Create a temporary SQLite database in memory
db_con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")

# Store the 'case_data' dataframe as a table named 'cases'
# in the SQLite database
DBI::dbWriteTable(db_con, "cases", case_data)
# Read data from the 'cases' table
result <- DBI::dbReadTable(db_con, "cases")
# Close the database connection
DBI::dbDisconnect(db_con)
# View the result
base::print(utils::head(result))

OUTPUT

   date confirm
1 16208       1
2 16210       2
3 16211       4
4 16212       6
5 16213       1
6 16214       2

This code first establishes a connection to an SQLite database created in memory using dbConnect(). Then, it writes the case_data into a table named ‘cases’ within the database using the dbWriteTable() function. Subsequently, it reads the data from the ‘cases’ table using dbReadTable(). Finally, it closes the database connection with dbDisconnect(). Read this tutorial episode on SQL databases and R for more examples.

Run SQL queries in R using dbplyr

A database interface package optimize memory usage by processing the database before extraction, reducing memory load. Conversely, conducting all data manipulation outside the database (e.g., in our local Rstudio session) can lead to inefficient memory usage and strained system resources.

Read the Introduction to dbplyr vignette to learn how to generate your own queries!

Reading from HIS APIs


Health related data are also increasingly stored in specialized HIS APIs like Fingertips, GoData, REDCap, and DHIS2. In such case one can resort to readepi package, which enables reading data from HIS-APIs.
-[TBC]

Key Points

  • Use {rio}, {io}, {readr} and {ImportExport} to read data from individual files.
  • Use {readepi} to read data form HIS APIs and RDBMS.