Handling data from {incidence2} • cfr

This vignette shows how to prepare <incidence2> objects from the incidence2 package for use with cfr, using the prepare_data() method for the <incidence2> class. If detailed individual-level data are available that include deaths and recoveries, then alternative methods for severity estimation could be used (e.g. directly calculating CFR from the subset cases with a known death outcome). However, there may be situations where only deaths are recorded, in which case the methods described here would provide an option for CFR calculation.

We first load the libraries we require, including cfr, incidence2, outbreaks for linelist data from a simulated ebola outbreak.

library(cfr)

# load {incidence2}
library(incidence2)
#> Loading required package: grates

Aggregated case data such as the Covid-19 dataset provided by incidence2 can be converted into an <incidence2> object using incidence2::incidence(), and then handled by prepare_data().

# get data bundled with the {incidence2} package
covid_uk <- covidregionaldataUK

# view the data
head(covid_uk)
#>         date          region region_code cases_new cases_total deaths_new
#> 1 2020-01-30   East Midlands   E12000004        NA          NA         NA
#> 2 2020-01-30 East of England   E12000006        NA          NA         NA
#> 3 2020-01-30         England   E92000001         2           2         NA
#> 4 2020-01-30          London   E12000007        NA          NA         NA
#> 5 2020-01-30      North East   E12000001        NA          NA         NA
#> 6 2020-01-30      North West   E12000002        NA          NA         NA
#>   deaths_total recovered_new recovered_total hosp_new hosp_total tested_new
#> 1           NA            NA              NA       NA         NA         NA
#> 2           NA            NA              NA       NA         NA         NA
#> 3           NA            NA              NA       NA         NA         NA
#> 4           NA            NA              NA       NA         NA         NA
#> 5           NA            NA              NA       NA         NA         NA
#> 6           NA            NA              NA       NA         NA         NA
#>   tested_total
#> 1           NA
#> 2           NA
#> 3           NA
#> 4           NA
#> 5           NA
#> 6           NA

Note that the grouping structure of this dataset given by the “region” variable is present in the <incidence2> object. prepare_data() respects grouping structure when present, and returns a dataset with one additional column for each grouping variable.

# convert to incidence2 object
covid_uk_incidence <- incidence(
  covid_uk,
  date_index = "date",
  groups = "region",
  counts = c("cases_new", "deaths_new"),
  count_names_to = "count_variable"
)
#> Warning in incidence(): `cases_new` contains NA values. Consider imputing these
#> and calling `incidence()` again.

# View head of prepared data with NAs retained
# Note that this will cause issues with CFR functions such as cfr_static()
head(
  prepare_data(
    covid_uk_incidence,
    cases_variable = "cases_new",
    deaths_variable = "deaths_new"
  )
)
#> NAs in cases and deaths are being replaced with 0s: Set `fill_NA = FALSE` to prevent this.
#>         date          region cases deaths
#> 1 2020-01-30   East Midlands     0      0
#> 2 2020-01-30 East of England     0      0
#> 3 2020-01-30         England     2      0
#> 4 2020-01-30          London     0      0
#> 5 2020-01-30      North East     0      0
#> 6 2020-01-30      North West     0      0

In this example, the “region” column is added to the data, allowing for disease severity to be calculated separately for each region if needed.

Users who wish to override grouping variables in their data are advised to do this when converting their data into an <incidence2> object, and to be aware of how incidence2 aggregates case and death counts, including how it deals with NAs; see incidence2::incidence() for more details.

Users who prepare data while maintaining grouping structure should take care to apply cfr_*() to their data by group, as cfr_*() functions cannot currently handle grouped data.