Prepare common epidemiological data formats for CFR estimation

This S3 generic has methods for classes commonly used for epidemiological data.

Currently, the only supported data format is <incidence2> from the incidence2 package. See incidence2::incidence(). Grouped <incidence2> data are supported, see Details.

Usage

prepare_data(data, ...)

# S3 method for class 'incidence2'
prepare_data(
  data,
  cases_variable = "cases",
  deaths_variable = "deaths",
  fill_NA = TRUE,
  ...
)

Arguments

data: A <data.frame>-like object. Currently, only <incidence2> objects are supported. These may be grouped.
...: Currently unused. Passing extra arguments will throw a warning.
cases_variable: A string for the name of the cases variable in the "count_variable" column of data.
deaths_variable: A string for the name of the deaths variable in the "count_variable" column of data.
fill_NA: A logical indicating whether NAs in the cases and deaths data should be replaced by 0s. The default value is TRUE, with a message to make users aware of the replacement.

Value

A <data.frame> suitable for disease severity estimation functions provided in cfr, with the columns "date", "cases", and "deaths".

Additionally, for grouped <incidence2> data, columns representing the grouping variables will also be present.

The result has a continuous sequence of dates between the start and end date of data; this is required if the data is to be passed to functions such as cfr_static().

Details

The method for <incidence2> data can replace NAs in the case and death data with 0s using the fill_NA argument, which is TRUE by default, meaning that NAs are replaced.

Keeping NAs will cause downstream issues when calling functions such as cfr_static() on the data, as they cannot handle NAs. Setting fill_NA = TRUE resolves this issue.

Passing a grouped <incidence2> object to data will result in the function respecting the grouping and returning grouping variables in separate columns.

Examples

#### For <incidence2> data ####
# load Covid-19 data from incidence2
covid_uk <- incidence2::covidregionaldataUK

# convert to incidence2 object
covid_uk_incidence <- incidence2::incidence(
  covid_uk,
  date_index = "date",
  counts = c("cases_new", "deaths_new"),
  count_names_to = "count_variable"
)
#> Warning: `cases_new` contains NA values. Consider imputing these and calling `incidence()` again.

# View tail of prepared data
data <- prepare_data(
  covid_uk_incidence,
  cases_variable = "cases_new",
  deaths_variable = "deaths_new"
)
#> NAs in cases and deaths are being replaced with 0s: Set `fill_NA = FALSE` to prevent this.

tail(data)
#>           date cases deaths
#> 485 2021-05-28  6205      6
#> 486 2021-05-29  5146      5
#> 487 2021-05-30  5395      8
#> 488 2021-05-31  6251      6
#> 489 2021-06-01  3346      4
#> 490 2021-06-02     0      0

#### For grouped <incidence2> data ####
# convert data to incidence2 object grouped by region
covid_uk_incidence <- incidence2::incidence(
  covid_uk,
  date_index = "date",
  counts = c("cases_new", "deaths_new"),
  count_names_to = "count_variable",
  groups = "region"
)
#> Warning: `cases_new` contains NA values. Consider imputing these and calling `incidence()` again.

# View tail of prepared data
data <- prepare_data(
  covid_uk_incidence,
  cases_variable = "cases_new",
  deaths_variable = "deaths_new"
)
#> NAs in cases and deaths are being replaced with 0s: Set `fill_NA = FALSE` to prevent this.

tail(data)
#>            date                   region cases deaths
#> 6365 2021-06-02                 Scotland     0      0
#> 6366 2021-06-02               South East     0      0
#> 6367 2021-06-02               South West     0      0
#> 6368 2021-06-02                    Wales     0      0
#> 6369 2021-06-02            West Midlands     0      0
#> 6370 2021-06-02 Yorkshire and The Humber     0      0