Prepare common epidemiological data formats for CFR estimation
Source:R/prepare_data.R
prepare_data.Rd
This S3 generic has methods for classes commonly used for epidemiological data.
Currently, the only supported data format is <incidence2>
from the
incidence2 package. See incidence2::incidence()
. Grouped
<incidence2>
data are supported, see Details.
Usage
prepare_data(data, ...)
# S3 method for class 'incidence2'
prepare_data(
data,
cases_variable = "cases",
deaths_variable = "deaths",
fill_NA = TRUE,
...
)
Arguments
- data
A
<data.frame>
-like object. Currently, only<incidence2>
objects are supported. These may be grouped.- ...
Currently unused. Passing extra arguments will throw a warning.
- cases_variable
A string for the name of the cases variable in the "count_variable" column of
data
.- deaths_variable
A string for the name of the deaths variable in the "count_variable" column of
data
.- fill_NA
A logical indicating whether
NA
s in the cases and deaths data should be replaced by 0s. The default value isTRUE
, with a message to make users aware of the replacement.
Value
A <data.frame>
suitable for disease severity estimation functions
provided in cfr, with the columns "date", "cases", and "deaths".
Additionally, for grouped <incidence2>
data, columns representing the
grouping variables will also be present.
The result has a continuous sequence of dates between the start and end date
of data
; this is required if the data is to be passed to functions such as
cfr_static()
.
Details
The method for <incidence2>
data can replace NA
s in the case and death
data with 0s using the fill_NA
argument, which is TRUE
by default,
meaning that NA
s are replaced.
Keeping NA
s will cause downstream issues when calling functions such as
cfr_static()
on the data, as they cannot handle NA
s.
Setting fill_NA = TRUE
resolves this issue.
Passing a grouped <incidence2>
object to data
will result in the function
respecting the grouping and returning grouping variables in separate columns.
Examples
#### For <incidence2> data ####
# load Covid-19 data from incidence2
covid_uk <- incidence2::covidregionaldataUK
# convert to incidence2 object
covid_uk_incidence <- incidence2::incidence(
covid_uk,
date_index = "date",
counts = c("cases_new", "deaths_new"),
count_names_to = "count_variable"
)
#> Warning: `cases_new` contains NA values. Consider imputing these and calling `incidence()` again.
# View tail of prepared data
data <- prepare_data(
covid_uk_incidence,
cases_variable = "cases_new",
deaths_variable = "deaths_new"
)
#> NAs in cases and deaths are being replaced with 0s: Set `fill_NA = FALSE` to prevent this.
tail(data)
#> date cases deaths
#> 485 2021-05-28 6205 6
#> 486 2021-05-29 5146 5
#> 487 2021-05-30 5395 8
#> 488 2021-05-31 6251 6
#> 489 2021-06-01 3346 4
#> 490 2021-06-02 0 0
#### For grouped <incidence2> data ####
# convert data to incidence2 object grouped by region
covid_uk_incidence <- incidence2::incidence(
covid_uk,
date_index = "date",
counts = c("cases_new", "deaths_new"),
count_names_to = "count_variable",
groups = "region"
)
#> Warning: `cases_new` contains NA values. Consider imputing these and calling `incidence()` again.
# View tail of prepared data
data <- prepare_data(
covid_uk_incidence,
cases_variable = "cases_new",
deaths_variable = "deaths_new"
)
#> NAs in cases and deaths are being replaced with 0s: Set `fill_NA = FALSE` to prevent this.
tail(data)
#> date region cases deaths
#> 6365 2021-06-02 Scotland 0 0
#> 6366 2021-06-02 South East 0 0
#> 6367 2021-06-02 South West 0 0
#> 6368 2021-06-02 Wales 0 0
#> 6369 2021-06-02 West Midlands 0 0
#> 6370 2021-06-02 Yorkshire and The Humber 0 0