Base Tools for Storing and Handling Case Line Lists
Source:R/linelist-package.R
linelist-package.Rd
The linelist package provides tools to help storing and handling case line
list data. The linelist
class adds a tagging system to classical
data.frame
or tibble
objects which permits to identify key
epidemiological data such as dates of symptom onset, epi case definition,
age, gender or disease outcome. Once tagged, these variables can be
seamlessly used in downstream analyses, making data pipelines more robust and
reliable.
Main functions
make_linelist()
: to createlinelist
objects from adata.frame
or atibble
, with the possibility to tag key epi variablesset_tags()
: to change or add tagged variables in alinelist
tags()
: to get the list of tags of alinelist
tags_df()
: to get adata.frame
of all tagged variableslost_tags_action()
: to change the behaviour of actions where tagged variables are lost (e.g. removing columns storing tagged variables) to issue warnings, errors, or do nothingget_lost_tags_action()
: to check the current behaviour of actions where tagged variables are lost
Dedicated methods
Specific methods commonly used to handle data.frame
are provided for
linelist
objects, typically to help flag or prevent actions which could
alter or lose tagged variables (and may thus break downstream data
pipelines).
names() <-
(and related functions, such asdplyr::rename()
) will rename tags as neededx[...] <-
andx[[...]] <-
(see sub_linelist): will adopt the desired behaviour when tagged variables are lostprint()
: prints info about thelinelist
in addition to thedata.frame
ortibble
Author
Maintainer: Hugo Gruson hugo@data.org (ORCID)
Authors:
Thibaut Jombart [conceptor]
Other contributors:
Tim Taylor [contributor]
Chris Hartgerink (ORCID) [reviewer]
Examples
if (require(outbreaks)) {
# using base R style
## dataset we'll create a linelist from, only using the first 50 entries
measles_hagelloch_1861[1:50, ]
## create linelist
x <- make_linelist(measles_hagelloch_1861[1:50, ],
id = "case_ID",
date_onset = "date_of_prodrome",
age = "age",
gender = "gender"
)
x
## check tagged variables
tags(x)
## robust renaming
names(x)[1] <- "identifier"
x
## example of dropping tags by mistake - default: warning
x[, 2:5]
## to silence warnings when taggs are dropped
lost_tags_action("none")
x[, 2:5]
## to trigger errors when taggs are dropped
# lost_tags_action("error")
# x[, 2:5]
## reset default behaviour
lost_tags_action()
# using tidyverse style
## example of creating a linelist, adding a new variable, and adding a tag
## for it
if (require(dplyr) && require(magrittr)) {
x <- measles_hagelloch_1861 %>%
tibble() %>%
make_linelist(
id = "case_ID",
date_onset = "date_of_prodrome",
age = "age",
gender = "gender"
) %>%
mutate(result = if_else(is.na(date_of_death), "survived", "died")) %>%
set_tags(outcome = "result") %>%
rename(identifier = case_ID)
head(x)
## extract tagged variables
x %>%
select(has_tag(c("gender", "age")))
x %>%
tags()
x %>%
select(starts_with("date"))
}
}
#> Warning: The following tags have lost their variable:
#> id:identifier, gender:gender, age:age
#> Lost tags will now be ignored.
#> Lost tags will now issue a warning.
#> Loading required package: magrittr
#> Warning: The following tags have lost their variable:
#> id:identifier, date_onset:date_of_prodrome, outcome:result
#> Warning: The following tags have lost their variable:
#> id:identifier, gender:gender, age:age, outcome:result
#>
#> // linelist object
#> # A tibble: 188 × 3
#> date_of_prodrome date_of_rash date_of_death
#> <date> <date> <date>
#> 1 1861-11-21 1861-11-25 NA
#> 2 1861-11-23 1861-11-27 NA
#> 3 1861-11-28 1861-12-02 NA
#> 4 1861-11-27 1861-11-28 NA
#> 5 1861-11-22 1861-11-27 NA
#> 6 1861-11-26 1861-11-29 NA
#> 7 1861-11-24 1861-11-28 NA
#> 8 1861-11-21 1861-11-26 NA
#> 9 1861-11-26 1861-11-30 NA
#> 10 1861-11-21 1861-11-25 NA
#> # ℹ 178 more rows
#>
#> // tags: date_onset:date_of_prodrome