Skip to contents

The linelist package provides tools to help storing and handling case line list data. The linelist class adds a tagging system to classical data.frame or tibble objects which permits to identify key epidemiological data such as dates of symptom onset, epi case definition, age, gender or disease outcome. Once tagged, these variables can be seamlessly used in downstream analyses, making data pipelines more robust and reliable.

Main functions

  • make_linelist(): to create linelist objects from a data.frame or a tibble, with the possibility to tag key epi variables

  • set_tags(): to change or add tagged variables in a linelist

  • tags(): to get the list of tags of a linelist

  • tags_df(): to get a data.frame of all tagged variables

  • lost_tags_action(): to change the behaviour of actions where tagged variables are lost (e.g. removing columns storing tagged variables) to issue warnings, errors, or do nothing

  • get_lost_tags_action(): to check the current behaviour of actions where tagged variables are lost

Dedicated methods

Specific methods commonly used to handle data.frame are provided for linelist objects, typically to help flag or prevent actions which could alter or lose tagged variables (and may thus break downstream data pipelines).

  • names() <- (and related functions, such as dplyr::rename()) will rename tags as needed

  • x[...] <- and x[[...]] <- (see sub_linelist): will adopt the desired behaviour when tagged variables are lost

  • print(): prints info about the linelist in addition to the data.frame or tibble

Author

Maintainer: Hugo Gruson hugo@data.org (ORCID)

Authors:

  • Thibaut Jombart [conceptor]

Other contributors:

  • Tim Taylor [contributor]

  • Chris Hartgerink (ORCID) [reviewer]

Examples


if (require(outbreaks)) {
  # using base R style

  ## dataset we'll create a linelist from, only using the first 50 entries
  measles_hagelloch_1861[1:50, ]

  ## create linelist
  x <- make_linelist(measles_hagelloch_1861[1:50, ],
    id = "case_ID",
    date_onset = "date_of_prodrome",
    age = "age",
    gender = "gender"
  )
  x

  ## check tagged variables
  tags(x)

  ## robust renaming
  names(x)[1] <- "identifier"
  x

  ## example of dropping tags by mistake - default: warning
  x[, 2:5]

  ## to silence warnings when taggs are dropped
  lost_tags_action("none")
  x[, 2:5]

  ## to trigger errors when taggs are dropped
  # lost_tags_action("error")
  # x[, 2:5]

  ## reset default behaviour
  lost_tags_action()


  # using tidyverse style

  ## example of creating a linelist, adding a new variable, and adding a tag
  ## for it

  if (require(dplyr) && require(magrittr)) {
    x <- measles_hagelloch_1861 %>%
      tibble() %>%
      make_linelist(
        id = "case_ID",
        date_onset = "date_of_prodrome",
        age = "age",
        gender = "gender"
      ) %>%
      mutate(result = if_else(is.na(date_of_death), "survived", "died")) %>%
      set_tags(outcome = "result") %>%
      rename(identifier = case_ID)

    head(x)

    ## extract tagged variables
    x %>%
      select(has_tag(c("gender", "age")))

    x %>%
      tags()

    x %>%
      select(starts_with("date"))
  }
}
#> Warning: The following tags have lost their variable:
#>  id:identifier, gender:gender, age:age
#> Lost tags will now be ignored.
#> Lost tags will now issue a warning.
#> Loading required package: magrittr
#> Warning: The following tags have lost their variable:
#>  id:identifier, date_onset:date_of_prodrome, outcome:result
#> Warning: The following tags have lost their variable:
#>  id:identifier, gender:gender, age:age, outcome:result
#> 
#> // linelist object
#> # A tibble: 188 × 3
#>    date_of_prodrome date_of_rash date_of_death
#>    <date>           <date>       <date>       
#>  1 1861-11-21       1861-11-25   NA           
#>  2 1861-11-23       1861-11-27   NA           
#>  3 1861-11-28       1861-12-02   NA           
#>  4 1861-11-27       1861-11-28   NA           
#>  5 1861-11-22       1861-11-27   NA           
#>  6 1861-11-26       1861-11-29   NA           
#>  7 1861-11-24       1861-11-28   NA           
#>  8 1861-11-21       1861-11-26   NA           
#>  9 1861-11-26       1861-11-30   NA           
#> 10 1861-11-21       1861-11-25   NA           
#> # ℹ 178 more rows
#> 
#> // tags: date_onset:date_of_prodrome