Skip to contents

After detecting incorrect subject IDs from the check_subject_ids() function, use this function to provide the correct IDs and perform the substitution.

Usage

correct_subject_ids(data, target_columns, correction_table)

Arguments

data

The input <data.frame> or <linelist>

target_columns

A <vector> of column names with the subject ids.

correction_table

A <data.frame> with the following two columns:

from

a column with the wrong subject IDs

to

a column with the values to be used to substitute the incorrect ids.

Value

The input dataset where all subject ids comply with the expected format.

Examples

data <- readRDS(
  system.file("extdata", "test_df.RDS", package = "cleanepi")
)
# detect the incorrect subject ids i.e. IDs that do not have any or both of
# the followings:
# - starts with 'PS',
# - ends with 'P2',
# - has a number within 1 and 100,
# - contains 7 characters.
dat <- check_subject_ids(
  data = data,
  target_columns = "study_id",
  prefix = "PS",
  suffix = "P2",
  range = c(1, 100),
  nchar = 7
)
#> ! Detected no missing, no duplicated, and 3 incorrect subject IDs.
#>  Enter `print_report(data = dat, "incorrect_subject_id")` to access them,
#>   where "dat" is the object used to store the output from this operation.
#>  You can use the `correct_subject_ids()` function to correct them.

# display rows with invalid subject ids
print_report(dat, "incorrect_subject_id")
#> $invalid_subject_ids
#>   idx       ids
#> 1   3 PS004P2-1
#> 2   5   P0005P2
#> 3   7   PB500P2
#> 

# generate the correction table
correction_table <- data.frame(
  from = c("P0005P2", "PB500P2", "PS004P2-1"),
  to = c("PB005P2", "PB050P2", "PS004P2")
)

# perform the correction
dat <- correct_subject_ids(
  data = dat,
  target_columns = "study_id",
  correction_table = correction_table
)