Skip to contents

Check whether the subject IDs comply with the expected format. When incorrect IDs are found, the function sends a warning and the user can call the correct_subject_ids function to correct them.

Usage

check_subject_ids(
  data,
  target_columns,
  prefix = NULL,
  suffix = NULL,
  range = NULL,
  nchar = NULL
)

Arguments

data

The input <data.frame> or <linelist>

target_columns

A <vector> of column names with the subject ids.

prefix

A <character> with the expected prefix used in the subject IDs

suffix

A <character> with the expected suffix used in the subject IDs

range

A <vector> with the expected range of numbers in the subject IDs

nchar

An <integer> that represents the expected number of characters in the subject ids.

Value

The input dataset with a warning if incorrect subject ids were found

Examples

data <- readRDS(
  system.file("extdata", "test_df.RDS", package = "cleanepi")
)

# make first and last subject IDs the same
data$study_id[10] <- data$study_id[1]

# set subject ID number 9 to NA
data$study_id[9] <- NA

# detect the incorrect subject ids i.e. IDs that do not have any or both of
# the followings:
# - starts with 'PS',
# - ends with 'P2',
# - has a number within 1 and 100,
# - contains 7 characters.
dat <- check_subject_ids(
  data = data,
  target_columns = "study_id",
  prefix = "PS",
  suffix = "P2",
  range = c(1, 100),
  nchar = 7
)
#> ! Detected 1 missing, 2 duplicated, and 2 incorrect subject IDs.
#>  Enter `print_report(data = dat, "incorrect_subject_id")` to access them,
#>   where "dat" is the object used to store the output from this operation.
#>  You can use the `correct_subject_ids()` function to correct them.

# display rows with invalid subject ids
print_report(dat, "incorrect_subject_id")
#> $idx_missing_ids
#> [1] "9"
#> 
#> $duplicated_ids
#> # A tibble: 2 × 3
#> # Groups:   study_id [1]
#>   row_id group_id study_id
#>    <int>    <int> <chr>   
#> 1      1        1 PS001P2 
#> 2     10        1 PS001P2 
#> 
#> $invalid_subject_ids
#>   idx       ids
#> 1   3 PS004P2-1
#> 2   5   P0005P2
#> 3   7   PB500P2
#>