Check whether the subject IDs comply with the expected format. When incorrect IDs are found, the function sends a warning and the user can call the correct_subject_ids
function to correct them.
Source: R/standardize_subject_ids.R
check_subject_ids.Rd
Check whether the subject IDs comply with the expected format. When incorrect
IDs are found, the function sends a warning and the user can call the
correct_subject_ids
function to correct them.
Usage
check_subject_ids(
data,
target_columns,
prefix = NULL,
suffix = NULL,
range = NULL,
nchar = NULL
)
Arguments
- data
The input
<data.frame>
or<linelist>
- target_columns
A
<vector>
of column names with the subject ids.- prefix
A
<character>
with the expected prefix used in the subject IDs- suffix
A
<character>
with the expected suffix used in the subject IDs- range
A
<vector>
with the expected range of numbers in the subject IDs- nchar
An
<integer>
that represents the expected number of characters in the subject ids.
Examples
data <- readRDS(
system.file("extdata", "test_df.RDS", package = "cleanepi")
)
# make first and last subject IDs the same
data$study_id[10] <- data$study_id[1]
# set subject ID number 9 to NA
data$study_id[9] <- NA
# detect the incorrect subject ids i.e. IDs that do not have any or both of
# the followings:
# - starts with 'PS',
# - ends with 'P2',
# - has a number within 1 and 100,
# - contains 7 characters.
dat <- check_subject_ids(
data = data,
target_columns = "study_id",
prefix = "PS",
suffix = "P2",
range = c(1, 100),
nchar = 7
)
#> ! Detected 1 missing, 2 duplicated, and 2 incorrect subject IDs.
#> ℹ Enter `print_report(data = dat, "incorrect_subject_id")` to access them,
#> where "dat" is the object used to store the output from this operation.
#> ℹ You can use the `correct_subject_ids()` function to correct them.
# display rows with invalid subject ids
print_report(dat, "incorrect_subject_id")
#> $idx_missing_ids
#> [1] "9"
#>
#> $duplicated_ids
#> # A tibble: 2 × 3
#> # Groups: study_id [1]
#> row_id group_id study_id
#> <int> <int> <chr>
#> 1 1 1 PS001P2
#> 2 10 1 PS001P2
#>
#> $invalid_subject_ids
#> idx ids
#> 1 3 PS004P2-1
#> 2 5 P0005P2
#> 3 7 PB500P2
#>