Perform dictionary-based cleaning
Arguments
- data
The input data frame or linelist
- dictionary
A data frame with the dictionary associated with the input data. This is expected to be compatible with the matchmaker package and must contain the following four columns:
options
This column contains the current values used to represent the different groups in the input data frame (required).
values
The values that will be used to replace the current options (required).
grp
The name of the columns where every option belongs to (required).
orders
This defines the user-defined order of different options (optional).
Value
A data frame where the target options have been replaced with their corresponding values in the columns specified in the data dictionary.
Examples
data <- readRDS(
system.file("extdata", "messy_data.RDS", package = "cleanepi")
)
dictionary <- readRDS(
system.file("extdata", "test_dict.RDS", package = "cleanepi")
)
# adding an option that is not defined in the dictionary to the 'gender'
# column
data$gender[2] <- "homme"
cleaned_df <- clean_using_dictionary(
data = data,
dictionary = dictionary
)
#> ! Cannot replace "homme" present in column gender but not defined in the dictionary.
#> ℹ You can either:
#> • correct the misspelled option from the input data, or
#> • add it to the dictionary using the `add_to_dictionary()` function.