Remove constant data i.e. empty rows and columns and constant columns
Source:R/remove_constants.R
remove_constants.Rd
The function iteratively removes the constant data until there are not found anymore. It stores the details about the removed constant data in a form of a data frame within the report object.
Examples
data <- readRDS(system.file("extdata", "test_df.RDS", package = "cleanepi"))
# introduce an empty column
data$empty_column <- NA
# introduce some missing values across some columns
data$study_id[3] = NA_character_
data$date.of.admission[3] = NA_character_
data$date.of.admission[4] = NA_character_
data$dateOfBirth[3] = NA_character_
data$dateOfBirth[4] = NA_character_
data$dateOfBirth[5] = NA_character_
# with cutoff = 1, line 3, 4, and 5 are not removed
test <- cleanepi::remove_constants(
data = data,
cutoff = 1
)
# drop rows or columns with a percentage of constant values
# equal to or more than 50%
test <- cleanepi::remove_constants(
data = test,
cutoff = 0.5
)
# drop rows or columns with a percentage of constant values
# equal to or more than 25%
test <- cleanepi::remove_constants(
data = test,
cutoff = 0.25
)
# drop rows or columns with a percentage of constant values
# equal to or more than 15%
test <- cleanepi::remove_constants(
data = test,
cutoff = 0.15
)
# check the report to see what has happened
report <- attr(test, "report")
report$constant_data
#> iteration empty_columns empty_rows constant_columns
#> 1 1 NA 3 NA