Skip to contents

The function iteratively removes constant data until none remain. It records details of the removed constant data as a data frame within the report object.

Usage

remove_constants(data, cutoff = 1)

Arguments

data

The input <data.frame> or <linelist>

cutoff

A <numeric> value specifying the cut-off for removing constant data. The possible values vary between 0 (excluded) and 1 (included). The default is 1 i.e. remove rows and columns with 100% constant data.

Value

The input dataset where the constant data is filtered out based on specified cut-off.

Examples

data <- readRDS(system.file("extdata", "test_df.RDS", package = "cleanepi"))

# introduce an empty column
data$empty_column <- NA
# inject some missing values across some columns
data$study_id[3] = NA_character_
data$date.of.admission[3] = NA_character_
data$date.of.admission[4] = NA_character_
data$dateOfBirth[3] = NA_character_
data$dateOfBirth[4] = NA_character_
data$dateOfBirth[5] = NA_character_

# with cutoff = 1, line 3, 4, and 5 are not removed
cleaned_df <- remove_constants(
  data = data,
  cutoff = 1
)

# drop rows or columns with a percentage of constant values
# equal to or more than 50%
cleaned_df <- remove_constants(
  data = cleaned_df,
  cutoff = 0.5
)

# drop rows or columns with a percentage of constant values
# equal to or more than 25%
cleaned_df <- remove_constants(
  data = cleaned_df,
  cutoff = 0.25
)

# drop rows or columns with a percentage of constant values
# equal to or more than 15%
cleaned_df <- remove_constants(
  data = cleaned_df,
  cutoff = 0.15
)

# check the report to see what has happened
print_report(cleaned_df, "constant_data")
#>   iteration empty_columns empty_rows constant_columns
#> 1         1            NA          3               NA