Generate report from data cleaning operations

Usage

print_report(
  data,
  what = NULL,
  print = FALSE,
  report_title = "{cleanepi} data cleaning report",
  output_file_name = NULL,
  format = "html"
)

Arguments

data

A <data.frame> or <linelist> object returned from the clean_data or the main functions of each data cleaning module.

what

A <character> with the name of the specific data cleaning report which would be displayed. The possible values are:

incorrect_date_sequence: To display rows with the incorrect date sequences
colnames: To display the column names before and after cleaning
converted_into_numeric: To display the names of the columns that that have been converted into numeric
date_standardization: To display rows in the cleaned data with date values that are outside of the specified time frame, and rows with date values that comply with multiple formats
misspelled_values: To display the detected misspelled values
removed_duplicates: To display the duplicated rows that have been removed
found_duplicates: To display the duplicated rows
constant_data: To display the constant data i.e. constant columns, empty rows and columns
missing_values_replaced_at: To display the names of the columns where the missing value strings have been replaced with NA
incorrect_subject_id: To display the missing, duplicated and invalid subject subject IDs

print

A <logical> that specifies whether to to open the report in your browser in the form of a HTML file or no. Default is FALSE.

report_title

A <character> with the title that will appear on the report

output_file_name

A <character> used to specify the name of the report file, excluding any file extension. If no file name is supplied, one will be automatically generated with the format cleanepi_report_YYMMDD_HHMMSS.

format

A <character> with the file format of the report. Currently only "html" is supported.

Value

A <character> containing the name and path of the saved report

Examples

# \donttest{
data <- readRDS(system.file("extdata", "test_df.RDS", package = "cleanepi"))
test_dictionary <- readRDS(
  system.file("extdata", "test_dictionary.RDS", package = "cleanepi")
)

# scan through the data
scan_res <- scan_data(data)

# Perform data cleaning
cleaned_data <- data %>%
 standardize_column_names(keep = NULL, rename = c("DOB" = "dateOfBirth")) %>%
 replace_missing_values(target_columns = NULL, na_strings = "-99") %>%
 remove_constants(cutoff = 1.0) %>%
 remove_duplicates(target_columns = NULL) %>%
 standardize_dates(
   target_columns = NULL,
   error_tolerance = 0.4,
   format = NULL,
   timeframe = as.Date(c("1973-05-29", "2023-05-29"))
 ) %>%
 check_subject_ids(
   target_columns = "study_id",
   prefix = "PS",
   suffix = "P2",
   range = c(1L, 100L),
   nchar = 7L
 ) %>%
 convert_to_numeric(target_columns = "sex", lang = "en") %>%
 clean_using_dictionary(dictionary = test_dictionary)
#> ℹ No duplicates were found.
#> ! Detected 8 values that comply with multiple formats and no values that are
#>   outside of the specified time frame.
#> ℹ Enter `print_report(data = dat, "date_standardization")` to access them,
#>   where "dat" is the object used to store the output from this operation.
#> ! Detected no missing, no duplicated, and 3 incorrect subject IDs.
#> ℹ Enter `print_report(data = dat, "incorrect_subject_id")` to access them,
#>   where "dat" is the object used to store the output from this operation.
#> ℹ You can use the `correct_subject_ids()` function to correct them.

# add the data scanning result to the report
cleaned_data <- add_to_report(
  x = cleaned_data,
  key = "scanning_result",
  value = scan_res
)

# save a report in the current directory using the previously-created objects
print_report(
  data = cleaned_data,
  report_title = "{cleanepi} data cleaning report",
  output_file_name = NULL,
  format = "html",
  print = TRUE
)
#> ℹ Generating html report in /tmp/Rtmplb3b3D.
#> [1] "/tmp/Rtmplb3b3D/cleanepi_report__2025-07-16Wedt_120613.html"
# }