Skip to contents

Exported functions

cleanepi functions available to end-users

add_to_dictionary()
Add an element to the data dictionary
add_to_report()
Add an element to the report object
check_date_sequence()
Check whether the order of the sequence of date-events is valid
check_subject_ids()
Check whether the subject IDs comply with the expected format. When incorrect IDs are found, the function sends a warning and the user can call the correct_subject_ids() function to correct them.
clean_data()
Clean and standardize data
clean_using_dictionary()
Perform dictionary-based cleaning
common_na_strings
Common strings representing missing values
convert_numeric_to_date()
Convert numeric to date
convert_to_numeric()
Convert columns into numeric
correct_subject_ids()
Correct the wrong subject IDs based on the user-provided values.
find_duplicates()
Identify and return duplicated rows in a data frame or linelist.
print_report()
Generate report from data cleaning operations
remove_constants()
Remove empty rows and columns and constant column
remove_duplicates()
Remove duplicates
replace_missing_values()
Replace missing values with NA
scan_data()
Scan a data frame to determine the percentage of missing, numeric, Date, character, and logical values in every column.
standardize_column_names()
Standardize column names of a data frame or linelist
standardize_dates()
Standardize date variables
timespan()
Calculate time span between dates

Shared helpers

get_target_column_names()
Get the names of the columns from which duplicates will be found
add_to_report()
Add an element to the report object
get_sum()
Get sum of numbers from a string
numbers_only()
Detects whether a string contains only numbers or not.

Clean data

Performs several cleaning operations at once

clean_data()
Clean and standardize data

Check data structure

Scan through the input data to determine its composition

scan_data()
Scan a data frame to determine the percentage of missing, numeric, Date, character, and logical values in every column.
scan_columns()
Calculate the percentage of missing and other data type values in a vector containing different data types such as numeric, Date, character, and logical.
print_report()
Generate report from data cleaning operations

Standardise column names

Harmonise on the usage of English characters in column names

standardize_column_names()
Standardize column names of a data frame or linelist

Retrieve Date from numeric values

convert_numeric_to_date()
Convert numeric to date

Convert numbers written in letters into numeric

convert_to_numeric()
Convert columns into numeric
detect_to_numeric_columns()
Detect the numeric columns that appears as characters due to the presence of some character values in the column.

Standardise dates

Coerce date values to the ISO format Ymd (2024-31-01)

standardize_dates()
Standardize date variables
date_check_column_existence()
Check if date column exists in the given dataset
date_check_timeframe()
Check date time frame
date_choose_first_good()
Choose the first non-missing date from a data frame of dates
date_convert()
Convert characters to dates
date_convert_and_update()
Convert and update the date values
date_detect_complex_format()
Detect complex date format
date_detect_day_or_month()
Detect the appropriate abbreviation for day or month value
date_detect_format()
Detect a date format with only 1 separator
date_detect_separator()
Detect the special character that is the separator in the date values
date_detect_simple_format()
Get format from a simple Date value
date_get_format()
Detect date format from a date column
date_get_part1()
Get part1 of date value
date_get_part2()
Get part2 of date value
date_get_part3()
Get part3 of date value
date_guess()
Try and guess dates from a characters
date_guess_convert()
Guess if a character vector contains Date values, and convert them to date
date_i_extract_string()
Extract date from a character string
date_i_find_format()
Guess date format of a character string
date_make_format()
Build the auto-detected format
date_match_format_and_column()
Check whether the number of provided formats matches the number of target columns to be standardized.
date_process()
Process date variable
date_rescue_lubridate_failures()
Find the dates that lubridate couldn't
date_trim_outliers()
Trim dates outside of the defined boundaries
convert_numeric_to_date()
Convert numeric to date

Dictionary-based substitution

Substitute given options from columns in a data frame with their corresponding values

dictionary_make_metadata()
Make data dictionary for 1 field
add_to_dictionary()
Add an element to the data dictionary
clean_using_dictionary()
Perform dictionary-based cleaning
make_readcap_dictionary()
Convert Redcap data dictionary into {matchmaker} dictionary format
construct_misspelled_report()
Build the report for the detected misspelled values during dictionary-based data cleaning operation
detect_misspelled_options()
Detect misspelled options in columns to be cleaned
print_misspelled_values()
Print the detected misspelled values

Find and remove duplicates

find_duplicates()
Identify and return duplicated rows in a data frame or linelist.
remove_duplicates()
Remove duplicates

Remove constant data

Remove constant columns, empty rows and columns

remove_constants()
Remove empty rows and columns and constant column

Replace missing values with NA

replace_missing_values()
Replace missing values with NA

Calculate time span between the variables of type Date

timespan()
Calculate time span between dates

Detect incorrect subject ids and correct them if required

check_subject_ids()
Check whether the subject IDs comply with the expected format. When incorrect IDs are found, the function sends a warning and the user can call the correct_subject_ids() function to correct them.
correct_subject_ids()
Correct the wrong subject IDs based on the user-provided values.

Check sequence of date events

check_date_sequence()
Check whether the order of the sequence of date-events is valid
is_date_sequence_ordered()
Check order of a sequence of date-events