Skip to contents

Exported functions

cleanepi functions available to end-users

add_to_dictionary()
Add an element to the data dictionary
add_to_report()
Add an element to the report object
check_date_sequence()
Checks whether the order in a sequence of date events is chronological. order.
check_subject_ids()
Check whether the subject IDs comply with the expected format. When incorrect IDs are found, the function sends a warning and the user can call the correct_subject_ids function to correct them.
clean_data()
Clean and standardize data
clean_using_dictionary()
Perform dictionary-based cleaning
common_na_strings
Common strings representing missing values
convert_numeric_to_date()
Convert numeric to date
convert_to_numeric()
Convert columns into numeric
correct_subject_ids()
Correct the wrong subject IDs based on the user-provided values.
find_duplicates()
Identify and return duplicated rows in a data frame or linelist.
get_default_params()
Set and return clean_data default parameters
print_report()
Generate report from data cleaning operations
remove_constants()
Remove constant data, including empty rows, empty columns, and columns with constant values.
remove_duplicates()
Remove duplicates
replace_missing_values()
Replace missing values with NA
scan_data()
Scan through a data frame and return the proportion of missing, numeric, Date, character, logical values.
standardize_column_names()
Standardize column names of a data frame or line list
standardize_dates()
Standardize date variables
timespan()
Calculate time span between dates

Shared helpers

get_target_column_names()
Get the names of the columns from which duplicates will be found
add_to_report()
Add an element to the report object
numbers_only()
Detects whether a string contains only numbers or not.
retrieve_column_names()
Get column names
tr_()
Flag out what message will be translated using the potools package

Clean data

Performs several cleaning operations at once

clean_data()
Clean and standardize data

Check data structure

Scans the input data to determine the composition of character columns

scan_data()
Scan through a data frame and return the proportion of missing, numeric, Date, character, logical values.
scan_in_character()
Scan through a character column
print_report()
Generate report from data cleaning operations

Standardise column names

Harmonizes the usage of English characters in column names

standardize_column_names()
Standardize column names of a data frame or line list
make_unique_column_names()
Make column names unique when duplicated column names are found after the transformation

Retrieve dates from numeric values

convert_numeric_to_date()
Convert numeric to date

Convert numbers written in letters into numeric

convert_to_numeric()
Convert columns into numeric
detect_to_numeric_columns()
Detect the numeric columns that appears as characters due to the presence of some character values in the column.

Standardize dates

Coerce date values to the ISO format Ymd (2024-31-01)

standardize_dates()
Standardize date variables
date_check_outsiders()
Convert and update date values
date_check_timeframe()
Check date time frame
date_choose_first_good()
Choose the first non-missing date from a data frame of dates
date_convert()
Convert characters to dates
date_detect_complex_format()
Detect complex date format
date_detect_day_or_month()
Detect the appropriate abbreviation for day or month value
date_detect_format()
Detect a date format with only 1 separator
date_detect_separator()
Detect the special character that is the separator in the date values
date_detect_simple_format()
Get format from a simple Date value
date_get_format()
Infer date format from a vector or characters
date_get_part1()
Split a string based on a pattern and return the first element of the resulting vector.
date_get_part2()
Get part2 of date value
date_get_part3()
Get part3 of date value
date_guess()
Try and guess dates from a characters
date_guess_convert()
Guess if a character vector contains Date values, and convert them to date
date_i_guess_and_convert()
Extract date from a character vector
date_make_format()
Build the auto-detected format
date_match_format_and_column()
Check whether the number of provided formats matches the number of target columns to be standardized.
date_process()
Process date variable
date_rescue_lubridate_failures()
Find the dates that lubridate couldn't
date_trim_outliers()
Trim dates outside of the defined timeframe

Dictionary-based substitution

Substitutes specified options in data frame columns with their corresponding values

dictionary_make_metadata()
Make data dictionary for 1 field
add_to_dictionary()
Add an element to the data dictionary
clean_using_dictionary()
Perform dictionary-based cleaning
construct_misspelled_report()
Build the report for the detected misspelled values during dictionary-based data cleaning operation
detect_misspelled_options()
Detect misspelled options in columns to be cleaned
print_misspelled_values()
Print the detected misspelled values

Find and remove duplicates

find_duplicates()
Identify and return duplicated rows in a data frame or linelist.
remove_duplicates()
Remove duplicates

Remove constant data

Remove constant columns, empty rows and columns

perform_remove_constants()
Remove constant data.
remove_constants()
Remove constant data, including empty rows, empty columns, and columns with constant values.

Replace missing values with NA

replace_missing_values()
Replace missing values with NA
replace_with_na()
Detect and replace values with NA from a vector

Calculate time span between variables of type Date

timespan()
Calculate time span between dates

Detect incorrect subject ids and correct them if required

check_subject_ids()
Check whether the subject IDs comply with the expected format. When incorrect IDs are found, the function sends a warning and the user can call the correct_subject_ids function to correct them.
correct_subject_ids()
Correct the wrong subject IDs based on the user-provided values.
check_subject_ids_oness()
Checks the uniqueness in values of the sample IDs column

Check sequence of date events

check_date_sequence()
Checks whether the order in a sequence of date events is chronological. order.
is_date_sequence_ordered()
Check order of a sequence of date-events