Package index • cleanepi

Exported functions

cleanepi functions available to end-users

add_to_dictionary(): Add an element to the data dictionary

add_to_report(): Add an element to the report object

check_date_sequence(): Checks whether the order in a sequence of date events is chronological. order.

check_subject_ids(): Check whether the subject IDs comply with the expected format. When incorrect IDs are found, the function sends a warning and the user can call the correct_subject_ids function to correct them.

clean_data(): Clean and standardize data

clean_using_dictionary(): Perform dictionary-based cleaning

common_na_strings: Common strings representing missing values

convert_numeric_to_date(): Convert numeric to date

convert_to_numeric(): Convert columns into numeric

correct_misspelled_values(): Correct misspelled values by using approximate string matching techniques to compare them against the expected values.

correct_subject_ids(): Correct the wrong subject IDs based on the user-provided values.

find_duplicates(): Identify and return duplicated rows in a data frame or linelist.

get_default_params(): Set and return clean_data default parameters

print_report(): Generate report from data cleaning operations

remove_constants(): Remove constant data, including empty rows, empty columns, and columns with constant values.

remove_duplicates(): Remove duplicates

replace_missing_values(): Replace missing values with NA

scan_data(): Scan through a data frame and return the proportion of missing, numeric, Date, character, logical values.

standardize_column_names(): Standardize column names of a data frame or line list

standardize_dates(): Standardize date variables

timespan(): Calculate time span between dates

Shared helpers

get_target_column_names(): Get the names of the columns from which duplicates will be found

add_to_report(): Add an element to the report object

numbers_only(): Detects whether a string contains only numbers or not.

retrieve_column_names(): Get column names

tr_(): Flag out what message will be translated using the potools package

Clean data

Performs several cleaning operations at once

clean_data(): Clean and standardize data

Check data structure

Scans the input data to determine the composition of character columns

scan_data(): Scan through a data frame and return the proportion of missing, numeric, Date, character, logical values.

scan_in_character(): Scan through a character column

Print data cleaning report

print_report(): Generate report from data cleaning operations

Standardise column names

Harmonizes the usage of English characters in column names

standardize_column_names(): Standardize column names of a data frame or line list

make_unique_column_names(): Make column names unique when duplicated column names are found after the transformation

Retrieve dates from numeric values

convert_numeric_to_date(): Convert numeric to date

Convert numbers written in letters into numeric

convert_to_numeric(): Convert columns into numeric

detect_to_numeric_columns(): Detect the numeric columns that appears as characters due to the presence of some character values in the column.

Standardize dates

Coerce date values to the ISO format Ymd (2024-31-01)

standardize_dates(): Standardize date variables

date_check_outsiders(): Convert and update date values

date_check_timeframe(): Check date time frame

date_choose_first_good(): Choose the first non-missing date from a data frame of dates

date_convert(): Convert characters to dates

date_detect_complex_format(): Detect complex date format

date_detect_day_or_month(): Detect the appropriate abbreviation for day or month value

date_detect_format(): Detect a date format with only 1 separator

date_detect_separator(): Detect the special character that is the separator in the date values

date_detect_simple_format(): Get format from a simple Date value

date_get_format(): Infer date format from a vector or characters

date_get_part1(): Split a string based on a pattern and return the first element of the resulting vector.

date_get_part2(): Get part2 of date value

date_get_part3(): Get part3 of date value

date_guess(): Try and guess dates from a characters

date_guess_convert(): Guess if a character vector contains Date values, and convert them to date

date_i_guess_and_convert(): Extract date from a character vector

date_make_format(): Build the auto-detected format

date_match_format_and_column(): Check whether the number of provided formats matches the number of target columns to be standardized.

date_process(): Process date variable

date_rescue_lubridate_failures(): Find the dates that lubridate couldn't

date_trim_outliers(): Trim dates outside of the defined timeframe

Dictionary-based substitution

Substitutes specified options in data frame columns with their corresponding values

dictionary_make_metadata(): Make data dictionary for 1 field

add_to_dictionary(): Add an element to the data dictionary

clean_using_dictionary(): Perform dictionary-based cleaning

construct_misspelled_report(): Build the report for the detected misspelled values during dictionary-based data cleaning operation

detect_misspelled_options(): Detect misspelled options in columns to be cleaned

print_misspelled_values(): Print the detected misspelled values

Check spelling mistakes

Substitutes misspelled values with their closest character from a user- provided vector of words

correct_misspelled_values(): Correct misspelled values by using approximate string matching techniques to compare them against the expected values.

Find and remove duplicates

find_duplicates(): Identify and return duplicated rows in a data frame or linelist.

remove_duplicates(): Remove duplicates

Remove constant data

Remove constant columns, empty rows and columns

perform_remove_constants(): Remove constant data.

remove_constants(): Remove constant data, including empty rows, empty columns, and columns with constant values.

Replace missing values with NA

replace_missing_values(): Replace missing values with NA

replace_with_na(): Detect and replace values with NA from a vector

Calculate time span between variables of type Date

timespan(): Calculate time span between dates

Detect incorrect subject ids and correct them if required

check_subject_ids(): Check whether the subject IDs comply with the expected format. When incorrect IDs are found, the function sends a warning and the user can call the correct_subject_ids function to correct them.

correct_subject_ids(): Correct the wrong subject IDs based on the user-provided values.

check_subject_ids_oness(): Checks the uniqueness in values of the sample IDs column

Check sequence of date events

check_date_sequence(): Checks whether the order in a sequence of date events is chronological. order.

is_date_sequence_ordered(): Check order of a sequence of date-events