Package index
Exported functions
cleanepi functions available to end-users
- 
          
add_to_dictionary() - Add an element to the data dictionary
 
- 
          
add_to_report() - Add an element to the report object
 
- 
          
check_date_sequence() - Checks whether the order in a sequence of date events is chronological. order.
 
- 
          
check_subject_ids() - Check whether the subject IDs comply with the expected format. When incorrect IDs are found, the function sends a warning and the user can call the 
correct_subject_idsfunction to correct them. 
- 
          
clean_data() - Clean and standardize data
 
- 
          
clean_using_dictionary() - Perform dictionary-based cleaning
 
- 
          
common_na_strings - Common strings representing missing values
 
- 
          
convert_numeric_to_date() - Convert numeric to date
 
- 
          
convert_to_numeric() - Convert columns into numeric
 
- 
          
correct_misspelled_values() - Correct misspelled values by using approximate string matching techniques to compare them against the expected values.
 
- 
          
correct_subject_ids() - Correct the wrong subject IDs based on the user-provided values.
 
- 
          
find_duplicates() - Identify and return duplicated rows in a data frame or linelist.
 
- 
          
get_default_params() - Set and return 
clean_datadefault parameters 
- 
          
print_report() - Generate report from data cleaning operations
 
- 
          
remove_constants() - Remove constant data, including empty rows, empty columns, and columns with constant values.
 
- 
          
remove_duplicates() - Remove duplicates
 
- 
          
replace_missing_values() - Replace missing values with 
NA 
- 
          
scan_data() - Scan through a data frame and return the proportion of 
missing,numeric,Date,character,logicalvalues. 
- 
          
standardize_column_names() - Standardize column names of a data frame or line list
 
- 
          
standardize_dates() - Standardize date variables
 
- 
          
timespan() - Calculate time span between dates
 
- 
          
get_target_column_names() - Get the names of the columns from which duplicates will be found
 
- 
          
add_to_report() - Add an element to the report object
 
- 
          
numbers_only() - Detects whether a string contains only numbers or not.
 
- 
          
retrieve_column_names() - Get column names
 
- 
          
tr_() - Flag out what message will be translated using the potools package
 
- 
          
clean_data() - Clean and standardize data
 
- 
          
scan_data() - Scan through a data frame and return the proportion of 
missing,numeric,Date,character,logicalvalues. 
- 
          
scan_in_character() - Scan through a character column
 
- 
          
print_report() - Generate report from data cleaning operations
 
- 
          
standardize_column_names() - Standardize column names of a data frame or line list
 
- 
          
make_unique_column_names() - Make column names unique when duplicated column names are found after the transformation
 
- 
          
convert_numeric_to_date() - Convert numeric to date
 
- 
          
convert_to_numeric() - Convert columns into numeric
 
- 
          
detect_to_numeric_columns() - Detect the numeric columns that appears as characters due to the presence of some character values in the column.
 
- 
          
standardize_dates() - Standardize date variables
 
- 
          
date_check_outsiders() - Convert and update date values
 
- 
          
date_check_timeframe() - Check date time frame
 
- 
          
date_choose_first_good() - Choose the first non-missing date from a data frame of dates
 
- 
          
date_convert() - Convert characters to dates
 
- 
          
date_detect_complex_format() - Detect complex date format
 
- 
          
date_detect_day_or_month() - Detect the appropriate abbreviation for day or month value
 
- 
          
date_detect_format() - Detect a date format with only 1 separator
 
- 
          
date_detect_separator() - Detect the special character that is the separator in the date values
 
- 
          
date_detect_simple_format() - Get format from a simple Date value
 
- 
          
date_get_format() - Infer date format from a vector or characters
 
- 
          
date_get_part1() - Split a string based on a pattern and return the first element of the resulting vector.
 
- 
          
date_get_part2() - Get part2 of date value
 
- 
          
date_get_part3() - Get part3 of date value
 
- 
          
date_guess() - Try and guess dates from a characters
 
- 
          
date_guess_convert() - Guess if a character vector contains Date values, and convert them to date
 
- 
          
date_i_guess_and_convert() - Extract date from a character vector
 
- 
          
date_make_format() - Build the auto-detected format
 
- 
          
date_match_format_and_column() - Check whether the number of provided formats matches the number of target columns to be standardized.
 
- 
          
date_process() - Process date variable
 
- 
          
date_rescue_lubridate_failures() - Find the dates that lubridate couldn't
 
- 
          
date_trim_outliers() - Trim dates outside of the defined timeframe
 
Dictionary-based substitution
Substitutes specified options in data frame columns with their corresponding values
- 
          
dictionary_make_metadata() - Make data dictionary for 1 field
 
- 
          
add_to_dictionary() - Add an element to the data dictionary
 
- 
          
clean_using_dictionary() - Perform dictionary-based cleaning
 
- 
          
construct_misspelled_report() - Build the report for the detected misspelled values during dictionary-based data cleaning operation
 
- 
          
detect_misspelled_options() - Detect misspelled options in columns to be cleaned
 
- 
          
print_misspelled_values() - Print the detected misspelled values
 
Check spelling mistakes
Substitutes misspelled values with their closest character from a user- provided vector of words
- 
          
correct_misspelled_values() - Correct misspelled values by using approximate string matching techniques to compare them against the expected values.
 
- 
          
find_duplicates() - Identify and return duplicated rows in a data frame or linelist.
 
- 
          
remove_duplicates() - Remove duplicates
 
- 
          
perform_remove_constants() - Remove constant data.
 
- 
          
remove_constants() - Remove constant data, including empty rows, empty columns, and columns with constant values.
 
- 
          
replace_missing_values() - Replace missing values with 
NA 
- 
          
replace_with_na() - Detect and replace values with 
NAfrom a vector 
- 
          
timespan() - Calculate time span between dates
 
- 
          
check_subject_ids() - Check whether the subject IDs comply with the expected format. When incorrect IDs are found, the function sends a warning and the user can call the 
correct_subject_idsfunction to correct them. 
- 
          
correct_subject_ids() - Correct the wrong subject IDs based on the user-provided values.
 
- 
          
check_subject_ids_oness() - Checks the uniqueness in values of the sample IDs column
 
- 
          
check_date_sequence() - Checks whether the order in a sequence of date events is chronological. order.
 
- 
          
is_date_sequence_ordered() - Check order of a sequence of date-events