Package index
Exported functions
cleanepi functions available to end-users
-
add_to_dictionary()
- Add an element to the data dictionary
-
add_to_report()
- Add an element to the report object
-
check_date_sequence()
- Checks whether the order in a sequence of date events is chronological. order.
-
check_subject_ids()
- Check whether the subject IDs comply with the expected format. When incorrect IDs are found, the function sends a warning and the user can call the
correct_subject_ids
function to correct them.
-
clean_data()
- Clean and standardize data
-
clean_using_dictionary()
- Perform dictionary-based cleaning
-
common_na_strings
- Common strings representing missing values
-
convert_numeric_to_date()
- Convert numeric to date
-
convert_to_numeric()
- Convert columns into numeric
-
correct_subject_ids()
- Correct the wrong subject IDs based on the user-provided values.
-
find_duplicates()
- Identify and return duplicated rows in a data frame or linelist.
-
get_default_params()
- Set and return
clean_data
default parameters
-
print_report()
- Generate report from data cleaning operations
-
remove_constants()
- Remove constant data, including empty rows, empty columns, and columns with constant values.
-
remove_duplicates()
- Remove duplicates
-
replace_missing_values()
- Replace missing values with
NA
-
scan_data()
- Scan through a data frame and return the proportion of
missing
,numeric
,Date
,character
,logical
values.
-
standardize_column_names()
- Standardize column names of a data frame or line list
-
standardize_dates()
- Standardize date variables
-
timespan()
- Calculate time span between dates
-
get_target_column_names()
- Get the names of the columns from which duplicates will be found
-
add_to_report()
- Add an element to the report object
-
numbers_only()
- Detects whether a string contains only numbers or not.
-
retrieve_column_names()
- Get column names
-
tr_()
- Flag out what message will be translated using the potools package
-
clean_data()
- Clean and standardize data
-
scan_data()
- Scan through a data frame and return the proportion of
missing
,numeric
,Date
,character
,logical
values.
-
scan_in_character()
- Scan through a character column
-
print_report()
- Generate report from data cleaning operations
-
standardize_column_names()
- Standardize column names of a data frame or line list
-
make_unique_column_names()
- Make column names unique when duplicated column names are found after the transformation
-
convert_numeric_to_date()
- Convert numeric to date
-
convert_to_numeric()
- Convert columns into numeric
-
detect_to_numeric_columns()
- Detect the numeric columns that appears as characters due to the presence of some character values in the column.
-
standardize_dates()
- Standardize date variables
-
date_check_outsiders()
- Convert and update date values
-
date_check_timeframe()
- Check date time frame
-
date_choose_first_good()
- Choose the first non-missing date from a data frame of dates
-
date_convert()
- Convert characters to dates
-
date_detect_complex_format()
- Detect complex date format
-
date_detect_day_or_month()
- Detect the appropriate abbreviation for day or month value
-
date_detect_format()
- Detect a date format with only 1 separator
-
date_detect_separator()
- Detect the special character that is the separator in the date values
-
date_detect_simple_format()
- Get format from a simple Date value
-
date_get_format()
- Infer date format from a vector or characters
-
date_get_part1()
- Split a string based on a pattern and return the first element of the resulting vector.
-
date_get_part2()
- Get part2 of date value
-
date_get_part3()
- Get part3 of date value
-
date_guess()
- Try and guess dates from a characters
-
date_guess_convert()
- Guess if a character vector contains Date values, and convert them to date
-
date_i_guess_and_convert()
- Extract date from a character vector
-
date_make_format()
- Build the auto-detected format
-
date_match_format_and_column()
- Check whether the number of provided formats matches the number of target columns to be standardized.
-
date_process()
- Process date variable
-
date_rescue_lubridate_failures()
- Find the dates that lubridate couldn't
-
date_trim_outliers()
- Trim dates outside of the defined timeframe
Dictionary-based substitution
Substitutes specified options in data frame columns with their corresponding values
-
dictionary_make_metadata()
- Make data dictionary for 1 field
-
add_to_dictionary()
- Add an element to the data dictionary
-
clean_using_dictionary()
- Perform dictionary-based cleaning
-
construct_misspelled_report()
- Build the report for the detected misspelled values during dictionary-based data cleaning operation
-
detect_misspelled_options()
- Detect misspelled options in columns to be cleaned
-
print_misspelled_values()
- Print the detected misspelled values
-
find_duplicates()
- Identify and return duplicated rows in a data frame or linelist.
-
remove_duplicates()
- Remove duplicates
-
perform_remove_constants()
- Remove constant data.
-
remove_constants()
- Remove constant data, including empty rows, empty columns, and columns with constant values.
-
replace_missing_values()
- Replace missing values with
NA
-
replace_with_na()
- Detect and replace values with
NA
from a vector
-
timespan()
- Calculate time span between dates
-
check_subject_ids()
- Check whether the subject IDs comply with the expected format. When incorrect IDs are found, the function sends a warning and the user can call the
correct_subject_ids
function to correct them.
-
correct_subject_ids()
- Correct the wrong subject IDs based on the user-provided values.
-
check_subject_ids_oness()
- Checks the uniqueness in values of the sample IDs column
-
check_date_sequence()
- Checks whether the order in a sequence of date events is chronological. order.
-
is_date_sequence_ordered()
- Check order of a sequence of date-events