Correct misspelled values by using approximate string matching techniques to compare them against the expected values.
Source:R/correct_misspelled_values.R
      correct_misspelled_values.RdCorrect misspelled values by using approximate string matching techniques to compare them against the expected values.
Usage
correct_misspelled_values(
  data,
  target_columns,
  wordlist,
  max_distance = 1,
  confirm = rlang::is_interactive(),
  ...
)Arguments
- data
 The input
<data.frame>or<linelist>- target_columns
 A
<vector>of the target column names. When the input data is a<linelist>object, this parameter can be set tolinelist_tagsto apply the fuzzy matching exclusively to the tagged columns.- wordlist
 A
<vector>of characters with the words to match to the detected misspelled values.- max_distance
 An
<integer>for the maximum distance allowed for detecting a spelling mistakes from thewordlist. The distance is the generalized Levenshtein edit distance (seeadist()). Default is1.- confirm
 A
<logical>that determines whether to show the user a menu of spelling corrections. IfTRUEand using R interactively then the user will have the option to review the proposed spelling corrections. This argument is useful for turning off themenu()whenrlang::is_interactive()returnsTRUEbut not wanting to prompt the user e.g.devtools::run_examples().- ...
 
Details
When used interactively (see interactive()) the user is presented a menu
to ensure that the words detected using approximate string matching are not
false positives and the user can decided whether to proceed with the
spelling corrections. In non-interactive sessions all misspelled values are
replaced by their closest values within the provided vector of expected
values.
If multiple words supplied in the wordlist equally match a word in the
data and confirm is TRUE the user is presented a menu to choose the
replacement word. If it is not used interactively multiple equal matches
throws a warning.
Examples
df <- data.frame(
  case_type = c("confirmed", "confermed", "probable", "susspected"),
  outcome = c("died", "recoverd", "did", "recovered")
)
df
#>    case_type   outcome
#> 1  confirmed      died
#> 2  confermed  recoverd
#> 3   probable       did
#> 4 susspected recovered
correct_misspelled_values(
  data = df,
  target_columns = c("case_type", "outcome"),
  wordlist = c("confirmed", "probable", "suspected", "died", "recovered"),
  confirm = FALSE
)
#>   case_type   outcome
#> 1 confirmed      died
#> 2 confirmed recovered
#> 3  probable      died
#> 4 suspected recovered