Skip to contents

When removing duplicates, users can specify a set columns to consider with the target_columns argument.

Usage

remove_duplicates(data, target_columns = NULL)

Arguments

data

The input <data.frame> or <linelist>.

target_columns

A <vector> of column names to use when looking for duplicates. When the input data is a linelist object, this parameter can be set to linelist_tags if you wish to look for duplicates on tagged columns only. Default is NULL.

Value

The input data <data.frame> or <linelist> without the duplicated rows identified from all or the specified columns.

Examples

no_dups <- remove_duplicates(
  data = readRDS(
    system.file("extdata", "test_linelist.RDS", package = "cleanepi")
  ),
  target_columns = "linelist_tags"
)
#> ! Found 57 duplicated rows in the dataset.
#>  Use `attr(dat, "report")[["duplicated_rows"]]` to access them, where "dat" is
#>   the object used to store the output from this operation.