Skip to contents

When removing duplicates, users can specify a set columns to consider with the target_columns argument.

Usage

remove_duplicates(data, target_columns = NULL)

Arguments

data

The input data frame or linelist.

target_columns

A vector of column names to use when looking for duplicates. When the input data is a linelist object, this parameter can be set to linelist_tags if you wish to look for duplicates on tagged columns only. Default is NULL.

Value

A data frame or linelist without the duplicated rows identified from all or the specified columns.

Examples

no_dups <- remove_duplicates(
  data = readRDS(
    system.file("extdata", "test_linelist.RDS", package = "cleanepi")
  ),
  target_columns = "linelist_tags"
)
#> Found 57 duplicated rows in the dataset. Please consult the report for more details.