Identify and return duplicated rows in a data frame or linelist.
Source:R/find_and_remove_duplicates.R
find_duplicates.Rd
Identify and return duplicated rows in a data frame or linelist.
Arguments
- data
A data frame or linelist.
- target_columns
A vector of columns names or indices to consider when looking for duplicates. When the input data is a
linelist
object, this parameter can be set totags
from which duplicates to be removed. Its default value isNULL
, which considers duplicates across all columns.
Value
A data frame or linelist of all duplicated rows with following 2 additional columns:
row_id
: the indices of the duplicated rows from the input data. Users can choose from these indices, which row they consider as redundant in each group of duplicates.group_id
: a unique identifier associated to each group of duplicates.
Examples
dups <- find_duplicates(
data = readRDS(system.file("extdata", "test_linelist.RDS",
package = "cleanepi")),
target_columns = c("dt_onset", "dt_report", "sex", "outcome")
)
#> Found 57 duplicated rows. Please consult the report for more details.