Skip to contents

Note that THIS FEATURE IS STILL EXPERIMENTAL: we strongly recommend checking a few converted dates manually. This function tries to extract dates from a character vector or a factor. It treats each entry independently, using regular expressions to detect if a date is present, its format, and if successful it converts that entry to a standard Date with the Ymd format (e.g. 2018-01-21). Entries which cannot be processed result in NA. An error threshold can be used to define the maximum number of resulting NA (i.e. entries without an identified date) that can be tolerated. If this threshold is exceeded, the original vector is returned.

Usage

date_guess(
  x,
  column_name,
  quiet = TRUE,
  modern_excel = TRUE,
  orders = list(world_named_months = c("Ybd", "dby"), world_digit_months = c("dmy",
    "Ymd"), US_formats = c("Omdy", "YOmd"))
)

Arguments

x

A character vector or a factor

column_name

The target column name

quiet

A logical indicating if messages should be displayed to the console (TRUE, default); set to FALSE to silence messages

modern_excel

When parsing dates from excel, some dates are stored as integers. Modern versions of Excel represent dates as the number of days since 1900-01-01, but pre-2011 Excel for OSX have the origin set at 1904-01-01. If this parameter is TRUE (default), then this assumes that all numeric values represent dates from either a Windows version of Excel or a 2011 or later version of Excel for OSX. Set this parameter to FALSE if the data came from an OSX version of Excel before 2011.

orders

The date codes for fine-grained parsing of dates. This allows for parsing of mixed dates. If a list is supplied, that list will be used for successive tries in parsing. Default orders are:

list(
  world_named_months = c("Ybd", "dby"),
  world_digit_months = c("dmy", "Ymd"),
  US_formats         = c("Omdy", "YOmd")
)

Value

A list of following two elements: a vector of the newly reformatted dates and a data frame with the date values that were converted from more than one format. If all values comply with only one format, the later element will be NULL.

Author

Thibaut Jombart, Zhian N. Kamvar