This vignette outlines the design decisions that have been taken during the development of the safeframe R package, and provides some of the reasoning, and possible pros and cons of each decision.
This document is primarily intended to be read by those interested in understanding the code within the package and for potential package contributors.
Scope
safeframe provides generic labelling and validation
tools. In contrast to the original versions of linelist
(<=v1.1.4
), safeframe functions at the variable level
instead of the object level.
The validation tooling is specific to type checking variables and providing feedback on potential data loss or coercion. It does not aim to do complex validations at this time.
Naming conventions
We separate functions as much as is reasonable into their own files
under R/
. If there are tests available for a file under
R/
, it follows the convention of
test-<filename>.R
under tests/testthat/
.
Not all source code has respective tests.
We try to make function names as descriptive as possible, while keeping them short. This is to make the package easy to use and understand.
Input/Output/Interoperability
Any data frame object can be passed into safeframe. Output from safeframe remains a data frame object, with an additional safeframe class attribute. This means it remains interoperable with all the regular data frame operations one may attempt to do.
safeframe is interoperable with pipes (that is,
|>
or %>%
). This allows for easy
chaining of functions. Note that there are no guarantees that label
attributes are preserved when piping or wrangling in another way. For
example, dplyr drops variable level attributes when
using dplyr::mutate()
.
Design decisions
- Generic: The package is designed to be a generic tool for labelling and validating data. This is to ensure that the package can be used in a wide range of contexts and is not limited to a specific use case. Any specific use cases should be implemented in separate packages.
- Local: We keep functions as local as possible. This means operations should be as precise as is feasible, to be non-destructive and ensure changes on one variable do not unexpectedly affect another. This helps ensure the package is predictable and easy to use + maintain.
- Minimize number of functions: We aim to keep the number of functions in the package to a minimum. This helps usability and maintainability.
- Base R: We aim to use base R functions where possible. This is to ensure that the package is lightweight and does not have many dependencies. This is for example why we do not use labelled as the labelling package.
If you feel like we did not uphold one of these design decisions, please let us know 😊
Dependencies
- checkmate - provide assertions for function arguments
- lifecycle - help manage function lifecycle
-
rlang -
...
to list parsing -
tidyselect - ensure we can use pipes in
has_label()
Development journey
The safeframe package is a major refactor of
linelist v1.1.4
. The refactor was
necessary to make the package more generic and to make the codebase more
maintainable. The refactor was completed in a series of steps documented
in #37.