safeframe philosophy is to prevent you from accidentally losing valuable data, but to otherwise be totally transparent and not to interfere with your workflow.
One popular ecosystem for data science workflow is the
tidyverse. We try to ensure decent safeframe
compatibility with the tidyverse. All dplyr verbs are tested in the
tests/test-compat-dplyr.R
file.
library(safeframe)
#>
#> Attaching package: 'safeframe'
#> The following object is masked from 'package:base':
#>
#> labels
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
x <- make_safeframe(
cars,
speed = "Miles per hour",
dist = "Distance in miles"
)
head(x)
#>
#> // safeframe object
#> speed dist
#> 1 4 2
#> 2 4 10
#> 3 7 4
#> 4 7 22
#> 5 8 16
#> 6 9 10
#>
#> labelled variables:
#> speed - Miles per hour
#> dist - Distance in miles
Verbs operating on rows
safeframe does not modify anything regarding the behaviour for row-operations. As such, it is fully compatible with dplyr verbs operating on rows out-of-the-box. You can see in the following examples that safeframe does not produce any errors, warnings or messspeeds and its labels are conserved through dplyr operations on rows.
dplyr::slice()
✅
x %>%
slice(5:10)
#>
#> // safeframe object
#> speed dist
#> 1 8 16
#> 2 9 10
#> 3 10 18
#> 4 10 26
#> 5 10 34
#> 6 11 17
#>
#> labelled variables:
#> speed - Miles per hour
#> dist - Distance in miles
x %>%
slice_head(n = 5)
#>
#> // safeframe object
#> speed dist
#> 1 4 2
#> 2 4 10
#> 3 7 4
#> 4 7 22
#> 5 8 16
#>
#> labelled variables:
#> speed - Miles per hour
#> dist - Distance in miles
x %>%
slice_tail(n = 5)
#>
#> // safeframe object
#> speed dist
#> 1 24 70
#> 2 24 92
#> 3 24 93
#> 4 24 120
#> 5 25 85
#>
#> labelled variables:
#> speed - Miles per hour
#> dist - Distance in miles
x %>%
slice_min(speed, n = 3)
#>
#> // safeframe object
#> speed dist
#> 1 4 2
#> 2 4 10
#> 3 7 4
#> 4 7 22
#>
#> labelled variables:
#> speed - Miles per hour
#> dist - Distance in miles
x %>%
slice_max(speed, n = 3)
#>
#> // safeframe object
#> speed dist
#> 1 25 85
#> 2 24 70
#> 3 24 92
#> 4 24 93
#> 5 24 120
#>
#> labelled variables:
#> speed - Miles per hour
#> dist - Distance in miles
x %>%
slice_sample(n = 5)
#>
#> // safeframe object
#> speed dist
#> 1 23 54
#> 2 14 80
#> 3 12 14
#> 4 19 46
#> 5 19 68
#>
#> labelled variables:
#> speed - Miles per hour
#> dist - Distance in miles
Verbs operating on columns
During operations on columns, safeframe will:
- stay invisible and conserve labels if no labelled column is affected by the operation
- trigger
lost_labels_action()
if labelled columns are affected by the operation
dplyr::mutate()
✓ (partial)
There is an incomplete compatibility with
dplyr::mutate()
in that simple renames without any actual
modification of the column don’t update the labels. In this scenario,
users should rather use dplyr::rename()
Although dplyr::mutate()
is not able to leverspeed to
full power of safeframe labels, safeframe objects behave as expected the
same way a data.frame would:
# In place modification doesn't lose labels
x %>%
mutate(speed = speed + 10) %>%
head()
#>
#> // safeframe object
#> speed dist
#> 1 14 2
#> 2 14 10
#> 3 17 4
#> 4 17 22
#> 5 18 16
#> 6 19 10
#>
#> labelled variables:
#> speed - Miles per hour
#> dist - Distance in miles
# New columns don't affect existing labels
x %>%
mutate(ticket = speed >= 50) %>%
head()
#>
#> // safeframe object
#> speed dist ticket
#> 1 4 2 FALSE
#> 2 4 10 FALSE
#> 3 7 4 FALSE
#> 4 7 22 FALSE
#> 5 8 16 FALSE
#> 6 9 10 FALSE
#>
#> labelled variables:
#> speed - Miles per hour
#> dist - Distance in miles
# .keep = "unused" generate expected tag loss conditions
x %>%
mutate(edad = speed, .keep = "unused") %>%
head()
#> Warning: The following labelled variables are lost:
#> speed - Miles per hour
#>
#> // safeframe object
#> dist edad
#> 1 2 4
#> 2 10 4
#> 3 4 7
#> 4 22 7
#> 5 16 8
#> 6 10 9
#>
#> labelled variables:
#> dist - Distance in miles
#> edad - Miles per hour
dplyr::pull()
✅
dplyr::pull()
returns a vector, which results, as
expected, in the loss of the safeframe class and labels:
dplyr::rename()
& dplyr::rename_with()
✅
dplyr::rename()
is fully compatible out-of-the-box with
safeframe, meaning that labels will be updated at the same time that
columns are renamed. This is possibly because it uses
names<-()
under the hood, which safeframe provides a
custom names<-.safeframe()
method for:
x %>%
rename(edad = speed) %>%
head()
#>
#> // safeframe object
#> edad dist
#> 1 4 2
#> 2 4 10
#> 3 7 4
#> 4 7 22
#> 5 8 16
#> 6 9 10
#>
#> labelled variables:
#> edad - Miles per hour
#> dist - Distance in miles
x %>%
rename_with(toupper) %>%
head()
#>
#> // safeframe object
#> SPEED DIST
#> 1 4 2
#> 2 4 10
#> 3 7 4
#> 4 7 22
#> 5 8 16
#> 6 9 10
#>
#> labelled variables:
#> SPEED - Miles per hour
#> DIST - Distance in miles
dplyr::select()
✅
dplyr::select()
is fully compatible with safeframe,
including when columns are renamed in a select()
:
# Works fine
x %>%
select(speed, dist) %>%
head()
#>
#> // safeframe object
#> speed dist
#> 1 4 2
#> 2 4 10
#> 3 7 4
#> 4 7 22
#> 5 8 16
#> 6 9 10
#>
#> labelled variables:
#> speed - Miles per hour
#> dist - Distance in miles
# labels are updated!
x %>%
select(dist, edad = speed) %>%
head()
#>
#> // safeframe object
#> dist edad
#> 1 2 4
#> 2 10 4
#> 3 4 7
#> 4 22 7
#> 5 16 8
#> 6 10 9
#>
#> labelled variables:
#> dist - Distance in miles
#> edad - Miles per hour
Verbs operating on groups ✘
Groups are not yet supported. Applying any verb operating on group to a safeframe will silently convert it back to a data.frame or tibble.
Verbs operating on data.frames
dplyr::bind_cols()
✘
bind_cols()
is currently incompatible with
safeframe:
- labels from the second element are lost
- Warnings are produced about lost labels, even for labels that are not actually lost
bind_cols(
suppressWarnings(select(x, speed)),
suppressWarnings(select(x, dist))
) %>%
head()
#> Warning: The following labelled variables are lost:
#> speed - Miles per hour
#> Warning: The following labelled variables are lost:
#> dist - Distance in miles
#>
#> // safeframe object
#> speed dist
#> 1 4 2
#> 2 4 10
#> 3 7 4
#> 4 7 22
#> 5 8 16
#> 6 9 10
#>
#> labelled variables:
#> speed - Miles per hour
#> dist - Distance in miles
Joins ✘
Joins are currently not compatible with safeframe as labels from the second element are silently dropped.
full_join(
suppressWarnings(select(x, speed, dist)),
suppressWarnings(select(x, dist, speed))
) %>%
head()
#> Joining with `by = join_by(speed, dist)`
#> Warning in full_join(suppressWarnings(select(x, speed, dist)), suppressWarnings(select(x, : Detected an unexpected many-to-many relationship between `x` and `y`.
#> ℹ Row 17 of `x` matches multiple rows in `y`.
#> ℹ Row 17 of `y` matches multiple rows in `x`.
#> ℹ If a many-to-many relationship is expected, set `relationship =
#> "many-to-many"` to silence this warning.
#>
#> // safeframe object
#> speed dist
#> 1 4 2
#> 2 4 10
#> 3 7 4
#> 4 7 22
#> 5 8 16
#> 6 9 10
#>
#> labelled variables:
#> speed - Miles per hour
#> dist - Distance in miles
Verbs operating on multiple columns
dplyr::pick()
✘
pick()
makes tidyselect functions work in usually
tidyselect-incompatible functions, such as:
x %>%
dplyr::arrange(dplyr::pick(ends_with("loc"))) %>%
head()
#>
#> // safeframe object
#> speed dist
#> 1 4 2
#> 2 4 10
#> 3 7 4
#> 4 7 22
#> 5 8 16
#> 6 9 10
#>
#> labelled variables:
#> speed - Miles per hour
#> dist - Distance in miles
As such, we could expect it to work with safeframe custom
tidyselect-like function: has_label()
but it’s not the case
since pick()
currently strips out all attributes, including
the safeframe
class and all labels. This unclassing is
documented in ?pick
:
pick()
returns a data frame containing the selected columns for the current group.