Throughout the Epiverse project, we use the renv R package to ensure reproducibility of the training materials and the pipelines we are providing. But we sometimes get reports from users who struggle to rebuild the environment and run the code.
In this post, we dissect the source of these issues, explain why in reality renv is not at fault, and how this is caused by the inherent complexity of reproducibility. The renv documentation already includes caveats explaining why some situations are bound to require more complex tools. This blog post reiterates some of these caveats and illustrates them with concrete examples.
Finally, we mention a couple of more complete (but more complex!) frameworks that can overcome the issues presented here. We do not explore these alternative framework in detail but provide links to more information.
Binaries vs building from source
Software, including R packages, can generally be delivered in two forms: as binaries or as source code. If you are building from the source code, you may in some case need a compilation toolchain on your computer. If that toolchain is missing, it can lead to errors such as:
ld: warning: search path '/opt/gfortran/lib' not found ld: library 'gfortran' not found
Most of the time, regular users of R will not see these errors because they are installing binaries. Indeed, CRAN provides pre-compiled binaries for Windows and macOS for the last version of the package and R.
With renv, you often want to install older versions of the packages, which won’t be available as binaries from CRAN. This means you are more likely to have to compile the package yourself and see this kind of errors, even though renv is not causing them.
If you are an Apple Silicon (Mac M1, M2, M3) user and encounter issues with gfortran, we have had success using the macrtools R package and we strongly recommend checking it out.
Beyond renv scope: incompatibility with system dependency versions
We discussed previously the topic of system dependencies, and dependencies on specific R versions. These special dependencies can also be a source of headaches when using renv.
The heart of the issue is that renv provides a simplified solution to reproducibility: it focuses on R packages and their versions. But other sources of non-reproducibility are outside its scope. In many cases, this will not be a problem, as the main source of non-reproducibility, especially in the relatively short-term, will be R package versions.
But sometimes, it is possible that the renv.lock
lockfile requires such an old version of an R package that it was written with a syntax that is no longer supported by recent R versions or modern compilers.
For example, a recent project (from 2023) was trying to install the version 0.60.1 of the matrixStats
package (from 2021). This lead to this compilation error:
error: ‘DOUBLE_XMAX’ undeclared (first use in this function); did you mean ‘DBL_MAX’?
Click to see the full error message
! Error installing package 'matrixStats':
=======================================
* installing *source* package ‘matrixStats’ ...
** package ‘matrixStats’ successfully unpacked and MD5 sums checked
** using staged installation
** libs
using C compiler: ‘gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0’
gcc -I"/usr/share/R/include" -DNDEBUG -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c 000.init.c -o 000.init.o
gcc -I"/usr/share/R/include" -DNDEBUG -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c allocMatrix2.c -o allocMatrix2.o
gcc -I"/usr/share/R/include" -DNDEBUG -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c anyMissing.c -o anyMissing.o
gcc -I"/usr/share/R/include" -DNDEBUG -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c binCounts.c -o binCounts.o
gcc -I"/usr/share/R/include" -DNDEBUG -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c binMeans.c -o binMeans.o
gcc -I"/usr/share/R/include" -DNDEBUG -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c colCounts.c -o colCounts.o
gcc -I"/usr/share/R/include" -DNDEBUG -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c colOrderStats.c -o colOrderStats.o
gcc -I"/usr/share/R/include" -DNDEBUG -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c colRanges.c -o colRanges.o
gcc -I"/usr/share/R/include" -DNDEBUG -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c diff2.c -o diff2.o
gcc -I"/usr/share/R/include" -DNDEBUG -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c indexByRow.c -o indexByRow.o
gcc -I"/usr/share/R/include" -DNDEBUG -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c logSumExp.c -o logSumExp.o
gcc -I"/usr/share/R/include" -DNDEBUG -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c mean2.c -o mean2.o
In file included from mean2_lowlevel.h:14,
from mean2.c:9:
mean2_lowlevel_template.h: In function ‘mean2_int’:
mean2_lowlevel_template.h:59:13: error: ‘DOUBLE_XMAX’ undeclared (first use in this function); did you mean ‘DBL_MAX’?
59 | if (sum > DOUBLE_XMAX) {
| ^~~~~~~~~~~
| DBL_MAX
mean2_lowlevel_template.h:59:13: note: each undeclared identifier is reported only once for each function it appears in
In file included from mean2_lowlevel.h:18,
from mean2.c:9:
mean2_lowlevel_template.h: In function ‘mean2_dbl’:
mean2_lowlevel_template.h:59:13: error: ‘DOUBLE_XMAX’ undeclared (first use in this function); did you mean ‘DBL_MAX’?
59 | if (sum > DOUBLE_XMAX) {
| ^~~~~~~~~~~
| DBL_MAX
make: *** [/usr/lib/R/etc/Makeconf:191: mean2.o] Error 1 ERROR: compilation failed for package ‘matrixStats’
The explanation for this error can be found in the matrixStats
release notes, specifically the section for matrixStats 0.63.0:
- Updated native code to use the C99 constant
DBL_MAX
instead of legacy S constantDOUBLE_XMAX
, which is planned to be unsupported in R (>= 4.2.0).
Some solutions
Alternative package managers
We discussed how many issues when using renv can arise during the package compilation from source. A potential solution would be to avoid this compilation step and always install pre-compiled binaries.
This is not possible while installing from CRAN as CRAN only provides binaries for recent versions of R and for a limited number of platforms.
But Posit for example provides a larger collection of binaries, for different package versions, and different platforms, via their Public Posit Package Manager (PPM).
Making sure you install from PPM rather than CRAN can be a first simple step to make some of the issues discussed here vanish.
Extending the scope of reproducibility
Another solution could be to add more complex reproducibility solutions that go beyond the scope of renv.
renv with rig
The R version is specified in renv.lock
and to avoid incompatibility of older package versions with newer versions of R, you could run the declared R version. This can be achieved with various means but a convenient solution is the rig tool.
There are even some discussions to integrate rig and renv more tightly and let rig detect automatically which R version to use based on the renv.lock
file.
Docker, Nix and others
Alternatively, you could use other reproducibility toolkits that focus not just on the R package versions, but on the entire software stack (e.g., including the operating system, the system dependencies). These solutions can be more complex to set up and use, and we won’t detail them in this blog post but you can find more information in:
Conclusion: a final note for developers
renv is an elegant solution that focuses on the most immediate source of non-reproducibility. This however means it needs to be complemented by other tools in more complex cases.
Ultimately, reproducibility is a team effort. People who write code can minimise the risk of renv complications by keeping the packages they use close to their CRAN version and regularly updating their code and renv.lock
accordingly. Other programming languages have automated tooling to help with this, via, e.g., the dependabot tool which submits pull requests to update dependencies. There is no well established equivalent for R yet, but anyone willing to set this mechanism up can look at the code used by the Carpentries workbench for this task.
Thanks to Pratik Gupte and Chris Hartgerink for their valuable comments on earlier drafts of this post.
Reuse
Citation
@online{gruson2024,
author = {Gruson, Hugo},
title = {Things That Can Go Wrong When Using Renv},
date = {2024-01-31},
url = {https://epiverse-trace.github.io/posts/renv-complications/},
doi = {10.59350/hsy4m-6se90},
langid = {en}
}