Key considerations for retiring/superseding an R package

A reflection on the example of epichains and bpmodels

software lifecycles
R
R package
software design
DOI
Authors

James Mba Azam

Hugo Gruson

Sebastian Funk

Published

February 3, 2025

Most of our work at Epiverse TRACE involves either developing an R package from scratch or adopting and maintaining an existing R package. In the former case, decision-making during development is guided by internal policies documented in the Epiverse-TRACE blueprints. However, a less common scenario for us has been taking on the maintenance of an existing package — a situation we recently encountered with the bpmodels R package.

In this post, I want to share some considerations and lessons learned from maintaining bpmodels, originally developed by Sebastian Funk at the London School of Hygiene & Tropical Medicine (with contributions by Zhian Kamvar and Flavio Finger), and the decision to retire/supersede it with epichains. The aim is not to define strict rules but to spark a conversation about good enough practices and alternative approaches that the R developer community has used or would like to be used more widely.

One of the first considerations was the scope of the package. When maintaining or re-imagining an R package, assessing its scope and identifying opportunities for refinement is crucial. For example, some packages have evolved significantly in the broader R ecosystem to better align with user needs. We will highlight a few examples. plyr was split into dplyr and purrr for manipulating data frame and list objects respectively, reflecting more specialized functionality based on object types. Similarly, reshape evolved into reshape21 and currently into tidyr, with each iteration simplifying and improving upon its predecessor. Another example is the renaming of ggmissing into the more generalized naniar. In the epidemiology ecosystem, two examples include the evolution of the EpiNow into EpiNow2 and incidence to incidence2.

For bpmodels, we wanted to unify the simulation functions (existing as two functions previously) and improve the function signature by renaming several of the arguments for readability. We also wanted to introduce an object-oriented workflow to aid in interoperability (in some inputs and outputs) with existing tools such as epicontacts and epiparameter. Moreover, the object-oriented backend would also allow us to implement custom methods for printing, summarising and aggregating the simulation output. Some of these considerations would have been less disruptive and in line with the concept of progressive enhancement discussed in a previous blogpost. However, the change in function name and signature would have led to a lot of disruptions including deprecating the existing functions and arguments.

One thing is clear from the examples on scope changes - they often lead to name changes. Another important decision was whether to rebrand the package with a new name. A new name can signal a fresh approach and address limitations of the original package. The most popular example is the renaming of ggplot to ggplot2. Others include renaming reshape to reshape2 and further, tidyr. However, changing a package’s name can be disruptive and cause breaking changes in downstream packages or workflows that use them. The process, if not well handled, can lead to confusion and frustration in the user and developer community. With this in mind and to enable reproducibility of existing work using bpmodels, we decided to fork the bpmodels repository to the Epiverse-TRACE GitHub organisation and maintain the old package. At the same time, we introduced epichains as the successor. The name reflects the fact that it is a package for simulating and analysing epidemiological transmission chains.

A second key consideration is the plans that the original package author(s) may have had and their views on any future changes. In our case, this was fairly straightforward because the maintainer of the original package was fully involved in the refactoring process. We were able to reach and receive approval from another package author who had made substantial contributions. More generally, however, bringing all package authors on board with, for example, changes in scope and name is an important step in taking on maintenance of a package and one that should not be neglected.

We also had to consider whether to archive bpmodels or allow it to co-exist with epichains. We decided to keep bpmodels accessible to sustain the reproducibility of existing code using the package. The package was moved back to the epiforecasts GitHub organisation where it originated from. We, however, added a lifecycle badge to communicate the package’s retired status and text in the README about our plans to only maintain the required infrastructure to keep the package running but not to add any new features.

Another technical consideration was how to handle previous contributions recorded as commit histories. When forking a package, it’s important to decide whether to retain the commit history. Options include squashing the history to start with a clean slate, which risks losing visibility of past contributions, or tagging the HEAD commit of the original repository and building from there. For bpmodels, we decided to go with the second approach, which allowed us to keep the GitHub commit history intact to retain the contributions of its original authors.

Semantic versioning was another key decision point. Since epichains was not going to be available immediately but would be developed in the open (on GitHub), we needed to consider how to communicate that to potential users. We decided to start at version 0.0.0.9999 to signal an experimental and unstable phase2 while iterating on features through minor releases.

Throughout this process, we drew inspiration from various sources. Hadley Wickham’s reasoning for reshape2 as a reboot of reshape and Nicholas Tierney’s reason for renaming ggmissing to naniar were helpful. Additionally, a talk at UseR! 2024 entitled “retiring packages with extensive reverse dependencies” offered practical advice.

This transition has raised several questions for the community. How do you decide whether to supersede or deprecate a package? What strategies have worked for maintaining backward compatibility while introducing new tools? How do you document and communicate major changes to users? How is all of this done while appropriately crediting past contributions and retain discoverability and citation/use tracking?

We’d love to hear your thoughts and experiences. Let’s start a conversation about maintaining and evolving open-source tools in a sustainable way.

Footnotes

  1. Notice the URL points to reshape instead of reshape2 although the README mentions the latter. The README however lays out the reasons for the evolution.↩︎

  2. See more details on R package versioning and what they communicate in the R packages book.↩︎

Reuse

Citation

BibTeX citation:
@online{mba_azam2025,
  author = {Mba Azam, James and Gruson, Hugo and Funk, Sebastian},
  title = {Key Considerations for Retiring/Superseding an {R} Package},
  date = {2025-02-03},
  url = {https://epiverse-trace.github.io/posts/superseding-bpmodels/},
  langid = {en}
}
For attribution, please cite this work as:
Mba Azam, James, Hugo Gruson, and Sebastian Funk. 2025. “Key Considerations for Retiring/Superseding an R Package.” February 3, 2025. https://epiverse-trace.github.io/posts/superseding-bpmodels/.