Lesser-known reasons to prefer apply() over for loops

R
functional programming
iteration
readability
good practices
tidyverse
Author

Hugo Gruson

Published

November 2, 2023

The debate regarding the use of for loops versus the apply() function family (apply(), lapply(), vapply(), etc., along with their purrr counterparts: map(), map2(), map_lgl(), map_chr(), etc.), has been a longstanding one in the R community.

While you may occasionally hear that for loops are slower, this notion has already been debunked in other posts. When utilized correctly, a for loop can achieve performance on par with apply() functions.

However, there are still lesser-known reasons to prefer apply() functions over for loops, which we will explore in this post.

Preamble: for loops can be used in more cases than apply()

It is important to understand that for loops and apply() functions are not always interchangeable. Indeed, for loops can be used in cases where apply functions can’t: when the next step depends on the previous one. This concept is known as recursion.

Conversely, when each step is independent of the previous one, but you want to perform the same operation on each element of a vector, it is referred to as iteration.

for loops are capable of both recursion and iteration, whereas apply() can only do iteration.

Operator Iteration Recursion
for ✔️ ✔️
apply() ✔️

With this distinction in mind, we can now focus on why you should favour apply() for iteration over for loops.

Reason 1: clarity of intent

As mentioned earlier, for loops can be used for both iteration and recursion. By consistently employing apply() for iteration1 and reserving for loops for recursion, we enable readers to immediately discern the underlying concept in the code. This leads to code that is easier to read and understand.

l <- list(c(1, 2, 6), c(3, 5), 6, c(0, 9, 3, 4, 8))

# `for` solution -------------------
res <- numeric(length(l))
for (i in seq_along(l)) {
  res[[i]] <- mean(l[[i]])
}
res
[1] 3.0 4.0 6.0 4.8
# `vapply()` solution ---------------
res <- vapply(l, mean, numeric(1))
res
[1] 3.0 4.0 6.0 4.8

The simplicity of apply() is even more apparent in the case of multiple iterations. For example, if we want to find the median of each matrix row for a list of matrices:

l <- replicate(5, { matrix(runif(9), nrow = 3) }, simplify = FALSE)

# `for` solution -------------------
res <- list()
for (i in seq_along(l)) {
  meds <- numeric(nrow(l[[i]]))
  for (j in seq_len(nrow(l[[i]]))) {
    meds[[j]] <- median(l[[i]][j, ])
  }
  res[[i]] <- meds
}
res
[[1]]
[1] 0.7807203 0.2448108 0.7407391

[[2]]
[1] 0.3249898 0.1742138 0.3917644

[[3]]
[1] 0.2876697 0.8835354 0.5606563

[[4]]
[1] 0.08360038 0.37515714 0.43738127

[[5]]
[1] 0.4834298 0.6106350 0.6328951
# `vapply()` solution ---------------
lapply(l, function(e) {
  apply(e, 1, median)
})
[[1]]
[1] 0.7807203 0.2448108 0.7407391

[[2]]
[1] 0.3249898 0.1742138 0.3917644

[[3]]
[1] 0.2876697 0.8835354 0.5606563

[[4]]
[1] 0.08360038 0.37515714 0.43738127

[[5]]
[1] 0.4834298 0.6106350 0.6328951

Moreover, this clarity of intent is not limited to human readers alone; automated static analysis tools can also more effectively identify suboptimal patterns. This can be demonstrated using R most popular static analysis tool: the lintr package, suggesting vectorized alternatives:

lintr::lint(text = "vapply(l, length, numeric(1))")

lintr::lint(text = "apply(m, 1, sum)")

Reason 2: code compactness and conciseness

As illustrated in the preceding example, apply() often leads to more compact code, as much of the boilerplate code is handled behind the scenes: you don’t have to initialize your variables, manage indexing, etc.

This, in turn, impacts code readability since:

  • The boilerplate code does not offer meaningful insights into the algorithm or implementation and can be seen as visual noise.
  • While compactness should never take precedence over readability, a more compact solution allows for more code to be displayed on the screen without scrolling. This ultimately makes it easier to understand what the code is doing. With all things otherwise equal, the more compact solution should thus be preferred.

Reason 3: variable leak

As discussed in the previous sections, you have to manually manage the iteration index in a for loop, whereas they are abstracted in apply(). This can sometimes lead to perplexing errors:

k <- 10

for (k in c("Paul", "Pierre", "Jacques")) {
  message("Hello ", k)
}
Hello Paul
Hello Pierre
Hello Jacques
rep(letters, times = k)
Warning: NAs introduced by coercion
Error in rep(letters, times = k): invalid 'times' argument

This is because the loop index variable leaks into the global environment and can overwrite existing variables:

for (k in 1:3) {
  # do something
}

message("The value of k is now ", k)
The value of k is now 3

Reason 4: pipelines

The final reason is that apply() (or more commonly in this situation purrr::map()) can be used in pipelines due to their functional nature:

l <- list(c(1, 2, 6), c(3, 5), 6, c(0, 9, 3, 4, 8))

# Without pipe
vapply(l, mean, numeric(1))
[1] 3.0 4.0 6.0 4.8
# With pipe
l |> vapply(mean, numeric(1))
[1] 3.0 4.0 6.0 4.8
l |> purrr::map_dbl(mean)
[1] 3.0 4.0 6.0 4.8

Conclusion

This post hopefully convinced you why it’s better to use apply() functions rather than for loops where possible (i.e., for iteration). Contrary to common misconception, the real reason is not performance, but code robustness and readability.

Thanks to Jaime Pavlich-Mariscal, James Azam, Tim Taylor, and Pratik Gupte for their thoughtful comments and suggestions on earlier drafts of this post.

Beyond R

This post focused on R, but the same principles generally apply to other functional languages. In Python for example, you would use list comprehensions or the map() function.

Further reading

If you liked the code patterns recommended in this post and want to use functional programming in more situations, including recursion, I recommend you check out the “Functionals” chapter of the Advanced R book by Hadley Wickham

Footnotes

  1. There are a handful of rare corner cases where apply() is not the best method for iteration. These are cases that make use of match.call() or sys.call(). More details are available in lapply() documentation and in this GitHub comment by Tim Taylor during the review of this post.↩︎

Reuse

Citation

BibTeX citation:
@online{gruson2023,
  author = {Gruson, Hugo},
  title = {Lesser-Known Reasons to Prefer `Apply()` over for Loops},
  date = {2023-11-02},
  url = {https://epiverse-trace.github.io/posts/for-vs-apply/},
  langid = {en}
}
For attribution, please cite this work as:
Gruson, Hugo. 2023. “Lesser-Known Reasons to Prefer `Apply()` over for Loops.” November 2, 2023. https://epiverse-trace.github.io/posts/for-vs-apply/.