19

Many of R's functions with non-standard evaluation, e.g. with, subset, and transform, contain a warning like this:

For interactive use this is very effective and nice to read. For programming however, i.e., in one's functions, more care is needed, and typically one should refrain from using with(), as, e.g., variables in data may accidentally override local variables, see the reference.
(quoted from the documentation for with, the others are less informative)

"The reference" is this 2003 article. Frankly, I don't see its relevance. It mentions the point about "variables in data may accidentally override local variables" in section 6, but it only does that - mention it. As far as I can see, nothing in that article tells you anything that the warning telling you to check the reference didn't already tell you.

I've searched through the R Manuals, even searching the 3500 page Reference Index for the term "non-standard", but I've come up with nothing other than what I've already mentioned. I really thought that it would be in the language definition, but I've read the whole thing and didn't find it. The closest that I got was the section that covers the substitute function, which I happen to know that a lot of functions with non-standard evaluation rely on.

As for any other places where I'm confident that help cannot be found, I've read both the R FAQ and An Introduction to R from cover to cover. The R FAQ mentions eval and substitute a handful of times, but not in any way that is relevant here. The only notable part was here, which also suggests to check the documentation for deriv, but I found nothing useful there.

So, is there any official part of R where the dangers of non-standard evaluation are actually documented? I find it very strange that parts of R's documentation would tell me to take care with something, without providing any place where I'm told how to do that. It's undeniable that care is needed. For example, Advanced R shows several ways that functions with non-standard evaluation can cause problems. I have paid for such carelessness before and it's not hard to find excellent answers with comments full of warnings about non-standard evaluation.

J. Mini
  • 1,288
  • 4
  • 23
  • 1
    Just a follow up: from what I could find, maybe the answer lies within the fact that R is heavily based on the S programming language (specifically, the S-PLUS version and the white book version), which in turn is heavily inspired by LISP. Maybe non-standard evaluation is covered in more detail in S documentation? However, I couldn't find any reasonably readable or free documentation of it. – eduardokapp Apr 07 '21 at 12:47
  • 1
    @eduardokapp R and S don't share the same scoping rules. From what I've read elsewhere, one of the major issues with non-standard evaluation is how it finds the arguments that its called with. Presumably, if S has totally different scoping rules, then it can't have said issue work in the same way as it does for R. – J. Mini Apr 07 '21 at 13:39
  • 2
    Now that we're near the end of the bounty, the high number of up votes is giving me the strong impression that the answer is "They're not documented anywhere". Shame. – J. Mini Apr 09 '21 at 16:53
  • @eduardokapp You could check the following link about S-PLUS, but as I've already said, I doubt that you will find anything of relevance: https://www2.stat.duke.edu/courses/Fall99/sta240/PGUIDE.PDF – J. Mini Apr 12 '21 at 15:46

2 Answers2

1

I guess section 6.3 "More on Evaluation" of the R language definition says a little about the whole problem.

Another case that occurs frequently is evaluation in a list or a data frame. For instance, this happens in connection with the model.frame function when a data argument is given. Generally, the terms of the model formula need to be evaluated in data, but they may occasionally also contain references to items in the caller of model.frame. This is sometimes useful in connection with simulation studies. So for this purpose one needs not only to evaluate an expression in a list, but also to specify an enclosure into which the search continues if the variable is not in the list. Hence, the call has the form eval(expr, data, sys.frame(sys.parent())).

And then the specific part where the text seems to "warn" the reader:

Notice that evaluation in a given environment may actually change that environment, most obviously in cases involving the assignment operator, such as eval(quote(total <- 0), environment(robert$balance)) # rob Rob. This is also true when evaluating in lists, but the original list does not change because one is really working on a copy.

Maybe it should be improved, because it definitely doesn't approach non-standard evaluation directly, one could say.

eduardokapp
  • 865
  • 1
  • 18
  • 1
    I'm unconvinced. You could be correct, but I very much doubt it. As you've said, it approaches the topic **very** indirectly. It's mostly talking about an optional argument to `eval`, so we're three steps away from talking about functions like `with`. The quoted section is not talking about `with` or even the `substitute` function that it relies on: It's largely talking about an optional argument to the `eval` function, which you _might_ use with `substitute` when using `eval`, which you just might use to build a function like `with`. That's far too far away from the issue. – J. Mini Apr 03 '21 at 22:44
  • 2
    I agree! I'm not sure there's anything official, then. I looked up all the documents you linked to. It's a shame! – eduardokapp Apr 03 '21 at 23:02
1

(Posting as an answer, because this is a bit too long for a comment.)

I don't know of a specific place where the dangers are documented, but from my personal experience, there are two important caveats to keep in mind when working with NSE:

  1. substitute() does not work correctly in nested functions, which leads to problems when trying to do sophisticated things with functions that use substitute(). Examples include glm() and coxph().

  2. If using rlang, the operator !! results in immediate evaluation of its operand w.r.t. the expression as a whole. This can lead to obscure "variable not found" errors, if the expression contains variables that will be defined when other parts of the expression are evaluated.

Outside of those two caveats, I generally find NSE to be very robust. This is especially true if you are using rlang, which goes a long way towards standardizing NSE functionality. With that said, my personal advice is to use NSE only when necessary and stick to standard evaluation (SE) as much as possible. While NSE can be extremely powerful, it produces code that can be hard to read, understand and maintain.

Artem Sokolov
  • 11,596
  • 4
  • 35
  • 65