R: a very Rebol-like language

In my last post, I mentioned that the R language is remarkably similar to Rebol in some respects. Now, R is not perfect — there is a lot to complain about (126 pages of it, to be precise). But as a widely-used language with similar concepts, it’s worth taking a look at.

(A warning: I don’t actually know very much about R, and much of the below was assembled from various bits of the Language Definition. It seems pretty straightforward, though.)

Two features of R in particular remind me of Rebol. The first is its peculiar implementation of lazy evaluation: not only are function arguments passed as unevaluated thunks, but the original code can be introspected, manipulated and evaluated. (Putting it in Lisp terms, every R function is an fexpr.)

The second is its treatment of scopes. In R, environments are a first-class data structure, consisting of a lookup table from symbols to values (‘frame’), and a pointer to an enclosing pointer (‘enclosure’). Each function contains a reference to the environment in which it was created; when called, it creates a new frame, and assembles the evaluation environment from that frame and the enclosing environment. Of course, since environments are first-class values, all of this can be manipulated from within the function itself.

These two concepts come together in the form of promises — conceptually, R’s answer to Rebol block!s. Promises contain unevaluated code, together with the environment in which it was created. Promises can be explicitly created, but all function arguments are implicitly represented as promises. When used, the code is evaluated in the environment to get the value of the function argument. But it is also possible to retrieve the unevaluated code, then manipulate it and/or evaluate it in another environment.

Alas, base R gives no way to retrieve the environment of a promise. This is fixed by the rlang package, which (amongst other things) gives a greatly more ergonomic interface for unevaluated code. Its basic data structure is the ‘quosure’, again storing code+environment. One can create these explicitly from function arguments, but more often they are manipulated via rlang’s quasiquoting functions.

For instance:

> get_mean <- function(data, var) dplyr::summarise(data, mean({{ var }}))
> get_mean(data, air_temp)
  mean(air_temp)
1       21.36986

Which (if I haven’t messed anything up) should be equivalent to the following Rebol code:

get-mean: func [data var] [summarise data compose [mean (var)]]
get-mean data [air-temp]

Incidentally, this shows a major use of R’s metaprogramming facilities: accessing variables in a data frame. Here, data is a table of weather measurements I happen to have, with air_temp being one of the columns. To my understanding, dplyr::summarise works by creating a new environment from its first argument data, then evaluates its second argument mean(air_temp) in that environment — which is why air_temp points to a column of data rather than some global variable. This is known as ‘data-masking’, and is widely used within the tidyverse libraries.

Rebol, of course, has different solutions to these problems. Instead of having functions automatically quote their arguments, the caller is expected to create block!s as needed. And, instead of associating environments with unevaluated expressions, Rebol associates a binding with each word!. This has advantages (quasiquotation is easier to reason about) and disadvantages (scopes no longer exist). But the end result is much the same: functions have complete control over where, when and how their arguments are evaluated.

1 Like