R: a very Rebol-like language

bradrn · December 30, 2023, 9:36am

In my last post, I mentioned that the R language is remarkably similar to Rebol in some respects. Now, R is not perfect — there is a lot to complain about (126 pages of it, to be precise). But as a widely-used language with similar concepts, it’s worth taking a look at.

(A warning: I don’t actually know very much about R, and much of the below was assembled from various bits of the Language Definition. It seems pretty straightforward, though.)

Two features of R in particular remind me of Rebol. The first is its peculiar implementation of lazy evaluation: not only are function arguments passed as unevaluated thunks, but the original code can be introspected, manipulated and evaluated. (Putting it in Lisp terms, every R function is an fexpr.)

The second is its treatment of scopes. In R, environments are a first-class data structure, consisting of a lookup table from symbols to values (‘frame’), and a pointer to an enclosing pointer (‘enclosure’). Each function contains a reference to the environment in which it was created; when called, it creates a new frame, and assembles the evaluation environment from that frame and the enclosing environment. Of course, since environments are first-class values, all of this can be manipulated from within the function itself.

These two concepts come together in the form of promises — conceptually, R’s answer to Rebol block!s. Promises contain unevaluated code, together with the environment in which it was created. Promises can be explicitly created, but all function arguments are implicitly represented as promises. When used, the code is evaluated in the environment to get the value of the function argument. But it is also possible to retrieve the unevaluated code, then manipulate it and/or evaluate it in another environment.

Alas, base R gives no way to retrieve the environment of a promise. This is fixed by the rlang package, which (amongst other things) gives a greatly more ergonomic interface for unevaluated code. Its basic data structure is the ‘quosure’, again storing code+environment. One can create these explicitly from function arguments, but more often they are manipulated via rlang’s quasiquoting functions.

For instance:

> get_mean <- function(data, var) dplyr::summarise(data, mean({{ var }}))
> get_mean(data, air_temp)
  mean(air_temp)
1       21.36986

Which (if I haven’t messed anything up) should be equivalent to the following Rebol code:

get-mean: func [data var] [summarise data compose [mean (var)]]
get-mean data [air-temp]

Incidentally, this shows a major use of R’s metaprogramming facilities: accessing variables in a data frame. Here, data is a table of weather measurements I happen to have, with air_temp being one of the columns. To my understanding, dplyr::summarise works by creating a new environment from its first argument data, then evaluates its second argument mean(air_temp) in that environment — which is why air_temp points to a column of data rather than some global variable. This is known as ‘data-masking’, and is widely used within the tidyverse libraries.

Rebol, of course, has different solutions to these problems. Instead of having functions automatically quote their arguments, the caller is expected to create block!s as needed. And, instead of associating environments with unevaluated expressions, Rebol associates a binding with each word!. This has advantages (quasiquotation is easier to reason about) and disadvantages (scopes no longer exist). But the end result is much the same: functions have complete control over where, when and how their arguments are evaluated.

hostilefork · January 2, 2024, 2:15am

3 posts were merged into an existing topic: Kaj Gets on the Meta Train

hostilefork · January 4, 2024, 8:08pm

Interesting to find out R has this character.

Wrapping things in blocks is one way. You can also quote things at the callsite (Ren-C allows you to quote all types arbitrarily, while Rebol and Red only have quoted words and paths).

But there are also "hard" quoted arguments, and "soft" quoted arguments. Ren-C does it a little differently than Rebol or Red.

A hard quoted argument, denoted with a quote mark, will give you literally whatever you pass it:

>> test-hard: func ['x] [probe x]

>> test-hard var
== var

>> test-hard :(first [var1 var2])
== :(first [var1 var2])

A soft quoted argument, denoted with a leading colon (GET-WORD!), will evaluate GET-GROUP!s or GET-WORD!s, but give everything else literally:

>> test-soft: func [:x] [probe x]

>> test-soft var
== var

>> test-soft :(first [var1 var2])
== var1

Rebol and Red flip this and use the colon form to do "hard" quoting, for whatever reason. They also don't have GET-GROUP!s, so their soft quoting evaluates plain groups.

Utilizing the feature...historical Rebol has a hard-quoting operator QUOTE that lets you quote whatever you want:

rebol2>> quote (1 + 2)
== (1 + 2)

In Ren-C, QUOTE is used for adding quoting levels to things. The hard-quoting operator is called THE.

ren-c>> x: 10

ren-c>> quote x
== '10

ren-c>> the x
== x

ren-c>> the (1 + 2)
== (1 + 2)

Pursuant to the above: if you want to limit this to passing a variable name, you'd more typically write get-mean data 'air-temp, which would let you type-constrain GET-MEAN to say that VAR is expected as a WORD!.

Note that historical Rebol's COMPOSE required /ONLY to place blocks as-is, otherwise it would splice them. If BLOCK is [a b c]:

rebol2>> compose [(block) d e f (block)]
== [a b c d e f a b c]

rebol2>> compose/only [(block) d e f (block)]
== [[a b c] d e f [a b c]]

In Ren-C, there is no /ONLY, and splicing is done with isotopic groups, created by SPREAD from blocks or groups.

ren-c>> compose [(block) d e f (spread block)]
== [[a b c] d e f a b c]

According one of your links, the laziness is generally not used (I helped someone with some R code despite not knowing it, and didn't run across this).

Are there a lot of examples of people trying to manipulate the R code as strings, or just the environments?

bradrn · January 5, 2024, 5:38am

OK, this is very interesting: Rebol has fexprs. And what’s particularly interesting about it is that I don’t actually see the need for it, when putting stuff into blocks is so convenient and idiomatic (thanks to definitional scoping). So what purpose does this feature serve in Rebol?

(Also, I didn’t know GET-GROUP!s existed, so thanks for mentioning them!)

So… THE in Ren-C corresponds to QUOTE in Lisp? (Or, at least, is the closest thing to QUOTE in Lisp.)

Yeah, fair enough. I’m still getting used to the Rebol paradigm, and in particular the multiple different kinds of quoting. (In a Lisp, 'a and [a] would both be (quote a).)

EDIT: er, actually, I guess they would really be (quote a) vs (quote (a)). Either way, it’s the same kind of quoting.

No idea, but I don’t see why anyone would choose to manipulate R code as strings, given that unevaluated code is already parsed and stored in AST form. I’m sure there are plenty of usecases for manipulating the AST, though!

hostilefork · January 5, 2024, 5:51am

Well the BLOCK!s are acting as FEXPRs, and you can also quote one item at the callsite at a time if it is convenient.

Beyond things like THE which are useful, it's used in things like for-each x [1 2 3] [print [x]]... the X doesn't have a quoting tic on it to pass to FOR-EACH as the name of the variable. (It is soft quoted, so you can calculate the variable name via GET-GROUP!)

Ren-C extends this to things like type of foo instead of type? foo, where the left-hand argument for OF can be taken as a plain word.

You could say for-each 'x or 'type of, but the idea is just nicer ergonomics for the language when you have a very common and understood expression.

Some discussion: Speaking With Tics

Ren-C has GET-GROUP! and GET-BLOCK!, as well as SET-GROUP! and SET-BLOCK!.

The evaluator's meaning for GET-BLOCK! is a REDUCE:

>> :['foo 1 + 2]
== [foo 3]

SET-BLOCK! is used in multi-returns.

Ren-C also expands TUPLE! to be generic, e.g. a.(b c).d is a 3-element TUPLE!, as a/(b c)/d is a 3-element PATH!.

TUPLE!s are used for field selection in Ren-C.

All of these parts are up to you to use how you wish in your own dialects.

Ok, well if they have an API for manipulating the AST then I guess they are more comparable.

bradrn · January 5, 2024, 7:02am

I’d say that ‘quoting at the callsite’ is specifically the primary attribute of fexprs. Providing block!s as arguments is very fexpr-like, but can be done in any language with quotation.

Personally I much prefer being explicit about such things, which means no quoting at the call-site (‘speaking with tics’). But it’s ultimately down to subjective preference.

bradrn · January 6, 2024, 7:49am

Since my first post above, I’ve been contemplating cases such as this:

> add_quasi <- function(arg1, arg2) quo({{ arg1 }} + {{ arg2 }})
> test <- function(arg) {
+     x <- 10
+     add_quasi({{ arg }}, x)
+ }
> x <- 5
> eval_tidy(test(x))
[1] 15

Here, add_quasi creates a quosure from both its arguments (using rlang’s quasiquotation syntax). test then passes its argument into add_quasi, alongside a local variable x. Finally, we call test with the global variable x. This gives a quosure containing code x + x, and the result of evaluating this is 10 + 5, i.e. 15.

Now, in Rebol this wouldn’t be too surprising, since each WORD! has its own binding. In R, however, it is: quosures associate one environment with a whole syntax tree, so we might expect it to evaluate to either 5 + 5 or 10 + 10. How does this work?

The answer becomes obvious if we print the quosure itself:

> result <- test(x)
> result
<quosure>
expr: ^(^x) + (^x)
env:  0x55dc115b55e8

This is a nested quosure! And of course, each one carries around its own environment:

> quo_get_expr(result)[[2]]
<quosure>
expr: ^x
env:  global
> quo_get_expr(result)[[3]]
<quosure>
expr: ^x
env:  0x55dc11710650

So the first x here is global, while the second is looked up in a local environment. (And the top-level quosure has a different environment yet again, in which + is looked up.)

If we choose, we can also collapse it into one big expression with a single environment, causing it to all be evaluated in the same scope:

> squashed <- quo_squash(result)
> squashed
x + x
> eval_tidy(squashed)
[1] 10

In fact, we should really ask rlang to warn us about such a destructive operation:

> quo_squash(result, warn=TRUE)
x + x
Warning messages:
1: Collapsing inner quosure 
2: Collapsing inner quosure

Actually evaluating nested quosures in R is slightly non-obvious, since quosures aren’t part of the base language. Insofar as I can tell, rlang implements eval_tidy using a bit of syntactic trickery — "quosure" is defined as a subclasse of "formula", a built-in variety of quoted code which uses a tilde for evaluation. eval_tidy then rebinds the tilde operator to evaluate quosures in their environments.

What would such a system look like in the context of Rebol? I think it would be a system where bindings can be associated with more than one type: not just WORD!, but also BLOCK! (and presumably also GROUP!, etc.). A WORD! would be evaluated in the context of its own binding if it has one; otherwise it would get looked up in the environment of the first containing BLOCK!/GROUP! which has a binding.

Considering such a system more seriously, I believe it would cope quite easily with code like that mentioned in Rebol And Scopes: Well, Why Not? :

 global: 10
 x: <not an integer>

 wrapper: func [string] [
     return do compose [interpolate (string)]
 ]

 foo: func [x] [
     let local: 20
     return wrapper {The sum is $(x + local)}
 ]

 foo 30

Starting with foo, it would bind its body block to a new environment. Then, foo 30 would cause the block to be evaluated — the words within it don’t have their own bindings yet, so everything executes within foo’s environment. foo proceeds to create local within its environment, and then it makes a string which is bound to its environment (since strings would also require bindings in order for string interpolation to work). Then, it calls wrapper, which similarly has bound its body to its own environment. As with foo, every word in its body would be looked up in wrapper’s environment — except for the variables in string, because string already has its own binding.

It also copes with this case:

Because you can still rebind whatever you want within a block. So a dialect can create its own environment, populate it with whichever words it wants, and then rebind words within a block to refer to that environment. And it all just works, because the rebound words will use their own bound environment, and the other words will use their parent block’s environment.

So, it seems to me like this system could just possibly work. But I’m admittedly bad with reasoning about this stuff — are there any edge cases I’ve missed? Is there some obvious reason why this wouldn’t work?

hostilefork · January 6, 2024, 11:48am

The point I was trying to make in "Rebol and Scopes: Well Why Not?" wasn't so much that there can't be some behavior. It's the question of whether that's the behavior that people intend.

If I have a block containing some terms--and I pass it to "someone else" to grok--what happens when some of those terms are meant to have values ascribed by the someone else, and others are supposed to carry over values supplied by the caller?

Ultimately there's no such thing as magic, and you need to have nuts and bolts available to say what you mean. To give some insight into what kind of thinking a seemingly simple binding problem requires, I wrote up the following:

Custom Function Generator Pitfalls

And that's really just two contexts in play: the function generator and the incoming material for the function body. People cobbling together source from lots of places with more complex rules about what lookups apply under what rules can get arbitrarily strange.

In something like PARSE, one avenue of attack is just to use keywords. A mapping from keywords (and types) to combinator functions gives the behavior that parse knows about. If a WORD! is encountered that's not in the mapping, it's only then that the binding is consulted.

>> rule: ["a" "b" try "c"]

>> parse "abababcab" [some [rule (print "found")]]
found
found
found
found
== ~null~  ; isotope

SOME and RULE are both words. But SOME exists in the mapping of combinators, and RULE does not. So after RULE is seen to not be in the map it falls back onto the binding to look it up.

I've thought that GET-WORD! might be a way of subverting keyword lookup and forcing the use of binding:

>> some: ["a" "b" try "c"]

>> parse "abababcab" [some [:some (print "found")]]
found
found
found
found
== ~null~  ; isotope

PARSE also has the problem of wanting to be able to define variables via something like LET internally to the parse, and imagining how this would work is all very new.

Anyway having bindings "stick" isn't the problem...it's when you are weaving together code with mixed wishes. And those wishes have historically been managed by a very simplistic model, that goes so far as to say that each method for each instance of an object has to do a full copy of the function body just so that it can be patched up so the WORD!s point to the object instance. :-/

bradrn · January 6, 2024, 11:59am

Hmm, fair enough. In that case, I’ll be more specific, and suggest that this particular model for binding is one which gives a good compromise between sane scoping and flexibility.

And I suggest that largely because I believe it does answer this question:

In this model, the choice is up to the ‘someone else’ (as indeed it is in current Ren-C, if I understand correctly). If they simply want to change some names, they can get at the environment bound to what they’ve been passed, and slip their own environment underneath that. If they want to do something more complicated (like, say, retaining GET-WORD!s as you suggest), they can do a deep traversal and bind their environment to individual words. Either way, the result is some kind of principled mixture of their environment and the caller’s environment.

hostilefork · January 6, 2024, 12:48pm

Historical Rebol did not have a notion of environments, but as I've said, Ren-C has been creeping toward them. Thanks for pointing out the similarities to R.

So long as granular bindings are permitted: I do think that contention between a binding glued onto a word or block vs. using a meaning coming from an environment turns out to be a bigger deal in practice than one might think. In part due to the fact that stray bindings happen kind of on a whim.

If you want a good explanation of a bad thing, here is how the closest thing to "global environments" used to work in R3-Alpha:

The Real Story about User and Lib Contexts

Things are improved significantly in Ren-C via "sea of words":

The Sea Of Words

But still, saying "oh it has a binding already, don't use the environment" will have problems, so it will take looking through real code to see what the limits are, and where to just give up and put things in objects and refer to them via obj.x instead of trying to convince an invisible propery of which x you meant in an ambiguous case.

bradrn · January 6, 2024, 12:59pm

I already understand how environment used to work (in large part because I did read Sea of Words). But your last paragraph confuses me… if you have any specific problems in mind, could you provide some code examples which would exhibit them?

Also, to be clear, it’s not just a matter of ‘don’t use the environment’. What I’m suggesting is that words could be bound to a different environment, which then overrides the next environment up.

hostilefork · January 6, 2024, 2:12pm

I can't tell if you are speaking of a world in which reduce [x x] can never return [10 20] or not.

If you do believe that should be possible, then there are a lot of associated issues. If you don't believe that should be possible then there are different issues of ways people have expected to use Rebol that won't work--and perhaps they shouldn't work, but these boundaries have not been articulated.

There are still basic challenges with the paradigm to feel through. I just added CONTINUE and BREAK to be aware of what loop they are breaking (e.g. specialized to the frame of the loop)

Definitional Break and Continue... the Time is Now

Binding has to answer the question of how the words connect to the right loop, in deeply nested structures, with nested loops. Historical Rebol just walked code and mutated the bindings (e.g. in for-each x data block any X words in that block would be destructively rebound to an X with the FOR-EACH. Even if you PROTECTed the block, it was still changed.

I have some avenues of solution to these problems but none of them crack the case of people restructuring code arbitrarily. Chains of inheritance can form loops when you take something out and put it back in and evaluate again.

Anyway, I'm interested in seeing how you might propose solving some of the issues. You have relevant experience and are picking up on it quickly. I do think you will find puzzles in the medium on your own--as I did--but now there's more groundwork to build solutions on.

bradrn · January 6, 2024, 11:54pm

I believe this should be possible. With the semantics I’m suggesting, this is what happens:

If neither WORD! in [x x] is bound to an environment, then reduce [x x] will always return two of the same value.
If one or both WORD!s are bound to an environment, there are no guarantees about the result of reduce [x x].

In my model, the easiest way to implement this kind of thing involves creating a new environment with the appropriate definitions of CONTINUE and BREAK, then rebinding the whole block such that, so to speak, that environment is ‘slipped under’ the existing environment of that block. (This amounts to taking the current binding of the block, setting it as the the parent of the new environment, then rebinding the block to the new environment.)

The effect of this is as follows: any occurrences of unbound CONTINUE and BREAK within the block get rebound to the new definitions. Any occurrences of CONTINUE and BREAK which were already bound to something else, are still bound to that something else. And all other words remain unchanged.

Amongst other things, this means that nested loops should work with no extra effort. Because this binding occurs at the level of blocks, an environment bound to a nested block will override an environment bound to a higher-level block. So a nested CONTINUE or BREAK gets bound to its surrounding nested loop.

I know that Rebol is not object-oriented… so what precisely do you mean by ‘inheritance’ here?

hostilefork · January 9, 2024, 8:54am

2 posts were split to a new topic: Block Creation Vs. Evaluation

hostilefork · January 9, 2024, 8:53am

What I was referring to is what you might think of as the "parent environment" linkage.

The first cut of implementing "specifiers" would basically just accrue links, so you'd get some issues of double accrual in cases like:

let x: 10
group: '(x + 2)
do reduce ['y: group]

The environment on the composed block would have stuff like the "x: 10" incoming, and then it would be on the group, as well.

A model that said "if something has a binding already, leave it alone" wouldn't encounter such "overbinding" cycles, so that's different. But a model that doesn't do "overbinding" works very differnetly, so we have to follow through on those discussions... which I'm trying to do (but it takes more time to do so than just answering questions about existing art...)

bradrn · January 9, 2024, 1:07pm

hostilefork:

The first cut of implementing "specifiers" would basically just accrue links, so you'd get some issues of double accrual in cases like:
let x: 10
group: '(x + 2)
do reduce ['y: group]
The environment on the composed block would have stuff like the "x: 10" incoming, and then it would be on the group, as well.

Firstly, I’ll note that ‘double accrual’ isn’t really possible with my proposal at all, because at any one time there is only one active binding. If you DO or REDUCE an inner block, that simply switches to the environment bound to that block, whether or not it’s the same or different to the current environment.

Additionally, the inheritance tree is a tree… it’s not a graph with loops. I believe that maintaining that invariant should just be a matter of choosing the right native functions for creating and modifying environments. In particular, it’s probably a bad idea to allow the programmer to re-assign the parent of an existing environment.

However, as for this particular test case, it’s worthwhile detailing how it would be evaluated in my model:

It would start with some environment with access to the standard library
x: is unbound, so x: 10 gets added to the current environment
'(x + 2) evaluates to an unbound group (x + 2), which similarly gets added to the current environment under the name group
['y: group] is an unbound group which evaluates to itself, picking up a binding to the current environment as it does so
reduce evaluates the elements of that block within its bound environment, i.e. the current environment, so it evaluates to [y: (x + 2)] where both the SET-WORD! and the GROUP! are unbound
- I just realised I never specified the binding of the block which gets returned from REDUCE… but it seems reasonable to posit that it retains its existing binding, in which case the whole returned block is still bound to the current environment
Finally, do executes the block in its bound environment, which is the same as the current environment, and both the elements of the block are unbound so they execute in that environment too.

So, most things end up unbound in any case, or bound to the current environment in the case of blocks. There’s not a huge amount of scope for things to go wrong if you aren’t explicitly doing anything weird.