Tail Calls in Ren-C...Yes, We Have Them

Looking into some of Joe Marshall's historical participation in Rebol, he speaks about Rebol 1.0 having "continuations" and "tail calls".

Ren-C uses continuations to implement stacklessness, based on a trampoline. Trampolines are generally slower than the fiddly technique Joe describes Rebol 1.0 as having used--but such fiddly methods are not standard C, and do not translate well to WebAssembly. In any case, the outcomes are similar.

Tail calls are a different feature. But if you have continuations, then your code is likely structured in such a way that tail calls are not that difficult to add.

So What Is A Tail Call?

When you invoke a function, the interpreter has to allocate some space for its arguments and locals. This information is then associated with a stack level structure, which also takes some space... and is put in a chain representing the call stack.

If that function calls itself recursively, the recursion will also need new stack space for its arguments and locals...and another stack level structure.

With recursive algorithms where a function calls itself hundreds or thousands of times, you can use up a lot of memory (and/or generate a stack overflow).

Tail calls are based on the observation that right at the moment a function is about to return, it doesn't need its stack level or arguments/locals anymore. So if you're willing to have your call stack suffer some amount of "amnesia" about the "real" stack depth, you can reuse the current call's stack level for a recursion.

Ren-C Supports Explicit Tail Calls With RETURN/RUN

There are two forms of tail call we can do in a Rebol-like interpreter. The first style can call any function (not just the one you're returning from). It avoids a recursion level on the stack, but does not reuse the memory for arguments and locals.

This is what that looks like:

foo: func [return: [tag!] n] [
    if n = 1 [
        return <success>
    ]
    return/run :foo n - 1
]

>> foo 100
== <success>

That triggers 100 calls to FOO, but only using one level of stack depth.

There's a fairly obvious technical reason why this variation cannot build the invocation for the recursion's N into the old stack frame's N argument... it uses the old N (as n - 1) to calculate the new N. You can't calculate new parameters into old parameter slots without winding up referencing your intermediate calculations instead of the old values you wanted.

(A compiled language could notice when old arguments were used to calculate new ones, and if it happened they could make space for temporary copies just for values that would be overwritten before they were seen...but being interpreted, we have to assume the worst and keep all the old values.)

If you really want to avoid making a new stack level and reuse the memory for the args and locals, you need a different approach. Mutate the variables in-place before jumping to a restart of the function:

foo: func [return: [tag!] n] [
    if n = 1 [
        return <success>
    ]
    n: n - 1
    return/run @redo
]

>> foo 100
== <success>

Modifying the variables in place means you're responsible for any dependency issues. If you overwrite one of your arguments while calculating the next, you wrote the code in sequence and you can see what you did.

Why RETURN/RUN :FOO vs. RETURN FOO or RERUN FOO

Ren-C's RETURN construct has special knowledge of what function it returns from. It is a specialization of a generic DEFINITIONAL-RETURN function, which is specialized with the target FRAME!. It's done in a somewhat optimized way, but still has some cost.

If tail calls were done with another construct--e.g. something like RERUN--that function would also need to be specialized with this knowledge. It's cheaper to just piggy back on what RETURN already knows and make it a refinement.

As for why it has to take a GET-WORD! vs. plain WORD! for the function to invoke...this is due to RETURN not taking its first argument quoted. So it's too late to intercept the mechanics of the call once an evaluation is run. RETURN would receive the product of the call.

It winds up looking fairly natural, because the RUN construct runs a function that gathers its arguments inline:

>> run :append [a b c] [d e]
== [a b c [d e]]

Note that even if we could make tail calls implicit, we probably wouldn't want to. Python hasn't added tail calls at all, based on a philosophical objection to the idea of obscuring the call stack. It also seems like if an algorithm depends on tail calls for some important optimization reason, there should be something to indicate that fact... so that someone reorganizing the code will be sensitive to it.

How Important Are Tail Calls?

If your language has a very limited amount of stack, being able to formulate your code to use tail calls so it doesn't accrue stack could be very important.

But if your language has a decent amount of stack (or is completely "stackless") it's not particularly important.

Where it matters are cases where an algorithm can be cleanly expressed in a recursive way, and you don't want to use an uglier representation that builds data structures which are mimicking what you would be getting naturally from the stack.

Using them in places that don't matter is likely just going to confuse your stack traces and error messages...like the Python people say. I'd discourage going around and changing all your RETURN SOMETHING to RETURN/RUN :SOMETHING to try and speed things up. Only do it if you're writing an algorithm that would have pathological stack use otherwise.

RETURN/RUN Had A Precursor... REDO

The original formulation of tail calls wasn't built into RETURN, but expected you to work with a more generic tool called REDO. This is what you had to write for the FOO example:

foo: func [return: [tag!] n] [
    frame: binding of 'n
    if n = 0 [
        return <success>
    ]
    n: n - 1
    redo frame
]

This corresponds to the return/run @redo case, which takes advantage of the fact that RETURN already knows the frame it returns to, and can use that to redo.

But REDO had a shorthand for this case, which was to accept a word from a frame:

foo: func [return: [tag!] n] [
    if n = 0 [
        return <success>
    ]
    n: n - 1
    redo 'n
]

This had led me to say "sure, Ren-C has tail calls". But I realized that didn't really look like what people would expect, so I made RETURN/RUN.

An added benefit of RETURN/RUN is that if you use it with the function call form (instead of @redo) then it offers something that REDO couldn't do, which is reuse an internal level structure for an arbitrary new function call. It wasn't actually that hard to implement, so I went ahead and did it.

REDO is still around. It can restart any frame or phase, and it's actually rather powerful. I don't really use it, but here's an example of how tricky it can be:

base: func [return: [text!] n delta /captured-frame [frame!]] [
    log [{BASE} n delta]

    n: n - delta
    if n < 0 [return "base less than zero"]
    if n = 0 [return "base done"]
    if captured-frame [redo captured-frame]
    return "base got no frame"
]

c: chain [
    adapt :base [
       log [{C} n delta]

       captured-frame: binding of 'n
       redo/sibling 'n :s

       comment {fall through to base}
    ]
    lambda [x] [
        if x = "base done" [
            <success>
        ] else [
            spaced ["base exited with" x]
        ]
    ]
]

s: specialize adapt :base [
    log [{S} n delta]

    if n = 1 [n: 10]
][
    delta: 10
]

>> c 11 0
== <success>

With a logging function attached, that gives:

 --- C ---
 n: => 11
 delta: => 0
 --- S ---
 n: => 11
 delta: => 10
 --- BASE ---
 n: => 11
 delta: => 10
 --- C ---
 n: => 1
 delta: => 10
 --- S ---
 n: => 1
 delta: => 10
 --- BASE ---
 n: => 10
 delta: => 10
  • C is called and captures its frame into F. Then it uses REDO/SIBLING to reuse the frame to call S. S gets the variables and args that it knows about as C left them--such as N and a captured frame F--but values it takes for granted are reset, which includes specialized delta of 10.

    (The need to reset specializations for consistency is similar to how locals must be reset--they're not part of the interface of the function, so to reach beneath them does something illegal in terms of parameterization.)

  • S doesn't have any effect besides resetting delta, so it falls through as an adaptation to the base function. BASE subtracts DELTA from N to get 1, which isn't an exit condition. The F frame which was set in C and was preserved as an argument to S is then used by BASE to REDO and get back up to the start of C again.

  • Once again C captures its frame and does a REDO to start up S, which now notices that N is 1 so it bumps it up to 10. (It cannot set DELTA to 1, because as a specialized argument DELTA is not visible to it.) This time when it falls through to BASE, the subtraction of DELTA from N yields zero so that BASE returns completion.

  • Since the function we originally called and built a frame for was a CHAIN, the REDO is effectively REDO-finishing the frame for the adaptation of BASE that sits at the head of the frame. That delegation has now finished bouncing around on that single frame and come to a completion, which means the chained functions will get that result. The string is translated to a tag and signals success.

All Pretty Crazy Stuff You Probably Won't Use

:crazy_face:

But it's there, to placate people who think a language has to have tail calls to get respect.

2 Likes