Tail Calls in Ren-C...Yes, We Have Them

Looking into some of Joe Marshall's historical participation in Rebol, he speaks about Rebol 1.0 having "continuations" and "tail calls".

Ren-C uses continuations to implement stacklessness, based on a trampoline. Trampolines are generally slower than the fiddly technique Joe describes Rebol 1.0 as having used--but such fiddly methods are not standard C, and do not translate well to WebAssembly. In any case, the outcomes are similar.

Tail calls are a different feature. But if you have continuations, then your code is likely structured in such a way that tail calls are not that difficult to add.

So What Is A Tail Call?

When you invoke a function, the interpreter has to allocate some space for its arguments and locals. This information is then associated with a stack level structure, which also takes some space... and is put in a chain representing the call stack.

If that function calls itself recursively, the recursion will also need new stack space for its arguments and locals...and another stack level structure.

With recursive algorithms where a function calls itself hundreds or thousands of times, you can use up a lot of memory (and/or generate a stack overflow).

Tail calls are based on the observation that right at the moment a function is about to return, it doesn't need its stack level or arguments/locals anymore. So if you're willing to have your call stack suffer some amount of "amnesia" about the "real" stack depth, you can reuse the current call's stack level for a recursion.

Ren-C Supports Explicit Tail Calls With RETURN:RUN

There are two forms of tail call we can do in a Rebol-like interpreter. The first style can call any function (not just the one you're returning from). It avoids a recursion level on the stack, but does not reuse the memory for arguments and locals.

This is what that looks like:

foo: func [return: [tag!] n] [
    if n = 1 [
        return <success>
    ]
    return:run foo/ n - 1
]

>> foo 100
== <success>

That triggers 100 calls to FOO, but only using one level of stack depth.

There's a fairly obvious technical reason why this variation cannot build the invocation for the recursion's N into the old stack frame's N argument... it uses the old N (as n - 1) to calculate the new N. You can't calculate new parameters into old parameter slots without winding up referencing your intermediate calculations instead of the old values you wanted.

(A compiled language could notice when old arguments were used to calculate new ones, and if it happened they could make space for temporary copies just for values that would be overwritten before they were seen...but being interpreted, we have to assume the worst and keep all the old values.)

If you really want to avoid making a new stack level and reuse the memory for the args and locals, you need a different approach. Mutate the variables in-place before jumping to a restart of the function:

foo: func [return: [tag!] n] [
    if n = 1 [
        return <success>
    ]
    n: n - 1
    return:run 'redo
]

>> foo 100
== <success>

Modifying the variables in place means you're responsible for any dependency issues. If you overwrite one of your arguments while calculating the next, you wrote the code in sequence and you can see what you did.

Why RETURN:RUN FOO/ vs. RETURN FOO or RERUN FOO

Ren-C's RETURN construct has special knowledge of what function it returns from. It is a specialization of a generic DEFINITIONAL-RETURN function, which is specialized with the target FRAME!. It's done in a somewhat optimized way, but still has some cost.

If tail calls were done with another construct--e.g. something like RERUN--that function would also need to be specialized with this knowledge. It's cheaper to just piggy back on what RETURN already knows and make it a refinement.

As for why it has to take a trailing-slash PATH! vs. plain WORD! for the function to invoke...this is due to RETURN not taking its first argument literally. So it's too late to intercept the mechanics of the call once an evaluation is run. RETURN would receive the product of the call.

It winds up looking fairly natural, because the RUN construct runs a function that gathers its arguments inline does the same thing:

>> run (either 1 < 2 [append/] [insert/]) [a b c] [d e]
== [a b c [d e]]

So if you were to RUN just a function, it would also look like RETURN:RUN...

>> run append/ [a b c] [d e]
== [a b c [d e]]

Note that even if we could make tail calls implicit, we probably wouldn't want to. Python hasn't added tail calls at all, based on a philosophical objection to the idea of obscuring the call stack. It also seems like if an algorithm depends on tail calls for some important optimization reason, there should be something to indicate that fact... so that someone reorganizing the code will be sensitive to it.

How Important Are Tail Calls?

If your language has a very limited amount of stack, being able to formulate your code to use tail calls so it doesn't accrue stack could be very important.

But if your language has a decent amount of stack (or is completely "stackless") it's not particularly important.

Where it matters are cases where an algorithm can be cleanly expressed in a recursive way, and you don't want to use an uglier representation that builds data structures which are mimicking what you would be getting naturally from the stack.

Using them in places that don't matter is likely just going to confuse your stack traces and error messages...like the Python people say. I'd discourage going around and changing all your RETURN SOMETHING to RETURN:RUN SOMETHING/ to try and speed things up. Only do it if you're writing an algorithm that would have pathological stack use otherwise.

2 Likes

RETURN:RUN Had A Precursor... REDO

The original formulation of tail calls wasn't built into RETURN, but expected you to work with a more generic tool called REDO. This is what you had to write for the FOO example:

foo: func [return: [tag!] n] [
    frame: binding of 'n
    if n = 0 [
        return <success>
    ]
    n: n - 1
    redo frame
]

This corresponds to the return:run 'redo variant, but that version takes advantage of the fact that RETURN already knows the frame it returns to, and can use that to redo.

REDO tried to have parity with that, by accepting a word from a frame and assuming you wanted to redo whatever that was bound to:

foo: func [return: [tag!] n] [
    if n = 0 [
        return <success>
    ]
    n: n - 1
    redo $n
]

This had led me to say "sure, Ren-C has tail calls". But I realized that didn't really look like what people would expect, so I made RETURN:RUN.

An added benefit of RETURN:RUN is that if you use it with the function call form then it offers something that REDO couldn't do, which is reuse an internal Level structure for an arbitrary new function call.

But I've Realized REDO Was Fundamentally Broken...

I've mentioned that I've been dotting "i's" and crossing "t's", so assertions are tightening up. And this idea of generalized REDO--regardless of what kind of action generator is in effect--has a lot of holes in it.

Basically, your generator has to know it's being REDO'd and it has to come up with a semantically coherent meaning for that. This is challenging, e.g. if you have a YIELDER.

/y: yielder [yield: [integer!] x [integer!]] [
    yield x + 1
    yield x + 2
    redo binding of $yield
]

>> y 10
== 11

>> y 20
== 22

>> y 30
== 31  ; where's the magic to implement this?

There's a lot that goes into making yielder work, which is tracked in the Details array of the generated function. That state is sensitive to the question of whether the yielder is "in flight" or not. You can't just take an in-flight yielder and start running its C dispatcher code as if it's having its first execution, the whole thing dies.

Must Cooperate With The Generator, like RETURN:RUN

If this kind of feature is going to be offered, it has to be offered with the generator's knowledge, so the right thing can be done.

If it were important, generalized REDO could be a general feature that is intercepted by function generators. But designing that generalization does not make a whole lot of sense at this time, and I can't really see any time in the future for it either.

The formulation as a feature of something like RETURN or YIELD makes much more sense because that cooperative relationship is already established.

So, I'm killing off the broken REDO. To the extent this unimportant feature is important at all, use return:run 'redo

Old REDO Tests

Here's what's going away. (Analogous tests already exist for RETURN:RUN where applicable)

; REDO via a direct FRAME! value
(
    /foo: func [return: [tag!] n] [
        let frame: binding of $n
        if n = 0 [
            return <success>
        ]
        n: n - 1
        redo frame
    ]

    <success> = foo 100
)

; REDO via extraction of FRAME! from an ANY-WORD?
; (has binding to a FRAME! to lookup variable value)
(
    /foo: func [return: [tag!] n] [
        if n = 0 [
           return <success>
        ]
        n: n - 1
        redo $n
    ]

    <success> = foo 100
)

; REDO locals clearing test
; (locals should be cleared on each redo)
(
    /foo: func [return: [tag!] n <local> unset-me] [
        if set? $unset-me [
            return "local not cleared"
        ]
        if n = 0 [
            return <success>
        ]
        n: n - 1
        unset-me: #some-junk
        redo $return
    ]

    <success> = foo 100
)

; REDO type checking test
; (args and refinements must pass function's type checking)
;
~expect-arg~ !! (
    /foo: func [return: [tag!] n i [integer!]] [
        if n = 0 [
            return <success>  ; impossible for this case
        ]
        n: n - 1
        i: #some-junk  ; type check should fail on redo
        redo $return
    ]

    foo 100 1020
)

; REDO phase test
; (shared frame compositions should redo the appropriate "phase")
(
    /inner: func [return: [tag!] n] [
        if n = 0 [
            return <success>
        ]
        n: 0
        redo $n  comment "should redo INNER, not outer"
    ]

    /outer: adapt inner/ [
        if n = 0 [
            return "outer phase run by redo"
        ]
        ; fall through to inner, using same frame
    ]

    <success> = outer 1
)

; !!! This initially seemed to have some broken expectations, that the OUTER
; would be able to view a variable (f.n) of INNER, when INNER does a redo of
; OUTER that does not also redo INNER...hence the frame is stale.  Not clear
; what the point was, but changed to count down a global to at least demo
; the ability to REDO an ENCLOSE frame.
(
    global: 1

    /inner: func [return: [text!] n :captured-frame [frame!]] [
        if n = 0 [
           return "inner phase run by redo"
        ]
        n: 0
        redo captured-frame  ; should redo OUTER, not INNER
    ]

    /outer: enclose inner/ func [return: [tag!] f] [
        if global = 0 [  ; was F.N, see note about that being wrong
            return <success>
        ]

        f.captured-frame: binding of $f
        global: me - 1

        ; Fall through to inner
        ; It is running in the same frame's memory, but...
        ; CAPTURED-FRAME is a FRAME! value that stowed outer's "phase"

        return eval-free f
    ]

    <success> = outer 1
)

; "Sibling" tail-call with compatible function
;
; (CHAINs are compatible with functions at head of CHAIN
;  ADAPTs are compatible with functions they adapt
;  SPECIALIZEs are compatible with functions they specialize...etc.)
;
; If LOG is set to DUMP the following will output:
;
;     --- C ---
;     n: => 11
;     delta: => 0
;     --- S ---
;     n: => 11
;     delta: => 10
;     --- BASE ---
;     n: => 11
;     delta: => 10
;     --- C ---
;     n: => 1
;     delta: => 10
;     --- S ---
;     n: => 1
;     delta: => 10
;     --- BASE ---
;     n: => 10
;     delta: => 10
;
; C is called and captures its frame into F.  Then it uses REDO:SIBLING to
; reuse the frame to call S.  S gets the variables and args that its knows
; about as C left them--such as N and a captured frame F--but values it takes
; for granted are reset, which includes specialized delta of 10.
;
; (The need to reset specializations for consistency is similar to how locals
; must be reset--they're not part of the interface of the function, so to
; reach beneath them does something illegal in terms of parameterization.)
;
; S doesn't have any effect besides resetting delta, so it falls through as
; an adaptation to the base function.  BASE subtracts DELTA from N to get 1,
; which isn't an exit condition.  The F frame which was set in C and was
; preserved as an argument to S is then used by BASE to REDO and get back
; up to the start of C again.
;
; Once again C captures its frame and does a REDO to start up S, which now
; notices that N is 1 so it bumps it up to 10.  (It cannot set DELTA to 1,
; because as a specialized argument DELTA is not visible to it.)  This time
; when it falls through to BASE, the subtraction of DELTA from N yields
; zero so that BASE returns completion.
;
; Since the function we originally called and built a frame for was a CHAIN,
; the REDO is effectively REDO-finishing the frame for the adaptation of
; BASE that sits at the head of the frame.  That delegation has now finished
; bouncing around on that single frame and come to a completion, which means
; the chained functions will get that result.  The string is translated to
; a tag and signals success.
(
    /log: (
        func ['x] []  comment "no-op"
        elide dump/  comment "un-elide to get output"
    )

    /base: func [return: [text!] n delta :captured-frame [frame!]] [
        log ["BASE" n delta]

        n: n - delta
        if n < 0 [return "base less than zero"]
        if n = 0 [return "base done"]
        if captured-frame [redo captured-frame]
        return "base got no frame"
    ]

    /c: cascade [
        adapt base/ [
           log ["C" n delta]

           captured-frame: binding of $n
           redo:sibling $n s/

           comment "fall through to base"
        ]
        lambda [x] [
            if x = "base done" [
                <success>
            ] else [
                spaced ["base exited with" x]
            ]
        ]
    ]

    /s: specialize adapt base/ [
        log ["S" n delta]

        if n = 1 [n: 10]
    ][
        delta: 10
    ]

    <success> = c 11 0
)