What happens to FUNCTION! arguments and locals when the call ends?

R3-Alpha's CLOSURE provided two things. One was a unique identity for the words of a function's arguments and locals for each recursion. This is what I've called "specific binding" and now comes "for free" in all functions...so you don't even have to think about it. (It's not exactly free, but we can hope it will converge to "very low cost".)

So in Ren-C:

>> foo: function [x code] [
    append code [print x]
    if x > 0 [
        probe code
        do code
        foo (x - 1) code
    ]
]

>> foo 2 []
[print x]
2
[print x print x]
2 ;-- R3-Alpha FUNCTION! got 1, only CLOSURE! got 2
1

Users can now take that for granted. :thumbsup:

But what I want to talk about is the other emergent feature of R3-Alpha CLOSURE!. This was that if an ANY-WORD! that was bound to the arguments or locals "escaped" the lifetime of the call, that word would continue to have its value after the function ended...for as long as references to it existed.

>> f: closure [x] [return [x]]

>> b: f 10
== [x]

>> reduce b
[10]

Functions did not do this:

>> f: function [x] [return [x]]

>> b: f 10
== [x]

>> reduce b
** Script error: x word is not bound to a context

It goes without saying that the closure mechanic is going to cost more, just by the very fact that they need to hold onto the memory for what the word looks up to. But the way things work today, it doesn't just need to hold onto that cell of data...it holds onto all the args and locals of the function. (R3-Alpha was more inefficient still...it not only kept the whole frame of values alive, it made a deep copy of the function body on every invocation of that function...so that the body could be updated to refer to that "frame". Specific binding lets Ren-C dodge that bullet.)

Now and again, the "keep-things-simple" voice says that the system would be simpler and faster if all executing frames (and their frame variables) died after a function ended. If you wanted to snapshot the state of a FRAME! for debugging purposes--to look at after the function ends--you could COPY it into a heap-based object, and return that. If you really were in one of the circumstances where you wanted an arg or local's word to survive, you could manually make an object to hold just those words, and bind to that.

But @Ladislav had a compelling case:

foo: function [x] [
    y: 10
    return function [z] [x + y + z]
]

If x and y were to go bad after foo exited, the returned function would be useless.

Some new mechanics related to Move_Value() are creating possibilities for "automatic closure-i-fication", where stack cells are converted into a heap object at the moment it's noticed that a bound word is "escaping". If none escape, then everything stays on the stack.

But though you might think these kinds of escapes are rare, remember some bindings aren't even intentional. When you return a block out of a function it might just have stray bindings on words that happen to overlap with something in the binding visibility. (Which makes one wonder, when returning a BLOCK! as data, should you always UNBIND/DEEP it before returning...to scrub off any inadvertent pointers into your local state it carries? Should there be a RETURN/BOUND to avoid the scrub?) These invisible bindings would trigger the auto-closurification, on what might seem like random cases to the user.

And remember--each time a word bound to a frame escapes--we're still talking about copying all the values in the frame. (It might be possible to break this down to a smaller granularity, e.g. a PAIR!-wise binding, where what closure-i-fication does is pack each key/value into a REBSER node.)

Were the user to get involved, and specify the cases, I might suggest something a bit like this (if <HAS> were taken to mean "a kind of per-instance static", while <STATIC> were used for all instances):

foo: function [x <has> x2 y] [
    x2: x
    y: 10
    return function [z] [x2 + y + z]
] 

The advantages to this are that it would mean that any words that "escape" would be explicitly handled by the user, reducing the burden on the system. The entire frame would not need to be preserved, only the part of the frame which had these persistent values. The disadvantage is that it's not automatic, and other languages--even JavaScript--do it automatically.

So how do people feel on this matter? What's acceptable or unacceptable? @MarkI said at one point that he was opposed to locals and args outliving the function call because it created "garbage". Is it wise to hide the consequences from the user, and burden the system with the logic of making it automatic?

Oddly enough, I don't recall knowing that UNBIND even existed - so maybe my life can be easier. It's an interesting question. When are the bindings of returned words and blocks important? What should be "best practice"? I worry that doing Unbind/deep before returning will create collateral damage to bindings not part of the function's mission.

In thinking about the main thrust of your question I realised I'm not entirely clear on what a frame actually is. I have read Relative Binding and FRAME! Internals. Is there some documentation that defines FRAME! at a Rebol language level?

A FRAME! is much like an OBJECT!, but it is the context a function gets when it starts running. So if you ask for BINDING OF an argument/refinement/local of a function, this is the answer you will get. (R3-Alpha would just give you TRUE to say that such words in a function were bound, it had no object-like thing to interact with.)

You can also make a frame for a FUNCTION! explicitly. If you fill in all its required arguments, or refinements with TRUE and then the refinement arguments, you can DO it:

>> f: make frame! :append

>> f/series: copy [a b c]

>> f/value: 'd

>> do f
== [a b c d]

>> do f
== [a b c d d]

One unique feature of FRAME! is that you can ask it for what its function is, via ACTION OF. This was a little flaky before, but now it's more robust--e.g. if you MAKE FRAME! for a RETURN, it can remember which function to return from.

One important issue about FRAME! is that the "object-ness" of it appears on-demand. This is to keep from creating an entity the GC has to worry about on each and every function call. But once user code asks for a frame or gets its hands on it, then the GC has a little stub it has to track.

As per Github issue 605 my preference is for automatic closure-i-fication

Then if you want to avoid any unintentional escapes perhaps add a new function spec tag like <safe> (or similar.. for eg. <pure> , <cleaned>) which then automatically UNBIND/DEEP any returning BLOCK!

PS. I'm a pragmatist and so if there are too many costs involved with automatic closure-i-fication then I'm happy to leave FUNCTION has is and use <has>, <durable> or even keep CLOS/CLOSURE wrapper.

PPS. However the x2 in the <has> workaround ruffles my feathers a bit :slight_smile: I'd prefer to go with something like:

foo: function [<durable> x <has> y] [
    y: 10
    function [z] [x + y + z]
]

In fact I'd go further and do this:

foo: function [x [<durable>] <has> y (10)] [
    function [z] [x + y + z]
]

if we can tag individual args? Could be a handy feature going forward for other things!

There's a mechanical reason. Briefly:

Let's imagine you are calling a function and it has 10 arguments, refinements, and refinement arguments. To fulfill each of the arguments involves an evaluation. During any evaluation a garbage collection may occur.

So let's say your function has fulfilled argument 1, and gets to argument 2. And let's say there is a lot of computation to do to supply argument 2--enough so that a GC is triggered.

This GC needs to know about argument 1 and not free any resources held onto by argument 1. And if argument 2 has been partially or temporarily evaluated, that needs to be taken care of as well (hence the argument slot is initialized with GC-readable "nothingness" before the evaluation starts, and during evaluation must stay GC-readably-legit in some way). But the GC must know not to look at 3-10 because those are still raw uninitialized bits.

One possibility would be to do a pre-walk and format cells 3-10 to not be random noise. R3-Alpha did this. But the frame knows how far along in argument processing it is, so if when the GC runs it can look at the frame stack and know where to stop. That's cheaper than needing to do two separate walks of the arguments on every function call.

So we happily avoid pre-walking the cells, and the evaluator itself just initializes cells as it goes along fulfilling arguments. Unfortunately, the formatting process is different for stack cells and indefinite lifetime cells, which live in arrays. If argument fulfillment has to be sensitive to whether that argument will live indefinitely, then you have to pre-walk it in the stack case...to initialize with bits that can be sniffed by the evaluator.

By splitting it out so that ordinary arguments and locals are known to always have stack lifetime, then the formatting process doesn't have to worry about the cell's previous formatting bits. It can just write stack initialization into it.

So if we can promise we aren't ever going to do argument evaluation into cells with indefinite lifetime (e.g. no slots that are also args) then it's more efficient. That said, users can be made unaware of this at a higher level... the "real argument" could be named out of the way somehow, and then the durable non-argument could take the argument's name and proxy its value once the function started running.

But such things can get messy. (what about an adaptation, how would it know about the funny named variable that actually has the argument it's interested in?) So for starters, I'd rather have the underlying mechanic and "rules of the game" visible so people understand what's happening.

1 Like

NB. Here's my idea from chat. However after reading your explanation more closely I may end of treading over same ground :frowning:

If <has> is the way to go then we could possible solve the feather ruffling by a bit of extra FUNCTION generating (ie. pre-processor/macro).

So for the following simple example:

foo: function [x [<durable>]] [
    function [y] [x + y]
]

FUNCTION could pre-expand this into (something like) this:

foo: function [`x has x] [
    x: `x
    function [y] [x + y]
]

The use of backtick is just for an example. In Lisp you have GENSYM for creating symbols (ie. words) which don't stomp on anything else.

My spec is also just an example. If we could work out what all the closed-over words are then you could go for FUNCTION [x] [...]. Alternatively it could be CLOSURE [x] [...]

Obviously the bound words would still be exposed but it is controlled and easily identifiable (by nomenclature convention).

Anyway more food for thought!

Yup, and along the lines of things I've considered.

But the central questions remain ones we can sort of discuss abstractly. Like do you really want locals to be surviving by default, on accident?

We already know that FUNCTION's locals gathering behavior can be overaggressive. You can easily get hundreds of locals you don't need if you make an object in the body of that function... every one of its fields then becomes a local as well. (Once, @szeng had to update the workings of the stack because he had some hundreds of locals.) Combining that already-existing worry with the new worry of everything surviving after the function call is over, with possible accidentally leaked bindings, means we aren't talking about trivial cost. The perceived performance of the language could wind up being pretty bad.

I guess if we can agree that survive-by-default is a bad thing, and that CLOSURE is too broad a brush to include in the box, then all we're down to is the question above about when you want to mark an argument as durable.

From an implementation point of view, there's some stuff I really need to get integrated on a branch that's been hanging around too long and I'm tired of rebasing it. It contains the first inklings of virtual binding, but there simply are mechanical problems with using it with persistent parameters/refinements/locals. These problems may not be forever, but they are there for now.

So how bad would be if, for the moment, args and locals and refinements did not outlive the call. Then, <has> was changed to be different from <static> to mean per-instance values. It would be an optimized way of reusing a function's frame node (pointing at stack data) to also act as the node for the portion that outlives the invocation, and duck further problems for now? This means USE can still be written as eval func compose [ (args)] body or similar (not that this is the greatest idea in the first place).

It's a first step that doesn't throw out much infrastructure for changing our minds later. If virtual binding gets further, we'll know a lot more about how everything could work...which will likely affect these discussions.

For the moment, I am going to kill CLOSURE. Also, locals to a FUNCTION which are leaked past that function's lifetime will give an error when accessed.

To facilitate this, I've turned it around so that USE is now its own native (as opposed to being built on CLOSURE) whose rebound body will have bindings that outlive the USE (since it creates an OBJECT!).

That means it's easy enough to write:

 foo: function [a b <local> c d] [
     use [e f] [
         ;-- e and f will be alive after function call ends
     ]
 ]

@draegtun Note: If this is the intended semantics for USE, then <USE> might be even better than <HAS> in the function spec. :-/ I don't know. It's a question of how often USE intended indefinite lifetime...

This is not the pinnacle of efficiency, since USE doesn't have the same power the system does to avoid copying and rebinding (yet). BUT it's far more taxing on the system to be stuck assuming that you always want leaked args and local words to have indefinite binding lifetimes. And as we've emphasized above, I've become wary of the idea that survival-by-default is even a desirable semantic, when Rebol's model leaks bindings unintentionally all over the place.

More importantly, this unblocks the development of new and interesting ideas in the core, which might even be able to make the deep binding that USE does "closer to free".

Looking back at Ladislav's "good" example of closure necessity...

foo: function [x] [
    y: 10
    return function [z] [x + y + z]
]

His point being that the returned function is useless if x and y are expired references once foo is off the stack. It is a compelling case, but...

...it suggests that if anything, the "closuring" is a property of the usage. Why would you be annotating foo to say it's a "special kind of function whose variables outlive its call", as opposed to annotating the returned function has "special kinds of references"?:

foo: function [x] [
    y: 10
    return function [z <use> x y] [x + y + z]
]

To my mind this makes a lot more sense. If you delete the motivating usage you don't have to update anything about the enclosing function.

Whether <use>-ing is automatic or not is another question. But that aside, I do think that the existence of a CLOSURE function or a CLOSURE! datatype is not the answer.

Which is what I thought before--hence CLOSURE has been gone for a while. But the %closure.test.reb file was still hanging around. Here are those tests for consideration, but I'm deleting them from the repo:

; datatypes/closure.r
[closure? closure [] ["OK"]]
[not closure? 1]
[closure! = type of closure [] ["OK"]]
; minimum
[closure? closure [] []]
; return-less return value tests
[
    f: closure [] []
    void? f
]
[
    f: closure [] [:abs]
    :abs = f
]
[
    a-value: #{}
    f: closure [] [a-value]
    same? a-value f
]
[
    a-value: charset ""
    f: closure [] [a-value]
    same? a-value f
]
[
    a-value: []
    f: closure [] [a-value]
    same? a-value f
]
[
    a-value: blank!
    f: closure [] [a-value]
    same? a-value f
]
[
    f: closure [] [1/Jan/0000]
    1/Jan/0000 = f
]
[
    f: closure [] [0.0]
    0.0 == f
]
[
    f: closure [] [1.0]
    1.0 == f
]
[
    a-value: me@here.com
    f: closure [] [a-value]
    same? a-value f
]
[
    f: closure [] [trap [1 / 0]]
    error? f
]
[
    a-value: %""
    f: closure [] [a-value]
    same? a-value f
]
[
    a-value: does []
    f: closure [] [:a-value]
    same? :a-value f
]
[
    a-value: first [:a]
    f: closure [] [:a-value]
    (same? :a-value f) and (:a-value == f)
]
[
    f: closure [] [#"^@"]
    #"^@" == f
]
[
    a-value: make image! 0x0
    f: closure [] [a-value]
    same? a-value f
]
[
    f: closure [] [0]
    0 == f
]
[
    f: closure [] [1]
    1 == f
]
[
    f: closure [] [#a]
    #a == f
]
[
    a-value: first ['a/b]
    f: closure [] [:a-value]
    :a-value == f
]
[
    a-value: first ['a]
    f: closure [] [:a-value]
    :a-value == f
]
[
    f: closure [] [true]
    true = f
]
[
    f: closure [] [false]
    false = f
]
[
    f: closure [] [$1]
    $1 == f
]
[
    f: closure [] [:append]
    same? :append f
]
[
    f: closure [] [_]
    blank? f
]
[
    a-value: make object! []
    f: closure [] [:a-value]
    same? :a-value f
]
[
    a-value: first [()]
    f: closure [] [:a-value]
    same? :a-value f
]
[
    f: closure [] [get '+]
    same? get '+ f
]
[
    f: closure [] [0x0]
    0x0 == f
]
[
    a-value: 'a/b
    f: closure [] [:a-value]
    :a-value == f
]
[
    a-value: make port! http://
    f: closure [] [:a-value]
    port? f
]
[
    f: closure [] [/a]
    /a == f
]
[
    a-value: first [a/b:]
    f: closure [] [:a-value]
    :a-value == f
]
[
    a-value: first [a:]
    f: closure [] [:a-value]
    :a-value == all [:a-value]
]
[
    a-value: ""
    f: closure [] [:a-value]
    same? :a-value f
]
[
    a-value: make tag! ""
    f: closure [] [:a-value]
    same? :a-value f
]
[
    f: closure [] [0:00]
    0:00 == f
]
[
    f: closure [] [0.0.0]
    0.0.0 == f
]
[
    f: closure [] [()]
    void? f
]
[
    f: closure [] ['a]
    'a == f
]
; basic test for recursive closure! invocation
[
    i: 0
    countdown: clos [n] [if n > 0 [i: i + 1 | countdown n - 1]]
    countdown 10
    i = 10
]
; bug#21
[
    c: closure [a] [return a]
    1 == c 1
]
; two-function return test
[
    g: closure [f [function!]] [f [return 1] 2]
    1 = g :do
]
; BREAK out of a closure
[
    blank? loop 1 [
        f: closure [] [break]
        f
        2
    ]
]
; THROW out of a closure
[
    1 = catch [
        f: closure [] [throw 1]
        f
        2
    ]
]
; "error out" of a closure
[
    error? trap [
        f: closure [] [1 / 0 2]
        f
        2
    ]
]
; BREAK out leaves a "running" closure in a "clean" state
[
    1 = loop 1 [
        f: closure [x] [
            either x = 1 [
                loop 1 [f 2]
                x
            ] [break]
        ]
        f 1
    ]
]
; THROW out leaves a "running" closure in a "clean" state
[
    1 = catch [
        f: closure [x] [
            either x = 1 [
                catch [f 2]
                x
            ] [throw 1]
        ]
        f 1
    ]
]
; "error out" leaves a "running" closure in a "clean" state
[
    f: closure [x] [
        either x = 1 [
            error? trap [f 2]
            x = 1
        ] [1 / 0]
    ]
    f 1
]
; bug#1659
; inline closure test
[
    f: closure [] reduce [closure [] [true]]
    f
]
; rebind test
[
    a: closure [b] [does [b]]
    b: a 1
    c: a 2
    all [
        1 = b
        2 = c
    ]
]
; bug#447
[slf: 'self eval closure [x] [same? slf 'self] 1]
; bug#1528
[closure? closure [self] []]
[
    f: make closure! reduce [[x] f-body: [x + x]]
    change f-body 'x ;-- makes copies now
    x: 1
    4 == f 2 ; #2048 said this should be 3, but it should not.
    ; function and closure bodies are not "swappable", because keeping the
    ; original series would mean that the original formation would always
    ; drop the index position (there is no index slot in the body series).
    ; A copy must be made -or- series forced to be at their head.
]

; TESTS THAT CAME FROM OTHER FILES THAT STILL USED CLOSURE

; object cloning
; bug#2049
[
    o: make object! [n: 'o f: closure [] [n]]
    p: make o [n: 'p]
    'p = p/f
]

; reflexivity test for closure!
; Uses CLOSURE to make the test compatible.
[equal? a-value: closure [] [] :a-value]

; No structural equivalence for closure!
; Uses CLOSURE to make the test compatible.
[not equal? closure [] [] closure [] []]

; reflexivity test for closure!
[
    a-value: closure [] []
    same? :a-value :a-value
]

; no structural equality for closure!
[not same? closure [] [] closure [] []]

; reflexivity test for closure!
[
    a-value: closure [] []
    strict-equal? :a-value :a-value
]

; no structural equality for closure!
[not strict-equal? closure [] [] closure [] []]

; bug#1549
; BIND works 'as expected' in closure body
[
    b1: [self]
    f: closure [/local b2] [
        b2: [self]
        same? first b2 first bind/copy b1 'b2
    ]
    f
]

Tinkering with JavaScript a bit, the above pattern is omnipresent.

I'm definitely leaning there now myself. It goes without saying that if JavaScript can do something extremely useful that we can't do, that is bad.

When I brought this up exactly a year ago, I mentioned a possibility that was emerging:

The infrastructure to do this is there, and it implements a very coarse version of this. The poor man's version is to consider an "escape" to have happened any time a bound item that is resident in a BLOCK! in an action's body winds up being moved outside of that body.

The optimization that hasn't been done is to detect when a word bound to a frame is moved into a cell belonging to a frame that will outlive it. For instance:

below: func [<local> x] [
   x: 10
   above 'x
]

above: func [w [word!] <local> x-reference] [
    x-reference: w
    print get x-reference
]

There's no technical reason why putting the word for x into the x-reference local variable should force closure-ification, nor the argument passing into w. Because both w and x-reference are cells in the frame for above, which is above the below frame. Since below will outlive above, it can just use the direct pointer to the frame.

Without that, it's extremely coarse. Half--or more than half--of usermode actions will have to be closure-ified due to something that happens in their body. (This isn't surprising, because just calling an IF statement and passing a BLOCK! from the body would trigger it...since the optimization hasn't been implemented. The IF's condition argument is at a higher stack level, but being treated as if it were indefinite lifetime, forcing auto-closure-ification)

Good news is that there are a lot of ways to get that number down, which can now be explored. And moreover it's good news that the basic mechanism is working (e.g. the mechanism that's even letting escapes start to be counted at all). Because this mechanism is integral to virtual binding, which is still on the agenda and being enabled by advancements a bit at a time (pun intended).

But overall news is that I'm leaning toward feeling that automatic closure-ification is likely non-negotiable. Can't let JavaScript be more ergonomic about something like this.

As the question of what makes a language "timeless" has become central, we can't let JavaScript have the upper hand here. There are too many uses for this.

With the impending unification of FUNC and FUNCTION as synonyms, I think we should fold in indefinite lifetime as the default. Frames will also be smaller, because locals will be managed using a different technique.

There's a lot of optimization possible--and the codebase is under control to try it.

1 Like

This feels like justification in and of itself. I feel like most of the future users are going to be coming from a JavaScript-heavy background. We have to avoid making choices that make the language seem inferior or incapable of handling common and useful idioms.

The deed is done:

There's already some optimization and detection of cases that "leak" words. This detection is good enough that natives don't persist their frames (unless you get references to it via the debugger). e.g. the optimization is not "it's a native" but "no references to words bound into the frame escaped".

We'll just have to commit to doing better with performance (and it actually doesn't seem terrible at the moment). But the usermode experience needs to be the "timeless" one.

What is the mechanism to retain the old behaviour? Can I "unbind" locals and have deeper function calls fail to find them?

The one that you design.

All function invocations use the same "paramlist" (which is also the identity of the function) as the specification of the ordering of the keys of the frame. Each invocation of a function starts with an unmanaged "varlist" of equal or greater length (which may be recycled from other calls) to hold the values for that frame. This varlist starts out unmanaged, but may become managed if a reference to the frame "leaks".

Here you can see when the function call finishes, the behavior of what it does with frames that were never managed vs. if they were discovered to be managed. Previously, discovery that a frame had become managed would collapse the series node and free its data allocation. It had to keep the stub around because otherwise pointer dereferences would crash. Most of the change is to change this to leave the managed frame as-is and allow its references to continue resolving:

https://github.com/metaeducation/ren-c/pull/1015/files#diff-94ddbdf54cabd760b45d9ca65e2739b2R703

(Technically speaking, in the "unavailable" node strategy once the GC sees that a reference points to a stub, it could re-point the reference to a canon "unavailable" stub...and free memory for the stub it was pointing to. This would lose some amount of added information, e.g. knowing the paramlist of the specific function that was called. It was never implemented just for that reason--of making debugging harder.)

Anyway...the mechanisms are still there, but I'm convinced of what the default should be (and now quite convinced in the FUNC/FUNCTION synonym as well).