The History of Multi-Return in Ren-C

hostilefork · March 28, 2019, 1:56am

This thread merges discussions of history across several threads to cover the key points, starting with material from the post announcing the addition of the SET-BLOCK! and GET-BLOCK!...and then folding in important points from later posts in the relative chronology of when they happened. So it's broken up into sections by approximate date.

March 2019: SET-BLOCK! and GET-BLOCK! Added

These were generic parts which you could use in dialects however you wanted...just like other types.

But a key motivation was the concept that the evaluator would use them for some kind of multiple-return-value strategy. Users of languages with the feature seemed to rave about it. And if you don't have it, they will complain and constantly try to find ways to work around it.

I had inklings that Rebol could do them in a way that is mindbending and original. But a dirt simple implementation of the idea in 2019 looked roughly like what people were used to with SET of BLOCK! from Rebol2, without needing to write the word SET:

early-multi-concept: function [a b] [
    return reduce [(a * 3) (b - 16)]
]

>> [x y]: early-multi-concept 10 20
== [30 4]

>> x
== 30

>> y
== 4

So ([x y]: foo) would act like (set [x y] foo), including allowances to let you take fewer than one value:

>> [x]: [10 20]
== [10 20]

>> x
== 10

But the simplicity had obvious drawbacks

First of all, Ren-C had NULL states that by design could not be put into blocks. This approach wouldn't be able to return those without distorting them into some different reified value than the intended NULL.

>> [x y]: function-returning-null-and-30
== [<null> 30]  ; can't put null, so... "something else" in first slot?
                ; (note this was long before quasiforms existed)

>> x
; null  (not the same thing as what the block said...seems bad, yes?)

>> y
== 30

Also, you would have to know that what you were calling returned multiple values. If you missed that and used a plain SET-WORD!, you'd just wind up with the block:

>> x: some-function-i-didnt-know-was-multi-return ...
== [ret1 ret2 ret3]  ; the block! could easily be mistaken for single return

I had one of those ideas that just wouldn't go away...

It seemed clearly preferable that rather than choose between [x]: and [x y]: and knowing in advance how many values you're taking or throwing away... someone who only asked for one value should be able to be blissfully ignorant. So the choice would be between:

>> x: multi-return ...
>> [x y]: multi-return ...

Even though the first case is a multiple return, you wouldn't set x to a block of values, but just get the first value. An interface like this would also solve the issue of returning NULL.

Not being forced to return a BLOCK! also made great sense... because since BLOCK! is always truthy, you'd not really be able to make useful conditional behavior be based on a BLOCK! return anyway!

>> [x y]: function-returning-null-and-30
; null  (so you could meaningfully say `if [x y]: whatever [...]`)

>> x
; null

>> y
== 30

Going even further, I suggested "it would be important that if a function wanted to do multiple returns that it could know how many results it was assigning. This could save calculation on things that aren't needed."

...but all this would require some kind of magic...

hostilefork · April 8, 2020, 3:39pm

February 2020: Fake It (with Infix) Until We Make It

It occurred to me to work up a prototype completely in usermode using a skippable infix parameter:

Multiple Return Values Via Infix

So a function could act as having multi-returns by peeking to its left... and seeing if there was a SET-BLOCK! there. If there was, then it would do the SET-ing of the variables that were needed (and could choose to only do calculations if variables were requested). If no SET-BLOCK! was on the left, it could just return a value as usual.

But this was not generalized in a way that functions could share.

April 2020: Unify With The Historical Refinement Model

The old school "multiple return" method was to pass in WORD!s to set as variables. Such as DO/NEXT:

 r3-alpha>> value: do/next [1 + 2 10 + 20] 'pos
 == 3

 r3-alpha>> pos
 == [10 + 20]

You see that DO could check for the presence of the /NEXT refinement and behave differently. It knows whether it has one return value or two. Based on that knowledge, many routines might have more optimized implementations when not all the possible return results they could give are wanted.

But since this mechanism existed, why couldn't the evaluator build a bridge so that the variables in the SET-BLOCK! would be passed in to specially marked refinement slots?

In April 2020, I implemented that. It made variables passed from a SET-BLOCK! on the left compatible with the historical method of passing WORD!s via refinements...to be SET by the function as additional outputs....

This comment from the C code commit explains how that prototype was approached.

//==//// SET-BLOCK! //////////////////////////////////////////////////////==//
//
// The evaluator treats SET-BLOCK! specially as a means for implementing
// multiple return values.  The trick is that it does so by pre-loading
// arguments in the frame with variables to update, in a way that could have
// historically been achieved with passing a WORD! or PATH! to a refinement.
// So if there was a function that updates a variable you pass in by name:
//
//     result: updating-function/update arg1 arg2 'var
//
// The /UPDATE parameter is marked as being effectively a "return value", so
// that equivalent behavior can be achieved with:
//
//     [result var]: updating-function arg1 arg2

So all you needed to do to get the feature was mark a refinement as an output parameter. Then check to see if it's null or not, and assign it if it's a WORD! or PATH!...the same way you ever would have.

You can use it in the old style (like a TRANSCODE/NEXT being passed a position to update) or you can use the SET-BLOCK! syntax and let the evaluator do the magic.

That meant these two calls would appear equivalent to the insides of TRANSCODE:

>> value: transcode/next/relax "1 [2] <3>" 'next-pos 'error

>> [value next-pos error]: transcode "1 [2] <3>"

We're also given the feature of being able to check for if a return is requested, and vary the behavior based on it:

 >> transcode "abc def ghi"
 == [abc def ghi]

 >> [value rest]: transcode "abc def ghi"
 == abc

 >> rest
 == [def ghi]

TRANSCODE and LOAD seemed to show great results immediately (though later problems of composability from this feature ultimately lead to it being panned to use in core functions )

And this was tried going along for the next year...

hostilefork · July 4, 2022, 7:10pm

The multi-returns were WORD!s of the variables...while usually being named as the variables they represent. This made them misleading if you tried to assign them.

Here's an example of a very common mistake:

remove-tags: func [  ; version that looks like it *should* be right but wasn't
    {Return block with all tags removed}

    return: [block!]
    num-removed: [integer!]

    block [block!]
][
    num-removed: 0
    while [not tail? block] [
        if tag? block.1 [
             take block
             num-removed: me + 1
        ]
    ]
    return block
]

That doesn't work because it's overwriting the WORD! that NUM-REMOVED may-or-may-not hold.

Here's how you had to do it instead:

remove-tags: func [  ; ugly working version, but that's easy to get wrong
    {Return block with all tags removed}

    return: [block!]
    num-removed: [integer!]

    block [block!]
][
    let removals: 0
    while [not tail? block] [
        if tag? block.1 [
             take block
             removals: me + 1
        ]
    ]
    if num-removed [set num-removed removals]
    return block
]

July 2022: Proxying Multi-Return Was Invented

Each output variable was given its own local variable as the output. Then when the function finishes, the multi-return mechanics proxied the contents of that local variable to whatever you passed in as the variable to assign.

This was desirable for other reasons:

It meant that if a function errored, then no changes to the variables you used on the left hand side of a multi-return would be made.
It prevents unwanted dependencies: you don't want a multi-return to intentionally--or accidentally--have variant behavior based on the name of the variable it is returning. (that is a feature that should be restricted to refinements that pass WORD!, because you're getting the word...)
Having a local variable to store a result in whether there was a multi-return request for it or not means that if you do a calculation that has a by-product you use during the body of your function, you don't need a separate name for that.

But there would have to be some other means to know if a variable was requested by the callsite. With some amount of hand-waving, I called that WANTED?... where you'd pass in the WORD! of the multi-return argument and be told whether it was being assigned to an output variable or not.

How Proxying Worked

Functions would have two cells in the frame for each output parameter: one refinement with the output name, and one hidden local slot to store the writeback variable (if any)

A prelude would run before the function body, where whatever is in the refinement slots for the outputs would be shifted into their associated local slot.
- If no variable is given, then the refinement slot would be overwritten with a ~ isotope, indicating an unset variable state. The hidden writeback slot would be set to null.
- If a variable was given, the the refinement slot would again be written to indicate unset. But the variable would be moved into the writeback slot
- The WANTED? function is implemented by having the implementation peek into the hidden writeback slot and report if anything is there
During the body of the function they use the named interface slot normally...assigning and reading with ordinary SET-WORD! and variable access (instead of needing SET and GET)
When the body is finished, an epilogue writes the value of the output cells for any that had saved variables
- This is also a good time to do typechecking
- Internally, the parameter slot for the output variable can keep the types to check, as it can never be specialized

The epilogue runs at the moment of RETURN if a function has a return...because that's when the stack still has information of which return path has the problem (if there's more than one such path). This is consistent with the historical concept that it's RETURN that does the typechecking.

In November 2020, I Had Actually Mused About Proxying

"We could imagine a different setup which tried to let you just do multi: 10 directly, and then when the operation was over would proxy that value into the target variable. And it could use some similar rules about how when the frame started, it could be either NULL if it wasn't wanted or # if it was wanted. But that seems a lot more error-prone. And the variable exists anyway to make the request...so why not go ahead and set it where it is, instead of going through a middleman anyway?"*

So I seem to have talked myself out of it because it would cost an extra slot for each variable, and people might not remember to check the state and lose it.

But it was much better. Things quickly got more comprehensible in UPARSE (it was a large diff, so linking directly to it doesn't seem to work):

Inefficient first-cut at multi-return proxying · metaeducation/ren-c@3b0d9ba · GitHub

But lingering issues regarding wrapping and composing multi-returns remained...

hostilefork · November 27, 2022, 4:54am

The proxying technique had shown clear advantages for the authors of functions, to be able to do direct assignments vs. need to remember to always SET the named variable passed in.

But underneath the hood, the names of the variables were still like inputs to the lower-level function. This went as far as trying to act compatibly with the variable-passed-by-refinement trick that historical Redbol used for multiple returns.

Among the many problems were trying to write something like ENCLOSE. An enclosing function could only influence the primary output, unless it went through some particularly convoluted work:

It would have to capture the variable name passed in before delegating, because the name would be repurposed as the proxied value slot
The delegated function would be called and would update the variable as part of its RETURN
The encloser would then have to read back this written variable with GET to see it, and use SET to update it...again.

Here's an example of what you could think you might have to do:

multi-returner: func [
    return: [integer!]
    extra: [integer!]
][
    extra: 20  ; by this point, variable passed as /EXTRA hidden
    return 10  ; stowed /EXTRA variable written back using EXTRA's 20 value
]

wrapper: enclose :multi-returner func [f [frame!]] [
    let extra-var: f.extra  ; capture var before call via DO moves it aside
    result: do f  ; callee proxies input variable during RETURN
    set extra-var (get extra-var) + 1  ; get new written value and update it 
    return result + 1
]

>> [a b]: wrapper
== 11

>> a
== 11

>> b
== 21

The situation was actually even worse than that. All the complex logic for filling proxy slots with variable WORD!s was done during the frame building. e.g. extra-var wasn't the hypothetical "parameter passed to extra before it got shifted into a hidden variable slot by FUNC", it was already the unset slot to be filled by the callee. And the actual variable name was private, known only to the enclosed function.

And it's worse than that if you want to preserve the ability to have behavior depending on how many inputs are requested, because there may be no variable at all... or an "opt in for the feature without a variable" placeholder. Correct code would be much more convoluted, if meaningful code could be written at all.

The headaches go deeper. Copying a frame and running it multiple times introduced semantic and ordering problems about the writing of these additional outputs!

Simply Put: Variable Names As Inputs Make Poor Outputs

All of this pointed to the inconvenient truth:

Implementing a function's conceptual outputs by passing named variables as input that are imperatively written during the function's body--anywhere, even at the end--is something that will break composition.

It's also horrible for atomicity, because a SET of an output variable may happen but then there's an error which occurs before the final return result can be produced... so any multi-return function working in this way is either broken or bearing an undue burden to do its own "transaction management", which is to say probably also broken.

Of course we knew this. But to get the desired effects (single return unless you use a SET-BLOCK!), there's no other choice, right?

The idea of making an ANY-VALUE! which tried to bundle values was nixed in the beginning. Because if we declared some new datatype to represent a multi-return pack that decays to its first value when assigned to a variable, you enter a catch-22, like this early puzzle when @[...] was being considered to denote multi-returns:

multi-return: func [] [
    return @[10 20]  ; assume RETURN is "magic" and returns @[10 20] vs. 10
]

>> x: multi-return
== 10

>> [x y]: multi-return
== 10

>> x
== 10

>> y
== 20

The problems are apparent on even a trivial analysis. These "highly reactive" @[...] values wreak havoc in a general system. If you walked across a block and encountered one, trying to work with it to store them in a variable would introduce distortions on assignment when they "decayed" to their first element.

for-each x [foo @[10 20] bar] [
    if integer? x [...]  ; INTEGER? sees @[10 20] as just 10
]

...but gee... if only there were some variation of BLOCK! which you could be guaranteed not to encounter when enumerating other blocks, and that couldn't be stored in variables... and a method for dealing with transforming them into and out of reified states so you could work with them...

Hey, waitaminute...

September 2022: Core Multi-Return via Antiform BLOCK!

Generalized Isotopes made longstanding problems seem to fall like dominos; like block splicing and error handling. And they could be applied here, too... solving several profound problems:

As with all antiforms, you wouldn't be able to put BLOCK! antiforms in blocks...alleviating many conceptual problems.
Yet even more severely, you wouldn't be able to put BLOCK! antiforms in variables. Antiform BLOCK!s would decay to their first value when assigned to any variable
- This turned out to be the only antiform "decay" mechanism required, subsuming prior concepts like decaying "heavy null" to "light null". Heavy null would simply be a null representation in an antiform block.
- Representing things like null would be possible since ^META values would be used in the multi-return convention, to afford the multi-return of antiforms themselves
Fitting in with the general rules of isotopes, a QUASI-BLOCK! would evaluate to an antiform block:
```
>> ~['10 '20]~
== ~['10 '20]~  ; anti

>> x: ~['10 '20]~
== 10
```
Functions that might be interested in the antiform state would need to take a parameter via a ^META parameter, in which case they would receive a QUASI-BLOCK! vs the decayed parameter
- A good example of a function that would want this would be RETURN, in order to be able to have a forwarding mode that would return an antiform result vs. its decayed first value
- This doesn't rule out building generators that proxy named output variables. In fact it fixes its problems, by limiting the relevance of the fact that proxying is being used to the interior of the function, and making its externals speak the antiform block protocol. If you want a proxying-FUNC you can have it...just define your own RETURN variation and wire it up.
- It also opens the doors to many other conceptions of how to abstract the multi-return process
SET-BLOCK! assignments would have special understandings of how to decompose antiform blocks and assign the component variables
- This would break the uneasy "backchannel" between caller and callee of variable names
- The most obvious sign this had been a problem was that mere parenthesization would break historical multi-assignment:
```
>> [a b]: multi-return   ; would work

>> [a b]: (multi-return)  ; would act like `(a: multi-return)`
```
- Now any expression that doesn't store a variable as intermediate can act pass-thru (such as a conditional), and if a variable wanted to capture the multi-return character temporarily it could META it...potentially manipulate the QUASI-BLOCK!, and UNMETA it back

Casualties of Composability

One casualty of this was be the feature of being able to make a function's behavior depend on how many outputs were requested. But the feature can still be achieved with enfix quoting left-hand-side and managing the assignment, it's just no longer something the core attempts to generalize.

Another casualty is legacy compatibility with passing in variable names via refinement. But again: this feature could be achieved by AUGMENT-ing the function with the refinement, then ENCLOSE-ing that with something that wrote the multi-return's output to the variable passed in via that augmented refinement.

But there's really no competition here. As I've hopefully made clear, passing in a named variable via refinement is simply not in the same league as a mechanism which legitimately makes additional outputs.

As usual with these things, I'll admit it may not be simple or obvious at first glance, but the results are speaking for themselves!

hostilefork · November 28, 2022, 5:50am

One point to mention here is that as far as the core protocol goes, there's no names for the individual multi-return outputs. If we wanted that, we'd have to make it so that if a function returned an "OBJECT! antiform" with labeled fields (that decayed to the first field?)

This isn't silly vs. returning an OBJECT! by convention in the same way that it's not silly vs. returning a BLOCK! by convention. You want that single-result default, and then more only if you ask for it. So there's still a way the first field in the frame could be picked out via a SET-WORD! and the other fields by a SET-BLOCK!; or you could ^META it and access them by name.

It's something to keep in mind, but nothing being done currently rules it out as a future possibility. The central point is the same--that there's an antiform bundle that can't be interpreted as something you intend to store in a block or assign to a variable in and of itself. The antiform status decays to one item in the bundle in the absence of a specific unpacking intent.