TRANSCODE and LOAD Multi-Return Improvements

TRANSCODE and LOAD were two of the first routines that took advantage of multi-returns, and showed a real cleanup. To keep a writeup chronicling the development of multi-returns more focused, I'm moving the part where I sing the praises of the improvements to its own Show & Tell thread!


[...In April 2020...] I was looking at TRANSCODE, which is a fairly complex in terms of its parameterization. This is the basic routine that exposes the scanner and turns UTF-8 into Rebol values. (Cool sidenote: in Ren-C with UTF-8 Everywhere you can now use it on strings, while R3-Alpha could only use it on binaries...)

A typical TRANSCODE operation turns a string into a block, so conceptually:

 >> transcode "1 [2] <3>"
 == [1 [2] <3>]

 >> transcode "[1 [2] <3>]"
 == [[1 [2] <3>]]   ; always a block of values, even if only one block value

Sometimes people want to transcode just one value at a time, so there is TRANSCODE/NEXT. As with DO/NEXT, this introduces another useful output... which is where you want to write the advanced position to do further processing. Let's look at it in that old style (none of this is real code in Ren-C, so just read it, don't run it):

 >> transcode/next "1 [2] <3>" 'new-pos
 == 1

 >> new-pos
 == "[2] <3>"

There's yet another potential output variation coming from the /RELAX switch. This means that if you have gibberish, it will skip that token and return an ERROR! value:

 >> transcode/next/relax "4bad [2] <3>" 'new-pos
 == make error! [...whatever...]

 >> new-pos
 == "[2] <3>"

That's not ideal, as with some kind of ANY-CONTEXT! literal solution you might actually find an ERROR! value in a scan.

Then Multi-Returns Were Re-Conceived As Refinement Doppelgangers

The way you would do a multi-return is to mark a parameter a "return" parameter, e.g. an output. The parameter is set either to null, or something to assign--like a refinement--in order to indicate it is requested. Then at the end of the function when it tears down the frame, those values are set.

So this way if you have a WORD! or PATH! you want to assign, you can pass it in as an argument. But if you use the multi-return convention, it will implicitly load those slots with variables out of the block on the left.

That means these two situations would appear equivalent to the insides of TRANSCODE:

>> value: transcode/next/relax "1 [2] <3>" 'next-pos 'error

>> [value next-pos error]: transcode "1 [2] <3>"

And then, these two would appear equivalent:

>> value: transcode/relax "1 [2] <3>" 'error

>> [value _ error]: transcode "1 [2] <3>"

This means that there needs to be a way of marking arguments as returns in the function spec, and their order matters (just as order of ordinary arguments matters).

It was a great duality, giving you the option to work with named arguments if you want, but provides a shorthand...where SET-BLOCK! collaborates with the evaluator to push into those slots.

It was quite amazing to see the benefits to TRANSCODE right off the bat!

Then I repeated the success with LOAD...

If you want a header you can just say:

[data header]: load %some-file.reb

And if you don't want a header, you just say data: load %some-file.reb as normal. This replaces the way the output changed in historical Rebol, where LOAD sometimes returned a BLOCK! with a header as the second item, and sometimes it did not.

rebol2>> load {[Rebol [Title: "hi"] a b c]}
== [a b c]

rebol2>> load/header {[Rebol [Title: "hi"] a b c]}
== [make object! [
        Title: "hi"
        Date: none
        Name: none
        Version: none
        ; ...etc, etc...
        Language: none
        Type: none
        Content: none
    ] a b c
]

What's also an improvement by setting variables directly over the BLOCK! approach is that you don't have to put BLANK! in a slot, you can really return NULL:

>> [data header]: load "<no> #header 'here"
== [<no> #header 'here]

>> data
== [<no> #header 'here]

>> header
; null

Note: This was another change powered by Pure and Refined...simplifying the nature of arguments in frames down to one-name-per argument and one-value-per lets us play with this equivalence. We can imagine function frames expanding via AUGMENT to add new return values, too...

2 Likes

Very cool. Thank you— I can see this being really useful for dialects (being coerced out of text, which is relevant for me), and almost anything which helps dialects is a good thing, in my opinion.