Implementing COLLECT + KEEP

hostilefork · January 10, 2024, 5:20pm

COLLECT allows you to build up a block, without needing to name the block or pass it as a parameter to individual APPEND instructions. Instead you use KEEP, which appends to the implicit nameless block:

>> collect [
       keep 'foo:
       print "Arbitrary code possible"
       keep/line [1 2 3]
       keep spread [Spread #works @(T O O)]
       repeat 2 [keep <whatever>]
    ]
Arbitrary code possible
== [
    foo: [1 2 3]
    Spread #works @(T O O) <whatever> <whatever>]

Leverages LAMBDA To Bind KEEP To Code

The trick is that the body is turned into a function that takes KEEP as a parameter. This defines the word KEEP for the body.

To see how this works, imagine this:

collector: lambda [keep [action?]] [
    keep 'foo:
    print "Arbitrary code possible"
    keep/line [1 2 3]
    keep spread [Spread #works @(T O O)]
    repeat 2 [keep <whatever>]
 ]

 block: copy []
 keeper: specialize :append [series: block]

 collector :keeper

This code gets the desired result in BLOCK.

Slight Twist: Make KEEP Return Its Input

APPEND will return the block that you append to. This would reveal the partially-built temporary block before the collect is complete. A better and more useful result of KEEP would be to return the value that you pass it.

To accomplish that, we can ENCLOSE the specialization:

keeper: enclose (specialize :append [series: block]) func [f [frame!]] [
    let value: f.value
    do f
    return value
]

We have to capture the value to append before we DO the captured FRAME!, because Rebol functions are permitted to make arbitrary modifications to their arguments during execution. (To help avoid mistakes, you are not allowed to read a frame's values after a DO is complete.) It's possible to DO COPY F but that makes a copy of the entire frame, and here we just copy the value we want.

A more efficient way to do this is to use a LAMBDA for the wrapper function, and ELIDE the DO. There's no need to type check F (since ENCLOSE only passes the FRAME! built for APPEND, never anything else):

keeper: enclose (specialize :append [series: block]) lambda [f] [
    f.value
    elide do f  ; evaluates to anti-isotope of 0 length block, vanishes
]

Putting It Together

Wrapping this up for a working COLLECT implementation:

collect: func [
    return: [block!]
    body [block!]
][
    let block: copy []
    let keeper: enclose (specialize :append [series: block]) lambda [f] [
        f.value
        elide do f
    ]
    run (lambda [keep] body) :keeper
    return block
]

It's a good demonstration of how you can make something impressive that feels like a first-class language feature out of Rebol, with little effort.

hostilefork · January 10, 2024, 6:43pm

I thought it might be interesting to compare how Ren-C does COLLECT to other implementations.

Rebol2

collect: func [
    body [block!]
    /into
    output [series!]
][
    unless output [output: make block! 16]
    do make function! [keep] copy/deep body make function! [value /only] copy/deep [
        output: either only [insert/only output :value] [insert output :value]
        :value
    ]
    either into [output] [head output]
]

MAKE FUNCTION! in Rebol2 was a low-level routine that did not make a copy of the body. If you're wondering why it's being used here instead of FUNC when it does a COPY/DEEP of the body... I think that just means it's misguided inlining (to avoid the overhead of calling FUNC).

Historical DO is variadic, and if passed a function will collect further args at the callsite. Ren-C makes DO arity-1, and only runs functions with complete frames. Variadic invocation is done with the distinct RUN operation.

We see here a COLLECT/INTO feature that lets you do a COLLECT into an already existing series. These two statements would be equivalent.

collect/into [...] target
insert target collect [...]

There was a certain cabal of people who lobbied for adding /INTO operations to various functions in order to avoid the creation of intermediate series...which they believed was costly. Experience has borne out that the handling of /INTO generally made things slower. I was in the camp who never liked it, and called it "the /INTO virus". All instances of /INTO in Ren-C were dropped.

Due to the /INTO, the implementation is based on INSERT instead of APPEND, and has to update the intermediate block's insertion position as it goes.

There's no specialization in historical Rebol. So the KEEPER function being made here takes an /ONLY refinement, and then dispatches to one of two different calls to either INSERT or INSERT/ONLY.

R3-Alpha

collect: func [
    body [block!]
    /into
    output [series!]
][
    unless output [output: make block! 16]
    do func [keep] body func [value [any-type!] /only] [
        output: apply :insert [output :value none none only]
        :value
    ]
    either into [output] [head output]
]

The misguided inlining is gone, and it just uses FUNC.

Rather than having two calls to INSERT based on whether /ONLY is used, it uses the messy historical APPLY operator (see APPLY II: The Revenge for contrast).

If you notice the NONE and NONE in the APPLY, this is for missing refinements that APPEND and INSERT have, that aren't given to KEEP... /DUP and /PART.

r3-alpha>> append/dup [a b c] <d> 3
== [a b c <d> <d> <d>]

r3-alpha>> append/part [a b c] [d e f g] 2
== [a b c d e]

By using specialization, Ren-C inherits all the refinements of APPEND. There's no /ONLY (since splicing is solved with isotopes), but there is a /LINE refinement (demonstrated above)... and /DUP and /PART are still around.

ren-c>> collect [keep/dup <d> 3]
== [<d> <d> <d>]

ren-c>> collect [keep/part spread [d e f g] 2]
== [d e]

Red

collect: func [
    body [block!] 
    /into
    collected [series!] 
    /local keep rule pos
][
    keep: func [v /only] [append/:only collected v v] 
    unless collected [collected: make block! 16] 
    parse body rule: [
        any [
             pos: ['keep | 'collected] (pos/1: bind pos/1 'keep)
             | any-string! | binary! | into rule | skip
        ]
    ] 
    do body 
    either into [collected] [head collected]
]

Okay... well... they did that differently.

It adds another keyword, COLLECTED, to give you access to the collection result as you go. That seems to me about as likely to cause accidents from people not knowing about the feature as it is to be useful. Needing access to the buffer as you collect it means you might as well name it, at which point COLLECT is saving you almost nothing... Ren-C makes it easy enough to specialize APPEND and call it EMIT, adding whatever other features you need.

The code block that you pass in has its bindings mutated directly--and everywhere in the system that does this has bothersome implications. (pos/1: bind pos/1 'keep) uses 'keep as a shorthand for naming a context, e.g. "bind the word KEEP or COLLECTED that we found in the block to whatever context this word 'keep is looking up to". In this case, that means bind it to this function invocation.

Red has a weird ability to chain refinements that they're taking advantage of here with append/:only which saves them from doing an APPLY or branching to either APPEND or APPEND/ONLY calls. This only works if the variable name matches the refinement name. Strange feature. Ren-C can do this but via what I think is a more normal way, with append/(if only ['only]). In practice, having things like specialization, adaptation, enclosure, etc. turns out to be much higher-leverage (and faster in most cases).

Offhand I don't see how their /INTO can work correctly if they're using APPEND.

red>> data: [a b c]
== [a b c]

red>> collect/into [keep <1> keep <2> keep <3>] data
== [a b c <1> <2> <3>]

red>> data
== [a b c <1> <2> <3>]

Okay, so, it doesn't work correctly.

rebol2>> data: [a b c]
== [a b c]

rebol2>> collect/into [keep <1> keep <2> keep <3>] data
== [a b c]

rebol2>> data
== [<1> <2> <3> a b c]