Variant Of "COLLECT" Without "KEEP"

no-e-in · January 11, 2024, 10:05am

I recently discovered a version of COLLECT by Brett Handley (for R2) that is less general [than what COLLECT has been come to be known as] but avoids nested code.

collect: func [
    {Collects block evaluations, use as body in For, Repeat, etc.}
    block [block!] "Block to evaluate."
    /initial result [series! datatype!] "Initialise the result."
    /only "Inserts into result using Only refinement."
] [
    if not initial [result: block!]
    result: any [all [datatype? result make result 1000] result]
    reduce ['head pick [insert insert/only] not only 'tail result to paren! block]
]

Examples (running in Rebol2 interpreter):

>> for i 1 10 2 collect [i * 10]
== [10 30 50 70 90]
    
>> foreach [a b] [1 2 3 4] collect [a + b]
== [3 7]

>> foreach w [a b c d] collect [w]
== [a b c d]

>> repeat e [a b c %.txt] collect/initial [e] %file
== %fileabc.txt

>> iota: func [n [integer!]][repeat i n collect/initial [i] make block! n]
>> iota 10
== [1 2 3 4 5 6 7 8 9 10]

hostilefork · January 11, 2024, 10:36am

Interesting idea... I'm sure @Brett can offer some historical perspective, and whether this was a precursor to COLLECT as we know it, or some other experiment. Doing a HEAD of an insertion at a TAIL suggests it's likely quite old... perhaps before APPEND existed (or before it had been standardized to return the head).

It builds a block of code that references a result block that it defines. Then returns that code.
The code is used as a loop body, and every time it is run adds the result to the block... then evaluates to the head of the collected block.
Then it takes advantage of the idea that loops evaluate to their last body result to give the answer.

One thing it can't do is control the result when the loop never runs the body... so you'll always get none, vs. an empty block:

rebol2>> data: [a b]
rebol2>> foreach w data collect [w]
== [a b]

rebol2>> data: []
rebol2>> foreach w data collect [w]
== none

It also can't control the result when a loop breaks (appears to be an UNSET!, in Rebol2):

rebol2>> data: [a b]
rebol2>> unset? foreach w data collect [if w = 'b [break] w]
== true

So these are some contrasts from what you'd get from today's COLLECT being outside a loop, with the KEEP inside.

Ren-C Notes

In Ren-C, loops that don't run their body at all return VOID... and loops that BREAK return the reserved value of a ~null~ antiform.

Also, some Ren-C loops started accepting functions, and passing them the loop variable value:

>> repeat 3 (lambda [x] [print [x]])
1
2
3

It's not fully baked, but was inspired by conditionals accepting functions as branches, and passing the condition:

>> if second [a b c] (lambda [x] [print [x]])
b

Brett · January 11, 2024, 8:47pm

My version is old. Append was still a mezzanine, this formulation was significantly faster than using Append.

Coming from standard imperative languages, Rebol was interesting and exciting and I'd noticed I was writing a lot of boring boilerplate code to collect results into a block from a variety of loop types. It was cool to realise I could manipulate the loop body, which in another language you'd expect was a given, bending it to fulfill a new intent and this would work for multiple types of loop function.

Not too long after I shared Collect, a wrapper function with Keep spontaneously appeared sporting the same name. I think there were a couple of tries of Keep versions by different people and ultimately the community pushed for one to be adopted into the Rebol distribution.

For myself, I never ultimately used Collect much in scripts. Using Collect as a wrapper function meant indenting the code, forcing the keyword Keep into code that might otherwise be resuable and wearing the performance penalty of doing so. For code with a single Keep, that seemed excessive for the boilerplate it avoided. Being able to have multiple Keeps in code is a feature, but such code seemed inelegant in some way. I let mine go, because doing something different than the more popular conception would be confusing in published code.

hostilefork · January 12, 2024, 10:14pm

Where things have been headed is to offer a MAP operation combined with generators.

>> gen: each [1 2]
== ~#[frame! "gen" []]~  ; anti

>> gen
== 1

>> gen
== 2

>> gen
== ~null~  ; anti

>> map x each [1 2] [x * 10]
== [10 20]

>> for x each [1 2] [print ["x is" x], x * 10]
x is 1
x is 2
== 20

When you combine this with GENERATOR and YIELDER (work in progress) it gives even more options.

You'd get something analogous to COLLECT and KEEP with:

>> map y generator [
       for x each [1 2] [yield x * 10] yield [a b] yield spread [d e]
   ] [y]
== [10 20 [a b] d e]

But it's not the same approach (e.g. there's no YIELD/LINE or YIELD/DUP). The pattern of implementing COLLECT+KEEP is applicable to different kinds of problems.

hostilefork · January 20, 2024, 5:26am

I was writing a PARSE example up with two COLLECTs in it, and I messed both of them up at first...

I expected collect some integer! to work, instead of collect some keep integer!.
I wrote collect keep some gather instead of collect some keep gather

So I can see the appeal of a KEEP-less COLLECT. But it's tough to implement cleanly; I don't think putting the COLLECT underneath an iteration and having it sneakily retain memory across those iterations is practical (though clever).

One line of attack in PARSE would be a construct that implies both iteration and collection together, so you could say something like collect-some-keep integer! Though that doesn't give you a way to express a difference of tolerating empty collections, you'd need collect-try-some-keep if you wanted that.

Maybe having a thing and calling it ACCUMULATE would be useful?

>> parse [hello 1 2 3] [let w: word! (print [w]), accumulate integer!]
hello
== [1 2 3]

It could have the at-least-one semantic, and then you could TRY ACCUMULATE and get a NULL if there weren't any (as opposed to an empty block). Then maybe you resort to COLLECT if you always wanted a block. Or vice-versa.

Maybe a variant like ACCUMULATE* could give back NULL if there's no YIELDs, e.g. the function it calls just returns NULL. (there's a COLLECT* that does this if there's no KEEPs.)

>> collect* [keep 1 keep 2]
== [1 2]

>> collect* [print "No keeps!"]
No keeps!
== ~null~  ; anti

A non-combinator ACCUMULATE could be applied to generators, as above:

>> accumulate generator [
      for x each [1 2] [yield x * 10] yield [a b] yield spread [d e]
   ]
== [10 20 [a b] d e]

If you passed ACCUMULATE a BLOCK! it could assume you wanted that block to be a generator:

>> accumulate [
      for x each [1 2] [yield x * 10] yield [a b] yield spread [d e]
   ]
== [10 20 [a b] d e]

Again, that's close to COLLECT and KEEP, minus the ability to KEEP/LINE or KEEP/PART or KEEP/DUP.

hostilefork · February 22, 2024, 9:20am

I've realized this is actually extremely useful. Ren-C's ability to vaporize things with ELIDE makes it even more useful, since making the result you want drop out of an expression is even easier without a temporary variable.

 >> parse ["a" a <a> "b" b <b>] [accumulate [text! word! elide tag!]]
 == [a b]

So I've added it!

accumulate: combinator [
    return: "Block of accumulated values"
        [block!]
    parser [action?]
    <local> collected
][
    collected: copy []
    remainder: input
    cycle [
        append collected (
            [@ remainder]: parser remainder except e -> [
                return collected
            ]
        )
    ]
]

It just loops through calling the "combinated" parser function it receives, until there's a match failure. Then it returns the block it collected. (Ultimately I decided that an empty block is probably what you want vs a null if there are no matches, but a variation could be easily done...as you can see.)

UPARSE extensibility for the win yet again.