Efficient Consuming Append-Like Operator... GLOM ?

hostilefork · August 5, 2021, 9:36am

So I'm looking at the question of how to take bits of material that may or may not be produced, that become owned by the recipient...and accumulate them.

For efficiency, I'd like an operator that will be null tolerant. Like this:

>> accumulator: null

>> result: make-some-stuff arg1 arg2 arg3
== ~null~  ; isotope (let's say it produced nothing)

>> accumulator: glom accumulator result
== ~null~  ; isotope (okay, that's a no-op)

>> result: make-more-stuff arg4 arg5 arg6
== [a b c]

>> accumulator: glom accumulator result
== [a b c]  ; since there was nothing in the accumulator, we take it

>> append accumulator [d]
== [a b c d]

>> result
== [a b c d]  ; it's okay to reuse result, protocol was we "own" it

This kind of operator is handy when synthesizing results up the stack. Because it has carte-blanche to tear up the series its given, it could notice when the result array was larger than the accumulator and steal its memory to slide results into...rather than do a new allocation.

So after a GLOM the thing being appended (result above) will either be equivalent to the accumulator or it will appear to have been FREE'd.

The specific feature I'm looking to support with this is the "third result" I've been talking about for combinators, of their "pending" items...this would include collected things. I don't want to have to be making new empty blocks with copy [] just in case combinators want to add to the collection something...they may just come back with blanks. So if the accumulator could start off as blank that would avoid these stray series creations most of the time.

I don't think allowing you to JOIN to a NULL and get a new series is a good idea, and it wouldn't be able to do the other optimizations either (since JOIN does not assume you are giving it ownership of the appended data to destroy as you wish). So this is a distinct operator, and calling it GLOM is probably fine for now. But thought I'd describe what it did just to put that out there.

Here is a usermode prototype:

glom: func [
    return: "New accumulator value (may be blank or reused)"
        [blank! block!]
    accumulator [blank! block!]
    result
][
    if accumulator [
        append accumulator result
        elide if block? result [free result]  ; placeholder for stealing memory
    ] else [
        case [
            null? result [null]
            block? result [result]
            (append copy [] :result)
        ]
    ]
]

Note that I'm using CASE fallout above, where if you don't pair up the clauses in a case it will just evaluate to the last condition:
>> case [
      1 = 2 [print "math is broken"]
      10 + 20
   ]
== 30
Rebol2 and R3-Alpha report #[true] for that, while Red says #[none]. The only answers I'd consider for this would be 30 or raise an error. I definitely like just being able to write a FAIL in that spot without having to say true [fail "..."], and I guess that "laxness" makes me feel it's a nice place to put a default as well. This is still an open question.

hostilefork · August 5, 2021, 10:57pm

It occurs to me that GLOM could be even more efficient if it was willing to let QUOTED! items signify a single item. :-/

>> accumulator: null

>> result: ...
== 'x

>> accumulator: glom accumulator result
== 'x

>> result: ...
== 'y

>> accumulator: glom accumulator result
== [x y]

This starts pointing to a rather strange universe in which something like a FOR-EACH on a QUOTED! would treat it as a container of one item.

>> result: the 'z
== 'z

>> for-each item result [print mold item]
z

We're already treating QUOTED! as a container of one item in things like APPEND. So it's not necessarily that crazy an idea.

But it's definitely a sacrifice of clarity for efficiency. :-/ On the back burner for now, but something to think about to further cut down on series allocations...since quote bits (up to 3 quote levels) are compressed into cells themselves.