Stopping the /INTO Virus

hostilefork · June 23, 2018, 12:04pm

My confidence of /INTO being bad means the deed is done and /INTO is gone.

Though I had one little moment of pause when noticing that COLLECT/INTO was able to KEEP into a string:

>> collect/into [keep 'A keep [B C]] x: copy ""
== "ABC"

What it's doing is basically equivalent to the following:

>> x: to text! collect [keep 'A keep [B C]]
== "ABC"

The difference is, the /INTO version "folds" any data it gets into the string as it goes. Using TO TEXT! after-the-fact means that if you do thousands of KEEPs, you'll wind up with a collected BLOCK! that's thousands of elements long...that can't be GC'd as you go. COLLECT/INTO would contribute material to the string one element at a time with no intermediate block.

Having a string target for /INTO when the source material is a block is thus materially different. When the target is a block anyway, you wind up gathering the same amount of total state...and the GC will clean up any intermediates. But targeting a block instead of folding a string means you might run up against an operation your system had enough memory to do with the fold, but not with the block.

Still...COLLECT/INTO an ANY-STRING! is a false economy

The folding nature of /INTO might make it sound good on paper. But if we look at the big picture, we can see why it's not very compelling. I'm going to write this all out because I want there to be no doubt that exterminating it was the right move.

You only get the TO TEXT! semantics of the underlying INSERT. There are a lot of ways to turn blocks into text, or to fold strings together. How often is what INSERT would do exactly what you wanted?

Most of the time I've used COLLECT with strings I find myself writing spaced collect [...]. But SPACED treats single characters differently from strings. It assumes you don't want them to participate in the delimiting but want them treated as-is:

>> spaced ["a" "b" #"c" "d"]
== "a bcd"

This is actually very important, because it means newlines don't get delimited since they are a CHAR!. You see weird behaviors on that in R3-Alpha/Rebol2/Red:

red> form reduce ["a" newline "b"]
== "a ^/b"

If a COLLECT/INTO is just folding text and throwing the material it has away, it can't do such subtleties.

You're losing the power of the evaluator. KEEP itself doesn't evaluate, so once something is folded into a string...you don't have any bindings or anything to evaluate anymore. That's lame. One thing about KEEP is that you don't have your hands tied with it... you can say things like:

x: unspaced collect [
   for-each [flag string] data [
       ...
       if flag [keep 'reverse]
       keep string
       ...
    ]
]

But if you folded it as you went, you'd wind up needing some nested collect:

x: collect/into [
   for-each [flag string] data [
       ...
       keep reduce collect [
           if flag [keep 'reverse]
           keep string
       ]
       ...
    ]
] result: copy ""

The second is less clear than the first. And as I said in the first point, it's not like UNSPACED is even the logic you'll be wanting.

The KEEP result doesn't return what you added and may not be that interesting. You get what was passed to keep passed through just like when you're adding to a block:

 >> collect/into [print ["KEEP returned:" mold keep [A B C]]] result: copy ""
KEEP returned: [A B C]
== "ABC"

It's just another axis of "your application might have different ideas of what's useful"

If you actually have an issue with scale, your problem is probably complicated enough you'd prefer your own emitter. Stressing the point again: there are tons of ways one might want to make strings from blocks, and the odds COLLECT/INTO picked the one matching your scenario are slim. Probably even slimmer if you're dealing with some gigantic bunch of data that requires folding as you go.

COLLECT is deliberately not a hard function to write (and it's even less so without /INTO complicating it!) You can roll your own emitter very easily:

result: copy ""
emit: specialize 'append [series: result]

Now your EMIT has /ONLY and /DUP and /LINE and /PART. That was quick. But what if we wanted to have a different return than result that we've accumulated so far (which APPEND returns). How about an ENCLOSE?

result: copy ""
emit: enclose (specialize 'append [series: result]) func [f [frame!]] [
    f/value: to-text f/value ;-- will be return result due to ELIDE
    elide do f ;-- runs the append (F/VALUE unavailable after DO F)
]

So now your emit [A B C] gives you "ABC" as a result, the part of the string it added. Sky's the limit.

And that's why you shouldn't shed a tear for the folding /INTO

Summarized simply:

It will always be clearer and more powerful to to express a string-targeting COLLECT/INTO as a block-targeting collect with operations that then process that accumulated block.

Thus, the only reason you would have ever favored a string-targeting COLLECT/INTO would be if your problem was one of performance or scale such that holding that intermediate block (and elements it holds) would be excessive. But anyone in that situation probably has complex needs that the limited COLLECT/INTO behavior wouldn't meet anyway!

So it's better to have people learn how to use the powerful tools for designing COLLECT-like abstractions themselves and keep COLLECT simple.

So you shouldn't miss it, nor should you be missing the following bugs, all of which are now "resolved" in Ren-C:

#2081 Make REDUCE/INTO and COMPOSE/INTO work when targeting any-string
#2061 REDUCE/into of non-block doesn't insert into target
#2062 COMPOSE/into of non-block doesn't insert into target
#1748 REDUCE/into and COMPOSE/into bypass PROTECT #1748
#620 EXTRACT /into buffer option #620
#623 READ /into buffer option. #623
#709 MAP-EACH /into buffer option
#621 Change COLLECT /into option to use insert semantics

And you can see why I call it a virus. It's not adding functionality, but all of a sudden here it is being tacked on to everything and anything.