COLLECT-LINES: an adaptation story

hostilefork · December 6, 2018, 9:36pm

Here's an interesting cookbook recipe. Frequently, the goal of a COLLECT process is to collect a certain number of strings representing command lines, or something like that. Each line is represented by a block, but needs to be SPACED. This can be a bit annoying to have to say every time:

 collect [
     keep spaced [...]
     if condition [
         keep spaced [...]
     ]
 ]

What if you wanted to specialize COLLECT as COLLECT-LINES so it would do the SPACED automatically?

 collect-lines: adapt 'collect [
      body: compose [
           keep: adapt 'keep [value: spaced value]
           (as group! body)
      ]
  ]

So you're augmenting the body with a little bit of prelude code that adapts the keep. What's nice is that by using AS you don't need to deep copy that body, you're just aliasing it. This means that when COLLECT goes through and binds the augmented body to keep, the little adapter has the same binding...so it affects that keep.

There's a particular finesse in Ren-C because when you put something in a stream of code using AS GROUP!, that won't synthesize anything. If your group is empty, it will act like it's not there.
>> do [1 + 2 ()]
== 3
That's actually pretty important for when you're doing these kinds of code splicings, you can really "opt out" of sections.

If you're going to do it this simply, you can eliminate some of the repetition with MY (which quotes a set word on the left, and injects the value as the first parameter of what comes next).

 collect-lines: adapt 'collect [
      body: compose [
           keep: my adapt [value: my spaced]
           (as group! body)
      ]
  ]

That's quick and dirty enough for casual usage. But this looks nice enough we might even want it in the box.
So thinking through a few edge cases...

Some of the refinements to KEEP don't make sense any more, like /ONLY and perhaps /PART (you'd be passing in a block and then specifying a /PART based on the text...probably not what you meant). You probably want each line to be on its own line when collected in the list, so /LINE should be true. Other refinements, like /DUP can probably be left as-is.

So let's get rid of the refinements that don't make sense and set it up to default to a newline on each string in the collected block. Also, you need to TRY on the value because it might be null, and then you'd want SPACED to see it as a BLANK! instead of an error, returning a null and thus preserving it:

 collect-lines: adapt 'collect [
      body: compose [
           keep: adapt specialize 'keep [
               line: true | only: false | part: _
           ] [value: spaced try :value]
           (as group! body)
      ]
  ]

Now we have a nice little routine:

 >> collect-lines [
       keep ["How" "about" "this?"]
       keep case [
           1 = 2 [["Not" "Kept"]]
           3 = 4 [["This" "Neither"]]
       ]
       keep/dup ["Pretty" "cool" "eh?"] 2
    ]
== [
    "How about this?"
    "Pretty cool eh?"
    "Pretty cool eh?"
]

So... how difficult would that be to do in Rebol2/R3-Alpha/Red? And how likely are you to get it wrong while trying?

One issue to think about...

With the recent change to where COLLECT only creates a block if you do a KEEP of some non-null material, there was a workaround to say keep [] as a no-op at the top of the collect body, to get the block. That won't work here, since keep [] will add an empty string to the collected lines.

Off the top of my head:

Since COLLECT-LINES is not "full band" any more (it knows you don't want to collect a BLANK!), it might use BLANK! to be the "no-op, but means you kept something". So keep _ would execute the un-adapted keep [] internally, yielding the same effect.
COLLECT-LINES could just sneak in a keep [] before it does the specialization and always return a block, foregoing COLLECT's "null if no KEEPs" property.

What's nice about actually doing these little experiments is you get to think about what pressures it puts on routines like SPACED. Increasingly I am of the opinion that SPACED of a TEXT! should just return that text--there's almost no case where enforcing that it's a BLOCK! has value.

hostilefork · December 7, 2018, 12:01am

I said this is a "cookbook" entry more than a feature discover entry (even though I'm thinking COLLECT-LINES needs to be in the box).

But this actually leads to a rather interesting take on another historical problem: how to collect a single line where some of the components want to be spaced apart, but others want to be tight together.

Crickets and gentleman(s), I give you... COLLECT-TEXT (because calling it COLLECT-LINE would not be different enough from COLLECT-LINES to spot easily):

collect-text: chain [
     adapt 'collect [
         body: compose/only [
             keep [] // make sure no null return, an empty block at the least
             keep: adapt specialize 'keep [
                 line: false | only: false | part: false
              ] [value: unspaced try :value]
             (as group! body)
         ]
     ]
         |
     :spaced // if an empty block gets here, it becomes a null result
 ]

The individual bits you KEEP with single blocks will be tight together, with the final result being spaced out:

 >> warnings: ["crazy-spectre-thing" "ms-ships-buggy-headers"]

 >> command: collect-text [
        keep "$(REBOL)"
        for-each w warnings [keep ["-Wno-" w]]
    ]

== "$(REBOL) -Wno-crazy-spectre-thing -Wno-ms-ships-buggy-headers"

This seems another nice tool to go in. I'll say that as hard as it is having the heavy amount of code in the make system being all Rebol, being able to use the language to attack it with new Ren-C features redeems it (this was why I endorsed it in the first place, even though it was very...big).

BlackATTR · December 7, 2018, 2:52pm

(Applause!) Very cool, and instructive. Anything which makes text munging more powerful and literate is a huge win for me.
Great stuff, Brian. Reading posts like this feels like:

hostilefork · December 9, 2018, 5:02pm

From my end, it usually feels like the holographic instruction manual from Invader Zim

"Why would you do all that?"
"Because it's Cool."

hostilefork · November 5, 2020, 1:22am

>> collect-lines [
      keep ["How" "about" "this?"]
      keep case [
          1 = 2 [["Not" "Kept"]]
          3 = 4 [["This" "Neither"]]
      ]
      keep/dup ["Pretty" "cool" "eh?"] 2
   ]
== [
    "How about this?"
    "Pretty cool eh?"
    "Pretty cool eh?"
]

Getting the fabric behind COLLECT-LINES working is important, and this has kept pushing on the state of the art.

But going beyond it, I think even better is POINTFREE. It's more general and keeps the definitions right in front of you.

POINTFREE can't do partial argument invocations like the following yet, but it should be able to eventually:

>> collect [
      keep: (<- keep unspaced)

      keep ["How" "about" "this?"]
      keep case [
          1 = 2 [["Not" "Kept"]]
          3 = 4 [["This" "Neither"]]
      ]
      keep/dup ["Pretty" "cool" "eh?"] 2
   ]

And I think that's probably the right way to go with this.

With comma as the new experiment for expression barriers, I'm not sure how I feel about:

      keep: <- keep unspaced,

That comma flies a little under the radar. Maybe not so bad with a space.

      keep: <- keep unspaced ,

Anyway that could be an alternate option.

hostilefork · October 3, 2021, 9:07am

I was looking at a line of code with COLLECT-TEXT and didn't really remember what it did, so I had to look it up.

This supports my argument that really, we're better off letting people define these specializations on an as-needed basis. Also, that lets you give the keep operation the name you want:

collect [
    keep: (<- keep/line spaced)  ; if you don't think name reuse is confusing
    ...
]

collect [
    keepline: (<- keep/line spaced)  ; if you only want it a little briefer
    ...
]

collect [
    keepL: (<- keep/line spaced)  ; if you're this kind of person
    ...
]

Having these examples has certainly been good for testing, so I do want to thank these specializations for their service.

But I think it's good enough to just put COLLECT-TEXT and COLLECT-LINES in the test suite. Having them in the mezzanine is just "that much more stuff" that needs documentation and for people reading the (few) callsites to trip over. I think the case-by-case specializations are clearer.