I was trying to do something that came out a little awkward, and wondered if there was a better way.

Imagine you have data like:

[foo: "a" bar: "b" | foo: "c" bar: "d"]

And you want to write a UPARSE rule with COLLECT to get:

[[foo: "a" bar: "b"] [foo: "c" bar: "d"]]

So you're recognizing specific SET-WORD!s, each of which you expect to be followed by TEXT!.

What I did was a bit awkward:

>> data: [foo: "a" bar: "b" | foo: "c" bar: "d"]

>> uparse data [collect while further [keep ^ collect [
       [keep [^ 'foo:], keep text!]
       [keep [^ 'bar:], keep text!]
       ['| | <end>]
== [[foo: "a" bar: "b"] [foo: "c" bar: "d"]]

If you don't like the ^ then use only. Here are a few points demonstrated:

>> uparse data [collect while further keep only collect [
       [keep [only the foo:] keep text!]
       [keep only the bar:, keep text!]
       ['| | <end>]
== [[foo: "a" bar: "b"] [foo: "c" bar: "d"]]
  • You can use THE if you don't like the quote marks.

  • We're in a world where only 3 becomes only [3], so it's not like "KEEP ONLY" is a keyword. This means you can move things around... keep [only the foo:] is just as valid as having the ONLY outside the block.

  • You can play around with the dynamics of BLOCK!s and COMMA!s. Sometimes the presence of a BLOCK! makes a comma unnecessary.

    • Sidenote: I actually think we should strongly discourage the idea of commas having semantic meaning in dialects. The status quo should be that they just provide visual separation...and the only impact adding them should have would be an error if they're not put in an "interstitial" position.

What Would Make This Better?

Readability is helped by breaking things into parts of course. Maybe people would like <end> stop better than having to think about FURTHER... like, just stop iterating when you reach the end...vs. make sure each iteration makes progress:

foobar-rule: [collect [
    [keep only ['foo:] keep text!]
    [keep only ['bar:] keep text!]

uparse data [collect while [
    keep only foobar-rule
    ['| | <end> stop]

But what was actually bugging me a little bit was having to write two KEEPs in order to pick up the SET-WORD! and the accompanying TEXT!. We know in ordinary KEEP you can just COLLECT a block:

>> collect [keep [foo: "a"]]
== [foo: "a"]

Is there any way to match the pattern foo: "a" and KEEP it in one step?


What I was wanting was something like REDUCE to resolve to all the items individually:

>> uparse [1 2] [integer! integer!]  ; typical case, last result fallout
== 2

>> uparse [1 2] [reduce [integer! integer!]]  ; REDUCE keeps processed items
== [1 2]

UPARSE has alternate meanings for WHILE, and ANY, and many other things. Does it make sense to call this REDUCE?

Such a feature would let the FOOBAR-RULE cleanup a bit:

foobar-rule: [collect [  ; old way
    [keep only ['foo:] keep text!]
    [keep only ['bar:] keep text!]


foobar-rule: [collect [  ; new way
    keep reduce @['foo:, text!]
    keep reduce @['bar:, text!]

Note that the REDUCE combinator would need to take its block as an @[...] or become a "quoting combinator". This is because if it were left as a plain BLOCK! combinator, then REDUCE would not have access to the individual pieces of the block...only the advancement and the result.

But, maybe you could do it with GET-BLOCK!:

foobar-rule: [collect [
    keep :['foo:, text!]
    keep :['bar:, text!]

This idea of combinators that break the rules on the meaning of BLOCK! is a heavy thing to think about. There's some semantically tricky stuff and some technically tricky stuff involved.

But one way or another, I think we need this. Thoughts?