The Cool New Repurposing of ANY In PARSE

hostilefork · August 14, 2021, 6:35pm

It's a bit of a pain to collect alternate rules. For instance:

alternates: copy []

rules: [[some integer!] [3 text!]]

for-each rule rules [
    append alternates compose [(rule) |]
]

parse data [alternates]

That will give you alternates as [[some integer!] | [3 text!] |]

But that rule will always succeed...should both the component rules fail to match, it will act as a no-op. Because it's equivalent to [[some integer!] | [3 text!] | []], and [] will always succeed.

You get a similar problem if you go the other way.

for-each rule rules [
    append alternates compose [| (rule)]
]

Now you've got a rule that is always a no-op: [| [some integer!] | [3 text!]]. Again, this is equivalent to [[] | [some integer!] | [3 text!]], and this time the [] succeeds before the other rules get a chance.

You can hack around this by starting out with alternates: [false]. This way, you can add the [| (rule)] and it will never run the false. So it works.

Wouldn't a New Meaning for the ANY Combinator be Better?

Having reclaimed ANY it seems it would be perfect for this. Why not:

rules: [[some integer!] [3 text!]]
parse data [any (rules)]

You could leave your block in its regular old form, and use it that way. Dyn-o-mite!

Subtlety: A Different Meaning For BLOCK!s

When you use ANY with a literal block, there's something a bit unusual going on:

 >> parse [1 "two" <three>] [some any [integer! text! tag!]]
 == <three>

Ordinarily, a block like [integer! text! tag!] would expect to see three things in sequence. But under the context of ANY, the rules change... it's taking its parameter not as a combinator, but as a BLOCK! by value.

At first I thought this was bad, and we'd have to enforce taking the blocks by value, e.g. as the product of a GROUP!:

 >> parse [1 "two" <three>] [some any ([integer! text! tag!])]
 == <three>

...or perhaps there'd be some non-pure-BLOCK! way of helping to reassure people that the block wasn't going to be processed by PARSE's default BLOCK! combinator:

 >> parse [1 "two" <three>] [some any @[integer! text! tag!]]
 == <three>

But we're adults, here. And we're using a language whose whole concept is context-dependent meaning. If you can't cope with the interpretation of blocks changing depending on what you pass the block to, you're in the wrong place.

However, do note that under today's logic, fetching blocks through a WORD! reference will still run them through the BLOCK! combinator. Hence you can't say:

rules: [[some integer!] [3 text!]]
parse data [any rules]  ; nope, you have to say [any (rules)]

So you need a group there, for now. I don't have an offhand proof of why a variable fetch in that context should run the block combinator--maybe it shouldn't? It's something to think about.

But long story short: Cool Feature, Use It!

BlackATTR · August 14, 2021, 9:09pm

This is a good one.

hostilefork · August 15, 2021, 5:55am

...indeed...

...What a Success Story ANY Turned Out To Be!!

The first place I thought to try it out was in REWORD, because I remembered it created a list of alternate patterns for what it was going to replace. So if you say:

 reword/escape "a(thing)b(thing2)c" [thing "ALPHA" "thing2" "BETA"] ["(" ")"]

A simplified concept of the rules it built looked like:

keyword-suffix-rules: [
    false
       |
    "thing" ")" (key: 'thing)
       |
    "thing2" ")" (key: "thing2")
]

match-rule: ["(" keyword-suffix-rules]

(If you're wondering why it's repeating the suffix ")" in each rule: the problem with putting it outside would be that "thing" would match "thing2" and then jump outside the rule block and see ")". It would then be too late to consider thing2.)

So here we see that false match thing. We can get rid of that, and the |, AND use UPARSE rule synthesis to get the key: on the outside!

keyword-suffix-rules: [
    ["thing" ")" ('thing)]
    ["thing2" ")" ("thing2")]
]

match-rule: ["(", key: any (keyword-suffix-rules)]

That's Nothing Compared To What It Did To Whitespace!

Continuing to look at the "generated list of alternates" scenario, it's a HUGE win for whitespace, which generates two levels of selection... one at the category level, and then for the instructions in the category:

any ([[
    [space]
    collect any [
        [keep just push space keep Number]
        [keep just duplicate-top lf space]
        [keep just duplicate-indexed tab space keep Number]
        [keep just swap-top-2 lf tab]
        [keep just discard-top lf lf]
        [keep just slide-n-values tab lf keep Number]
    ]
] [
    [tab space]
    collect any [
        [keep just add space space]
        ...
    ]
    ...
]])

That's the stuff! No sign of those pesky | and the weird edge cases they introduce.

But There Was A Surprise Application For ANY...

What if you don't feel the | are just wrong for your generated expressions, but not right for your source expression?

Let's say you're writing something with this kind of pattern:

uparse data [while [not <end>] [
    ;
    ; Here are some comments about what we're doing here
    ; They might be long
    ;
    some "a", between "(" ")", '<whatever>
|
    ; It's not totally clear where to put that |.  Some people would put it
    ; aligned in the same column as the SOME, others would put it at
    ; indent level deeper.  But it's a non-sequitur
    ;
    x: [integer! | text!]
    keep (:[{Writing random code for an example is a pain} x])
|
    ; This could go on for a number of alternates, let's stop here.
    ; By the way, did you know there's a FAIL combinator now?  :-)
    ; It will set the `near` of the error message to the input position.
    ;
    fail "Tired of making things up, want to finish post"
]]

This can be an application for ANY too! Maybe you'd prefer to cluster your code in blocks and leave out the distracting |...

uparse data [while [not <end>] any [
    ;
    ; Here are some comments about what we're doing here
    ; They might be long
    ;
    [some "a", between "(" ")", '<whatever>]

    ; It's not totally clear where to put that |.  Some people would put it
    ; aligned in the same column as the SOME, others would put it at
    ; indent level deeper.  But it's a non-sequitur
    [
        x: [integer! | text!]
        keep (:[{Writing random code for an example is a pain} x])
    ]

    ; This could go on for a number of alternates, let's stop here.
    ; By the way, did you know there's a FAIL combinator now?  :-)
    ; It will set the `near` of the error message to the input position.
    ;
    fail "Tired of making things up, want to finish post"
]]

I hadn't thought of that, so it being useful for this was a surprise.

Note that you don't need to put the rules inside an ANY in BLOCK!s if they are single rules. any [some "a" some "b"] is legal, not just any [[some "a"] [some "b"]] The FAIL at the end there isn't in a block, for instance.