The Trickiness of the New ANY Combinator Concept

It's a bit of a pain to collect alternate rules. For instance:

alternates: copy []

rules: [[some integer!] [3 text!]]

for-each rule rules [
    append alternates compose [(rule) |]
]

uparse data [alternates]

That will give you alternates as [[some integer!] | [3 text!] |]

But that rule will always succeed...should both the component rules fail to match, it will act as a no-op. Because it's equivalent to [[some integer!] | [3 text!] | []], and [] will always succeed.

You get a similar problem if you go the other way.

for-each rule rules [
    append alternates compose [| (rule)]
]

Now you've got a rule that is always a no-op: [| [some integer!] | [3 text!]]. Again, this is equivalent to [[] | [some integer!] | [3 text!]], and this time the [] succeeds before the other rules get a chance.

You can hack around this by starting out with alternates: [false]. This way, you can add the [| (rule)] and it will never run the false. So it works.

Wouldn't a New Meaning for the ANY Combinator be Better?

Having reclaimed ANY it seems it would be perfect for this. Why not:

rules: [[some integer!] [3 text!]]
uparse data [any rules]

You could leave your block in its regular old form, and use it that way. Dyn-o-mite!

But wait. BLOCK! already has semantics as a parse rule. Conventionally, ANY doesn't get to see the block at all... it gets a parser function which has been made out of the block.

Uh oh.

Bad Option #1 - Quoting

So ANY could say it's a quoting combinator. This means it would get whatever single thing came after it... be it a WORD! or BLOCK! or whatever. It could try its best to turn that into a BLOCK!.

In the case above ANY would thus get the WORD! rules. It could look up the WORD!, get a block. And then walk through the block, combinatorizing each element in it and running the element in sequence.

That's rather yucky.

Less Bad Option #2 - Take BLOCK! as synthesized rule product

ANY would be a better citizen if it was willing to say that the BLOCK! it's going to walk through came to it by honest means.

rules: [[some integer!] [3 text!]]
uparse data [any (rules)]

At first glance that seems weird to me. But, is it really that weird?

It seems to me this is what has to be done--and it makes much more sense than going down the rabbit hole of quoting and destabilizing the whole syntax.

Also, This Mitigates Compatibility Concerns

If ANY only runs with rules that have BLOCK! synthesized products, that's a (small?) subset of all the ANYs that are out there historically. It can choke if it doesn't like what it sees and tell you that you may be using the old sense of ANY.

Even further, we probably can temporarily make ANY a quoting combinator that only accepts GROUP!...as a simulation of accepting any parser in the future.

I'm going ahead and adding it!

1 Like

This is a good one. :+1:

1 Like

How About Even Less Bad (And Maybe Good!) Option #3 ?...

I realized something about the new rules with @... it gives a way to pass BLOCK!s to rules and mark them as not being processed by the BLOCK! combinator. So ANY could take them:

 uparse "a" [any (["a" "b"])]
 <=>
 uparse "a" [any @["a" "b"]]

At first that looks like a pretty modest improvement. You save one character and avoid a call into the evaluator.

But where it actually gets some punch comes from a neat feature of COMPOSE:

>> compose [any @(collect [keep "a" keep "b"])]
== [any @["a" "b"]]

The COLLECT gave back a BLOCK!. But COMPOSE just blended it so the result of the group had the @ on it. (This works for ^META and SET: and :GET as well.)

Now imagine you were trying to get the result [any (["a" "b"])] instead. Hmmm. Certainly possible, but uglier...

>> compose [any (as group! reduce [collect [keep "a" keep "b"]])]
== [any (["a" "b"])]

That could be a special operation (and it is) but it has a weird name and contaminates source it is in:

>> compose [any (engroup collect [keep "a" keep "b"])]
== [any (["a" "b"])]

It's preferable to be able to avoid all of that. The @ version doesn't really need any explanation.

(I think this is really only the beginning of the interesting applications of the @ types in UPARSE!)

1 Like

...indeed...

...What a Success Story ANY Turned Out To Be!! :astonished:

The first place I thought to try it out was in REWORD, because I remembered it created a list of alternate patterns for what it was going to replace. So if you say:

 reword/escape "a(thing)b(thing2)c" [thing "ALPHA" "thing2" "BETA"] ["(" ")"]

A simplified concept of the rules it built looked like:

keyword-suffix-rules: [
    false
       |
    "thing" ")" (key: 'thing)
       |
    "thing2" ")" (key: "thing2")
]

match-rule: ["(" keyword-suffix-rules]

(If you're wondering why it's repeating the suffix ")" in each rule: the problem with putting it outside would be that "thing" would match "thing2" and then jump outside the rule block and see ")". It would then be too late to consider thing2.)

So here we see that false match thing. We can get rid of that, and the |, AND use UPARSE rule synthesis to get the key: on the outside!

keyword-suffix-rules: [
    ["thing" ")" ('thing)]
    ["thing2" ")" ("thing2")]
]

match-rule: ["(", key: any (keyword-suffix-rules)]

That's Nothing Compared To What It Did To Whitespace!

Continuing to look at the "generated list of alternates" scenario, it's a HUGE win for whitespace, which generates two levels of selection... one at the category level, and then for the instructions in the category:

any ([[
    [space]
    collect any @[
        [keep @push space keep Number]
        [keep @duplicate-top lf space]
        [keep @duplicate-indexed tab space keep Number]
        [keep @swap-top-2 lf tab]
        [keep @discard-top lf lf]
        [keep @slide-n-values tab lf keep Number]
    ]
] [
    [tab space]
    collect any @[
        [keep @add space space]
        ...
    ]
    ...
]])

That's the stuff! No sign of those pesky | and the weird edge cases they introduce.

But There Was A Surprise Application For ANY...

What if you don't feel the | are just wrong for your generated expressions, but not right for your source expression?

Let's say you're writing something with this kind of pattern:

uparse data [while [
    ;
    ; Here are some comments about what we're doing here
    ; They might be long
    ;
    some "a", between "(" ")", '<whatever>
|
    ; It's not totally clear where to put that |.  Some people would put it
    ; aligned in the same column as the SOME, others would put it at
    ; indent level deeper.  But it's a non-sequitur
    ;
    x: [integer! | text!]
    keep (:[{Writing random code for an example is a pain} x])
|
    ; This could go on for a number of alternates, let's stop here.
    ; By the way, did you know there's a FAIL combinator now?  :-)
    ; It will set the `near` of the error message to the input position.
    ;
    fail @["Tired of making things up, want to finish post"]
]]

This can be an application for ANY too! Maybe you'd prefer to cluster your code in blocks and leave out the distracting |...

uparse data [while any @[
    ;
    ; Here are some comments about what we're doing here
    ; They might be long
    ;
    [some "a", between "(" ")", '<whatever>]

    ; It's not totally clear where to put that |.  Some people would put it
    ; aligned in the same column as the SOME, others would put it at
    ; indent level deeper.  But it's a non-sequitur
    [
        x: [integer! | text!]
        keep (:[{Writing random code for an example is a pain} x])
    ]

    ; This could go on for a number of alternates, let's stop here.
    ; By the way, did you know there's a FAIL combinator now?  :-)
    ; It will set the `near` of the error message to the input position.
    ;
    fail @["Tired of making things up, want to finish post"]
]]

I hadn't thought of that, so it being useful for this was a surprise.

Note that you don't need to put the rules inside an ANY in BLOCK!s if they are single rules. any @[some "a" some "b"] is legal, not just any @[[some "a"] [some "b"]] The FAIL at the end there isn't in a block, for instance.

Also notice that the @ form has the convenience of not making you put a coupled ) on the tail of an ANY rule. So you're not just saving a character by not using a GROUP!, you're saving a character that could be an arbitrary distance from the ANY. A small thing, but it makes usage lighter!

:partying_face:

2 Likes