Reconciling PARSE's ANY with Ordinary ANY

hostilefork · July 15, 2018, 6:35pm

Historical Rebol PARSE's ANY means "match this rule any number of times, including 0 times". So it's effectively an iterative construct.

In the past I've griped about the use of the word. If PARSE were a fruit stand the conversation might go like:

me: "Do you have any apples?"
parse: "Yes."
me: "Can I buy an apple?"
parse: "No."
me: "I'm not @gchiu...so why not?"
parse: "Because I have zero apples."
me: "Why didn't you didn't tell me you didn't have any?"
parse: "Because I do. I just don't have some."

What bothered me was how much in direct contradiction with ANY's use in the regular language this was:

if any [
    1 > 2
    3 > 4
][
    print "if 0 matching conditions was 'any', this would run"
]

Add to that the fact that ANY isn't iterative, it just means "pick the first thing that matches from this set", and the semantics feel quite inconsistent.

Freeing Up ANY Would Allow A More Fitting Use

The true parallel application of ANY in UPARSE would be to pick the first matching parse rule out of a list. e.g. these would be equivalent:

 parse "abcbca" [some [any ["a" "bc"]]]

 parse "abcbca" [some ["a" | "bc"]]

@BlackATTR points out that this might make it easier when generating alternate rules, since you wouldn't have to worry about sticking in the | during generation. It's a pain to worry about putting (N - 1) vertical bars between (N) items to match from a set... but this way lists of items to match from could just be used as-is.

(UPDATE: This has been implemented, and is very slick!)

Are There Better Words?

Looking to other languages for inspiration...in at least one Haskell text combinator set, it also uses some to mean one or more match, but picks many for zero or more matches.

Doesn't make a whole lot of sense. But because we're sort of dealing in a gray area of learned behavior here, I wonder if the benefit of going with MANY to avoid the inconsistency with ANY in regular code is enough to prefer it?

Or we could switch around and keep ANY for PARSE, but change the language so that ANY is prefix OR with ALL as prefix AND.

and [thing1, thing2] then [...]

or [thing1, thing2] then [...]

But I'm not a fan of that. I like the direction of AND and OR as weird infix operations right now...so I think we should stick with that. I've even been considering that x and y should be allowed so long as Y is not a function with arguments; it can short circuit across the word if it quotes it.

So, Thoughts?

iArnold · March 4, 2021, 7:36am

For me MANY says more than one. Even more than two if you are strict. (As the saying goes, counting: one, two, many.)

SOME says yes there should be one or more of these present. ANY is just fine for me expressing any positive number and zero within PARSE. But indeed with ANY [condition1 condition2] there is at least one of the conditions true.

Well a small difference indeed, but vs. using something like OPTIONAL within PARSE to overcome this, I say I can live with the difference in meaning.

BlackATTR · March 4, 2021, 1:29pm

While it's good to have literate keywords for pattern matching, I think most devs (including newbies) coming to Ren-C will be minimally familiar with the symbols which have been around forever: * (match 0 or more), ? (match 1 item), # (match one digit), and ! (not).

The standard symbol in pattern matching is *.

hostilefork · March 4, 2021, 10:08pm

ANY isn't iterative, it just means "pick the first thing that matches from this set", and the semantics feel quite inconsistent.

It occurs to me that there is an arity-1 looping construct... CYCLE. It was chosen to replace FOREVER, since FOREVER loops usually broke (it was a misnomer).

But CYCLE could be used here:

>> parse "aaaccc" [some "a" opt some "b" opt some "c"]
== "c"

>> parse "aaaccc" [some "a" cycle "b" cycle "c"]
== "c"

It's a bit different semantically because CYCLE in the main language doesn't end until you STOP or BREAK. It's not like an UNTIL where the body result itself can make it stop.

Although PARSE is a bit different in semantics anyway. So stopping the cycle on a failed rule might not be that inconsistent under its rules.

Though...CYCLE could be, the anti-UNTIL

>> n: 1, cycle [print [n], n: n + 1, n < 4]
1
2
3

Anyway...I don't know that CYCLE implies "do this as long as it is true", however...more like "do it until I say to stop". It's interesting to remember that we do have another arity-1 looping construct in the mix though.

rgchris · March 5, 2021, 4:38am

I'm not suggesting I necessarily agree with the need to change, but if I were, I'd maybe go for ANY-OF and ALL-OF.

hostilefork · March 5, 2021, 6:41am

But Is A Single Keyword Necessary (or Even Good)?

I've gotten to wondering if there is a reason we don't have a separate word for "zero or more" in English. You actually have to write out "zero or more" to convey that intent... maybe because the intent is too weird for a single word?

It has in the past occurred to me that PARSE's ANY was equivalent to OPT SOME:

; any number of "a"s (including zero), followed by some "b"s
parse "bbb" [any "a" some "b"]

; optionally some "a"s, followed by some "b"s
parse "bbb" [opt some "a" some "b"]

Notice how while the code is a few characters "longer", the comment you need to explain what's going on tightens up. It's like a more "proportional" capture of your intent.

Also, in the UPARSE model of synthesized values it's kind of less confusing, because it's clearer what it returns in the case of nothing: the same thing OPT always returns when a rule doesn't match: NULL.

Trying Out The Change, I Quickly Saw Benefits...

When you just write ANY it may be that you have a case that's actually supposed to be a SOME but it hasn't really bit you yet. If you're willing to tolerate between 1 and a million of something, the case of no things being there is distinguished...and calling attention to the fact that the rule you have may not match at all can be an asset.

So OPT SOME offers an advantage, because it encourages you to look at it and decide if the OPT belongs there or not. It may feel kind of like a wart, but maybe it's a helpful wart.

I actually did find a difference how I read the code. "This entire next section may not be relevant... none of it could match and it would go on." That weight of the OPT is felt more heavily when the word is there than the ANY...which if you frequently expect the thing to be there, you may assume it will always be there for at least one instance.

You also can spot redundancy in OPT more clearly. Things like:

opt [
    any [...]
]

Stand out more if they look like:

opt [
   opt some [...]
]

Many cases I looked at tidied up. I found this code removing 0 or more newlines at the head of a series via ANY:

parse series [
    remove [any newline]
    ...
]

But when you rephrase this with OPT SOME it suggests a better factoring:

parse series [
    remove [opt some newline]
    ...
]

It reads clearest when you bring the OPT outside, to say you're optionally removing some newlines:

parse series [
    opt remove [some newline]
    ...
]

Plus you can now see the result of NULL more obviously in the case when no newlines are removed, and leverage that.

A Motivated Individual Can Overrule It

Remember, UPARSE is going to let you be the judge. If you want your own keywords, you can have them. Maybe you like MANY (some parser combinators seem to think that 0...N is "many" and 1...N is "some"). Maybe you don't want to use the ANY parse abstraction that I think is more interesting. It's your call!

So, I'm Going Ahead With This

One can argue there's a bit of a 1984-newspeak to it ("you don't need words like better or worse, use plus-good and un-good and double-plus-ungood"). But we're sort of asking a programming language to be more "nuanced" in its wording than English, which has evolved to be pretty much where the brain is at. I've shown some concrete benefits here to breaking out the OPT so you can see its relationship to the other OPTs you have and move it around.

hostilefork · July 26, 2023, 3:40pm

A post was merged into an existing topic: The Cool New Repurposing of WHILE in PARSE

hostilefork · December 12, 2021, 11:14pm

The change has been a winner!

...and it becomes even more palatable with TRY replacing OPT:

parse data [opt some rule]
; vs.
parse data [try some rule]

Here's a great example from a little section of code in HELP (that needs revisiting, just in general), where it's breaking down parameters and refinements of a function:

parse3 parameters of :value [
    copy args any [word! | meta-word! | get-word! | quoted-word!]
    copy refinements any path!
] else [
    fail [...]
]

When we rewrite the ANY as TRY SOME it shows us something interesting:

parse parameters of :value [
    args: across try some [word! | meta-word! | get-word! | quoted-word!]
    refinements: across try some path!
] else [
    fail [...]
]

Since our ACROSS goes over something effectively OPT, we could wind up with an empty block. But an empty block isn't as cleanly differentiated as a null. What if we move the OPT outside the across (and leverage our new ANY, for good measure)?

parse parameters of :value [
    args: try across some any [word! meta-word! get-word! quoted-word!]
    refinements: try across some path!
] else [
    fail [...]
]

Now we know that args and refinements are either null, or non-empty. So testing "are there args" becomes just if args and not the more laborious if not empty? args.

I think it's interesting to see how these transformations jump off the page when you use TRY SOME instead of an atomic zero-or-more construct!