ANY vs. MANY in PARSE... EOF? TAG! combinators?

In at least one Haskell text combinator set, it uses some to mean one or more match, and many for zero or more matches.

I can see why ANY makes more sense...to mean "any number of matches" (including 0). But a disadvantage is that it looks a lot like the common ANY construct in regular code... which kind of has the opposite meaning (non-PARSE ANY means "at least one of the following things, go with the first one that's truthy, else return null").

Because we're sort of dealing in a gray area of learned behavior here, I wonder if the benefit of going with MANY to make parse rules look different is enough to prefer it.

Also, they use EOF instead of END. END is literate, but one often wants to call variables things like "begin" or "end", or "start" and "end".

This makes me wonder if perhaps we should be a bit more creative in the use of datatypes. If you want to match a WORD! in a dialect, you have to use a tick mark. What if you had to use a tick mark to match TAG!s, and then an ordinary TAG! could have meaning as a rule...such as <end>?

parse "aaa" [data: copy to <end>]

parse "<div>stuff</div>" [x: between '<div> '</div>]

Anyway, that could open up a whole new category of combinators... tag combinators. Maybe <here> is another example, or perhaps <input> if you want to pass the original input position through to a function.

A unifying concept here could be that you'd use it for properties that you don't want to have collide with the names of variables. Consider for example if PARSE tracks the line number, you might want to say something like line: <line> in the middle of a rule.

If you want to match tags by their stringness, it's not like it's all that hard to just say "<div>" in the first place. But quoting is even briefer. Remember that being inert in typical evaluation is not enough in PARSE to mean it's not a rule... INTEGER!, BLOCK!, BLANK! (previously NONE!) and now LOGIC! all have to be quoted to mean their actual literal thing. And quotes are needed on things like WORD!, GROUP!, GET-WORD!, SET-WORD!...and much more.

So is it worth it to get another dialect part, by making you have to quote your tags if you want them to match literally? I kind of feel like it would be. Of course, the concept with UPARSE is that people could disagree and make entirely different answers...

(Note: a downside here is that since TAG!s are strings and not symbols, the comparison costs could be (slightly) higher. However, I've been thinking that to speed up string comparisons they might cache a symbol as part of the comparison process...and clear the symbol cache on each mutation. Then comparisons of strings to symbols could become very fast...so long as the string isn't changing. Wouldn't help if it were looked up in a map, but the optimized native version could do a fast check before hitting the map.)

That is for me MANY says more than one. Even more than two if you are strict, as the saying goes, counting: one, two, many.
SOME says yes there should be one or more of these present. ANY is just fine for me expressing any positive number and zero within PARSE. But indeed with ANY [condition1 condition2] there is at least one of the conditions true.
Well a small difference indeed, but to use something like OPTIONAL within PARSE to overcome this, I say I can live with the difference in meaning.

1 Like

I'd rather rename the ANY function in regular code to SOME (FIRST-IF, FIRST-DID, FIRST-TRUE etc.). :slight_smile: I think ANY is the right word for UPARSE; the standard symbol in pattern matching is *.

While it's good to have literate keywords for pattern matching, I think most devs (including newbies) coming to Ren-C will be minimally familiar with the symbols which have been around forever: * (match 0 or more), ? (match 1 item), # (match one digit), and ! (not).

The quoting proposal seems okay to me. Could be an adjustment for some, but not a huge leap for me.

It's worth thinking about a replacement for non-PARSE ANY. But I think it would need to be short. ONE is a possibility, though it makes it sound like it could be that it evaluated all the conditions and checked that one-and-only-one is true.

one [thing1, thing2, thing3] then [...]

If we were willing to say that Rebol's disposition is prefix, ANY could be OR with ALL as AND.

and [thing1, thing2] then [...]

or [thing1, thing2] then [...]

But I don't think that's a good idea.

Note: I like the direction of AND and OR as weird infix operations right now...so I think we should stick with that. I've even been considering that x and y should be allowed so long as Y is not a function with arguments; it can short circuit across the word if it quotes it.

The real question is just how nasty parameter-gathering conventions are willing to get to make your source level experience more comfortable. That irregularity makes the functions harder to reuse...e.g. if you MAKE FRAME! for :AND, you have to realize that you're giving it code that it will short-circuit, and you have to know all the rules for that.

1 Like

I'm not suggesting I necessarily agree with the need to change, but if I were, I'd maybe go for ANY-OF and ALL-OF.

1 Like

In the past, I've thought we might make the PARSE rule convention for ANY just be OPT SOME.

Unfortunately, combinators break that idea as this would be semantically different in capturing...under the rules I'm thinking of.

For example, just thinking about the idea that INTEGER! might transcode from strings:

parse "10 20 30" [numbers: any integer!]
>> numbers
== [10 20 30]

parse "xxx" [numbers: any integer!]
>> numbers
== []

parse "xxx" [numbers: opt some integer!]
>> numbers
; null

The idea is that OPT will set its result to NULL if the rule does not succeed...and give you the combinator product if it does. But ANY would give you an empty block in the case it doesn't succeed at all.

The COPY (or ACROSS) that just gets the span of input wouldn't help smooth that over in this particular case. Because copy opt some integer! gives you a span of the input series, which is text. The combinator product for INTEGER! here on text input is an INTEGER!.

Unfortunately it can't be stylized the other way as some opt integer!, by terminating on NULL... if we are to have some work with "rules that have no products", like some "a". (I've been assuming that no-product rules exist, where the only thing you can do with them is COPY across their consumed input).

2 Likes

Instead of ANY one could say that a certain item is allowed to be present, no matter how many times it even does, so ALLOW comes up.

One thought on that note is that the PARSE forms of ANY and SOME might be useful as plain functions.

I've described how @(...) in PARSE means "generate a value without looking at the input, but fail the rule if it is null".

What this means is that you can sort of impromptu make a repeated call to a generator in PARSE. Since stackless is off on a branch, I'll make a generator manually for an example:

gen-from-1-to-3-then-null: func [<static> n (0)] [
    if n < 3 [return n: n + 1]
    return null
]

>> uparse "a" ["a", data: some @(gen-from-1-to-3-then-null)]
== "a"

>> data
== [1 2 3]

While working on stackless, I realized we need a function that will call a generator a number of times and then return the results collected as a block.

Calling this function "MAP" has some amount of history behind it. But MAP is also useful as a noun.

What if we had variations, ANY and SOME, that would act like their PARSE versions, in terms of deciding on whether to consider no results a "failure" (e.g. overall null result) or not?

>> some (generator [yield 1, yield 2])
== [1 2]

>> any (generator [yield 1, yield 2])
== [1 2]

>> some (generator [print "No yields!"])
No yields!
; null

>> any (generator [print "No yields!"])
No yields!
== []

Both call the generators and collect their results. But ANY is willing to return an empty block, where SOME will return NULL. Hence you can use SOME with ELSE but not ANY.

That would reclaim the word MAP, and bring consistency between PARSE and normal code.

Just a thought, based on a new interesting interoperability of NULL with the parse rules.

2 Likes

In a separate post I'm explaining the historical difference between ANY and WHILE in PARSE...and the question of if they should be the same.

If they are the same, then we might adopt WHILE as the name. It's an alternative way of thinking of it to use WHILE to mean "keep running this rule as long as it matches":

parse "aaa" [while "a"]

It seems to be more consistent. Because WHILE means "keep doing this so long as it is true" in both plain DO code and PARSE, whereas ANY's meaning in DO is "match one of these things and then stop".

e.g. ANY [RULE1, RULE2, RULE3] in PARSE would be more consistent as a synonym for RULE1 | RULE2 | RULE3.

(ALL [RULE1, RULE2, RULE3] comes from the implicit sequencing being ALL, e.g. [RULE1, RULE2, RULE3])

Maybe this verbalizes a little more what my problem is with the reuse of the word...it isn't so much that ANY is in both places, but that there is a fairly clear parallel for what the meaning of ANY would be if it applied in both...and that's not what it is.

1 Like

I like WHILE as described here and the improved consistency of ANY and ALL.

If you can, please read the most recent summary remarks on ANY vs. WHILE… and NOT END. I'm increasingly feeling certain that WHILE and SOME with no progress requirement are the right primitives. It's running up against my lack of love for the BREAK/ACCEPT/REJECT naming as those kinds of things would need to be invoked more often, but maybe that just needs to get cleaned up too.

Also consider that if ANY were retaken in PARSE it could potentially offer an alternative of a parallel use like:

any [integer! decimal! block! text!]

vs.

[integer! | decimal! | block! | text!]

I don't know how motivating that would be, as if you wanted more than one rule in sequence as part of the set you'd need to use a BLOCK!. But I think that this "non-looping-feeling" which the language cultivates about ANY gets at the core of why the word never really sat right with me.

The "progress requirement" slipstreamed into the rule makes the curve from simple cases to harder ones more difficult...where you wind up having to learn WHILE anyway.

Related thought: Maybe it is interesting to make a SOME loop, which errors if it doesn't run the body at least once? Conceptually like:

some: func [condition body] [
    let ran: false
    while condition compose [
        ran: true
        ((body))
    ]
    elide if not ran [fail "SOME must run body at least once"]
 ]

I can imagine myself using such a thing, in lieu of having something like:

 assert [not empty? block]
 while [item: take block] [...]

I could just say:

some [item: take block] [...]

Though having a few words that are common in parse rules but not in plain code can help cue differentiation of which is which, so that's another thing to consider.

1 Like