I'm really certain that ANY should not be a looping construct in PARSE. Rebol's use of ANY everywhere else means "any one of", not "any number of". That applies to the ANY short-circuit-OR operation, to the ANY-XXX! types, and it can come up in PARSE such as:
parse block [some any-value!]
I like the shorthand for this this that works across series types with the TAG! combinator:
uparse block [some <any>]
This meaning any one element. It gets at that English concept that operators like *
(or <*>
) just don't have.
Plus, the "zero-or-more matches of a rule" interpretation doesn't jibe with how we use ANY in English:
- "Do you have ANY bananas?"
- "Yes."
- "Cool. Can I have one, then?"
- "No, sorry. I don't have ANY."
But I'm Not Happy With Bending WHILE For This
It seemed appealing at first to say that WHILE would be standardized in the language as arity-1, both in PARSE and in ordinary code loops. This would make UNTIL and WHILE line up, and LOOP could take the arity-2 role that WHILE used to have.
But I've been lamenting just how universally WHILE is arity-2 in pretty much every language and that LOOP doesn't really quite cut it while reading. :-/
Sorry for the flux, but I want to move back to while [condition] [body] as it was. However going through the process has spurred thought...
An Observation: OPT SOME <=> WHILE
It has in the past occurred to me that PARSE's WHILE (or ANY) was really OPT SOME. It's three more characters to say it:
while pattern
opt some pattern
(Note: This is only true in modern Ren-C, as previously the progress requirement differentiated these...that is now broken out into FURTHER.)
...but although it's more characters, "optionally some number of occurrences of the pattern" is pretty literally what you are talking about. In the UPARSE model of synthesized values it's kind of less confusing, because it's clearer what it returns in the case of nothing...the same thing OPT always returns when a rule doesn't match: NULL.
Anyway, I'm feeling remorse and a wish to go back to WHILE for arity-2 loops in the language. But I don't want to go back to ANY in PARSE.
Is OPT SOME really so bad?
I've gotten to wondering if there is a reason we don't have a separate word for "zero or more" in English. You actually have to write out "zero or more" to convey that intent... maybe because the intent is too weird for a single word.
When you just write WHILE it may be that you have a case that's actually supposed to be a SOME but it hasn't really bit you yet. If you're willing to tolerate between 1 and a million of something, the case of no things being there is distinguished...and calling attention to the fact that the rule you have may not match at all can be an asset.
I actually think OPT SOME offers an advantage, because it encourages you to look at it and decide if the OPT belongs there or not. It may feel kind of like a wart, but maybe it's a helpful wart.
(It reminds me a bit of the UNLESS vs. IF NOT situation. Many people felt UNLESS is actually obfuscating nearly everywhere it's used, and that it's better to break it apart even if that means two words instead of one.)
Trying Out The Change, I Noticed...
I actually did find a difference how I read the code. "This entire next section may not be relevant... none of it could match and it would go on." That weight of the OPT is felt more heavily when the word is there than the WHILE...which if you frequently expect the thing to be there, you may assume it will always be there for at least one instance.
You also can see redundancy in OPT more clearly. Things like:
opt [
while [...]
]
Stand out more if they look like:
opt [
opt some [...]
]
I think some things really do read more clearly. You can look at this as removing 0 or more newlines at the head of a series via a WHILE:
parse series [
remove [while newline]
...
]
Or rephrase that with OPT SOME:
parse series [
remove [opt some newline]
...
]
But I think it reads clearest when you bring the OPT outside, to say you're optionally removing some newlines:
parse series [
opt remove [some newline]
...
]
More Distinct
ANY and WHILE both had the problem that they had analogues in imperative code. But if SOME remains a PARSE keyword, then this helps better intuit the difference...so the code looks more differentiable.
Compression Is Possible By Other Avenues
I noticed a particularly laborious substitution in %make-zlib.r which extracts the headers and code for zlib using parse, because it often was parsing C code and looking for the pattern while whitespace
. This would happen multiple lines in a row and multiple times on a line. When it became opt some whitespace
it got more annoying.
But this is kind of a problem anytime you repeat something over and over. Maybe that pattern should have been ws*: [opt some whitespace]
and then it would just be ws*
to mean "any number of whitespace characters here, including zero".
A Motivated Individual Can Overrule It
Remember, UPARSE is going to let you be the judge. If you want your own keywords, you can have them. Maybe you like MANY (some parser combinators seem to think that 0...N is "many" and 1...N is "some"). Maybe you don't care if WHILE is different. Maybe you don't want to use the ANY parse abstraction that I think is more interesting.
I'm Trying It Out
One can argue there's a bit of a 1984-newspeak to it ("you don't need words like better or worse, use plus-good and un-good and double-plus-ungood"). But we're sort of asking a programming language to be more "nuanced" in its wording than English, which has evolved to be pretty much where the brain is at. I've shown some concrete benefits here to breaking out the OPT so you can see its relationship to the other OPTs you have and move it around.
I do know I'm getting cold feet on the WHILE <=> LOOP change. And I don't think the arity of WHILE in PARSE should be different from the arity of WHILE in the language, it's jarring.
I'm giving it a shot in the bootstrap and rebmake to see what kind of thoughts it inspires. So far it seems to be around equally good and bad...and since the bad is just largely unfamiliarity which should wear off...that points to a win, especially since it means retaking WHILE.