The Handling of NULL and VOID in UPARSE

hostilefork · April 7, 2023, 12:14am

By design, nulls are handled noisily--right at the moment of fetching the word!--in UPARSE (and PARSE3):

>> prefix: null, suffix: ")"

>> parse "aaa)" [prefix, some "a", suffix]
** Error: (prefix is null, and we raise errors for that in parse)

If we didn't raise an error it seems there are only two other options:

Make null always succeed, keeping the parse position where it is (synonym for [])
Make null always be an unsuccessful combinator match, but not cause a failure (synonym for false)

I think (1) feels like a pretty obvious bad idea, because null is supposed to represent a soft failure. I've suggested that this is a better behavior for void, e.g. parse "ab" ["a" void "b"] would work.

I'm not too pleased with the idea of (2), and prefer the error as the default.

...that said... it seems there should be some operators or combinators that let you get the other behaviors.

What About a "MAYBE" Combinator To Use With Null?

In standard code, the policy of "void-in-null-out" has worked well, with MAYBE transforming soft-failure nulls to voids:

 ; non-PARSE handling of NULL via MAYBE

 >> append [a b c] null
 ** Error: cannot append ~null~ isotope to a block

 >> append [a b c] maybe null
 == [a b c]

 >> block: null

 >> append maybe block [d e]
 == ~null~  ; isotope

So if we imagine applying this to the parse example, it would presumably do this:

>> prefix: null, suffix: ")"

>> parse "aaa)" [maybe prefix, some "a", maybe suffix]
== ")"

For the above parse to succeed, the combinator made by maybe prefix would have to succeed and not advance the input.

But It Doesn't Combine Well In Larger Rules

What if what you intended was "if there's a prefix, match some non-zero number of instances, but if prefix is null then don't worry about matching":

You might try doing that by COMPOSE'ing your rules. But UPARSE actually lets us write that out literally using GET-GROUP! rule synthesis:

>> parse "aaa)))" [:(if prefix '[some prefix]), some "a", :(if suffix '[some suffix])]
== ")"

But what if we tried to do that with MAYBE...could it work?

>> parse "aaa)))" [some maybe prefix, some "a", some maybe suffix]
; infinite loop!

No dice. We've said maybe prefix just succeeds and doesn't advance the input when prefix was null. But if you combine that with some the null case will just match nothing in perpetuity, causing an infinite loop.

This may look familiar, because if you write some opt [...anything...] you'll always get an infinite loop. But in that case it's just wrong thinking: you know that the repetitive nature of some looking for an eventual non-match meant you must have intended some [...anything...] (at least one) or opt some [...anything...] (zero or more).

NOTE THAT HISTORICAL PARSE HAS NO GOOD ANSWER FOR THIS

Rebol2 treats NONE! as a no-op which just succeeds but doesn't advance the input. So the following gives you an infinite loop:
rebol2>> prefix: none suffix: ")"

rebol2>> parse "aaa)))" [some prefix some "a" some suffix]   
; infinite loop
The hackish "must make progress" rules in R3-Alpha actually make the above "work as intended", because the SOME will bail out after one non-advancing match. I don't consider that a "good" answer--more a random effect.

Another Problem: MAYBE is a very similar word to OPT

Imagine looking at this code:

>> prefix: "(", suffix: ")"

>> parse "aaa)" [maybe prefix, some "a", maybe suffix]
== ~null~  ; isotope

"But wait"... I can imagine someone saying... "doesn't that mean that if it's not there, you skip the rule"?

I've had some mental back-and-forth about the words try, opt, and maybe...with a general dislike of the word OPT. The current idea is that TRY was intended to defuse harder definitional errors:
>> take []
** Error: you can't take from an empty block (stopping further code)

>> try take []
== ~null~  ; isotope

An extra barrier to creating MAYBE is mechanical

... because the error that NULL generates is the "null combinator" itself. It is not a definitional error, because those just represent things like "type didn't match".

The only way I can see a null-disabling MAYBE parse combinator working would be by quoting its argument, doing the rule fetch itself, and turning into a failing combinator if it fetched null. This breaks the model somewhat.

Maybe /prefix could Mean Optionally-Null variable?

>> prefix: null, suffix: ")"

>> parse "aaa)" [/prefix, some "a", /suffix]
== ")"

It's already the case that paths have to be quoted to match in blocks, but a leading slash could be used to deal with the rules.

It's a lot to think about on my first day of thinking about Rebol stuff for a while! But there you go.

hostilefork · June 23, 2023, 1:00am

Okay, voided variables should be no-ops in UPARSE

This is consistent with how quoted voids work:

>> parse [a b] ['a ' 'b]
== b

Or how voided expressions work in GET-GROUP! substitution:

>> parse [a b] ['a 'b :(if false [[some 'c]])]
== b

>> parse [a b c c c] ['a 'b :(if true [[some 'c]])]
== c

And with more liberal policies for void variables via word access in the main evaluator, extending this to WORD! references seems consistent:

>> c-rule: if false [[some 'c]]

>> parse [a b] ['a 'b c-rule]
== b

>> c-rule: if true [[some 'c]]

>> parse [a b c c c] ['a 'b c-rule]
== c

A Quirky MAYBE Combinator Is Probably Bad News

Not everything in the evaluator universe is going to have a PARSE parallel. If you have a null rule, I guess you may just have to use a GET-GROUP! and call the evaluator's MAYBE.

>> c-rule: null

>> parse [a b] ['a 'b :(maybe c-rule)]
== b

This will keep you from erroring on the null by turning the null into a void.

UPARSE has richer mechanisms to help the higher-order rules, to more intentionally express the R3-Alpha progress rule...which you could use:

>> prefix: null, suffix: ")"

>> parse "aaa)))" [
        opt some further :(maybe prefix)
        some "a"
        opt some further :(maybe suffix)
     ]
== ")"

If you don't want PREFIX to be "ornery" when it's used in PARSE, then initialize it to void instead of null and this cleans up a bit:

>> prefix: void, suffix: ")"

>> parse "aaa)))" [
        opt some further prefix
        some "a"
        opt some further suffix
     ]
== ")"

There are a lot of tools at one's disposal, and I don't think we need anything crazier than this. I'm content enough with it, I think!