UPARSE needs BREAK, REJECT, ACCEPT... But What Are They?

hostilefork · August 7, 2021, 11:51am

When something doesn't sit right in my head, I notice. Like how I could never remember in the beginning what GET-WORD! or SET-WORD! in PARSE did. (e.g. is a GET-WORD! GET-ing the parse position to SET a variable, or GET-ting a variable's value to use to SET the parse position?)

And I never really understood ACCEPT, REJECT, BREAK, and FAIL.

I Currently Consider the FAIL Confusion "Solved"

Things have settled nicely, in that LOGIC! is used to get a pure "keep going" or "stop". So FAIL is replaced simply by FALSE. It means if (expression-returning-logic) is replaced in UPARSE as :(expression-returning-logic), and we can keep FAIL on the meaning of "raise error".

This generalization has the pleasing property that we don't need to go introducing "parse switch" or "parse case" or any such things. Since NULL means the same thing as #[true] for a GET-GROUP! splice, you have every option at your disposal.

BREAK and REJECT Seem Too Similar

The problem I have is that BREAK sounds a lot like "this didn't work". In fact, I've enforced that loops return NULL if-and-only-if you BREAK them:

>> repeat 3 [break]
; null

>> repeat 3 [null]
== ~null~  ; isotope

A NULL is used as a signal of "soft failure", e.g. it causes ELSE to run.

 >> repeat 3 [break] else [print "soft failure"]
 soft failure

 >> repeat 3 [null] else [print "soft failure"]
 == ~null~  ; isotope

So the distinction between BREAK and REJECT seems a thin one. I feel like I'd rather that BREAK meant you decided the iterated rule isn't working out...and some other signal indicated that you want to accept it and go on.

But ACCEPT doesn't really hint at ceasing iteration. Perhaps STOP? As a word, it hints more at the ceasing of an iteration...and that's used in CYCLE.

>> cycle [stop]
== ~void~  ; isotope

Unlike BREAK (which always returns NULL) it is able to return a non-NULL...and a NULL will be isotopified so it won't be seen as a "soft failure" by ELSE:

>> cycle [stop 10]
== 10

>> cycle [stop null]
== ~null~  ; isotope

Similarly, if you're going to be saying an iterative construct in PARSE is to keep going, then you should have an opportunity to say what the value synthesized from that rule will be. This requires "endable" rules (because we want a plain stop to work). I think that's doable.

So I guess I'm saying prefer BREAK to mean rule failed... return NULL. And STOP to mean rule succeeded. Default to returning a void isotope if no argument given, but allow an argument. The argument would be a rule, so you could actually make the STOP a rule.

>> uparse "aaab" [while ["a" (print "A") | stop ["b" (1020)]]]
A
A
A
== 1020

What About CONTINUE ?

If your loop is only one deep in alternates, then all an alternate needs to do continue is succeed:

>> uparse "aaab" [while ["a" comment "continue" | "b" comment "continue"]]

But if you're deeper than that, it is trickier. And I don't see any particular reason why you shouldn't be able to ask a rule to CONTINUE a loop.

>> uparse "abbbaccc" [while [
    "a" [some "bbb" (print "BBB"), continue | some "ccc" (print "CCC")]
    (print "like this!")
]
BBB
CCC
like this!

And CONTINUE could also take an argument, which would matter only if it was the final iteration:

>> uparse "bba" [repeat (3) ["a" continue (<like this>) | "b"]]
== <like this>

Would That Be an Improvement?

I think CONTINUE is pretty obviously useful.

One thing that's a bit weird about what I suggest is that when a BREAK happens in a non-parse loop, the code after it runs.

But the idea that "failure" stops progression is a cross-cutting design aspect in PARSE. It seems consistent to me.

Yet another issue is that STOP is not currently offered by plain WHILE or REPEAT or FOR-EACH or other loops. The reason is that if you are to try and write your own iterator in terms of other iterators, you cannot tell from the outside if a "cease iterating" intention happened.

Consider this:

>> opaque-code: [print "looping", 1000 + 20]

>> repeat 2 (opaque-code) then [repeat 2 (opaque-code)]
looping
looping
looping
looping
== 1020

That's nice because if the opaque-code has a break, the whole thing will break:

>> opaque-code: [print "entering", break]

>> repeat 2 (opaque-code) then [repeat 2 (opaque-code)]
entering
; null

But if you permit STOP to return a value, the stopping intent is lost:

>> opaque-code: [print "entering", stop 1020]

>> repeat 2 (opaque-code) then [repeat 2 (opaque-code)]
entering
entering
== 1020

When you're trying to write compound looping expressions that are built up of smaller loops, this really matters. CYCLE is an oddball because you know the only way it ever terminates with a value is if there was a stopping intent...which is why it allows STOP.

Maybe ACCEPT and REJECT Should Be Used and No BREAK?

...but this kind of runs into the same problem that non-PARSE WHILE doesn't have ACCEPT or REJECT. So why get worked up about it having STOP when non-PARSE WHILE doesn't have STOP, if it makes everything line up?

Or maybe non-PARSE while can have STOP...you just understand that STOP has limits when it comes to loop abstraction. Not everything works all the time. So STOP can have a warning on it that you can't tell the stopping intent happened from outside a loop that isn't CYCLE...

What Do You Think?

Are the needs of PARSE different, or the same? Should BREAK make the overall expression evaluate to NULL but keep going? Are ACCEPT and REJECT the right answer?

It's hard to say. I have to work out the mechanism by which such things could work in usermode combinators whatever you call them...so there's time to think about it.

hostilefork · August 8, 2021, 4:09am

I'd like some weighing in here if possible from @Brett, @IngoHohmann, @giuliolunati, @rgchris, and whoever else might have specific interest in PARSE...

My proposal above is like this:

>> uparse? "aaa" [some ["a" stop] "aa"]  ; the SOME succeeds here
== #[true]

>> uparse? "aaa" [opt some ["a" break] "aaa"]  ; the SOME fails here
== #[true]

This is different from history, and Red:

red>> parse "aaa" [some ["a" break] "aa"]
== true

As further proof that I still don't understand REJECT:

red>> parse "aaa" [opt some ["a" reject] "aaa"]
== false

red>> parse "aaa" [some ["a" reject] | "aaa"]
== true

So REJECT seems to be disconnected from iteration, but instead means "don't just fail this rule, but fail all rules in the current alternative". Red's parse introduction claims:

break : break out of a matching loop, returning success.
reject : break out of a matching loop, returning failure.

But they have tests like:

red>> parse [] [break]
== true

Where's the "loop" this is breaking? I don't get it. That should be an error in my book.

If ACCEPT and REJECT communicate with the BLOCK! combinator and act on the alternate-level, that needs to be defined better. Where does it stop? Once it hits the first alternate?

Again: I literally never understood these.

(UPDATE: I found a Red issue where they debate this, it doesn't seem resolved and as far as I can tell they don't know either: Red Issue #3478)

How about REPEAT and the INTEGER! cominator?

It seems clear that SOME and WHILE are loops, but is REPEAT a loop? Is an integer? Like this:

 >> uparse? "aaa" [repeat (3) ["a" stop] "aa"]
 == #[true]

 >> uparse? "aaa" [3 ["a" stop] "aa"]
 == #[true]

We could say that INTEGER! is not a loop but REPEAT is, which would give you something of a choice.

Anyway, I feel like BREAK's "return null" meaning make the loop-fail-the-parse seems pretty solid. I could really use some additional thoughts here.