Generalizing Unwinding in UPARSE

hostilefork · July 19, 2021, 3:09pm

So the COLLECT keyword in Red is--in @rgchris's opinion (and mine) basically useless. Because backtracking does not undo the things you've collected.

red>> parse "ac" [collect [
    "a" keep (<a1>) "b" keep (<b>)
    | "a" keep (<a2>) "c" keep (<c>)
]]
== [<a1> <a2> <c>]

It got half way through the first sequence...matching an "a" and doing a keep, then realized it didn't match the "b". So it rolled back to the start and again matched an "a" and did another keep, then a "c" and did a keep.

It's far more useful if bookkeeping is done to backtrack what's being collected, as Ren-C does.

ren-c>> parse "ac" [collect [
    "a" keep (<a1>) "b" keep (<b>)
    | "a" keep (<a2>) "c" keep (<c>)
]]
== [<a2> <c>]

How Does Red Justify The Non-Rollback Behavior?

Maybe @rgchris can chime in here to say more precisely or find some direct quotes. But I believe the general justification is that they think of rollback as a "new" behavior that swims upstream from the fact that "code with side effects isn't unwound in general".

all-redbols>> parse "ac" [collect [
    "a" (print "a1") "b" (print "b")
   | "a" (print "a2") "c" (print "c")
]]
a1
a2
c

Nothing rolls back arbitrary code in groups...and Ren-C is not planning on being any different.

In this point of view, they would say that in the implementation of KEEP they are running an APPEND. They conclude it's thus reasonable to say that the user experience should correspond to not wanting there to be too much of a gap from the implementation. If there is, it would mean adding more code/design...

I call that the tail wagging the dog.

Anyway, this introduction aside... let's get on to the main part of what I wanted to write about.

The Mechanics Behind Rollback are Non-Trivial

One thing about adding something like COLLECT into the UPARSE combinator model is that it's the sort of thing that winds up having to know about what happens with the relationship of every other parser.

Imagine you have something like an EITHER combinator that will succeed if either of two rules succeed. This would be the ideal behavior.

>> uparse "aaa" [collect [either 2 keep ["a" "a"] 3 keep ["a"]]]
== ["a" "a" "a"]

Remember that a BLOCK! rule evaluates to its last result, and TEXT! rules against strings evaluate to the rule itself, so this is the expected answer.

I wrote this example above the way I did instead of putting the two arguments to this hypothetical EITHER in BLOCK!s:

>> uparse "aaa" [collect [either [2 keep ["a" "a"]] 3 [keep ["a"]]]]
== ["a" "a" "a"]

Because if I did it that way, then you might argue that the only place COLLECT needs to get its hooks in would be BLOCK! combinators. But we are saying part of the interesting natural-language property of PARSE is to (correctly) manage the implicit grammar...it's foundational to Rebol philosophy to not require the blocks (though to offer the option if the grouping makes it more clear).

So when I wrote it without the blocks it was to demonstrate a point. Accomplishing the rollback conceptually needs to have "hooks into the EITHER". It needs to know that although the overall EITHER succeeded it failed in part of its implementation. One of the 2 attempted keeps from the first rule succeeded but that turned out not be enough.

COLLECT must somehow thus get a notice on every combinator's call...effectively before to record the high water mark of the collection... and then after to know if it failed, to roll back.

This Might Sound Familiar...like an ENCLOSE

If you recall how ENCLOSE works.

hooked-append: enclose :append func [f [frame!]] [
    print ["Hey, I know you're trying to append a" f.value]
    print ["That append only happens if I DO this frame..."]
    let result: do f
    print ["I did the frame and got back" mold result]
    print ["But I'm returning something else!"]
    return <you got hooked!>
]

It's pretty nifty:

>> data: [a b c]
== [a b c]

>> hooked-append data 10
Hey, I know you're trying to append a 10
That append only happens if I DO this frame...
I did the frame and got back [a b c 10]
But I'm returning something else!
== <you got hooked!>

>> data
== [a b c 10]

This rather directly corresponds to what COLLECT needs, as a service. Except instead of enclosing one function it's adding some code to a hook in every combinator.

I'm going to have a bit more to say about this as I go. But I had noticed that in my first crack at thinking about where to put the rollback I'd made the mistake of putting it in the BLOCK! combinator. So I was motivated to write up a good example like the EITHER to show why that wasn't correct.

Also any chance to talk about Red as being (intellectually) lazy and anti-intelligent-design is fun.

hostilefork · July 21, 2021, 10:17am

I made the change I discussed to put the rollback mechanics in the combinator hook. And that is good for the reasons I mention...they have to be there.

But when I took the code out of the BLOCK! combinator I noticed a problem.

>> uparse "ab" [collect [keep "a" keep "a" | keep "a" keep "b"]]
== ["a" "a" "b"]

The bad behavior I was complaining about appeared when individual parser success/failure was used to drive the unwinding.

Looking into why...this made me realize a problem with what I said above.

What do you do when some parsers called by a combinator succeed...but those particular successful parses aren't deemed to contribute to what the calling combinator ultimately considered success?

It's always helpful to pick simple examples, so let's say we wanted a parser MAXMATCH which will run two parsers and return the synthesized value of whichever rule got the furthest.

>> uparse "abab" [collect [
     maxmatch ["aba" keep (3)] [some "ab" keep (4)]
     to end
]]

We can tell that some "ab" will match all 4 characters, while "aba" will only match 3 of them. So the second rule "wins".

But should that return [4] or [3 4]? Neither parser failed, but only one was deemed the choice for ultimate success.

You might suggest there's not a precise "right answer" to this question. MAXMATCH could be done one way or the other.

In the implementation--however--not all things are equal. e.g. if you are collecting in a straight buffer--appending to a block as you go--then unwinding the last thing is easier than unwinding an arbitrary thing.

In other words, if the COLLECT process picks up [3] then goes to [3 4] it's easier to drop back to [3] than it is to drop to [4].

This isn't to say it's not possible to have a general mechanism in which a combinator could receive back some "token" from a combinator, and then "reject" the token if it wanted to say that even though the component combinator itself succeeded that any contributions it made to COLLECT would be inapplicable. It just means you can't do this with a simple stack-based collection implementation.

If BLOCK! itself is a combinator, though, none of this is academic. It's a real question of a combinator needing to issue decisions that even successful parsers might need to have their contributions rolled back based on something else that happens.

Semantically it's something to chew on, and as @BlackATTR can attest it took me a bit down the other day. (I'm at his house, watching his dogs. )

hostilefork · July 27, 2021, 5:59am

I could have mentioned an even simpler case than BLOCK! or my fictional MAXMATCH. What about plain old AHEAD?

>> uparse "aaa" [collect [ahead ["a" keep (<1>)] 3 "a" keep (<2>)]]
== [<1> <2>]  ; or just [<1>] ?

The AHEAD matched the ["a" keep (<1>)] rule, and made a note that it succeeded.
That rule's advancement was not used to advance...the parse position was "unwound" back to the position before the AHEAD.
All 3 "a" were consumed by the ensuing rule of 3 "a" keep (<2>)

Philosophical question: if the positional advancement was unwound, why wouldn't the KEEPs be unwound too?

Again, it seems tempting to just say "all rules that succeed don't rewind their KEEPs", but BLOCK! rules show that doesn't work. They roll back on successful rules whose entire alternate clause did not succeed.

What About a "virtual" Sub-Sequence Combinator?

We could say that how BLOCK! combinators avoid the problem is to delegate to sub-combinators.

So if your input is "aa" and the BLOCK! combinator looks like:

 [keep "a" keep "a" | keep "a" keep "b"]

If we say BLOCK! calls 4 KEEP combinators... and the combinator hook is the only rollback, there's not enough information to roll back. The first keep "a" succeeds and the combinator system can't realize that was part of an ultimate failure, because the overall expression succeded in the end.

-or- BLOCK! could break itself down to call 2 SUBSEQUENCE combinators that it synthesizes. Each of which takes a rule BLOCK! and keeps going until it hits a | or end of BLOCK".

This way, the generalized hook for rollback could still be at the combinator level. But it would restrict rollback to a simple sequence of ordered success and failure.

This would address the issue of the BLOCK! combinator; because it makes its decisions on success from left to right only. But it would not permit the theoretical MAXMATCH that does "rollback".

This Is Why Pure FP People Call Redbols Fundamentally Broken

Haskell people would cite this as the problem with shortcuts that come from implicit parameterization and side-effects (why Python, Ruby, JavaScript etc. are dead-ends too...none of them would bother to remark much on Redbols because they would feel them disproven from seeing any single point of them which seem "nothing new under the sun")

When I say "shortcut" I mean at the moment a combinator returns two things: the synthesized value and the remaining amount of input. And COLLECT is trying to sneak past this by not making what a rule collected a third return result. If every combinator had to "pipe" which parts of the collected results of the parsers it had called up to its own return of what it collected...then (for instance) either decision for MAXMATCH would be feasible.

I think this third result is the only way to go, but what bothers me is having it so specific to the COLLECT feature.
I don't want simple combinators to get any harder to write than they are now, so if the third return result is not requested then maybe there's something we can do automatically.

Tough to think about. Let's think about WHILE (simplified slightly, as my point is "I want things to be simple"):

while: combinator [
    {Any number of matches (including 0)}
    return: "Result of last successful match, or NULL if no matches"
        [<opt> any-value!]
    parser [action!]
    <local> last-result result pos
][
    last-result: '~void~
    cycle [
        [result pos]: ^(parser input) else [
            set remainder input  ; overall WHILE never fails
            return unmeta last-result
        ]
        last-result: result
        input: pos
    ]
]

That doesn't look so bad.

But can we exempt it from having to explicitly say that its collect products are going to come from its parsers as called in sequence? Should we? If not, we wind up with:

while: combinator [
    {Any number of matches (including 0)}
    return: "Result of last successful match, or NULL if no matches"
        [<opt> any-value!]
    parser [action!]
    <local> last-result result pos total-collected parser-collected
][
    last-result: '~void~
    total-collected: copy []
    cycle [
        [result pos parser-collected]: ^(parser input) else [
            set remainder input  ; overall WHILE never fails
            set collected total-collected
            return unmeta last-result
        ]
        last-result: result
        append total-collected parser-collected
        input: pos
    ]
]

This seems a slippery slope. It felt like WHILE said everything it had to say in the first version, this junks it up. And there's also GATHER and EMIT.

How might WHILE say "I'm not like the BLOCK! combinator, every failing parser I call I want to drop their aggregate results"? One way is to show a lack of interest in the collection results of parsers. If you don't ask, you don't care, which means you want collection to be a pure by-product of success or failure.

[result pos]: ^(parser input)  ; I don't care

[result pos parser-collected]: ^(parser input)  ; I care

It would certainly be possible to notice if a combinator never asked for a collection result...and if it did not, assume all failing parsers it called contributed nothing...and all successful parsers it called would keep their collection results in the order they were called. I didn't say easy, just possible.

Tempting to Make the BLOCK! Combinator Special

My guilty thought is to say that BLOCK! isn't actually a combinator, BLOCK! is an immutable part of UPARSE and it is paired with the COMBINATOR abstraction itself.

This would mean the only place successful parsers were rolled back would be if they were part of the sequencing operation implicit in BLOCK!

Offhand I'm not thrilled with the idea. I like generalizing BLOCK! as "just another combinator".

I've been in vacation mode for a while but intend to get back to development mode, so, hope to have an answer to this soon.

hostilefork · July 29, 2021, 6:39am

4 posts were split to a new topic: Getting Hooks Into "Events" during PARSE

hostilefork · July 28, 2021, 10:01am

So I thought I might play a little bit of a devil's advocate on what I say below:

hostilefork:

>> uparse "aaa" [collect [either 2 keep ["a" "a"] 3 keep ["a"]]]
== ["a" "a" "a"]
Remember that a BLOCK! rule evaluates to its last result, and TEXT! rules against strings evaluate to the rule itself, so this is the expected answer.

I wrote this example above the way I did instead of putting the two arguments to this hypothetical EITHER in BLOCK!s:
>> uparse "aaa" [collect [either [2 keep ["a" "a"]] 3 [keep ["a"]]]]
== ["a" "a" "a"]
Because if I did it that way, then you might argue that the only place COLLECT needs to get its hooks in would be BLOCK! combinators. But we are saying part of the interesting natural-language property of PARSE is to (correctly) manage the implicit grammar...it's foundational to Rebol philosophy to not require the blocks (though to offer the option if the grouping makes it more clear).

If we make an analogy with a plain EITHER in historical Rebol, the absence or presence of blocks does make a difference.

Without blocks:

rebol2>> truebranch: func [] [print "running truebranch" [<true>]]

rebol2>> falsebranch: func [] [print "running falsebranch" [<false>]]

rebol2>> either false truebranch falsebranch
running truebranch
running falsebranch
== <false>

With blocks:

rebol2>> either false [truebranch] [falsebranch]
running falsebranch
== [<false>]

Though that changes the result. Hence perhaps blocks aren't the best analogy though, as really it's GROUP!s in plain Rebol that line up with how PARSE thinks of BLOCK!:

rebol2>> either false (truebranch) (falsebranch)
running truebranch
running falsebranch
== <false>

Either way, weird early-morning not-quite up-yet thought is: might it be not entirely un-Rebol-like to say that you only get the rollback from BLOCK! when alternates fail?

This would give you the power to introduce rollback whenever you found you weren't getting it where you needed it. Maybe in the search for overall reduction of systemic complexity, maybe I was right when I wrote it the first time? :-/

hostilefork · January 11, 2024, 8:41am

I realized this thread doesn't actually have the solution at the end... :-/

Needs a rewrite.