Semantics of UPARSE's FURTHEST

hostilefork · November 2, 2023, 4:54am

hostilefork:

I made FURTHEST a multi-return value. The hooked combinators are run, and then if they succeed they're checked to see if they got further than any previous combinator. If so they update furthest.
>> [_ furthest]: uparse "aaabbb" [some "a" some "c"]
; null

 >> furthest
 == "bbb"

Giving back FURTHEST as a multi-return had to be axed, for several reasons.

For one thing: PARSE now raises a definitional error when it does not reach the end of input. Returning an error isotope is fundamentally incompatible with unpacking return values from a returned block isotope, you can't do both!

And secondly: PARSE is designed to allow you to use the full bandwidth of return values, including multi-returns. This means combinators themselves might be multi-returns (I've suggested TALLY could be a multi-returner, giving the count as a primary product but also including the synthesized product):

 >> [count result]: parse "aaa" [tally ["a" ('a)] | tally ["b" ('b)]]]
 == 3

 >> count
 == 3

 >> result
 == a

Or you can just explicitly evaluate to a muti-return PACK from a group evaluation, for whatever reason...

parse "..." [... accept (pack [<a> 2]) ...]

Stop-Gap Measure: a "Crappy" FURTHEST Hook

An early adaptation of @BlackATTR's Query had used the FURTHEST return value from UPARSE. But that needed to backtrack into painful inclusion of furthest: <here> markers in every rule. That's no good.

But UPARSE's generic hookability allows for a rule-stepwise debugger

So of course, simply asking how far it got is not hard to ask for. Just use a hook which writes to a variable you specify. I hacked this up separately as PARSE-FURTHEST:

>> parse-furthest "aaabbb" [some "a" some "c"] 'far except [
       print ["Furthest was:" mold far]
   ]

Furthest was: "bbb"

Well, it's better than nothing!

But What Do We Really Want?

One way of looking at this could be that FURTHEST is a field in the error that gets returned.

>> parse "aaabbb" [some "a" some "c"] except e -> [
       assert [e.id = 'incomplete-parse]  ; only raised error?
       print ["Furthest was:" mold e.furthest]
   ]

That's not bad, other than it seems there might be a lot of things you want to know about a failed parse besides just FURTHEST. You'd want to know the actual parse position, but other aspects about the rule state as well. Is it too much stuff to pack into an error object?

You might have been keeping running track of the line and column (which I've wanted to find a way to expose via <line> and <column> combinators, where applicable). That has to be updated as the parse proceeds, you wouldn't want to have to recalculate that in case of an error...?

So it seems to me that there may be need for a parse state object which can be made available to reflect after a parse that's successful -or- unsuccessful. The PARSE function would be a convenient wrapper on top of this, but you could dig deeper for more serious needs.

For right now, the PARSE-FURTHEST hack keeps the mechanism behind the functionality tested, but there's clearly a lot to think about here (and I believe looking at Haskell for inspiration on these kinds of questions is wise).