Upon the announcement of UPARSE, @Brett listed as his secondmost missing feature the idea of knowing "how far a parse got":
- An ability to return the furthest input point matched and the rule that caused rollback from there on parse failure. During development of rules this generally indicates the rule that is not properly specified.
What I've done is make it so combinators use a common generator COMBINATOR. This generator takes the function body you give it, and stuffs in some boilerplate parameters (like the INPUT and STATE). But it also wraps your code with some more boilerplate that can run before and after the parser.
The current idea of the "parser state" is just to pass around the FRAME! of the UPARSE operation itself. So if you have any global state you want visible to the parse you can put it there. Hence the state gives every combinator access to the arguments, return values, and locals of the invocation.
I made FURTHEST a multi-return value. The hooked combinators are run, and then if they succeed they're checked to see if they got further than any previous combinator. If so they update furthest.
>> [_ furthest]: uparse "aaabbb" [some "a" some "c"]
; null
>> furthest
== "bbb"
I Notice TO and AHEAD Skew FURTHEST a Bit Far...
Consider the case of the TO combinator. It's supposed to move the parse position to right before an instance of the matching rule.
But the subtlety of backing up that position is lost on FURTHEST...which just noticed that a successful parser run occurred, and updates the high water mark:
>> [result furthest]: uparse "aabbcc" [to "bb"]
; null
>> furthest
== "cc" ; not "bbcc"
It's a problem that's kind of a parallel with rollback. Which leads to the discovery of a quirk! TO does not manage its "pending" list explicitly...it uses the default "auto-routing". Hence the success of the last parser it calls--whose advancement it doesn't want--counts in a collect:
>> uparse "aabbcc" [collect to [some keep "b"], elide [2 "b" 2 "c"]]
== ["b" "b"]
Is that right? (cc: @rgchris) If it's not right, then it would seem that any KEEPs inside a TO rule should never have an effect. That seems strictly less powerful than being able to grab things when you find an ahead-match, so I think it's okay.
But with FURTHEST it's less clear.
Pathology Studies: How About MINMATCH?
I made MAXMATCH as a case study of different approaches to influence on COLLECT:
Similarly we could ask about MINMATCH, and what its participation with FURTHEST should be.
It could call two parsers...have both succeed...and then only advance the smaller amount of the two. We might say this "foils" a wrapper-based approach to updating furthest, as it would be advanced by the larger amount.
I'm hesitant to burden combinator authors with another parameterization just to express distinctions for the purposes of FURTHEST. But it's a good thought experiment for what the limits are.
I'm not really sure what the behavior here would be.
Maybe you can look at how the furthest detection works and explain in the context of the code what you would want.