PARSE Errors And You: FURTHEST, FAIL, ENSURE?

So far there's only a little bit of UPARSE featuring related to errors. One is the FURTHEST return result:

>> [v furthest]: uparse "abbbabbabcabab" [some ["a" | "b"]]
; null

>> furthest
== "cabab"

What it's doing is it's recording the high water mark of whatever a combinator called success.

It's better than nothing, I guess. But for parsers that scan ahead it might be worthless. (I'll point this out to @Brett, since he suggested the feature...)

>> [v furthest]: uparse "[ababbbcabbab]" [
       "[" ahead to "]"  ; this pushes the high water mark to the ]
       some ["a" | "b"]
       "]"
   ]
; null

>> furthest
== "]"

So here we are not implicating the "c", which people would think of as the actual culprit. But it's harder than one might think to figure out who that is.

Recap of the New FAIL Feature

With the new FAIL in UPARSE, you have a little bit of support on implicating the point of the input to complain about.

The idea is that you make sure the parse position is where you want to implicate, by making the FAIL an alternate to that position:

>> uparse "{ababcababa}" [
       into between "{" "}" [
           some ["a" | "b"] <end>
           | fail @["Between braces should be just a and b"]
       ]
   ]
** User Error: Between braces should be just a and b
** Near: "ababcababa"

(If you've forgotten why FAIL's argument needs the @, it's because the PARSE dialect has a meaning for BLOCK! already...and for the purposes of "regularity" in the dialect this tries not to override that. But this is an open issue if FAIL wants to break the rules.)

For demonstration purposes here, I didn't implicate the "c", but actually wrote it so the alternate is set to backtrack to when it started matching b. You get a different result to make the fail an alternate to the end:

>> uparse "{ababcababa}" [
       into between "{" "}" [
           some ["a" | "b"]
           [<end> | fail @["Between braces should be just a and b"]]
       ]
   ]
** User Error: Between braces should be just a and b
** Near: "cababa"

A New Fuzzy Concept: ENSURE

We have ENSURE for values outside of PARSE. It runs a test and passes through the result if it matches, or stops and errors:

>> x: 10

>> ensure integer! x
== 10

>> ensure tag! x
** Error: ENSURE failed with argument of type integer!

It seems appealing to make PARSE able to do that too:

>> uparse [<x> 10 #y 20] [collect [while [
       keep ensure tag!
       keep ensure integer!
   ]]
** Error: ENSURE failed with argument of type ISSUE!
** Near: [... 10 \\ #y \\ 20]

So a similar idea to FAIL, where you get some feedback on the input location causing the problem.

But also similar to FAIL, this doesn't work within the model of having alternates. It sees something it doesn't like and errors in the moment, without giving any | options in the rest of the rules a chance. That's a bit harsh, but maybe still would fit a lot of scenarios.

The historical ENSURE only works on datatypes. Could this work on values, or alternate values?

Far-out idea:

 >> uparse "abbbcababa" [some ensure ["a" | b"]]
 ** Error: ENSURE would have expected:
       "a"
       "b"
   But it received "c"

The idea would be that once ENSURE started, it might have some way of collecting the "leaf nodes" of failed rules. But I have no idea how such a thing could actually work.

More generally I wonder how alternates figure into any system of error delivery.

Random Weird Dialect Idea: BAD-WORD!

Just wanted to write down a strange idea I had, to use BAD-WORD! to indicate a shorthand for FAIL with a message. The idea was to make it come after a complete rule and imply a message to give if the rule to its left didn't match:

>> uparse "[ababbbcabbab]" [
       "[" ahead to "]"
       some ["a" | "b"] ~a-or-b-expected~
       "]"
   ]
** Error: a-or-b-expected
** At parse input location: "cabbbab]"

It sucks, but it was just a brainstorming idea as a shorthand for:

>> uparse "[ababbbcabbab]" [
       "[" ahead to "]"
       [some ["a" | "b"] | fail ~a-or-b-expected~]
       "]"
   ]

Maybe this points to the need for an ELSE construct, as it might be a bit smoother than having to enclose everything in blocks:

>> uparse "[ababbbcabbab]" [
       "[" ahead to "]"
       some ["a" | "b"] else fail ~a-or-b-expected~
       "]"
   ]
2 Likes

Good stuff on providing FURTHEST. Yep better than nothing and what it provides has been quite useful to me.

Not sure about the worthless comment. Rules that scan ahead are not arbitrary in my experience, they tend to a specific function. Someone writing a rule that skips ahead to "]" has declared they're not interested in validating the stuff before that bracket. I've done that for performance where I trust the data format. There's likely to be something before or after than yields useful information to FURTHEST.

Yes. How can we decide when the data or rules diverged...

I've suggested the FURTHEST behavior as one heuristic for finding an error. It was something I could achieve in Rebol 2. But with access to the backtracking mechanism, could another event be when the parse completely backtracks out to overall failure without further successful match (other than END)? It may not be exactly where things initially went wrong, but we can say that this is the moment in the progression through the input and the rules that the parse has been abandoned. Hopefully an interesting place.

I'm not sure that adding ENSURE to this example achieves anything, but I really like the idea of collecting the leaf nodes.

It's the failure of the last failed basic match test (leaf test) that precedes overall failure. If parse can track that leaf and it's parent operation (NOT, SOME, basic-alternatives, charset), maybe something descriptive can be the message.

And for verbosity, if it's tracking that much, maybe it can not clear the stack of active rules if it backtracks all the way out to failure (I haven't read the code). Perhaps some of these identified by name, to be able to identify where in the parse tree it's talking about.

2 Likes