The FAIL That Wins Big: Combinator Definitional Errors

hostilefork · August 19, 2022, 2:27am

Announcing Another Major Step Forward...

As the wheels of thought began to churn around TRY, it came to seem clear that it had a higher purpose in defusing definitional errors.

This led to a thought about how if UPARSE were to allow calling arbitrary decoders (like DEBIN) that would deliver informative errors when the input wasn't a fit...that those errors would need to be interpreted as "soft" parse match failures...moving on to the next alternate.

That called back to another musing about why we don't use the readable word TRY instead of OPT in PARSE when we want to say a component of a parse is optional.

But so long as the combinator-skinned-decoders are communicating in errors, why not make combinators always indicate failure with definitional errors?

This would free up NULL as a synthesized product to just be an ordinary result, which had been a sticking point trying to wrangle the dual needs of isotopic nulls in a fully generic dialect.
The parsers would be able to generate diverse and informative errors, that in debug modes could be tagged with where they came from. Parsers could distinguish between tunneling an error generated by another parser and emitting their own...providing more information for tracing tools.

It only took a couple of days to do... most of which was just sorting out a lot of edge cases from being a very thorough client of the relatively untested definitional error infrastructure.

But expect good things from this!

A Bit Of History: How The Previous Model Came To Be

When UPARSE was first conceived (a mere year and a half ago), the combinators were responsible for returning three things:

Whether the parser succeeded or not
A synthesized value
How much of the input was consumed (represented by a series position of the new "current" parse position, which could potentially be at the tail)

(COLLECT and friends necessitated some more nuances, but you only have to worry about it manually if you need fine-grained control. So most combinators look like these are the only results in play, with the other outputs being "autopiped" around by the machinery.)

The second two results would only be applicable if the parser succeeded. So rather than return three results, it aimed to return just two... and fold together the success with some invalid state for the other result.

At first this seemed like it would be best to fold with the series position. This would mean that the position could be either a series value or NULL. That way, NULL could be a valid synthesized product. This came in handy for things like OPT:

>> x: y: <before>

>> did parse [1020] [x: integer! y: opt integer!]
== #[true]  ; parse succeeded

>> x
== 1020

>> y
; null

The first draft used the fledgling multi-return facility to do this, and it had the nice property of working with ELSE. So when a combinator called a parser that failed, it was easy to handle that failure, e.g. to propagate that failure along:

[pos synthesized]: parser input else [return null]

But This Was Reversed... For... Reasons

A mechanical issue came up that VOID could only be represented by the primary return result of a function. If a multi-return argument was going to be returned and convey voids, it would have to use the ^META protocol... and the caller would have to be explicitly aware that the result they got would be pre-quoted by convention.

But I also noticed that some combinators didn't want to advance the input at all, only operate to transform one synthesized product into another. Or that they didn't really need to plug into the overall parse architecture. It seemed like making combinators match as closely to a "normal" function--by putting their synthesized result as the primary result--just made sense.

NULL isotopes were just coming on the scene, which gave a potential way to get out of this: a successful parser which wanted to return NULL would return the isotope form. Pure NULL would be reserved as the signal for isotopic failure. This meant the reversed parameters would be able to work:

[synthesized pos]: parser input else [return null]

Internally, to OPT something like y: opt integer! would not return NULL, but a ~null~ isotope.

But Now, It's Done With Definitional Errors!

[synthesized pos]: parser input except e -> [return raise e]

Here you see the error being intercepted, and then passed on. NULL is free to be dealt with as a normal product without interference. And there's a difference between generating a new error (tagging it with the location in the parse rules and the context) vs. just passing on one that was generated by a subparser--you are actually keeping a record of what happened, to show in logs or otherwise.