TRY PARSE + PARSE EXCEPT : FAIL On Mismatch

hostilefork · August 19, 2022, 6:23am

There's a snazzy new potential for giving more informative messages / logs from failed parses... which means we now have a more interesting option than we might have had before.

Imagine something like:

>> parse "aaa" [some "a" some "b"]
** Error: SOME requires at least one match
** Where: [some "a" \\ some "b" \\]
; Note: this failure can be intercepted by TRY, EXCEPT, ATTEMPT

It can't be perfect unless it maintains some kind of large error tree that accumulates the list of all the reasons it decided to fail, so you might have to be in a debugging mode to ask it to give you a bigger diagnostic. But we can build it now with the participation of the combinators themselves.

But not only this, we could open up the full spectrum of return values. Right now if your parse returns NULL, it has to be contorted into a null isotope to avoid accidentally cuing an ELSE. Similar contortions for void, blank, and logic false.

result: parse block rules except e -> [print "Got an error", return none]

; If you got here, you know result is good
; Even if it was a purposefully returned NULL, etc.

There'd be some way to rig this up without using enfix. I might make things more lax about letting you assign error isotopes, because the isotope will bubble through and cause a problem anyway. So you could write something like:

 if raised? result: parse block rules [
     print "You have a failure, use ^result to get it"
 ]

If you were willing to collapse failure down to a NULL or somesuch, or didn't even care about the result, you could just TRY it.

 try parse block rules

Too Good Not To Be The Default

Of course you'll be able to reskin it however you like for the R3C's or R3Chius out there. But I think this looks like a perfect convergence to put in the box.

Errors aren't going to be that interesting on day one, but it's good to point the ship in the right direction.

hostilefork · August 11, 2023, 2:32pm

Looking at a parse rule in @gchiu's Pharmac project, I saw this:

; get the SAnnnn part of the pdf name
;
root: parse pdfname [between <here> ".pdf"]

It's a clean, casual usage of the value-synthesizing abilities of UPARSE...

...and I think that cleanliness should offer you safety by default.

So it seems that if the filename is aspirin.pdf.zip you should get an error raised to draw your attention to the unexpected circumstance.

If you didn't want an error, you have a lot of options to craft your rules

root: parse pdfname [between <here> ".pdf" elide ...]

parse pdfname [root: between <here> ".pdf" ...]

parse pdfname [root: between <here> ".pdf" to <end>]

root: parse pdfname [between <here> ".pdf" <end> | accept (#badfile)]

etc. UPARSE dances and sings.

Or if you just wanted it to ignore success or failure in a "classic" way, use TRY:

root: try parse pdfname [between <here> ".pdf"]

That seems a good enough way to ask for the bad enough thing. It will give you a NULL if there's no ".pdf" anywhere, and what's before .pdf for anything up-to-and-including something.pdfoobar

I Say Again: "Too Good Not To Be The Default"

...but it's a pretty big change, which will take some figuring.

hostilefork · September 16, 2023, 1:09am

Since I feel RAISE of an error is such a superior option, I thought it would be worthwhile to go ahead and retrofit PARSE3 to do this as well.

It really clarifies things. For instance, there was some rather old code that looked like this:

let directives: collect [
    let i
    if block? prior [
        parse3 prior [some [set i: issue! (keep i)]]
    ]
]

It predates COLLECT being a feature of PARSE3, so it uses regular COLLECT (which lacks the nice properties like backtracking the collected data when rules fail).

What it was trying to do was that you'd have something like:

[#foo #baz #bar a b c d ...]

It would collect any issues at the head of the block, e.g. [#foo #baz #bar]. There may be no issues at all, in which case the collected result would be an empty block []

But using a plain SOME rule is bad way of conveying this. The PARSE is failing, and that failure is just treated casually. At the very least, this should be a try some.

Then we have the issue of not reaching the end, and that being "okay". But the okayness is nowhere in the rule formulation, meaning the person reading this code is in the dark. It needs something like a to <end> or an accept (true).

let directives: collect [
    let i
    if block? prior [
        parse3 prior [
            try some [set i: issue! (keep i)]
            accept (true)
        ]
    ]
]

Of course, modern UPARSE is much slicker (!)

let directives: parse prior [accept collect try some keep issue!]

It caught the fact that ABOUT has been broken for years!

If you look way back in the history of open source Rebol, you will find an simple dialect for printing out the ABOUT information. The dialect looked like this:

make-banner [
    *
    -
    "REBOL 3.0 [Alpha Test]"
    -
    = Copyright: [system/build/year "REBOL Technologies"]
    = "" "All rights reserved."
    = Website:  "www.REBOL.com"
    -
    = Version:  system/version
    = Platform: system/platform
    = Build:    system/build
    = Warning:  "For testing purposes only. Use at your own risk."
    -
    = Language: system/locale/language*
    = Locale:   system/locale/locale*
    = Home:     [to-local-file system/options/home]
    -
    *
]

By means of the generation function MAKE-BANNER it would transform into an asterisk-laden box, with the substitutions made:

**************************************************************************
**                                                                      **
**  REBOL 3.0 [Alpha Test]                                              **
**                                                                      **
**    Copyright: 2014 REBOL Technologies                                **
**               All rights reserved.                                   **
**    Website:   www.REBOL.com                                          **
**                                                                      **
**    Version:   2.101.0.4.40                                           **
**    Platform:  Linux libc-x64                                         **
**    Build:     7-Nov-2014/18:50:11                                    **
**    Warning:   For testing purposes only. Use at your own risk.       **
**                                                                      **
**    Language:  none                                                   **
**    Locale:    none                                                   **
**    Home:      ./                                                     **
**                                                                      **
**************************************************************************

Simple though it is, I think it is a pretty good example of how you can "rethink" your problems in a dialect to pare down your code to express your intent.

But it was based on a PARSE rule with no check for whether the parse had actually succeeded or failed! This means that when paths like system/version were changed to tuples like system.version the parse rule stopped working, and only printed the top part of the box in Ren-C:

**************************************************************************
**                                                                      **
**  REBOL 3.0 (Ren-C branch)                                            **
**                                                                      **
**    Copyright: 2012 REBOL Technologies                                **
**    Copyright: 2012-2021 Ren-C Open Source Contributors               **
**               Licensed Under LGPL 3.0, see LICENSE.                  **
**    Website:   http://github.com/metaeducation/ren-c                  **

Even though the tests would run ABOUT to make sure it didn't crash, it didn't catch this half-output-box.

I Was Convinced, Now I'm More Convinced

As I go through the system and fix/clarify these issues, there's no regrets on the change.

Red's lack of definitional failure means they couldn't embrace the idea even if they wanted to. Triggering an error when a parse didn't succeed would be conflated with any other error that happened inside the parse (typos etc.) so trapping errors would be impractical. Oh well...

hostilefork · November 28, 2023, 12:39am

hostilefork:

Or if you just wanted it to ignore success or failure in a "classic" way, use TRY:
root: try parse pdfname [between <here> ".pdf"]
That seems a good enough way to ask for the bad enough thing. It will give you a NULL if there's no ".pdf" anywhere, and what's before .pdf for anything up-to-and-including something.pdfoobar

So this is good when you're writing an expression that's supposed to return a value. But if you're only interested in if the parse reached the end or not, it's not good enough.

>> try parse "aaabbb" [some "a" try some "b"]
== ~null~  ; isotope

That parse reached the end, but the product of the TRY combinator inside the parse block when it doesn't succeed is NULL. So if try parse ... is unfortunately not a general answer to testing if a parse completed.

if not raised? parse ... works but is overly verbose.
if unraised? parse ... is a little shorter but awkward.
if ok? parse ... uses a very old idea of OK? meaning not-errored, but a bit nebulous in what it means, and perhaps looks a little "cheap" and abbreviated.
if okay? parse ... looks a little less cheap and abbreviated but I don't know if the added length vs. OK? really pays for itself.
if try? parse ... builds on the existing TRY concept so you can see the relationship, but is a bit weird.
if did parse ... looks nice but DID is taken for something else, that I don't think should be conflated with error defusion.
if success? parse ... or if succeeded? parse ... is a little bit like OK? in the sense that success is a pretty open-ended concept. But TRY is tied up a bit with "success" so SUCCESS? as the complement to RAISED? isn't a terrible thing.
if good? parse ... is weird, and the suggestion of BAD? as a synonym for RAISED? is weird also. But it reads a little better than IF OK?
if parse? ... as a different form of PARSE that just returns true or false based on reaching the end is something that has shown up from time to time, but I've never been crazy about it. Functions ending in question mark tend to take one argument. This is like it's testing if something is a parse-state object or similar, and I can't quite read it as "if parse reached end".
if completed? parse ... is appealing, it's a bit of a parallel to Rebol2's FOUND? to pair with FIND. It leads to a world where completed? null is true. While not being a generic word it would function generically to other things that might raise failures leading to misuse, and it's a long word.

Everything has its upsides and downsides. I'm not crazy about if ok? parse... but I like it better than if parse? ... and it's still brief. And it gives OK? as a generic NOT RAISED? , as well as if not ok? as a way of saying if raised? if you liked that better.

Or... A Refinement to PARSE, or Combinator?

For the sake of covering all the bases, I'll mention this could be attacked on the insides of parse, as well.

if parse/completed "aaabbb" [some "a" try some "b"] ...

if parse/tail? "aaabbb" [some "a" try some "b"]

if parse "aaabbb" [completed? [some "a" try some "b"]] ...

if parse "aaabbb" [some "a" try some "b" || accept <end?>] ...

I'm actually not hating PARSE/COMPLETED. Maybe something shorter...in the spirit of PARSE/DONE but better?

But if you get it shorter to PARSE/OK you aren't really doing better than OK? PARSE and losing generality.

Experience Probably Informs This

I think it's hard to see clearly in particular having seen so many different return modes of PARSE over time.

As you get experience in the new world with "parse raises an error if it doesn't reach the end" then a generic tool like OK? for "checks to see if it raises an error" might seem to make perfect sense. You just have to be used to the idea of errors, and not expect anything otherwise...so of course you need some kind of weird error-detecting construct.

Trying it out a little, SUCCESS? is definitely more literate than OK?. But I don't know that the length is worth it, and OK? seems pretty learnable.

People can of course easily make their own shorthands, even PARSE?, if they like it.

hostilefork · February 22, 2024, 8:50am

hostilefork:

So this is good when you're writing an expression that's supposed to return a value. But if you're only interested in if the parse reached the end or not, it's not good enough.
>> try parse "aaabbb" [some "a" try some "b"]
== ~null~  ; isotope
That parse reached the end, but the product of the TRY combinator inside the parse block when it doesn't succeed is NULL. So if try parse ... is unfortunately not a general answer to testing if a parse completed.

I Think I Have The Answer...

...and that answer is VALIDATE.

>> validate "aaa" [some #a]
== "aaa"

>> validate "aaa" [some #b]
== ~null~  ; anti

If the only thing you're interested in testing for success or failure, then this variant of PARSE which evaluates to either the input series or null will give you a falsey result only if the parse does not succeed.

It's also a combinator, so you can use it from within PARSE (or another VALIDATE).

So don't do if parse ... unless you are testing for results of the rules that are logically synthesized based on the idea that's what the rules are outputting.

 if parse "ttt" [some "t" (true) | some "f" (false)] [
     ...
 ]

If you're using arbitrary rules and not controlling the result carefully, you want VALIDATE and not TRY PARSE.

The only uses of TRY PARSE should be if you don't care if the parse reaches the end and you don't care if the rules fail. If you only want to allow for not reaching the end of the input, I've added PARSE/RELAX.

>> parse "aaa" [#a]
== #a
** Error: PARSE partially matched the input, but didn't reach the tail

>> parse/relax "aaa" [#a]
== #a

>> parse "aaa" [#b]
** Error: PARSE BLOCK! combinator did not match input

>> parse/relax "aaa" [#b]
** Error: PARSE BLOCK! combinator did not match input

It took a long time to get here, but I think this gives pretty complete coverage.

TRY PARSE + PARSE EXCEPT : *FAIL* On Mismatch