The PARSE of PROGRESS

hostilefork · September 28, 2020, 5:02pm

There has been a lot of fiddling over time with PARSE's return value.

It was long believed that a failed PARSE should return NULL. This would make it play nicely with ELSE and THEN. The question was thus what to return on success:

Just returning ~okay~ makes the output of PARSE easier to read in tutorials. This isn't overwhelmingly important.

Returning the input value would make it easy to use PARSE as a validator for data.

if parse data [integer! integer!] [  ; exactly two integers
   call-routine data
] else [fail]

call-routine (parse data [integer! integer!] else [fail])  ; nicer

call-routine non null parse data [integer! integer!]  ; even nicer :-)

Returning how far a successful parse got was strictly more informative, as the information on a partial process is difficult to reconstruct otherwise.

For at least some time, @rgchris favored #3, because many sophisticated tasks are helped by knowing how far PARSE got. But that required a change to the semantics of PARSE to not automatically fail on partial inputs, so the rules had to explicitly ask to hit an <end>

But the need to tack on <end> made some things seem less concise and elegant. And surveying how other languages do "destructuring" made me feel that PARSE requiring completion was the best answer in the Redbol world. When you're matching a structure against [tag! tag!] it feels somewhat wrong for [<x> <y> <z>] to "match" when it seems "over the limit".

UPARSE Offers The Best Of All Worlds

Everything changed with UPARSE.

First of all, if a PARSE doesn't match it raises a definitional error. This provides a welcome safety net.

>> parse "abc" ["ab"]
** Error: PARSE partially matched the input, but didn't reach the tail

You can use TRY PARSE if you like, and get NULL..though possibly conflating with a NULL synthesized by the last matching rule (e.g. OPT synthesizes null when the optional thing was not there). You can use EXCEPT to specifically handle exceptions in a postfix manner. Or using META/EXCEPT will give you a plain ERROR! on definitonal error, and a META'd value otherwise.

All rules synthesize a result (though a nihil result is legal, e.g. you can ELIDE a rule), and you can end the parse at any time with ACCEPT:

>> parse "abc" ["ab", accept <input>]
== "abc"

>> parse "abc" ["ab", accept <here>]
== "c"

You can even pack up multi-return values and give them back. The possibilties are pretty much endless, and so the policy of returning the synthesized result has won out.

hostilefork · March 7, 2025, 12:32am

I've mentioned that this is pretty easy to write. But it doesn't mean there shouldn't be a name for it...

It seems to me a reasonably good name for this is PARSE-THRU..

>> parse-thru "aaabbb" [some "a"]
== "bbb"

It can be implemented any number of ways, but an easy one is to ADAPT the rules slightly before running the PARSE. Since RULES is a BLOCK!, you can just compose it in, and follow it with an ACCEPT of wherever the current position is.

/parse-thru: adapt parse/ [
    rules: compose* [(rules) accept <here>]
]

This will default to erroring if it doesn't match, so you'd have to use try parse-thru if you wanted a null when there was a deliberate mismatch:

>> parse-thru "bbbaaa" [some "a"]
** Error: PARSE BLOCK! combinator did not match input

>> try parse-thru "bbbaaa" [some "a"]
== ~null~

If you want to work around this, there's lots of ways to do it. You could make an alternative to return null:

/parse-thru: adapt parse/ [
    rules: compose*:deep [[(rules) accept <here>] | accept null]
]

Or rig it up so that the rule is optional, and use PARSE:RELAX to remove the requirement that it reach the end:

/parse-thru: adapt parse:relax/ [
    rules: compose*:deep [opt [(rules) accept <here>]]
]

Lots of ways to get the effect:

>> parse-thru "bbbaaa" [some "a"]
== ~null~  ; anti

Another Interesting Interface: PARSE-MATCH

Being able to get the input, or a NULL, can be useful as well. Similar technique will get it, just swap the <input> combinator for the <here> combinator, and don't remove the requirement to reach the end:

/parse-match: adapt parse/ [
    rules: compose* [(rules) <end> <input> | accept null]
]

>> parse-match "aaabbb" [some "a" some "b"]
== "aaabbb"

>> parse-match "bbbaaa" [some "a" some "b"]
== ~null~  ; anti

>> parse-match "aaabbb" [some "a"]
== ~null~  ; anti

Endless Possibilities... But How To Compose Them?

In the Visual Parse Demo I showed how a tweaked PARSE variant, that I called eparse, could be rigged up to make underlines in the web-based text editor for anything you marked with a MARK combinator (with rollback, such that marks would not be made if the whole rule did not ultimately match...)

So do you have to write EPARSE-THRU and EPARSE-MATCH?

If instead of having these modes be done with wrappers--that they were instead refinements on PARSE itself--you'd get EPARSE:THRU and EPARSE:MATCH "for free". Perhaps they could be more efficient in their implementation as well.

But then you start having situations where people can do nonsensical combinatorics, like eparse:thru:match.

...or (Weird Idea) Could PARSE Have Some Other Hookability?

It might be that if you ask to PARSE an OBJECT!, that the object could act as some kind of specification... like providing the combinators and where to look for the data.

e.g. parse editor [some "a"] could look at the editor object, and have behaviors particular to that object. This would mean that parse-match editor [some "a"] could work as well.

Separate Entry Points vs. Refinements Is The Safer Bet

In the scheme of things, having PARSE-MATCH and a PARSE-THRU entry points is easiest, because you'll be able to do that regardless.

But like I say, the default being the synthesized result of the rules... with error by default if a match or ACCEPT is not reached... that's a super powerful default that I'm really happy with.

hostilefork · March 7, 2025, 2:05am

hostilefork:

It can be implemented any number of ways, but an easy one is to ADAPT the rules slightly before running the PARSE...
/parse-match: adapt parse/ [
    rules: compose* [(rules) <end> <input> | accept null]
]

I should mention that this isn't necessarily the best way to do it, because someone might redefine those combinators.

So for instance...in the case of PARSE-MATCH, you can do it another way...that's actually faster, and doesn't rely on any specific combinators:

/parse-match: enclose parse/ func [f] [
    eval f except [return null]
    return f.input
]

But making PARSE-THRU work without relying on combinators would pretty much require exposing some sort of alternate interface from the lower-level parse. Or maybe composing the combinators it wants in literally (!) which it occurs to me should probably be possible:

/parse-thru: adapt parse/ [
    rules: reduce [rules default-combinators.accept default-combinators.<here>]
]

Anyway, I just wanted to mention the double-edged sword of building these kinds of generic routines on top of combinators that someone could override, unless you remove the ability to override combinators from the interface.