UPARSE's spin on RETURN

hostilefork · April 18, 2021, 7:53am

R3-Alpha re-used the word RETURN as a PARSE keyword. If it was used, it would make the overall PARSE expression evaluate to what you passed it in a GROUP!.

Conflating the word "return" bugged me, because it felt too easy to confuse with a function's return. Of course, other keywords in PARSE were conflated as well - you had to know you were writing parse rules to know what the semantics were. But RETURN felt worse, for some reason.

RETURN Deleted in 2018, Brought Back for UPARSE in 2021

Despite having dropped the R3-Alpha feature, I thought to give it another shot in UPARSE... so see if my opinion about it had changed.

I noticed an annoyance: that seemingly convenient casual usages of it require you to specify END or you may be missing something:

>> uparse [1 2 <gotcha!>] [return collect [some keep integer!]]
== [1 2]

So I was (rightfully) skeptical of incorrect uses.

BUT... when I was writing and obsessing over this issue, PARSE wasn't returning the synthesized result of its block. So getting values out of a parse would have to be done with assignments. People would thus be tempted to reach for RETURN even when they intended the match to make it to the end--resulting in mistakes like the above.

But UPARSE Started Returning Rule Results By Default

Once the natural behavior of a rule would be to synthesize out of the block, why would you "reach for RETURN" when it's so obviously cleaner to not use it?

>> parse data [collect [...]]
== [...]

People can understand that works, and checks the rules to the end...and that you should be doing that if you want the rules to be checked to the end.

With the clean answer sitting right in front of you for how to avoid a variable... the only time you would use RETURN was when it did what you meant: terminate the parse now, with this result!

So the functionality was deemed to be worth keeping!

But the name... probably shouldn't be "return".

hostilefork · September 9, 2022, 3:26pm

I have been thinking that this name should be ACCEPT. It could be paired with a REJECT, so that you could give immediate failure as well.

rule: [
     some [integer! | accept tag!] reject ("No tag found")
     || reject ("Not solely an INTEGER!-TAG! block")
]

>> parse [1 2 <tag> 3 4] rule
== <tag>

>> parse [1 2 3 4] rule
** Error: No tag found

>> parse [1 2 3 4] rule else [print "Intercepted rejection"]
Intercepted rejection

>> parse [1 2 3 4 a b c] rule
** Error: Not solely an INTEGER!-TAG! block

I'm thinking PACK could have an analogue in PARSE, to make multi-returns:

 >> [a b]: parse [1 2 <tag> 3 4] [some [integer! | accept pack [tag! <here>]]
 == <tag>

 >> b
 == [3 4]

This would enable returning a result -and- returning a position.

If you're in a SUBPARSE, I kind of feel like the ACCEPT and REJECT should apply to the subparse. I think I could live with bypassing that being done with a THROW of some kind.

There were previous usages of ACCEPT and REJECT that didn't really make sense to me. This seems a lot more powerful.

A FUNCTION's job is to RETURN value(s). Use RETURN.
A CATCH's job is to intercept THROWN values. Use THROW.
A GENERATOR's job is to YIELD values. Use YIELD.
A PARSE(er)'s job is to ACCEPT or REJECT input. Use one or the other... or use the default behavior to ACCEPT the result of the last combinator if the end of input is reached.