`||` as an Inline Sequencing Operator for UPARSE

hostilefork · May 3, 2021, 9:29pm

Let's say you have a rule like ["a" | "b"], but you want to change it to sequence a "C" after it...like this:

>> parse? "ac" [["a" | "b"] "c"]
== #[true]

>> parse? "bc" [["a" | "b"] "c"]
== #[true]

Tacking on that "C" seems like more work than it should be. You have to go a long way back to find the place to put the start of the block.

So... how about an operator that acts like "everything on the left" is in a block? Let's call it ||:

>> parse? "ac" ["a" | "b" || "c"]
== #[true]

>> parse? "bc" ["a" | "b" || "c"]
== #[true]

Slick.

I don't feel bothered by the fact that | means "or" and || means "then", because they're both being picked due to the same property of creating a clean line of parts.

This illustration explains why I think || is an okay notation, because it kind of evokes ][.

["a" | "b" || "c"] <=> [[ "a" | "b" ][ "c" ]]

There's a performance consequence to having this operator, because it means that a successful alternate match inside of a BLOCK! combinator doesn't mean you're done. You have to scan ahead and make sure there isn't a sequencing operator coming up...in which case the block still has processing to do.

I'm not that concerned about the performance issue. You could ask to use a block combinator that supports it or not...it's your choice. The default being more expressive is fine. And truthfully this really just makes sequencing success perform with parity to matching failure...when a rule fails it has to seek to the end of the block for |.

And Guess What... It's In UPARSE Right Now!

I actually noticed a bug in a rewrite of some command line argument processing I'd suggested for Giulio's webserver:

["-h" | "-help" | "--help" (-help, quit)]

It would only print the help and quit for "--help", not for "-help" or "-h".

Inline sequencing to the rescue!

["-h" | "-help" | "--help" || (-help, quit)]

It's really nice for editing to be able to just throw in the || right where you mean it, instead of needing to put the individual [ and ] at different locations.

The hits just keep coming, don't they?