2022 UPDATE: This brainstorm was the origin of the behavior of BLOCK! rules in UPARSE: "What if it made the last result in the rule the value?"
A year later, this idea is taken for granted...and has gone through evolutions in development.
Post retained for historical reference.
I've noticed patterns come up like:
parse data [
[some-rule (variable: xxx)
| variable: integer!
| some-other-rule (variable: yyy)
| ... (variable: ...)
]
]
It's a little bit like a SWITCH. But you wind up repeating the same variable name several times.
There's tremendous general value in having more ways to push the name of the assigned value outside of the rule. This makes it easier to write reusable rules.
One thing we might consider would be some kind of CATCH parallel to COLLECT, which lets you get single values instead of a block of them:
parse data [
variable: catch [
some-rule, throw @(xxx)
| throw integer!
| some-other-rule, throw @(yyy)
| ... throw @(...)
]
]
That's a bit verbose. It also ties together the moment of deciding your capturing of the variable with the moment you return it. You might not want that (think of BETWEEN-like things where you still have some stuff to match after you've found your value).
So maybe we should put more practical thought into the question of "what does it mean to SET a variable to a BLOCK! rule in the first place".
Capturing everything matched in the block seems like such a rare intent...your TOs and THRUs and matches of string bits seem like they're often not part of the capture. You might argue it should only capture things inside @... where if what you're capturing is actually a parse rule then you put it in an @[block]:
parse data [
variable: [
some-rule, @(xxx)
| @[integer!]
| some-other-rule, @(yyy)
| ... @(...)
]
]
It's more succinct. And then this gives you the option of capturing single values via parse rule in the traditional way with an @ rule:
parse [1 ...] [variable: @[integer! | text!] ...]
So imagine that doing a capture of an integer or taking a default:
parse ["foo" ...] [variable: [@[integer!] | @(0)] ...]
Should @integer! should be treated as @[integer!] or @(integer!)
I feel like the bias should probably be toward variables. Because keep @var is already an added burden for keeping a variable instead of a rule as keep rule.
Hence @integer!
would act like @(integer!)
. I think I can live with the idea that a "capturing rule" is always in @[...]
What if You Say var: [@(1) @(2)]
As previously mentioned, because this isn't a CATCH/THROW situation, the idea here is that the "@-capturing" wouldn't interrupt the rule:
rule: [
[some "(", @[to ")"], some ")"]
| @('unmatched)
]
parse "((1))" [x: rule] ; gives you `"1"`, still ran the `some ")"`
parse "[[1]]" [x: rule] ; gives you `unmatched`
Notice how we're ducking the need to ELIDE those SOMEs with this "@ capture" rule.
So what happens if you have more than one capture group? What about zero?
How Do You Get Parameters?
We might say if you want parameters, there's no shortcut...you need to write a new combinator.
But you could also use GET-GROUP! to invoke a function that builds a block:
make-rule: func [delimit] [
compose [
some (delimit), x: to (delimit), some (delimit)
]
]
parse "**1**" [x: :(make-rule "*")]
I think it should be clear that the system compares poorly against other parser combinator libraries if you don't have a way to do this without the :(...)
.
But it may be wise to say that if what you are using isn't an actual bona-fide "combinator" (as recognized by some aspect of its signature)...and just a "function that makes a BLOCK!", that you run it this way.
Big Picture Conclusion: I Like This Direction for Block Capturing
While people have gotten used to writing things like:
parse [1] [set x [integer! | text!]]
I think that the semantics of "whatever you write as a rule in the block is part of the capture" is an uncommon desire outside of that narrow case. The longer the block rule, the less likely you meant to capture every fragment of rule material in it.
So I'm proposing making the "capture everything" the special case:
parse [1] [x: @[integer! | text!]]
This would leave plain BLOCK! available for a schematic that is more useful more of the time, where the things you want to capture have to be called out explicitly.
Does anyone have strong counter-arguments?