Toward Reusable Rules: SET of a BLOCK! in PARSE

hostilefork · April 6, 2021, 8:25pm

2022 UPDATE: This brainstorm was the origin of the behavior of BLOCK! rules in UPARSE: "What if it made the last result in the rule the value?"

A year later, this idea is taken for granted...and has gone through evolutions in development.

Post retained for historical reference.

I've noticed patterns come up like:

 parse data [
     [some-rule (variable: xxx)
     | variable: integer!
     | some-other-rule (variable: yyy)
     | ... (variable: ...)
     ]
 ]

It's a little bit like a SWITCH. But you wind up repeating the same variable name several times.

There's tremendous general value in having more ways to push the name of the assigned value outside of the rule. This makes it easier to write reusable rules.

One thing we might consider would be some kind of CATCH parallel to COLLECT, which lets you get single values instead of a block of them:

 parse data [
     variable: catch [
         some-rule, throw @(xxx)
         | throw integer! 
         | some-other-rule, throw @(yyy)
         | ... throw @(...)
     ]
 ]

That's a bit verbose. It also ties together the moment of deciding your capturing of the variable with the moment you return it. You might not want that (think of BETWEEN-like things where you still have some stuff to match after you've found your value).

So maybe we should put more practical thought into the question of "what does it mean to SET a variable to a BLOCK! rule in the first place".

Capturing everything matched in the block seems like such a rare intent...your TOs and THRUs and matches of string bits seem like they're often not part of the capture. You might argue it should only capture things inside @... where if what you're capturing is actually a parse rule then you put it in an @[block]:

 parse data [
     variable: [
         some-rule, @(xxx)
         | @[integer!] 
         | some-other-rule, @(yyy)
         | ... @(...)
     ]
 ]

It's more succinct. And then this gives you the option of capturing single values via parse rule in the traditional way with an @ rule:

 parse [1 ...] [variable: @[integer! | text!] ...]

So imagine that doing a capture of an integer or taking a default:

 parse ["foo" ...] [variable: [@[integer!] | @(0)] ...]

Should @integer! should be treated as @[integer!] or @(integer!)

I feel like the bias should probably be toward variables. Because keep @var is already an added burden for keeping a variable instead of a rule as keep rule.

Hence @integer! would act like @(integer!). I think I can live with the idea that a "capturing rule" is always in @[...]

What if You Say `var: [@(1) @(2)]`

As previously mentioned, because this isn't a CATCH/THROW situation, the idea here is that the "@-capturing" wouldn't interrupt the rule:

 rule: [
     [some "(", @[to ")"], some ")"]
     | @('unmatched)
 ]

 parse "((1))" [x: rule]  ; gives you `"1"`, still ran the `some ")"`
 parse "[[1]]" [x: rule]  ; gives you `unmatched`

Notice how we're ducking the need to ELIDE those SOMEs with this "@ capture" rule.

So what happens if you have more than one capture group? What about zero?

How Do You Get Parameters?

We might say if you want parameters, there's no shortcut...you need to write a new combinator.

But you could also use GET-GROUP! to invoke a function that builds a block:

make-rule: func [delimit] [
    compose [
        some (delimit), x: to (delimit), some (delimit)
    ]
]

parse "**1**" [x: :(make-rule "*")]

I think it should be clear that the system compares poorly against other parser combinator libraries if you don't have a way to do this without the :(...).

But it may be wise to say that if what you are using isn't an actual bona-fide "combinator" (as recognized by some aspect of its signature)...and just a "function that makes a BLOCK!", that you run it this way.

Big Picture Conclusion: I Like This Direction for Block Capturing

While people have gotten used to writing things like:

parse [1] [set x [integer! | text!]]

I think that the semantics of "whatever you write as a rule in the block is part of the capture" is an uncommon desire outside of that narrow case. The longer the block rule, the less likely you meant to capture every fragment of rule material in it.

So I'm proposing making the "capture everything" the special case:

parse [1] [x: @[integer! | text!]]

This would leave plain BLOCK! available for a schematic that is more useful more of the time, where the things you want to capture have to be called out explicitly.

Does anyone have strong counter-arguments?

BlackATTR · April 6, 2021, 8:55pm

I like all of the above, it definitely helps with stuff I'm currently working on. I've recently taken to using PARSE as a more versatile kind of SWITCH (i.e., leveraging the matching power of PARSE rules against a directed graph of input).

And just to be clear, are your comments focused on UPARSE here? I have it in my mind that the traditional PARSE is deprecated.

hostilefork · April 6, 2021, 8:57pm

The real question here is if there's some indisputable need to be able to write:

parse [1] [x: [integer! | text!]]

If that is not negotiable, then it means there has to be compromise in other areas to facilitate it.

But I feel like the balance here is that having to say x: @[integer! | text!] is better than needing to decorate the block in some way for other capture scenarios...which in total would likely out-number and out-prioritize needing to do this undecorated.

As usual, it's a theory...so if I go write it I may figure out problems with it. I already see one, which is that the @ capture would have to be in the immediate top-level of the block...so if you have a sub-block doing a capture you'd need to note that:

 x: [... | to ... @[... @(1) ...] to ... | ...]

not:

 x: [... | to ... [... @(1) ...] to ... | ...]

So you can't skip out on it, the capture must be singularly pointed out at every level of depth. This is different from how COLLECT/GATHER work, where the KEEP and EMIT can be nested and you don't annotate things in intervening layers. But then that doesn't blend with my argument for what @[integer! | text!] migh mean Maybe that would need to be :[integer! | text!] ?

Anyway, as I say: ongoing thought...

UPARSE is a prototype for an architecture for PARSE. Features in UPARSE will just begin migrating into PARSE as they are adopted.

hostilefork · April 7, 2021, 3:27am

Something I already don't like about it is the readability problem of not being able to tell which @xxx are standalone and which are parameters. If I say [..., keep @(xxx), ...] in a rule, that's not a result or the block, that's an argument to a keep.

So should there just be a keyword that comes into being as an implicit "block result" whenever you are assigning a block to a SET-WORD!?

parse ... [var: [... blockresult @(1) ... | ... blockresult integer! ...]

That's obviously a long keyword, and any keyword is going to mean typing more. But one must remember that a key benefit here is getting the name outside the block, so that it becomes reusable.

What if it made the last result in the rule the value?

This is a bit like what happens with x: (1 2) or x: do [1 2] being 2 in regular DO code. What if rule assignments worked the same way?

There's problems with this idea, but it does mean that you can say:

parse [1] [x: [integer! | text!]]

...and get X as 1. However it means that if you said:

parse [1 2] [x: [integer! integer! | text! text!]]

You would get X as 2.

You'd have to stylize your rules somewhat so the result was the last thing. With LET that's not so difficult:

parse [1 2] [x: [let i: integer!, integer!, @(i) | let t: text!, text!, @(t)]]

So there you could get 1 as the result, instead of 2.

But with invisibles you have other options:

parse [1 2] [x: [integer!, elide integer! | text!, elide text!]]

Invisibility giveth, and Invisibility taketh away

What if you put in debug messages or something of the sort?

parse [1 2] [x: [integer! integer! (print "Hey!") | text! text!]]

Plain GROUP! is another invisible. So it doesn't count. Another win for invisibility.

That might seem good, but it means as the parse engine proceeded it wouldn't know when it was on the "last rule" just by looking at source. This inability to predict would mean every rule would be generating content which we might have to throw away. @IngoHohmann may recall my reluctance to make TO "A" produce a value without a modifier (like ACROSS TO "A"), and this kind of thing was part of why.

Still...something about this concept appeals to me

An obvious likeable thing about it is that the x: [integer! | text!] would still work.

But taking advantage of the "last expression" position, when bolstered by invisibles, has some unifying-seeming strength with the evaluator people already have to know to use. It feels synergistic.

It might warrant serious consideration as the way to get a result out of a block rule, but the need to ask every step for a result just to throw it away could end up being cost prohibitive.

hostilefork · August 15, 2022, 2:42am