Mutation and UPARSE: Winds of CHANGE

hostilefork · March 13, 2021, 5:06pm

Mutating operations in PARSE introduce a bag of issues that are a bit hard to manage. Topaz decided to dismiss with it altogether. I don't think one should be so quick to dismiss it, especially in a language which has an embrace of mutability kind of at its core.

Anyway...historically CHANGE has had the somewhat odd question of what its second argument represents. The first is a rule that spans the region you want to replace. And the second argument is...what, exactly?

You can do a literal:

red>> str: "(aaa)"
red>> parse str ["(" [change [some "a"] "bbb"] ")"]
red>> str
== "(bbb)"

Note that literal isn't a rule...it wasn't matched against any "b" in the input. You could have also calculated the replacement via a GROUP!

red>> str: "(aaa)"
red>> parse str ["(" [change [some "a"] (reverse "xyz")] ")"]
red>> str
== "(zyx)"

But what if you wanted to make the replacement a function of the data it was replacing?

I thought I'd give a shot at a different way of thinking about this with UPARSE. What if the second argument was a rule, just a value-bearing one? And what if it processed the same input position that was given to the thing determining the span to replace?

Since (...) are value-bearing rules you have the choice of just dropping in a value with that:

>> str: "(aaa)"
>> uparse str ["(" [change [some "a"] ("bbb")] ")"]
>> str
== "(bbb)"

For the cost of a couple more characters, you have the advantage of helping readers know that you are not matching that "bbb" against any input...but fabricating a value out of whole cloth.

But you're not just signaling that...because by making that second argument a rule slot, you can gather information. We give the replacement rule the same position to look at.

Let's try a little differently.

>> str: "(aba)"
>> uparse str [
    "("
    change [to ")"] collect [
        some ["a" keep ("A") | skip]
    ]
    ")"
]
>> str
== "(AA)"

This shows the added flexibility of making the second argument fit into the model of "rule": You can react to the input you are replacing.

Do Combinators (and PARSE) Need /LIMIT ?

So one thing that's weird about that rule is that the replacement rule isn't bounded by how far the first argument matched. It seems like it might should be limited by that (?)

Yet this won't be 100% obvious, because it will be changing the span and not the result. Trying to say change [between "(" ")"] [...] will be including the parentheses in the changed region, because you aren't changing the generated value, you're always changing the input.

So even if there were a limit, it would be a limit at the tail of the span...and the beginning would be the start of the span. This boils down to there being a difference in using the consumed range of a rule vs. its product.

Anyway, these changes are things anyone can hack on.