When Should PARSE Notice Changes

hostilefork · February 27, 2021, 7:59pm

What do you think should happen in the following situation?

rule: [2 "a" (rule: [2 "b"])]

parse "aabb" [some rule]

Should that be a successful PARSE, or not?

Rebol2, R3-Alpha, and Red say yes. But I don't think it should succeed. I think that if you say some rule then the entirety of the SOME should be matching the same value.

The Case For Making SOME RULE Capture the RULE

Let's consider a parallel experience outside of PARSE. Imagine a plain function called SOME, where a RULE variable is passed as an argument:

some: func [arg] [
    until [not do arg]
    return <finished>
]

rule: [(rule: [false]) true]

>> some rule
; infinite loop

That would be an infinite loop, because SOME would latch onto a value for its arg and never let go of it. It would get arg as [(rule: [false]) true]...and that's what arg would be regardless of what you do to rule.

The only way you could get the above SOME to see that the RULE variable changed would be if it took its argument quoted. It would then have to fetch the variable every time.

some: func ['arg] [
    until [not do (either word? arg [get arg] [arg])]
    return <finished>
]

rule: [(rule: [false]) true]

>> some rule  ; this version terminates
== <finished>

Getting messier, and quoting arguments to functions takes away the power of the evaluator. The SOME function becomes concerned about the details of variable fetching vs. literals. Once it fetches, real code would have to type check and becomes convoluted.

You then have another issue: if the quoting abstraction wanted to support GROUP!s to calculate the argument to SOME, then you have to evaluate that argument each time. Beyond performance, you have to worry about side-effects.

Bringing it back to PARSE again: delegating to the "power of the evaluator" for PARSE actually means delegating to the parsing engine. e.g. if I want to say some 2 opt "a" then that's a bit like passing an expression to a function. There's no variable that contains [2 opt "a"], so you can't refer to it through an address. SOME needs to be able to accept some sort of "parser combinator" by value that the parse engine builds and gives to it.

Not Capturing Is Either Slow, Or Broken

I didn't think Red actually would look-up the variable each time, but it does.

red>> rule: [2 "a" (rule: [2 "b"] outer/2: 'surprise)]

red>> parse "aabb" outer: [some rule]
*** Script Error: PARSE - invalid rule or usage of rule: surprise

We modified the outer rule from [some rule] to [some surprise] and it saw that. This is technically sound (on the surface, at least)...a bit like a quoting function...but I've made the case above for why it isn't desirable.

Rebol2 and R3-Alpha do not re-fetch the variable:

rebol2>> rule: [2 "a" (rule: [2 "b"] outer/2: 'surprise)]

rebol2>> parse "aabb" outer: [some rule]
== true

If it didn't fetch the variable, how did it see the change? The answer is that SOME was doing a tricky thing that user code can't do... it's holding onto a reference to a variable's value by its address.

Every time I see such things I know there's trouble. Because addresses of variables aren't stable across arbitrary user code execution. Objects can expand, moving memory around and GC'ing the old memory. Easy to break, just like this:

 obj: make object! [
      expand: does [
          repeat n 100 [append obj to set-word! rejoin ["var" n]]  ; expand
          recycle  ; reclaim memory
          loop 10000 [make object! [x: y: #junk]]  ; try fill up old memory
      ]

      rule: [2 "a" (rule: [2 "b"] expand)]
 ]

 r3-alpha>> parse "aabb" bind [some rule] obj
 ** Script error: PARSE - invalid rule or usage of rule: #junk

What you get there is unpredictable...but in this case it just shows that the use of invalid memory happened to have a value in it that allowed for an error message. Crashing could easily happen--it's stale memory.

I'd imagine you could probably crash Rebol2 also...because it's probably holding the address without actually GC marking the variable through PARSE. Something like this should be able to crash it--though it didn't on a first try--but without source code I don't feel like trying any harder:

 obj: make object! [
      corrupt: does [
          outer/2: 'not-rule
          obj: <unused>
          recycle  ; reclaim memory
          loop 10000 [make object! [x: y: #junk]]  ; try fill up old memory
      ]

      rule: [2 "a" (rule: [2 "b"] corrupt)]
 ]

 parse "aabb" bind outer: copy [some rule] obj

But generally speaking, you have a complex internal state that's hard to model to expose to userspace constructs like debugging...since you're using a feature usermode doesn't have (references to a variable's value). Being buggy is just an ancillary fact of using that unsupported concept.

Conclusion

some rule should evaluate what "rule" is once, and then repeat that for all the matches of the SOME.

This aligns with a saner division of responsibility between the individual combinators like SOME/ANY/WHILE/etc. and the parse engine itself, by removing the expectation that they should be able to see changes in their arguments...which if they were expressions, would likely be bad to recalculate (and require backtracking of the rule block to implement).

There exists a mechanism for getting the variable each time by triggering another parse block recursion, via some [rule]. Those cases that truly need the behavior can do that...recognizing that it will be less performant.

rgchris · February 28, 2021, 3:16am

This obviously merits a more considered response, but felt compelled to post this:

Gromit

iArnold · February 28, 2021, 10:57am

Agree on that this should not succeed feeling here.
PARSE is a complex operation in itself already and changing the rules during that operation is something that I would consider to be a feature that should not be allowed in software that is to be maintained (on par with the GOTO statement). When such a feature is needed perhaps it is better to test in two or more steps.
Someone should bring an awfully good case for this to be possible to the table for considering it to be allowed. I'd say if you do this, don't complain about unexpected results.

hostilefork · March 2, 2021, 10:57pm

I posted this argument before I'd gotten UPARSE written. But it should be obvious that my opinion on this was motivated by the combinators.

The value should--by this point--be apparent. As I establish, there is no loss of generality since you can put even a single word in a BLOCK!. The block combinator iterates across the rules each time, re-fetching the elements...so this is the tradeoff you make if you don't want to be building hardened/reusable combinator functions.