How to Capture Binding Of PARSE Items

hostilefork · January 29, 2024, 7:44pm

Consider some simple code that used to "work" (in of course only the simplest of cases)

>> parse [word: 10] [
       let word: set-word! let val: integer! (
           set word val
       )
   ]

We're getting some unbound values by structural extraction. But now that structural extraction doesn't propagate bindings... how do we look those values up in an environment?

We'd get the wrong answer if we said set (inside [] word) val... that would try to bind the "word" word to the LET variable from the rule. I made it conflict just to stress the point that the processing code is not the right environment to be looking up values in the data most of the time.

When PARSE is doing the processing (and recursions in our data for us), we're cut out of the loop on binding.

Solution Tactics

You can use the <input> TAG! combinator to get the input, and if there were an IN combinator you could do this yourself... handling recursions

>> parse [[word: 10]] [
       let i: <input>
       subparse in (i) block! [  ; make subparse input propagate specifier
           let sub: <input>
           let word: set-word! let val: integer! (
               set (in sub word) val 
           )
       ]
   ]

Making this a little easier might be a combinator for capturing the parse state object, for getting the input more easily at any time.

>> parse [[word: 10]] [
       let s: <state>
       subparse in (s.input) block! [  ; subparse changes s.input
           let word: set-word! let val: integer! (
               set (in s.input word) val
           )
       ]
   ]

Certainly some pain involved here. Perhaps @bradrn can appreciate the reason why propagating binding through structure automatically seemed necessary so things like this worked "like magic".

But it was bad magic. If the structural operations presume ideas about binding, that ties our hands in the interpretation of binding for the input block. We have [[word: 10]] now, but what if we wanted something like [let word [word: 10]]? It's up to the parse of this "dialect" to decide the bindings, not have it automatic. It's only the refusal of the automaticness allowing the LET in PARSE above to be implemented!

Though actually in this simple case, you could just say:

>> parse [[word: 10]] [
       subparse in <input> block! [  ; make subparse input propagate specifier
           let word: in <input> set-word! let val: integer! (
               set (in sub word) val 
           )
       ]
   ]

Even briefer, a TAG! combinator <in> that means in <input>:

parse [[word: 10]] [
   subparse <in> block! [
       let word: <in> set-word! let val: integer! (
           set word val
       )
   ]
]

Not too arduous, and you have the necessary hook points for alternative binding interpretation when you need it. And if you're just processing code structurally, you don't have to worry about it.

(Note: Trying this I remembered that TAG! combinators haven't been set up to take arguments. Should they be able to? Maybe not... none do at the moment, and it seems a reasonable policy to say they don't. If not a TAG! then what should this be? It could be the behavior of the @ operator... which is a bit incongruous with how @word etc. are handled in PARSE, but lines up sort of with wanting to capture the current sense of binding on the next argument. Something to think about, I'm calling it *in* as a placeholder just to move along)

Other Places This Pops Up

If you're writing something like a FOR-EACH loop, and you want to get the bindings of things, you can look the thing up in an environment that you have on hand:

>> block: [word: 10]
>> for-each [word val] block [
      set (in block word) val
   ]

>> word
== 10

It's manual, but it works. But what if the block were literal, and you didn't have access to it?

>> for-each [word val] [word: 10] [
      set (??? word) val
   ]

Where this may be pointing is that instead of trying to imagine weirdly designed FOR-EACH variants that incorporate binding, it may be that you should think in terms of PARSE as the tool for when you want to enumerate with binding...

bradrn · January 29, 2024, 11:59pm

Although a combinator <in> (or, from the other thread, *in*) seems like the best option here, it’s worth noting that these kinds of combinators are quite standard in parser combinator libraries, e.g. megaparsec. It lets you write a bunch of really useful things: for instance match, which yields the portion of the input which was consumed during a subparse.

hostilefork · January 30, 2024, 12:11am

Not sure what you're referring to being similar? The binding is a very distinct issue.

Just to get on the same page for terminology...

SUBPARSE in Ren-C (traditionally INTO) spans only one element...the sub-series you are parsing.

 >> parse [1 "aabb"] [integer! subparse text! [some "a" some "b"]]
 == "b"

 >> parse [1 [a a b b]] [integer! subparse block! [some 'a some 'b]]
 == b

If you want to get a span of data out of a rule's match, there is ACROSS in Ren-C (traditionally COPY):

>> parse "aaabcbcaabc" [collect some [some "a" | keep across some ["b" | "c"]]]
== ["bcbc" "bc"]

(It doesn't have a secondary multi-return of the original synthesized product of the rule you copied across, but it could.)

But this doesn't help with the binding issue at hand, because when you copy data out of input arrays that is what I'm calling "structural". So it doesn't take the specifier into account.

bradrn · January 30, 2024, 12:30am

What I quoted:

I’m saying that such combinators, which capture the parse state, are standard in parser combinator libraries.

Ah, in that case I didn’t mean ‘subparse’ in the same way. The match combinator I mentioned in megaparsec sounds like ACROSS here.

hostilefork · January 31, 2024, 2:35am

There was an instance of this in @Brett's %source-analysis.r (yes, it's still running...and presenting ponderable situations).

    for-each list [tabbed whitespace-at-eol] [
        if not empty? get list [
            emit as tag! list [(file) (get list)]
        ]
    ]

Here you can say "well, those variables are in the current context anyway" and write:

    for-each list [tabbed whitespace-at-eol] [
        if not empty? get inside [] list [
            emit as tag! list [(file) (get inside [] list)]
        ]
    ]

Not a great answer. But it's there.

However, another idea is to use @word under reduction, picking up the binding:

    for-each list reduce [@tabbed @whitespace-at-eol] [
        if not empty? get list [
            emit as tag! list [(file) (get list)]
        ]
    ]

And if you are a fan of GET-BLOCK! for REDUCE this becomes:

    for-each list :[@tabbed @whitespace-at-eol] [
        if not empty? get list [
            emit as tag! list [(file) (get list)]
        ]
    ]

It's hard to say how "natural" this will feel. Maybe someone who has always experienced that get 'tabbed is the same as get first [tabbed whitespace-at-eol] and won't work... but get @tabbed will work and requires evaluation... this might be a perfectly obvious thing to reach for.

I'm trying to avoid a generalized harden-bindings [tabbed whitespace-at-eol] as long as I can, because I don't really like the idea of that being a common way to solve problems. At some point, something like that will have to be made, but it's going to raise a lot of questions (should all things be hardened, or just things the evaluator would... so quoted values remain unaffected?)

bradrn · January 31, 2024, 4:45am

It feels pretty natural to me: conceptually, foobar is now a simple name, while @foobar is the variable that name refers to. On the other hand, I’m not sure I would have noticed it on simply perusing this code.

Agreed on that point.

I think it should work like this:

harden-bindings: func [block] [
    return collect [for-each value block [
        keep in block value
    ]]
]

(Apologies if I made any mistakes there, I’m still not great with actually programming in Rebol, but hopefully it should be clear enough.)

But I guess this just pushes back the problem to the behaviour of IN: what happens if you do in block ''word? (Or, if you prefer, in block first ['word].) Personally, I think it should make a bound, quoted word. Perhaps one could defend a situation where quoted words can never be bound, but I don’t love that.