A (Lame) Hole-Punch Motivating Dialect

hostilefork · January 13, 2024, 7:39am

Referenced in the explanation of "What Dialects Need From Binding"

This code uses hypothetical mechanisms for a new approach to pure virtual binding that are not yet implemented at time of writing.

Let's imagine you have the idea of a dialect which validates the number of some characters in a string. You give it a list of strings and you match using "keywords" ONE or TWO for the character at hand. You can also run code in groups if you want:

>> dialect ["aaaa" "bbbb"] [
       [#a [one two one (print "rule one match")]]
       [#b [two one one (print "passed 2 1 1") one (print "rule two match")]]
   ]
  rule one match
  passed 2 1 1

Now imagine a callsite that wants to use this dialect. Let's say it has its own definition for what TWO means just incidentally defined. But it knows the dialect's meaning should override that. However let's say it also has some MESSAGE, that it doesn't want the dialect to override.

Let's also throw in COLLECT for good measure:

 let two: lambda [body] [repeat 2 body]  ; some incidental definition

 let message: "passed 2 1 1"  ; intended to be seen by the dialect

 let results: collect [
      dialect ["aaaa" "bbbb"] [
           [#a [one two one (keep <finished a>)]]
           [#b [two one one (print message) one (keep <finished b>)]]
       ]
 ]

The caller and DIALECT have a common understanding: that ONE and TWO are things that the dialect provides. But it's the dialect's responsibility to sort that out. Even though the tip of the block it receives has a definition for TWO, it shouldn't be influenced by that...because all the ONE and TWO are unbound inside that block.

Now say the plan of attack that the author has is to build upon the PARSE dialect to implement what they're doing. It seems plausible they should be able to do the following:

 dialect: func [strings [block!] lines [block!]] [
     for-each line lines [
        line: in lines line
        let char: line.1
        let rule: in line line.2
        let string: strings.1
        do compose/deep [
            let one: (char)
            let two: [repeat 2 (char)]
            parse (string) [comment "your code here" (unuse [one two] rule)]
        ]
        strings: next strings
     ]
 ]

In more detail:

Propagating with IN LINES and IN LINE means that RULE gets the binding of the original LINES block, which is the aggregated chain of bindings (from LIB for things like PRINT, for the LETs, for KEEP). But it also has that definition of TWO.
We want RULE to come out of this as a BLOCK!, not boxed into a function, as PARSE intends to enumerate it.. as well as to be able to DO GROUP!s inside it.
We don't want to have to make a copy of that aggregated binding (e.g. an entire copy of LIB to remove any ONE and TWO, and a copy of all the other contexts to remove ONE and TWO). Beyond inefficiency, we don't want to explode the number of binding environment identities. Instead, we need an additive means to say "I want everything but ONE and TWO from this binding environment". Sort of a "persistent-vector" approach.
This is what I call a hole-punching instruction... that becomes the new specifier for the embedded rule, pointing at the specifier of rule as a parent. Later on during the DO of the composed code... when PARSE descends into the block and uses IN, that's where the hole-punched binding is "coalesced" with available definitions of ONE and TWO from PARSE's "current environment" (which it propagated off second parameter it received).

There are many other ways to accomplish this intent--and this particular example does suck (it inadvertently exposes all the features of PARSE even if it didn't want to...among other criticisms). But I maintain this implementation strategy is analogous to real situations that come up, vs. some imagined thing.

hostilefork · January 13, 2024, 7:34am

I've shown that you can't always (or don't always want to) position your hole punches at the moment where the environment is completely built.

Still, in this example, what's the tradeoff of doing it at the very top vs. inside a compose?

 dialect: func [strings [block!] lines [block!]] [
     lines: unuse [one two] lines  ; <-- what's the impact of moving here?
     for-each line lines [
        line: in lines line
        let char: line.1
        let rule: in line line.2
        let string: strings.1
        do compose/deep [
            let one: (char)
            let two: [repeat 2 (char)]
            parse (string) [comment "your code here" (rule)]  ; <-- vs. here
        ]
        strings: next strings
     ]
 ]

The answer is clearly that wherever you do it, that's the moment the incoming definitions become unavailable.

We don't have any meaningful code in the outer layers of the dialect blocks which might try and use the looping TWO definition from the callsite. But what we'd want to happen in that code would guide the choice of whether to do the hole punching sooner or later. Should it be seen? Should it not? Only the dialect knows.

One impact is that if bound blocks get composed deeper as rules into the dialect somehow, they won't see the hole punches if you only punch at the top. Generally speaking, composing in fully bound blocks is not going to work well for some dialect implementation strategies. It's a reason why I think "industrial strength" dialects should probably be built more on keywords recognized literally for their "language"... and binding should be reserved for things you might think of more as variables.

(...although, in some cases, it may be natural for the person who does the composing of the bound blocks to do the hole punching themselves vs. expect it all to be done by the dialect. Guess how painful that is depends on how many holes there are, and if they're available in some sort of list alongside the dialect.)

Anyway, this just sort of hammers in how the system has no chance at making such decisions automatically, so dialect complicity in binding at each level is the only sensible answer.