What Dialects Need From Binding

Brett · January 11, 2024, 9:59pm

Rebol dialects can have different grammars or evaluation rules than the standard Rebol language. The idea is that you put standard rebol value types in any order that makes sense to you within your block and evaluate the block with your custom interpreter.

I made a sql like dialect interpreter some years back: https://github.com/codebybrett/rebol2/blob/master/scripts/rowsets.r

Some dialect examples are in the comments of that script - have a brief look at the blocks following rowset/query.

To interpret the code, the rowset/query function, deeply parses the block and selectively changes the binding of some of the words. But I don't want to change everything - for example, I want WHERE conditions to evaluate as you'd expect. Some words are keywords bound to internally defined functions, "score" will resolve to the column of the currently processing row, but the ">" in the where condition for example is unchanged.

Now consider that you might want to compose in your own code into that block somewhere - I hope my code leave your code's bindings in peace.

Dialects Are Everywhere

I believe any function that evaluates a block argument can be thought of as an interpreter of some dialect grammar.

Say I ask you to write me a function that takes a block. The block will be used as a template for a resulting block, because I want to use the template to generate a script to be run later. The holes to fill in are represented by "... (CODE) ...". My client code, inside a loop, will define CODE, your function should return a block where CODE has been reduced and the result inserted within the boilerplate. For example:

files: [%a new-name]
replace-holes [fn (files)]

Should yield:

[fn %a new-name]

It's important to me that the binding of FILES does not change otherwise I will not get my expected result. Likewise the bindings of FN and NEW-NAME should not change, because I may have set FN to be the same as RENAME and NEW-NAME could be a function that spits out a new filename every time it's called. I might call DO on the block you give me immediately. Or I may decide to process multiple files, collecting all the results into a script that I can review before executing it - because bulk file operations are scary.

Ask yourself "How could I write replace-holes? How many ways could I write it?"

Hope that helps.

hostilefork · January 12, 2024, 1:29am

So the point we're trying to get at is that every dialect becomes an interpreter in its own right. When you pass it a block, it wants to do more with that block than just delegate to the evaluator's rules by DO-ing it. It wants to enumerate over it with something like a FOR-EACH... making decisions based on the logic of whatever its mini-language rules wants.

In the past, if one of these dialect languages used FOR-EACH and came across a WORD!, it would be able to GET that word as a variable...thanks to binding passes that walked the blocks deeply before the dialect function received it.

The weaknesses of that binding strategy are articulated in detail elsewhere (weak enough that I consider it broken and unfit for purpose, hence the search for new ideas). But it can seem powerful at first glance.

Imagine a silly Rebol2 dialect that just takes SET-WORD!s and INTEGER!s in pairs, and assigns double the value:

rebol2>> double-assigner: func [block] [
             foreach [sw i] block [
                 assert [(set-word? sw) (integer? i)]
                 set sw 2 * i
             ]
             return none
        ]

rebol2>> x: none y: none
== none

rebol2>> double-assigner [x: 10 y: 20]
== none

rebol2>> x
== 20

rebol2>> y
== 40

I'll throw in an example which might be interesting to you: the DESTRUCTURE dialect.

The idea that you can do such things has hinged on a very different idea from "the currency of code is largely unbound, with bindings at the tips affecting evaluative products of non-quoted material", which is what is currently under consideration.

If that takes effect, then the dialect must mirror the evaluator's logic of propagating bindings stored in blocks as it goes where applicable... making decisions at each point. Because by default, any PICKs or FOR-EACHs will just be giving back the literal elements with no binding. (Unless they happen to be one of the relatively rare items which have been explicitly bound.)

Brett's example of "write your own COMPOSE", e.g. REPLACE-HOLES, has a similar problem of finding a GROUP! via enumeration as opposed to evaluation... then wanting to DO it, when the enumeration led to getting it with no binding.

hostilefork · January 12, 2024, 4:06am

IN is a nice short word, which we might take for exposing the evaluator's mechanic of "leave binding alone, or add if not (and merge if bound-but-with-a-holepunch)":

double-assigner: func [block] [
    for-each [sw i] block [
        assert [(set-word? sw) (integer? i)]
        set (in block sw) 2 * i
    ]
]

(Some might say IN makes a better variable name, e.g. as a complement to OUT, and hence shouldn't be used for some important general purpose... though you can always name it out of the way and override it!)

BIND is also a short word. Historically the parameters to BIND have been the thing to bind first, then where to bind it:

double-assigner: func [block] [
    for-each [sw i] block [
        assert [(set-word? sw) (integer? i)]
        set (bind sw block) 2 * i
    ]
]

That parameter order isn't set in stone. It's likely more often the case that the context is a short expression and the thing to bind is a longer one, and shorter parameters first tends to be favorable. Putting the context first has seemingly clear value for instance if you're trying to bind quoted material:

 bind context '[
     your <bunch> of
     arbitrary stuff
     here
 ]

vs.

 bind '[
     your <bunch> of
     arbitrary stuff
     here
 ] context

Just sort of depends how you read it "bind context to..." or "bind data in..."

Note that this cleaner parameterization was one of the historical selling points of IN as "bind with reversed parameters":

 in context '[
     your <bunch> of
     arbitrary stuff
     here
 ]

IN has been the term favored for "virtual" binding (e.g. producing a new value with its specifier bits shuffled) because BIND traditionally has walked into blocks and mutated them. But there's no reason that legacy can't be forgotten and a new understanding that BIND does not mutate can't be established. (REDUCE doesn't mutate either, sounds like it might...)

It could get a little confusing because in this model, both parameters to bind could be a block or group. Easy to mix up.

But forcing people to extract the "environment"/"specifier"/whatever gets wordy.

double-assigner: func [block] [
    let specifier: binding of block
    for-each [sw i] block [
        assert [(set-word? sw) (integer? i)]
        set (bind specifier sw) 2 * i
    ]
]

Starts getting ugly. And the point is that we don't want doing this kind of thing to get too ugly, because it really is the value proposition of the language.

Anyway... you can write APPEND with two blocks and get confused on which is which, too. So probably just have to drill in which parameter is which.

hostilefork · January 12, 2024, 6:18am

So I think I can make a seemingly rapid decision here--that's actually based largely on prior experience:

IN should shift to being an operation that will not override an existing binding (modulo the merge of holepunches mentioned)...so merely adds bindings to things that don't have them

It will be used extremely frequently in the proposed binding model where the majority of values are unbound, so being shorter than BIND will actually add up.
There's less potential for confusion in ordering of arguments. While value in context might make some sense, in value context does not. So if you can at least remember that IN is a prefix operation of arity-2, the order is clearly in context value
- Not making mistakes in ordering is especially important since in block1 block2 will be a perfectly valid call.
The parameter ordering is the most pleasing to look at when passing quoted bodies of code as a second argument, which would be common.

IN Will No Longer Be Used For Testing WORD!-in-OBJECT!

Making IN the fundamental merge-binding-operator would kill off historical applications such as using it for looking up words in objects, which have been around since Rebol2. (It's a secondary behavior to being the reverse-order operator for BIND on blocks.)

rebol2>> obj: make object! [x: 10 y: 20]

rebol2>> x: <global>

rebol2>> word: in obj 'x  ; Note: incoming X bound to global X
== x  ; Note: output is rebound to OBJ

rebol2>> get word
== 10

rebol2>> in obj 'z
== none

In the new formulation, 'x and 'z would wind up losing their quotes and being passed to IN as unbound. Then an X bound into OBJ would be returned, and an unbound Z would pass through. Had a bound word been passed in, it would not be rebound.

For a replacement of the historical usage, I'll suggest the more conditional sounding HAS...which is not used in Ren-C for anything else. HAS would be for looking up words only, return null if not found, otherwise override any existing binding on the word.

So What Will BIND Be For In The New Model?

...don't know.

Perhaps it's single-arity and means BIND-TO-CURRENT (which seems a little...weak). Or maybe it could be saved for "the BIND dialect" which assists in various patterns that aren't easily solvable by just in context value and in context unbind value alone.

Perhaps "unuse" and other patterns that are similar to hole punching would all be done with the BIND dialect, as a mini-language for bind merging algorithms.

bradrn · January 12, 2024, 3:05pm

Trying to catch up with all the posts here… it’s helpful for my understanding, thanks! But I think @hostilefork has covered everything I would have wanted to say.

One of my original thoughts was that environments could be first-class values. Then BIND would simply bind an environment value to a block. But, since it now seems that functions can be used to do most environment manipulation, perhaps this is no longer needed? (Though I still think it’s a good idea.)

bradrn · January 12, 2024, 3:07pm

I don’t get this… why would ‘hole punching’ need any change to how specifiers work? I thought that getting ‘hole-punching’ behaviour is simply a side-effect of how FUNC creates a new environment.

hostilefork · January 12, 2024, 6:05pm

All you can do with a function is call it. So you've accomplished the hole punching, but at the cost of becoming a black box.

(I wrote the post showing COLLECT and KEEP's implementation to point out that we did already have instances where LAMBDA was used to do the binding work. For that case it works...though it may not be the most efficient way to go about it. But other cases might not be able to use the approach.)

If you're assembling a block with the goal of passing it to a dialect, then such black boxes undermine its ability to interpret that block under the meanings it wants.

A block which has been assembled/composed together where some slots were UNUSE'd via a binding instruction (that IN can coalesce) can still be enumerated and treated as a dialect... with contents inspected literally to implement a mini language, while DO-ing and GET-ing some parts of the structue.

Beyond not being a fit for when blocks are the desired currency, functions also bring in performance implications. They've historically made a deep copy of their bodies. If they did not:

>> block: [print "Hello"]

>> foo: func [] block

>> append block [print "World"]

>> bar: func [] block

>> foo
Hello World

>> bar
Hello World

That copy has an associated cost. We could say "that's not a bug, it's a feature" and that it's the responsibility of the user to make their own copies in this situation. Or perhaps the block just becomes deep locked on function creation, to force you to make copies if you want to modify. (Copying has historically served a dual purpose of allowing the copied body to cache the binding relationship of deep-walked words for arguments and locals to the function, but that may be unnecessary/impossible in the pure virtual binding model.)

Even so, implementing UNUSE via a function would add expense, as it needs to build the function and then build a frame to invoke. A binding instruction that IN understands as part of the specifier chain would presumably be a lot more efficient.

bradrn · January 13, 2024, 12:19am

OK, this makes sense.

One relevant thing I realised yesterday: if implemented right, it’s possible to allow functions to access the environment from which they were called. If we combine that with the idea of first-class functions, it becomes possible to implement UNUSE without any fundamental change in how specifiers work. Something like:

unuse: func [vars block] [
    let old-env: get-calling-environment
    let new-env: new-environment-inheriting-from binding of block
    for-each var vars [
        do in new-env compose [(var): (old-env.(var))]
    ]
    return in new-env block
]

I actually have been wondering about this… there were a bunch of things which made no sense to me assuming that function bodies were left uncopied. So this is good to know!

hostilefork · January 13, 2024, 2:02am

Hole punch instructions are one case.

Something that has a hole punched in it is bound. But that binding needs to allow the punched definitions to leak in.

I am calling that "coalescing", but it may not be the only example.

bradrn · January 13, 2024, 2:07am

Fair enough… but, like I said, it should be possible to implement this ‘hole-punching’ using normal environmental inheritance, without any kind of special-casing. So I’m not sure there’s any need for any particular notion of ‘coalescing’ beyond the usual inheritance.

hostilefork · January 13, 2024, 5:48am

Once again... the situation is... you're building up a block to pass to a dialect. The character of dialects is such that the ability to execute imperative commands aren't available (in any sort of general case), so you have to run the UNUSE at composition time, not while the dialect is being interpreted when the environments in play are fully formed.

I worked up an example to try and articulate the problem... and in doing so found an alternative line of argument which may be clarifying: At the time you need to perform the UNUSE, you may not have the new variables available yet.

Code fragment:

do compose/deep [
    let one: (char)
    let two: [repeat 2 (char)]
    parse (string) [comment "your code here" (unuse [one two] rule)]
]

Observe that the moment of UNUSE in the compose does not yet have the new definitions for ONE and TWO available. Those are coming along later when they are generated during the DO. The LETs have to run before the coalesce can happen, so that is what creates the issue.

Observe also that rewriting the code to move the UNUSE later is not always possible, because you may not be in an imperative context where you can call UNUSE. That's why I threw in the COMMENT (imagine there's debugging stuff there, or something, I was too lazy to rethink the problem statement to need some pre-matching in a way that would be more coherent than ["prefixaaaa" "prefixbbbb"].)

For contrast, if the code had been like this:

 do compose/deep [
      let one: (char)
      let two: [repeat 2 (char)]
      parse (string) unuse [one two] (rule)
 ]

You're not trapped inside a dialect and can run the UNUSE at the evaluator step where the block is being fulfilled as an argument. Then what you're suggesting about slipping ONE and TWO in via normal binding would work. But note that even if things lined up with an imperative context so this was possible, you still might have reasons of code organization or phasing to want to do the UNUSE earlier.

I think that laser-focuses the point. But since I only got to that articulation after writing up an example... I went ahead and posted the example:

A (Lame) Hole-Punch Motivating Dialect

bradrn · January 13, 2024, 1:10pm

After re-reading these posts a few times… I think I understand what you’re getting at now.

To double-check my understanding, let me try to summarise what happens in the example:

RULE is a block containing instances of ONE and TWO, intended to be substituted for their values (which are parsing rules)
ONE and TWO are defined within the block, with their matched character being COMPOSEd into their definitions
Similarly, RULE also needs to be COMPOSEd into the block: specifically, it ends up nested within a larger PARSE rule
But at the same time, it needs any instances of ONE or TWO to refer to the just-defined parse rules, rather than any bindings it might already have
However, the ‘easy’ options don’t work in this case:
- UNUSE can’t run after ONE and TWO are defined, because by that point the code is within a dialect which has no clue how to UNUSE a block
- But my earlier implementation of UNUSE can’t run before ONE and TWO are defined, because it relies on being able to gather up their definitions
Therefore, this requires an UNUSE which can do the ‘hole-punching’ without prior knowledge of what the names will be bound to.

I’ll note that my earlier implementation of UNUSE actually can work if the code is rewritten to use two levels of COMPOSE:

do compose [
    let one: (char)
    let two: [repeat 2 (char)]
    rule: (rule)
    parse (string) compose [comment "your code here" (unuse [one two] rule)]
]

To my mind this is slightly easier to comprehend, but I can see how it would get annoying quickly.

(Incidentally, this also puts the final nail in the coffin for the way I originally suggested implementing UNUSE using FUNC — that one relied function application to collect the new bindings, and that doesn’t work in dialects either!)

hostilefork · January 13, 2024, 1:55pm

Yep!

bradrn:

I’ll note that my earlier implementation of UNUSE actually can work if the code is rewritten to use two levels of COMPOSE:
do compose [
    let one: (char)
    let two: [repeat 2 (char)]
    rule: (rule)
    parse (string) compose [comment "your code here" (unuse [one two] rule)]
]

Good observation. Though if we're going to say "just rewrite it" this particular example could be simplified as not having an outer compose at all:

let one: char
let two: compose [repeat 2 (char)]
parse string compose [comment "your code here" (unuse [one two] rule)]

The cost-benefit may work out to say "you can't hole punch things until the variables exist...so if your code requires that then rewrite it". Saying no is often the right answer.

My intuition has been that this would be a pain point for people who think they should be able to write dialect composing in an "obvious" way. I hit it relatively easily...but as I say it's a sucky example, where I was trying to hit it (or at least, not trying to avoid it... I figured just compose some things and something won't work eventually). Though if you think about subroutines building composed code--and passing that to other routines--and having to coordinate pushing all the composition to the edges... I suspect it's going to come up. The convolutions people would wind up doing may just be a buggy isomorphism of solving the problem once in the implementation of IN and being done with it.

I might be wrong. So a prudent thing to do may be to just write the simpler version first--see if it generates any complaints in actual practice vs. trying to force it in examples--and if not, consider a bullet of complexity dodged.

Either way, I'm glad to have the concern explained better as an ordering problem. Thanks for pushing through to understanding it, and it may mean a prototype can be done faster by trying the simple version first!

hostilefork · January 13, 2024, 2:38pm

One of the nice things about being able to actually pinpoint what the mechanical issue is means you can beeline for real problems...

So a pathological case which would not be amenable to pushing the UNUSE operation outwards would be if the dialect itself created variables, and then encountered the unused block without having any time in a plain evaluative context between.

This could happen e.g. if PARSE got a LET instruction (it needs one, and I believe the new mechanism using IN paves the way for it to be able to work):

compose/deep [
    parse [#a [repeat 2 #a] "aaaa"] [
        let one: <any>  ; gets rule #a
        let two: <any>  ; gets rule [repeat 2 #a]
        subparse text! (unuse [one two] rule)
    ]
]

Here you're stuck as far as the ordering goes... because there's no moment where you have access to plain evaluation between the LETs and the unused block. There's no way to tease that out and keep the structure. PARSE itself would have to have an UNUSE combinator.

(Having to use the phrase "the unused block" suggests that UNUSE is not a good final name for this tool. Maybe PUNCH? "the punched block".)

bradrn · January 13, 2024, 11:54pm

hostilefork:

Though if we're going to say "just rewrite it" this particular example could be simplified as not having an outer compose at all:
let one: char
let two: compose [repeat 2 (char)]
parse string compose [comment "your code here" (unuse [one two] rule)]

I did contemplate this… but presumably that outer DO is there for a reason.

I don’t think it’s just ordering, as such. Rather it’s the conflict between evaluation and ordering which causes the problem: unuse [one two] rule needs to be evaluated partly before the block runs and partly while it runs.

Thus, this problem would also be solved by, say, adding a SUBSTITUTE rule to the PARSE dialect, which would evaluate a following BLOCK! and substitute its value in as a parse rule:

do compose/deep [
    let one: (char)
    let two: [repeat 2 (char)]
    parse (string) [
        comment "your code here"
        substitute [unuse [one two] (rule)]
    ]
]

But this could not be a general solution, since it would need to be added to every dialect.

For that matter… now that I think of it, you can almost replicate this by using an extra DO:

do compose/deep [
    let one: (char)
    let two: [repeat 2 (char)]
    parse (string) [comment "your code here" (do unuse [one two] (rule))]
]

But then you’d need some way to tell the COMPOSE to substitute the nested GROUP!, but leave the outer GROUP! alone. Is there some way of doing that? In general, it seems like the kind of capability which could be very useful to have.

hostilefork:

This could happen e.g. if PARSE got a LET instruction (it needs one, and I believe the new mechanism using IN paves the way for it to be able to work):
compose/deep [
    parse [#a [repeat 2 #a] "aaaa"] [
        let one: <any>  ; gets rule #a
        let two: <any>  ; gets rule [repeat 2 #a]
        subparse text! (unuse [one two] rule)
    ]
]

OK, this is an even better example, thanks!

bradrn · January 14, 2024, 12:40am

A further thought on this: at one stage @hostilefork suggested ((Doubled Groups)) as a Dialecting Tool. Might we be able to resurrect this as a tool for including literal GROUP!s in a COMPOSE? The syntax even makes intuitive sense — conceptually, COMPOSE gets rid of the outer layer of brackets, leaving the inner layer.

So, if that idea works out, we could get rid of any special ‘UNUSE specifiers’, since it would let us just do this:

do compose/deep [
    let one: (char)
    let two: [repeat 2 (char)]
    parse (string) [comment "your code here" ((do unuse [one two] (rule)))]
]

hostilefork · January 14, 2024, 3:51am

Ren-C parse uses GET-GROUP! to evaluate and substitute as a rule:

 >> found-plus: false

 >> parse "aa+bb" [
        some ["+" (found-plus: true) | :(either found-plus [#b] [#a])]
    ]
 == #b

On the downside, you pay for that evaluation every time the rule is visited--which I consider an acceptable cost if it helps you accomplish your intent clearly.

(If not already clear, I don't fret -all- that much about performance... I think of this as an "artistic medium", the so-called "Minecraft of Programming"... so I'm more concerned about what you can express and if that expressivity has reasonable composability. Though at the same time it is trying to be a usable system, so I can't actually -completely- not care...)

Putting that aside, yes: it does seem that it can be the means for slipstreaming an evaluator-based hole punching operation, by means of putting the evaluator in the dialect to generate values processed by the dialect.

BUT note that it's not so simple, because you have to COMPOSE a GROUP! that made a GET-GROUP! with the right bindings to find RULE at a later wave of evaluation... would have to actually work through the details of that.

This medium is so weird and (relatively) unexplored that I can't say it's wrong to have a general rule: "most dialects should reserve GET-GROUP! to mean substitute an evaluative expression and treat it like it was written there". It might be the superior idea to trying to systemize hole punching. We're already looking at this idea of making dialects complicit in binding via IN vs. doing something automatic, so maybe this is the next level of that.

bradrn · January 14, 2024, 4:45am

OK, good to know! I honestly wasn’t expecting this to exist already.

(Incidentally, I just noticed my last post is wrong — (do unuse [one two] rule) wouldn’t evaluate rule as a parse rule, it would merely return it. But this means it was very nearly correct.)

This kind of thing is why I suggested reusing the double-parenthesis syntax to mean ‘keep this as a GROUP! in the output’. It would be a minor addition to make it support GET-WORD!s too (and SET-WORDS!s, etc.):

do compose/deep [
    let one: (char)
    let two: [repeat 2 (char)]
    parse (string) [comment "your code here" :((do unuse [one two] (rule)))]
]

So RULE gets substituted in by the COMPOSE/DEEP to yield a block [comment "your code here" :(do unuse [one two] [rule goes here])], and it all just works.

Could there not be some kind of preprocessing step?

…hmm, no, I guess not, given that the GET-WORD! can refer to values assigned within the parsing process itself (as in your FOUND-PLUS example earlier). So this performance impact, however minor, seems inherent and unavoidable in this design — and that does make me a little nervous. I’m starting to feel some kind of systematised UNUSE specifier may indeed be the preferable choice, but I’m not sure yet.

hostilefork · January 15, 2024, 12:43am

Enhancements to "the COMPOSE dialect" that make it easier to make GROUP!s in output have always been something I've thought about. My current best try is labeling the groups you want composed, which affords leaving groups you don't want evaluated as-is. But that doesn't help you if you wanted to evaluate something as a group, and coerce the product (likely a block) to a group, preserving any decorations on the slot.

Interesting idea to let ((...)) mean "groupify" instead of my thought of meaning "plainify". Probably more semiotic. In any case, I think you're starting to catch on to what dialecting is about!

Before you called it into question, I just sort of assumed you'd need it. But that's based on my instinct that it's not the only instruction in the box... that what people will ultimately want from "specifiers" is a more nuanced concept than a simple chain of environmental inheritance can provide.

I think it's probably about time to take a crack at prototyping the new model. I'm sure it will unleash a flurry of posts as I hit issues.

(As I say, the "start the world unbound and let the binding trickle down" was tried before...minus the quoted material not getting trickled tweak. But also, I was still trying to make PICK-ing and FOR-EACH-ing propagate that trickled specifier in a way that gave the illusion of the old binding model, so dialects could just GET the products, instead of making them say where to BIND and do their own trickle. Implicit propagation was a dead-end: the eagerness of putting binding on meant you were pretty much always binding something that already had a binding on it. There had to be a merge binding strategy... more complex than the hole-punching... to accomplish even the most basic things. This merge binding led to cycles I've previously mentioned, and other intrinsic problems. What we're talking about now starts limiting the problems to COMPOSE scenarios...the problems will still be there, but hopefully more manageable.)

bradrn · January 15, 2024, 3:15am

OK, this is a good idea! And means my proposal is possible already, which is nice.

I don’t understand this… can’t you just do compose [… (as group! […]) …]?

Hmm, quite possibly.

Of course, that’s always the way!

As it happens, I’m also about to start working on a language which uses this binding model. (Indeed, that’s how I got interested in binding in the first place.) Its syntax is more like Lisp than Rebol, but I’m sure I’ll still discover new edge cases as I write it.