Block Creation Vs. Evaluation

bradrn · January 8, 2024, 5:27am

Beyond the discussion in Custom Function Generator Pitfalls, I just thought of another interesting case for my conception of binding…

somevar: 20

body: [
    somevar: 10
    return do [somevar]
]

test: func [] body

With my model, upon running test, the function would create a new environment based on that bound to body. Then it would assign localvar within that environment (overriding the global assignment). Finally, do would evaluate the block it’s given; when that block is created, it is bound to the same environment, so it looks up somevar in that environment, to return 10.

The flaw in this conception, I’ve just realised, is that notion of ‘creation’. What does it mean for a block to be ‘created’ in Rebol? Because one could just as easily evaluate do fifth body… and then it’s unclear which context it should be evaluated in. Normally it would be evaluated in the environment of the function, but that environment hasn’t been created yet. Should fifth body even have a bound environment or not? This is a huge gap in my model.

But not an unsolvable one, I think. Instead of ‘creation-time’, substitute ‘evaluation-time’: when an unbound block is evaluated, the result is a block which is bound to the currently active environment. (When a bound block is evaluated, of course, I see no reason for its binding to change.)

Thus, in the above program, body is a block which got bound to the global environment when it was evaluated. By contrast, fifth body never got evaluated, so it has no binding. When test is called, it creates a new environment, which is set as the current environment; later on, fifth body is evaluated, at which point it receives a binding to that new environment, so that do [somevar] picks up the right value of SOMEVAR.

This also implies a very convenient way to create unbound blocks: just quote them! A quoted block evaluates to a block, but that resulting block isn’t itself evaluated, so in this model it never gets a binding. This means, for instance, that I can write my with-function from the other thread even more simply:

with-return: func [name spec body] [
    let passthru: lambda [return] reduce ['lambda spec body]
    return spread compose '[(setify name) (bind-to-current 'passthru) :return]
]

(Of course, knowing what I do now about isotopic FRAME!s, it would probably be saner to abandon the explicit binding and do this instead:

with-return: func [name spec body] [
    let passthru: lambda [return] reduce ['lambda spec body]
    return spread '[(name): (:passthru) :return]
]

So now nothing is explicitly bound, and it would be fine because the body of the inserted FRAME! still carries around its binding to the environment in which it was created. Not that it needs to, because it doesn’t refer to any local variables of with-return. Either way, the original version remains a good exercise in convoluted binding patterns.)

bradrn · January 8, 2024, 11:04am

Thinking about this further… now that I’ve made that mental leap from ‘creation’ to ‘evaluation’, my proposal can be quite concisely summarised in a few rules for what happens when things are evaluated:

When an unbound WORD! is evaluated, it gets looked up in the currently active environment.
When a bound WORD! is evaluated, it gets looked up in its bound environment.
When an unbound BLOCK! is evaluated, it yields a BLOCK! bound to the currently active environment.
When a bound BLOCK! is evaluated, it yields itself.
When an isotopic FRAME! is evaluated, it evaluates its arguments, then runs its body in the context of the environment bound to its body BLOCK!, or in the null environment if it’s unbound.

Plus, the behaviour of some important functions:

DO, REDUCE, COMPOSE etc. evaluate the provided BLOCK! in the context of its own bound environment (like FRAME! evaluation above).
FUNC can implement definitional scoping similarly to the current situation: by figuring out which variables should be local, collecting them in a new environment, and binding that environment to the block which is its body.

The more I think about this system, the more I like it. It strikes me as being elegant and easy to reason about. Even if for some reason it’s unsuitable for Ren-C, it feels like what I’ve been grasping for in my own thoughts about binding.

hostilefork · January 9, 2024, 10:32am

There was a "bug" (or quirk) that the "wave of constness" was not preserved by quoting:

It was giving this behavior:

foo: func [x <local> block] [
   block: []
   return append block x
]

>> foo 10
** Access Error: CONST or iterative value (see MUTABLE): []

bar: func [x <local> block] [
    block: '[]
    return append block x
]

>> bar 10
== [10]

>> bar 20
== [10 20]

When you used a quote, you were effectively getting the behavior as if you had fetched the block from a variable. This shielded it from the propagation of <const> that was coming along the body.

This wound up being useful in cases like the API. I've mentioned before that generalized quoting is your recourse when you are splicing C variable values, and don't get the "fetched-value-is-unevaluated" protection that being able to use a WORD! variable gets you. But it also gets you the ability to bypass the const protection as a variable would allow.

Consider the ordinary code:

 >> block: [a b c]

 >> repeat 2 [append block 'd]  ; first form
 == [a b c d d]

 >> repeat 2 [append [a b c] 'd]  ; second form
 ** Access Error: CONST or iterative value (see MUTABLE): [a b c]

Then consider the API code:

 REBVAL* block = rebValue("[a b c]");

 rebElide("repeat 2 [append", block, "'d]");  // errors

While the intent is the first form, the interpreter experiences it as the second form.

So I kept the behavior of quoting being able to bypass this, instead of propagating the const bit:

 REBVAL* block = rebValue("[a b c]");

 rebElide("repeat 2 [append", rebQ(block), "'d]");  // works

But that means that this also works:

 >> repeat 2 [append '[a b c] 'd]
 [a b c d d]

It does open some doors to bugs, e.g. if a function does return '(1 + 2) then that group will be mutable... and if the caller appends to it then you get self-modifying code to make return '(1 + 2 + 3) or whatever.

But, it's a tradeoff. You could try to teach people to use the (1 + 2) and save quoting for special cases (today quoted arguments still get the constness, maybe they shouldn't and behave with parity to QUOTED!). Another possibility might be to shift to a "const by default" system where you'd have to make your return values or arguments <mutable>, else they'd be const.

This has actually crossed my mind before... inspired by the const issue... whether quoting could be used as a kind of "binding shield".

Basically the idea of extending things so that quoting always acts the same as if you fetched the thing that's quoted from a variable. It would preserve what binding it has on it, if any. In a model where things are unbound by default, this would mean they'd frequently be unbound if they originated in source (vs. from a QUOTE or rebQ() operation).

By and large I do find it appealing if "stray" bindings are less common. There are a lot of cases where you want to pass a word somewhere and it's really just meant as a word and not a variable, but you pick up bindings anyway. Same with blocks that come with all kinds of bindings you don't actually want.

But generalizing this breaks a lot of current understandings. This is expected to work, today:

>> foo: lambda [x] [get 'x]

>> foo 10
== 10

New models with completely new understandings might say you need an operator to put a binding on:

>> foo: lambda [x] [get 'x]

>> foo 10
** Error: x is unbound

>> foo: lambda [x] [get bind-to-current 'x]

>> foo 10
== 10

(If BIND-TO-CURRENT is needed often it should have a snappier name. Perhaps this arity-1 operator would even be common enough to take over the name BIND... who knows.)

Or perhaps--again--using the instead of quoting would act different (and again, maybe a bad idea if quoted args don't have parity).

Anyway, just connecting it to a previous issue where quoting stepped in as a surrogate for reference by variable.

hostilefork · January 9, 2024, 12:05pm

Some of what you're suggesting is similar to a prototype I had around the time I wrote "Rebol and Scopes: Why Not".

But I've never had a system that didn't use "overbinding" to override the meanings of already-bound material when dealing with things like function arguments in the body, or loop variables inside a FOR-EACH body, etc.

You run into trouble with things like this library function:

loop-five: lambda [:var body] [
    for-each :var [1 2 3 4 5] body
]

Let's say the usage looks like this:

x: 10
y: 20

repeat 2 [
    loop-five x [
       if x = 3 [continue]
       print [x + y]
    ]
]

Here we have a case where BODY considers itself bound... and it knows what IF and PRINT and = and Y mean. But as far as it's concerned, it also knows what X and CONTINUE mean.

That's a problem if the intent here is that LOOP-FIVE is supposed to be supplying new meanings for X and CONTINUE.

We might imagine passing the body as unbound:

    loop-five x '[  ; let's say this gave the body as unbound
       if x = 3 [continue]
       print [x + y]
    ]

This gives up the visibility of Y... and it also means possibly being subject to new meanings for IF or = or PRINT as they are understood in the library's context.

(It's also likely not the syntax the person using the construct wanted to use.)

Perhaps you need some kind of explicit intervention whenever dealing with already bound code, a "hole-punching" abstraction (we might call it "UNUSE"):

loop-five: lambda [:var body] [
    for-each :var [1 2 3 4 5] unuse reduce [var 'continue] body
]

But the UNUSE influence would have to follow through such that the blocks inside, e.g. the [continue], would know that their binding isn't necessarily all-the-way-fulfilled anymore.

So there has to be a way to "punch holes" in bound material. Up until now, the hole punching has been implicit via this "overbinding" I refer to.

It may be that if unbound material was a more common currency, that explicit hole-punching is better when bound material shows up. But off the top of my head, you could face problems where if you punch a hole in a block's binding it may wind up attaching to things you didn't intend:

maker: func [body] [
   body: unuse 'return body
   return lambda [] [let somethin: <implementation junk>, (as group! body)]]
]

something: 100

foo: maker [if true [return somethin]]

Let's say somethin is a typo or an accident in the parameter to maker, and should be expected as an error... but it's accidentally picked up by virtue of piercing the bound state of the [return somethin] block for the sake of getting at the now-unbound RETURN. It sees the unbound somethin as well. :-/

Perhaps there's a new form of "consider this block bound, except for these words" so that arbitrary unbounds aren't seen, but that doesn't sound fun to implement.

bradrn · January 9, 2024, 1:02pm

hostilefork:

x: 10
y: 20

repeat 2 [
    loop-five x [
       if x = 3 [continue]
       print [x + y]
    ]
]
Here we have a case where BODY considers itself bound... and it knows what IF and PRINT and = and Y mean. But as far as it's concerned, it also knows what X and CONTINUE mean.

That's a problem if the intent here is that LOOP-FIVE is supposed to be supplying new meanings for X and CONTINUE.

This actually isn’t a problem for my proposal, though earlier I probably should have been more explicit about how it’s solved.

The key is that environments can inherit from other environments. So for-each can take the existing environment bound to the block, create a new environment which inherits from it, add X and CONTINUE to than environment, and then re-bind it to the block. Conceptually, it ends up with its new environment ‘slipped under’ the old one (as I’ve referred to it once or twice already).

Thus, when the loop block is executed, it knows what X and CONTINUE are… but it also knows the environment in which it was created. It doesn’t need to know anything about the environments in which LOOP-FIVE or FOR-EACH were created.

Note that this is similar to what functions like USE and FUNC are doing already. They take a block with existing bindings, and traverse it to re-bind whichever words they want. ‘Slipping under’ an environment has the same effect, but without the need for any traversal.

(As for your previous post before this one, it was interesting, but apart from that I don’t have much which I can say about it.)

hostilefork · January 9, 2024, 1:18pm

Okay, if that's the case... what you're talking about is exactly what happens today (what I've called "virtual binding").

FOR-EACH takes the specifier that is on the block it gets, creates new entries for the variables (and CONTINUE and BREAK), and then points to the old specifier.

The new entries are injected as the specifier for the block, and this threads down as each level in the block is evaluated. So no, there's no binding pre-walk in Ren-C for these items...but it still effectively has the same semantics of "overbinding".

This means the problems I point out in "Custom Generator Pitfalls" don't go away.

hostilefork · January 9, 2024, 1:28pm

Maybe there is a difference, in that you see FOR-EACH as being unique in manipulating the environment, and there's no other binding override?

repeat 2 [
    code: [print [x + y]]
    loop-five x compose [
       if x = 3 [continue]
       (as group! code)
    ]
]

So in this case, you'd say the X would not see the new binding, as (print [x + y]) would already have a binding.

Composition scenarios would still get tricky, it's just now I may have to figure out how to punch a hole for that X if I meant for it to pick up the new meaning...while still preserving the PRINT and Y I meant.

bradrn · January 9, 2024, 1:33pm

hostilefork:

Maybe there is a difference, in that you see FOR-EACH as being unique in manipulating the environment, and there's no other binding override?
repeat 2 [
    code: [print [x + y]]
    loop-five x compose [
       if x = 3 [continue]
       (as group! code)
    ]
]
So in this case, you'd say the X would not see the new binding, as (print [x + y]) would already have a binding.

Well, not unique as such: a few other functions would need to do it as well (like FUNC and USE).

But indeed, here code would not see the new binding of X. Instead, when [print [x + y]] is evaluated, it would gain a binding to the current environment with its current meaning of X. Presumably we’d want AS GROUP! to retain that binding, so that when the evaluator reaches the resulting GROUP!, it will switch to that environment with its binding to X.

(Of course, this means it also wouldn’t know about CONTINUE. But that seems very reasonable to me, given the way this code is written.)

I’m getting rather confused by some of the replies you’ve been giving me. So, just so we’re on the same page: could you concisely summarise (perhaps in bullet points) which problems these are?

hostilefork · January 9, 2024, 2:21pm

In my case, I believe I now understand what you're trying to say.

And you're suggesting something similar to what's done today (but more similar to a prototype that didn't bind all material during a scan to a module as a first step...rather allowed the environment to spread along as it went). Constructs like FUNC and FOR-EACH already link a little piece into the "environment" for the blocks they evaluate, adding their contribution.

The big difference is that in the current implementation, the evaluator attempts to coalesce bindings itself... instead of just leaving things that are already bound alone. That's essential today because of the bindings that things get from the module pre-walk...they have to be overridden.

But in a world where material was more often unbound, I can see the appeal of not trying to have any automatic behavior beyond adding the environment to completely unbound things.

However...

...I believe that scenarios roughly like this would arise where the default behavior is not desired, that require "hole punching" to get the X to be rebound. And I don't think the scenarios are uncommon, and I believe it would come up a lot when cobbling together material. Some things would get harder than in the current model, and I'm not immediately certain what new things would start working that don't work today.

But this is probably the only point of contention at the moment. And I do like the general concept that there's no "automatic" specifier coalescing being done by the evaluator, so willing to hammer through what could be done without it.

bradrn · January 9, 2024, 3:10pm

Hmm… I think I’m starting to see your point. I’m not convinced it’s as bad as that, since it works fine for the most common cases: namely, for any non-nested blocks provided directly to the function, and for any literal (i.e. unbound) nested blocks. But interpolated nested blocks don’t get rebound, since they have an existing binding which e.g. FOR-EACH won’t modify — unlike the current system, where FOR-EACH can rebind any instances of CONTINUE and X, however deeply nested they are.

Or, another way of putting it: ‘slipping under’ environments depends on nested blocks having no bindings, so that the new bindings can penetrate into their bodies. If any nested blocks are already bound, then my proposed system can’t easily rebind words within them. (Without requiring a deep traversal, that is.)

I could argue that this isn’t such a bad design choice. If you’re passing a block to a function, do you really want to keep track of the fact that it’ll be passed into a block which will eventually get passed into a FOR-EACH? That feels like a lot of ‘spooky action at a distance’ to me. But there are compelling arguments for the opposite (current) behaviour too.

(It’s very late in my timezone and I need to go now, but I’ll post again tomorrow if I get any other ideas.)

hostilefork · January 9, 2024, 10:09pm

It's interesting that here, when the code being composed is under mostly the same understandings (e.g. of what PRINT and Y means), then unbound code could be used.

repeat 2 [
    code: '(print [x + y])  ; assuming quote evauations don't bind
    loop-five x compose [
       if x = 3 [continue]
       (code)
    ]
]

Where you wind up not being able to do this is when the place doing the composing is in some other library, e.g. where PRINT could mean something totally different. But worth pointing out that some number of places would be able to use unbound code. How many? Don't know.

I agree there are good arguments for why it may be a better way to bias the defaults. But holepunching is definitely going to come up. Perhaps I should have written:

loop-five: lambda [:var body] [
    for-each :var [1 2 3 4 5] compose [
       print "Iterating loop"
       (as group! body)
    ]
]

I've talked a bit about Rebol's "blocks as currency" model, compared with "functions as currency", in terms of what that value proposition is:

Binding Re-Examined from First Principles

And I've wondered about a "parameterized block" abstraction which could leave slots in blocks ready to pick up enclosing bindings--while not making the structure opaque as a function would... which I guess would be like the UNUSE above:

code: unuse [x] [print [x + y]]

One implementation could be to deep copy the material... harden the bindings of all the words, while unbinding the arrays that contain them. The danger I mentioned is that stray unbound things could be bound accidentally when they should have been errors. Perhaps a "dead binding" would need to be given to any unbound things which the unuse didn't mention, to make sure those things were only examined literally... but, then you'd not be able to apply more than one UNUSE to code. So maybe there's a specific "unused binding" that means "intentionally unbound to pick up enclosing binding" that is distinct from the default unbound, and then dead bindings distinct from that.

That could wind up being a lot of copying. Avoiding that and getting something more like today, then UNUSE would generate "holepunch" instructions that splice into the specifier. The system is required to do binding coalescing implicitly again, but biased to only do it on these called out structures for the words they specify. This would be a fairly complex mechanic, plus abstractions like FOR-EACH or FUNC would be encountering a more baffling data structure for the environment...if that structure had to account for persistent holepunch instructions. It may not be as bad as it seems, and perhaps not as bad as the cycles arising from the automatic 'overbinding' coalescing (though I think coalescing holes is likely overall harder).

Whatever happens, people who expect to do surgery on environments they find in blocks will face some hassles. If blocks are losing their environments in order to become candidates for binding to descend into them again, that loss may affect expectations that the environment would be available.

Anyway... aiming in this direction does have more formalism, and may avoid the seemingly random effects that spliced code can pick up. The method I've generally used with new ideas is just to prototype it and see what (of the tons of wacky corpus) breaks. Here that might be done with the deep copy method first for UNUSE, to generate points of discussion.

bradrn · January 10, 2024, 12:27am

One thing I’ll note, on reflection: you can fix this behaviour simply by getting FOR-EACH to do a traversal of the block it’s supplied, and add its new environment to any nested blocks as well. That’s not a very desirable approach, but it does mean that it’s still theoretically possible to get that behaviour back.

I think this is one place where we can be quite precise: it works best when the two relevant environments share a common ancestor. The further away the common ancestor is, the more difficult it will be to reason about the behaviour of unbound code.

Yep, this is a much more difficult case to deal with for my model. The fundamental problem is that it wants the environment of a block to be fully described by binding onto that block value itself. That’s easy if the block is evaluated in its current environment without further modification. That’s also easy if the block is passed directly to the function which wants to modify its environment. But, if the block is merely included somewhere else, it doesn’t know that its calling environment has been modified.

I think there’s a way around it, though. In particular, we could use the same idea that I suggested for custom function generators, namely passing new bindings through functions:

loop-five: lambda [:var body] [
    for-each :var [1 2 3 4 5] compose [
       print "Iterating loop"
       (lambda compose [(:var) continue] body) (:var) :continue
    ]
]

This obviates the need for UNUSE… or, perhaps, reframes it as a tool to generate these pass-throughs in a convenient way:

>> code: unuse [x] [print [x + y]]
== (~#[frame! [x]]~ x)

(Where that outer GROUP! is unbound, of course.)

So with that UNUSE it would look like:

loop-five: lambda [:var body] [
    for-each :var [1 2 3 4 5] compose [
       print "Iterating loop"
       (unuse compose [(:var) continue] body)
    ]
]

I don’t actually mind this style — the way I see it, it’s good to precisely express intent like this. It means the default is for blocks to be executed in the same environment they were defined in, and overriding that requires special work (though not much!) and is done in a limited way. Certainly, it’s not obviously worse than the current situation, which is ‘blocks can have anything arbitrarily rebound depending on where they land up’.

hostilefork · January 10, 2024, 1:42am

That accomplishes effectively the same thing as the "overbinding" instructions of the specifier of today. It does this without an a-priori traversal...just letting the overbind instruction graft on whatever specifiers the embedded blocks already have (if any) as it descends.

But if the function's body itself contains composed blocks that are bound... it again will only see arguments in the topmost array... so it's not a substitute for a true hole-punching UNUSE.

Unless you say that FUNC overbinds its args and locals, which is essentially today's model (and then FOR-EACH would do it too presumably, so the FUNC is not needed).

When we look at simple examples of imperative code, leveraging functions might seem like a good avenue of attack...

But the issue is that more broadly, the purpose of the language is to empower dialects...where the dialect is interpreting the structure. It wants to decide that things like a WORD!-followed-by-a-TAG! means something, and it can GET that word's value as a variable if it wants to.

So you're not binding black boxes, you're binding things like PARSE rules that aren't directly run by the evaluator... or that only evaluate some pieces at some times (e.g. how PARSE runs groups).

(Further discussion extracted to "What Dialects Need From Binding")

It's a puzzle, but even relatively weak implementations have allowed people to build cool things. They're just really easy to break.

Anyway...I think you're getting the general lay of the land, about the contentions in play when binding gets composed. I do think it is interesting to think about hole-punching as an alternative to overbinding... pushing responsibility onto those doing composition to say what they mean, while the natural flow operates on unbound code. The idea of never overriding a binding that already exists automatically has an appeal, just need to articulate the mechanism of explicit override.

bradrn · January 10, 2024, 3:06am

Indeed, FUNC would have to create a new environment for its arguments and locals. I don’t see any other way to implement it.

But FUNC is still needed. The key realisation here is: if we avoid doing any deep traversal, functions can only overbind values in the single block they’re passed. (And any unbound blocks nested within that.) They can’t overbind values in any nested bound blocks. So, if we want to ‘punch holes’ in those deeper blocks, we need to say so explicitly, using FUNC or an equivalent.

The result is a style of code where you have to explicitly specify whenever you want a block to receive overbinding. You can pass the outer block to FOR-EACH, but that does nothing to the inner block. So you need to explicitly overbind that block too.

bradrn · January 10, 2024, 4:24am

Just noticed I never responded to this:

This feels like a good articulation of the trade-offs resulting from this approach.

There are a few mechanisms of explicit override. At the most basic level, you can construct the new environment yourself with the desired values, and rebind it to the block. But FUNC will have to do this in any case, so it’s easier to build on that by using FUNC to do the rebinding for you. At an even higher level, you can use that to build combinators like the UNUSE I suggested. But either way, it’s something you have to ask for explicitly… this method won’t rebind blocks it hasn’t been asked to rebind.

Brett · January 11, 2024, 6:12am

The conversation has moved on, but I thought I'd throw in a response to bradrn's earlier musing. BTW, hi bradrn!

When REBOL 2 was released many years back part of the conception was that code, a dialect, to be evaluated might arrive via email or a network port. The dialect would be safe through being a controlled language. How to evaluate it? PARSE was meant to be the solution, but you have to define your grammar and your interpreter - both non-trivial tasks. DO and LOAD were often seen in examples, but:

>> do load "print now" ; Not safe.
11-Jan-2024/15:54:06+11:00

Alternatively:

>> do to block! "print now" ; Much safer.
** Script Error: print word has no context
** Near: print now

I've had an intuition for a long time that rebol blocks should start life unbound. Also that the Rebol interpreter provide better support than PARSE in block evaluation, even if the block is a dialect, through the manner of its binding. Very recently I came across two languages that echo this sentiment of code starting out dead and selectively being brought to life.

Conventional programming languages give too much power to programs. Typically, any program is potentially capable of doing anything that the whole system is capable of doing. ...such inappropriately powerful systems are unable to adequately defend themselves against malicious or imprudent programs....

Misty takes a different approach. Instead of attempting to add security, it makes it possible to remove sources of insecurity. It does this by giving each actor or function in an application just the resources it needs in order to do its work and no more.

Misty Programming Language: Security

Anything you try to do besides invoking [functions you have defined] will throw an exception, because it doesn't contain any other functions. This gives you complete control over what a piece of Lizzie code is legally allowed to do, and allows you to for instance evaluate "insecure" code in a highly restricted context, which does not have access to negatively modify the state of your server/client in any ways.

lizzie/docs/introduction.md at master · polterguy/lizzie · GitHub

Anyway, welcome again @bradrn.

bradrn · January 11, 2024, 6:29am

Hi!

Now this is a fascinating idea! I was thinking about binding purely in terms of scoping of names within one program, but you’re right that it has entirely different advantages too.

To this list of examples you can add WebAssembly — a bytecode which by default cannot interact with the outside world, but requires callers to explicitly import functions into it. (I saw this great article about it just the other day: missing the point of webassembly — wingolog .)