Caching Binding Lookup, and "Attachment Binding"

I've written about Rebol's historical idea of walking the source deeply at the beginning, and the mere mention of an ANY-WORD! would lead to a variable being created for it in the user context. This chewed through memory making unnecessary variables, and gave you a situation ripe for typos:

 rebol2>> add-ten: func [argument] [
              argment: argument + 10
              return argument
           ]

 rebol2>> add-ten 20
 == 20

 rebol2>> argment
 == 30

This behavior resembles pre-strict-mode JavaScript, and makes it easy to get bugs

But it's a semantic that some code uses intentionally. And if Rebol2 emulation is to be possible, there has to be some way to do this.

It's also a useful way to work in the console. Browsers still run JavaScript in non-strict-mode by default:

>> function foo() { jkl = 10 }
<- undefined

>> foo()
undefined

>> jkl
<- 10

Simulating Non-Strictness With "Attachment" Binding

In order to overcome the idea of creating variables for every ANY-WORD! to be mentioned, the "Sea of Words" concept was first implemented by making code loaded for a module bind non-specifically to that module. Words held a pointer to the module with no further information about the address of a variable in that module. This was called the "attached" state.

If you tried to read from an attached word, it would fail. But if you wrote to an attached word, it would create a variable. This gave the experience of non-strict mode, without a-priori creating tons of variables.

But this creates weird words that are neither fully bound nor fully unbound. And you can't take being in the "attached" state to mean the variable does not exist in the module... because more than one attached reference to the word could have been created, and one of them might have been used to create it... not knowing about the other word's attached state to fix it up. Or perhaps the variable was created explicitly in the module--not via any particular assignment.

Should Words Be Storing "Environments"?

Historically, bound words would store the specific "address" of variables (most of the time object plus index), while unbound words would store nothing.

Attachment introduced a new situation where an "unbound" word could hold a pointer to a module, where the word would be created if a SET or SET-WORD! operation were performed.

But if objects and environments can expand, is there a good reason why the "attachment" should have been to a particular module... or should it be attached to an environment? If the specifier for a piece of code has an OBJECT! to look in first, and then a MODULE!... and a lookup doesn't find it in either, why should it become "attached" only to the module? What if it shows up in the object before you write it?

Taking this to the extreme: Why should only "unbound"/"attached" things be able to see overrides that come along later on? Why don't words store environments always, and look up every time...to be able to find new things?

Ok, Back Up.

It seems that once a word has been bound, it needs to stay bound to where it is. e.g. the following seems bad:

>> word: in [] 'foo
== foo

>> protect word  ; make sure no one changes FOO's value

>> set word 10
== 10

>> some-arbitrary-routine
== <whatever>  ; didn't error, so didn't try to write foo

>> get word
== 20  ; !!! it wasn't written, so how?

We'd lose some grounding if bound words weren't stable. It also would hurt performance, because words would have to be looked up in the environment chain every time.

BUT we're saying that the IN operation does this lookup, and may get different results if the environment changes. That's pretty much a given: the evaluator runs an equivalent to IN, and this is why when you run a function several times you get different bindings to different frames from the same unbound words as input.

Contain The Weirdness To "Attachment"

This points the finger at "attachment" binding being a narrow, weird thing... whose job is solely to simulate the idea that a variable exists that doesn't.

Sea of Words made it a shallow illusion, e.g. it didn't let you GET/ANY the variable and get it back as being trash--you'd get an error instead. There's risks to deepening the illusion, because you'd have to make module enumeration give back every possible word as a variable. :-/

So I think attachment should be to a module (not an environment) and it should be considered as bound for most practical purposes (it would need to be, for Rebol2 compatibility).

Maybe I’m missing something, but: if we’re moving to this new model of block-level bindings, then surely that means there’s no need for ‘attachment’ at all?

The problem is that a variable may not exist at all to bind to, and there's never a moment of explicit creation. Yet it's expected to work.

Historical Rebol expected all source words to be bound to something--even it's an unset something. This is because variables were created in advance for all words mentioned in source.

 rebol2>> foo: func [] [set 'whatever 10]  ; whatever bound to user context

 rebol2>> foo
 == 10

 rebol2>> whatever
 == 10

To preserve this behavior without requiring an a-priori walk that creates a ton of spurious variables, I came up with the idea of "attachment". Variables could be attached to a module but not exist in that module... yet. If a SET came along to that attached word, the variable would be created at that moment.

Now that we have to be more explicit about binding, there's still the problem of when that variable would be created:

>> foo: func [] [set in [] 'whatever 10]

>> foo
; ... if we wanted whatever in user context to be 10, how to do that?

SET only receives a WORD!...not the argument to IN. For this to work, either IN had to create the variable (not knowing in advance whether it was going to be SET or not) or it had to bind the word in a state to say "if you get a SET request, here's where you should put the variable."

I've already said that having modules work this "non-strict" way is questionable (it's also a bit of a puzzle to guess what kinds of non-strict module presence in the "environment" should permit emergence, and how that implies a single location--let's imagine there's a methodology to it). But it gives rise to console behaviors that people have come to expect, historical code uses the style, and rightly or wrongly some current code depends on it.

Urgh. Considering attachment binding to be "bound" is kind of unworkable. Because it makes basically the entire body of a non-strict module bound. And we've seen for instance that really basic things will choke on the binding (e.g. MAKE OBJECT! when SET-WORD! is thought to be already bound)

Blindly treating it as unbound isn't good, because as I said, it may have come into existence. So an attached word has to be checked for if it has come into existence before you treat it as unbound.

So if you can't treat it as unconditionally bound or unbound, this means its state can change out from under you.

It's definitely frustrating... I'd like to be able to say "this isn't important, don't support it". But that would be very consequential. Explicit creation of all variables e.g. in the console, we could say it could happen for all top-level definitions e.g. how MAKE OBJECT! works, and wind up with something like this:

>> x: 10 print [x]
10 ; works

>> (y: 10 print [y])
** Error: y is unbound  ; wouldn't work

I honestly am really confused why ‘attachment’ is necessary here. Surely set can just create whatever in the current environment when it runs? That’s what basically every other programming language does, and it seems to work fine for them.

SET's parameter is a WORD! that has no binding, or some binding, but... that word is all it has in its hand.

If you are speaking of an implicit environment parameter which would be captured from the callsite of the SET--it would be unlikely that was the right place to be doing the emergence of the new variable.

Think of the implementation of something like PARSE--which lives in its own module. It's traversing some code that you gave it:

Rebol [
    Title: "My Module"
    Type: module
]

example: func [<local> i] [
    parse [a 10] [w: word! i: integer!]
]

example

The block of rules that PARSE is receiving is thus connected with the frame of EXAMPLE which points next in the chain to the module environment for "My Module".

But the SET-WORD! combinator--which is implemented in the parse module--is what is ultimately running the code that sets the variable. It has logic which says to run and process the next parse rule, and if it matches then SET the word to the product of that rule.

PARSE has in its hand the input rule block, and the w: set-word! that it plucked out of that block. It doesn't know offhand that the w: word wasn't bound to something (e.g. if it asked about i it would get a binding).

The attached state is to allow the in rules ('w:) to give back a product that's able to permit the SET to appropriately emerge the variable into My Module, not PARSE's module.

1 Like

Okay, that emulation is out the window, so let's drop it from consideration. (Though building an emulator in Ren-C would likely be relatively easy given the infrastructure, but that emulation would be a whole different evaluator.)

I wonder if people would be happy enough if it just did top-level declarations.

>> x: 10
== 10

>> (y: 20)
** Error: Y is not bound

>> (let y: 20)
== 20

>> y
** Error: Y is not bound

>> y: ~

>> (y: 20)
== 20

>> y: ~<used by some-func>~  ; <-- tripwires are so cool
== ~<used by some-func>~  ; anti

>> some-func: func [x] [y: x] 

>> some-func 20

>> y
== 20

If you need to make a bunch of variables in a context, the wrap command could help you:

>> wrap [x: 10 y: 20 z: 30, x + y + z]
== 60

>> z
** Error: z is not bound

So the console and whatever else could perhaps use a variant of that, where instead of using a fresh context you inject the variables into another one:

>> wrap* system.contexts.user [x: 10 y: 20 z: 30 x + y + z]
== 60

>> z
== 30

Could do it as a refinement to wrap, though it puts the argument at the tail which I find annoying.

wrap:inside [x: 10 y: 20 z: 30 x + y + z] system.contexts.user 

I think that covers the bases for me, but I'll have to try propagating this stuff in the system to find out.

If attachment binding disappeared, I'd shed no tears.


(Okay, I posted that 30 minutes ago, and I just booted a system with no attachment binding, so... yeah. I think it's time to let it go. People will write better code this way, strict mode is good.)

2 Likes

I hinted at the need to break MAKE OBJECT! into component operations, and I think WRAP is one of those components... where it shouldn't run the code.

Because you might want to write something like:

all wrap [
    x: ...
    y: ...
    z: ...
]

So the evaluation should be separate, as eval wrap [...]

Does EVAL WRAP deserve a special name?

Maybe, but I can't think of a good one offhand. I sort of feel like those two words together are "right-sized" for the intent.

Everything in actual code that this is tripping on was a bug.

The casualties are things that are intentionally scrappy, like the tests.

I don't think the answer is to do something like WRAP/DEEP on the tests for their convenience...we don't want tests to be different from what runs in the console... and we don't want what runs in the console to be that different from what you could put in a script.

Basically just have to bite the bullet, and turn tests like all [x: ...] into all wrap [x: ...] or all [let x: ...] or whatever.

It is a little disappointing that things break when you say all wrap [(x: ...) ...] but I think that's just life. Binding is becoming more of a conscious effort and that has ramifications, that's one of them.

2 Likes

One problem, SET-BLOCK!s have not traditionally been considered in top level gathers. (Not because they were never intended to be, but because they are "new" :roll_eyes: and have had their semantics sort of hammering out over time.)

But if you can write:

wrap [
    x: ...
]

It seems that you should be able to say:

wrap [
    [x y]: ...
]

But SET-BLOCK! is dialected, so it gets a little complex. You don't want to make a variable for Z when you see [x (z)]:, and you do want to make a variable for N when you see [^n (z)]:

Which reminds me, with CHAIN we're about to get META-CHAIN-WORDs, like ^foo: ... so you won't have to make a block to get a meta assignment. And we'd expect that to wrap too.

wrap [
    ^foo: ...
]

Brave new world. But @foo: (evals to @foo:, bound) shouldn't be collected, nor should $foo: (evals to foo: bound), and obviously not 'foo: (evals to foo:, unbound)

I'm also reminded that SET-BLOCK! needs to recurse for unpacking.

wrap [
     [[a b] [c d]]: pack [pack [1 2] pack [3 4]]
]

I would expect a, b, c, and d to be collected by that wrap.

Well, there we see it's a blessing as well as a curse... if you didn't want the wrap semantics, you can escape them.

Given that I seem to be meandering about here, it may not be obvious...but things are really starting to tighten up. Things are coming into alignment, and the code is written in such a way that I can morph it around pretty comfortably despite the drastic upheavals. I will reiterate that without being able to build as C++, a lot of the reorganizations I do would be nigh impossible... so frontloading the R3-Alpha redesign effort with bringing C++ into it was not just worthwhile... it's the only reason the project can exist.

Just had a thought: why do we need a special function WRAP? I feel that it should be sufficient to put LET before SET-WORD!s as desired.

In the case of the console interactivity, a LET would not be seen past one evaluation (and isn't supposed to), so that explains at least why what I've called WRAP* is needed, to gather the set-things and inject them into the user context:

>> let x: 10
== 10

>> x
** Error: X is not bound

And I believe it would be painful for scripts and modules to have to put LETs on all their top-level declarations, in addition to the fact that LET "does something different" mechanically. It would have to be changed to where a module-level LET "injected", and I don't think we want LET to change what it does in that way. But again, that's in contrast to WRAP* being used by the module machinery.

As for the cases of higher-level WRAP on a BLOCK! creating one object vs. lots of little LETs, there's not a huge difference. But there may be some advantage to gathering all the declarations in one function call vs. having several--it may create fewer GC'able entities and it may be able to run faster.

But mostly it's just less visual noise of lots of LETs, and letting people work in a way that seems more familiar. It does mean that you don't have to worry about touching up code when you move or delete the "declaring instance":

... [
    let [pos value]: transcode/next ...  ; say I delete this line
    let x: ...
    let y: ...
    [pos value]: transcode/next ...
    if x = value [...]
    [pos value]: transcode/next ...
    if y = value  ...]
]

I think if you are writing code with one definition in it you are likely to favor the LET, but if you have a lot then the WRAP may win.

There's also a bit about how people might feel about LETs in mid-expression, e.g. on the right hand sides of comparisons/etc.

 all [
     1 = let x: something ...
     ...
 ]

LET does work in these cases, but might seem awkward--even to people who have fully embraced putting assignments in such slots. I can at least vouch for the fact that putting LET there stresses me out (maybe because I'm just sensitive to the idea of "that might not work!", but it does work...the definition outlives the subexpression...although I predict trouble with variadics).

I suppose we can debate whether that should actually work or not. It seems to me there's more value from it working. It's strictly more powerful--if you don't want it to outlive the subexpression you can put it in a group.

1 Like