Caching Binding Lookup, and "Attachment Binding"

hostilefork · January 22, 2024, 2:47pm

I've written about Rebol's historical idea of walking the source deeply at the beginning, and the mere mention of an ANY-WORD! would lead to a variable being created for it in the user context. This chewed through memory making unnecessary variables, and gave you a situation ripe for typos:

 rebol2>> add-ten: func [argument] [
              argment: argument + 10
              return argument
           ]

 rebol2>> add-ten 20
 == 20

 rebol2>> argment
 == 30

This behavior resembles pre-strict-mode JavaScript, and makes it easy to get bugs

But it's a semantic that some code uses intentionally. And if Rebol2 emulation is to be possible, there has to be some way to do this.

It's also a useful way to work in the console. Browsers still run JavaScript in non-strict-mode by default:

>> function foo() { jkl = 10 }
<- undefined

>> foo()
undefined

>> jkl
<- 10

Simulating Non-Strictness With "Attachment" Binding

In order to overcome the idea of creating variables for every ANY-WORD! to be mentioned, the "Sea of Words" concept was first implemented by making code loaded for a module bind non-specifically to that module. Words held a pointer to the module with no further information about the address of a variable in that module. This was called the "attached" state.

If you tried to read from an attached word, it would fail. But if you wrote to an attached word, it would create a variable. This gave the experience of non-strict mode, without a-priori creating tons of variables.

But this creates weird words that are neither fully bound nor fully unbound. And you can't take being in the "attached" state to mean the variable does not exist in the module... because more than one attached reference to the word could have been created, and one of them might have been used to create it... not knowing about the other word's attached state to fix it up. Or perhaps the variable was created explicitly in the module--not via any particular assignment.

Should Words Be Storing "Environments"?

Historically, bound words would store the specific "address" of variables (most of the time object plus index), while unbound words would store nothing.

Attachment introduced a new situation where an "unbound" word could hold a pointer to a module, where the word would be created if a SET or SET-WORD! operation were performed.

But if objects and environments can expand, is there a good reason why the "attachment" should have been to a particular module... or should it be attached to an environment? If the specifier for a piece of code has an OBJECT! to look in first, and then a MODULE!... and a lookup doesn't find it in either, why should it become "attached" only to the module? What if it shows up in the object before you write it?

Taking this to the extreme: Why should only "unbound"/"attached" things be able to see overrides that come along later on? Why don't words store environments always, and look up every time...to be able to find new things?

Ok, Back Up.

It seems that once a word has been bound, it needs to stay bound to where it is. e.g. the following seems bad:

>> word: in [] 'foo
== foo

>> protect word  ; make sure no one changes FOO's value

>> set word 10
== 10

>> some-arbitrary-routine
== <whatever>  ; didn't error, so didn't try to write foo

>> get word
== 20  ; !!! it wasn't written, so how?

We'd lose some grounding if bound words weren't stable. It also would hurt performance, because words would have to be looked up in the environment chain every time.

BUT we're saying that the IN operation does this lookup, and may get different results if the environment changes. That's pretty much a given: the evaluator runs an equivalent to IN, and this is why when you run a function several times you get different bindings to different frames from the same unbound words as input.

Contain The Weirdness To "Attachment"

This points the finger at "attachment" binding being a narrow, weird thing... whose job is solely to simulate the idea that a variable exists that doesn't.

Sea of Words made it a shallow illusion, e.g. it didn't let you GET/ANY the variable and get it back as being trash--you'd get an error instead. There's risks to deepening the illusion, because you'd have to make module enumeration give back every possible word as a variable. :-/

So I think attachment should be to a module (not an environment) and it should be considered as bound for most practical purposes (it would need to be, for Rebol2 compatibility).

bradrn · January 23, 2024, 3:30am

Maybe I’m missing something, but: if we’re moving to this new model of block-level bindings, then surely that means there’s no need for ‘attachment’ at all?

hostilefork · January 23, 2024, 2:01pm

The problem is that a variable may not exist at all to bind to, and there's never a moment of explicit creation. Yet it's expected to work.

Historical Rebol expected all source words to be bound to something--even it's an unset something. This is because variables were created in advance for all words mentioned in source.

 rebol2>> foo: func [] [set 'whatever 10]  ; whatever bound to user context

 rebol2>> foo
 == 10

 rebol2>> whatever
 == 10

To preserve this behavior without requiring an a-priori walk that creates a ton of spurious variables, I came up with the idea of "attachment". Variables could be attached to a module but not exist in that module... yet. If a SET came along to that attached word, the variable would be created at that moment.

Now that we have to be more explicit about binding, there's still the problem of when that variable would be created:

>> foo: func [] [set in [] 'whatever 10]

>> foo
; ... if we wanted whatever in user context to be 10, how to do that?

SET only receives a WORD!...not the argument to IN. For this to work, either IN had to create the variable (not knowing in advance whether it was going to be SET or not) or it had to bind the word in a state to say "if you get a SET request, here's where you should put the variable."

I've already said that having modules work this "non-strict" way is questionable (it's also a bit of a puzzle to guess what kinds of non-strict module presence in the "environment" should permit emergence, and how that implies a single location--let's imagine there's a methodology to it). But it gives rise to console behaviors that people have come to expect, historical code uses the style, and rightly or wrongly some current code depends on it.

Urgh. Considering attachment binding to be "bound" is kind of unworkable. Because it makes basically the entire body of a non-strict module bound. And we've seen for instance that really basic things will choke on the binding (e.g. MAKE OBJECT! when SET-WORD! is thought to be already bound)

Blindly treating it as unbound isn't good, because as I said, it may have come into existence. So an attached word has to be checked for if it has come into existence before you treat it as unbound.

So if you can't treat it as unconditionally bound or unbound, this means its state can change out from under you.

It's definitely frustrating... I'd like to be able to say "this isn't important, don't support it". But that would be very consequential. Explicit creation of all variables e.g. in the console, we could say it could happen for all top-level definitions e.g. how MAKE OBJECT! works, and wind up with something like this:

>> x: 10 print [x]
10 ; works

>> (y: 10 print [y])
** Error: y is unbound  ; wouldn't work

bradrn · January 24, 2024, 2:05am

hostilefork:

 rebol2>> foo: func [] [set 'whatever 10]  ; whatever bound to user context

 rebol2>> foo
 == 10

 rebol2>> whatever
 == 10
To preserve this behavior without requiring an a-priori walk that creates a ton of spurious variables, I came up with the idea of "attachment". Variables could be attached to a module but not exist in that module... yet. If a SET came along to that attached word, the variable would be created at that moment.

I honestly am really confused why ‘attachment’ is necessary here. Surely set can just create whatever in the current environment when it runs? That’s what basically every other programming language does, and it seems to work fine for them.

hostilefork · January 24, 2024, 2:18am

SET's parameter is a WORD! that has no binding, or some binding, but... that word is all it has in its hand.

If you are speaking of an implicit environment parameter which would be captured from the callsite of the SET--it would be unlikely that was the right place to be doing the emergence of the new variable.

Think of the implementation of something like PARSE--which lives in its own module. It's traversing some code that you gave it:

Rebol [
    Title: "My Module"
    Type: module
]

example: func [<local> i] [
    parse [a 10] [w: word! i: integer!]
]

example

The block of rules that PARSE is receiving is thus connected with the frame of EXAMPLE which points next in the chain to the module environment for "My Module".

But the SET-WORD! combinator--which is implemented in the parse module--is what is ultimately running the code that sets the variable. It has logic which says to run and process the next parse rule, and if it matches then SET the word to the product of that rule.

PARSE has in its hand the input rule block, and the w: set-word! that it plucked out of that block. It doesn't know offhand that the w: word wasn't bound to something (e.g. if it asked about i it would get a binding).

The attached state is to allow the in rules ('w:) to give back a product that's able to permit the SET to appropriately emerge the variable into My Module, not PARSE's module.