The Real Story about User and Lib Contexts

hostilefork · July 30, 2018, 3:59pm

People know about binding in the sense of making an object--which has certain named keys--and then binding an ANY-WORD! which has the same spelling (case-insensitive) as those keys into that object. Once bound, the words look up into that object.

Yet the mysterious user context influences a lot of how people experience the system. How does it operate?

For the moment, let's oversimplify matters a bit and imagine that the user context starts out empty. Then say you type into the console something like:

>> foo: func [arg] [print ["arg is" arg]]

This idealized user context would now contain keys for foo, func, print, and arg. FOO would be set to this new action we've just defined. FUNC and PRINT would be values of their respective actions from lib. And ARG would be an unset variable--present in the user context, but with no value assigned to it.

Ok...it seems fairly obvious that the user context would contain a FOO. You just defined it, and there's nowhere else for it to live. But why did it create entries for func and print? Aren't they in the lib context? Couldn't it have just bound the words into the lib context directly?

Binding words into lib implies you probably also bind SET-WORD!s (and even if you didn't, you could use SET to set through a WORD!). When you have a direct reference to something like print or append in lib--and casually overwrite what it points to--you would wreak havoc on the implementations of mezzanine routines that live in lib that use those definitions. e.g. if you say print: :fail--then try to run HELP--then all the printing in HELP would break.

So what happens instead is that when a word isn't found in the user context already, the LOAD process checks lib to see if it exists there. When it creates the binding in user, it captures the value of what lib holds for that word at that moment. The two words then go their own way--any changes to lib's definition after that point won't update the user context's version.

Next we might ask why arg got a binding. It's a function argument, why should it have a definition in the user context? That variable is just going to be unset...because the function is going to use a binding for the arg based on the invocation, not use any "global" arg.

The thing is that LOAD didn't know what you were going to do with the code you loaded. And when it made a WORD!, it has a moment in time to either make the choice to bind it or not. If it left it unbound because it couldn't find anything to point it to, you might be disappointed by how that behaved.

Let's try out a console in an alternate world where you'd have to have a SET-WORD! before getting a binding:

>> d1: does [print foo]

>> foo: 1020

>> d2: does [print foo]

>> d2
1020

>> d1
** Error: foo is not bound to a context

So here you see the problem of not pre-emptively adding a context key for every word you see. If you didn't, you'd have to do some kind of "forward declaration".

Every ANY-WORD! gets a binding in user? That's a lot of words!

Yep. And the context only grows--it never shrinks.

This was worse when ISSUE! was an ANY-WORD! (a decision that has been reversed in Ren-C, it is now an immutable string form, unified with characters). Not only would you get a word for every data-bearing issue, you'd get invalid words:

>> foo: func [] [print [#1020-0304]]   

>> find words of binding of 'foo '1020-0304
== [1020-0304 words]

Your context has FIND, WORDS, OF, BINDING, FOO, 1020-0304, and WORDS... (1020-0304 was just close to the tail). So as crazy as everything else is, it's good to have issues out of this picture.

How much space are we talking about?

Each entry in a context is a key and a value. The key is 4 platform pointers, the value is 4 platform pointers. On a 64-bit system that's (4 + 4) * 8 => 64 bytes.

Every unique word, issue, or otherwise that gets its own binding adds that cost, whether the declaration comes from lib or not. And if modules are isolated into their own user-type contexts, then they start bloating up too...with their own copies of lib declarations, and their own declarations for any word they use.

So if you use 800 unique words in a module, that module has about 50k of overhead just for that list on a 64-bit system.

Could lib be read-only, so direct bindings could be used?

The main problem with making the variables in lib read-only is what would happen if you decide you want to redefine something. You'd have to do that redefinition up-front. Because at present, there's no way to go back and update bindings that were made historically...you don't know where the blocks of code containing the old binding got handed to.

When you think about the impact to the usage model, it is significant. Again, think about how much in the console it's taken for granted that d1: does [print foo] can have foo defined after that and still be found by that foo. If direct bindings were used until overridden, you would face a different use problem in that you'd redefine PRINT but only bindings that were made after that point would see it.

How do modules treat this?

I mention that this same concept extends to module isolation. The state of modules in R3-Alpha was that they were bound directly into lib unless you said Options: [Isolate]. Look how quickly that goes south:

>> m: module [] [test: does [print: does [do make error! "surprise!"]]]
>> m/test
>> help append
USAGE:
    ** User error: "surprise!"

(Note: USAGE is output with PRIN and not PRINT, so that's why you see it. Good argument for a common hookpoint like WRITE-STDOUT, isn't it?)

Once you throw on that Options: [Isolate] you don't wreak that havoc. But you'll start noticing your module went from having a handful of declarations in it for your exports, to having a definition for every single ANY-WORD! it so much as mentions.

I've just added some interesting code which lets the libRebol API detect at runtime what native they're running in, and use that as a guide to what context to do bindings into. So a native registered as part of an extension module would be able to bind into that module's context...thus not being disrupted when other modules (or the user context) changes the definition of something basic like rebRun("append"...) or rebRun("print"...)...also using the specific overridden versions of words that particular module defined. But for this to work, they must be using Isolate, which means they'll be getting those tens of kilobytes of overhead by having their own copies of every WORD!, SET-WORD!, ISSUE!, etc. they reference.

...?

It's unfortunate that people let things get this far down the road in Rebol lore without a body of clear-headed thinking about this. I take every chance I get to complain about it not having been a priority. :-/

What I've brought in is a lot of mechanical tools for solving binding problems--we already have specific binding and derived binding which are solutions to fundamental design holes. The same level of control that permitted the implementation of those features can be brought in to help here. So it's not a matter of execution--it's a matter of figuring out what the plan is.