"Relative Binding and FRAME! Internals"

hostilefork · January 6, 2016, 5:00am

This is an old writeup that was on a GitHub wiki prior to the existence of the forum. We don't have intention to use GitHub wikis for anything, so it's probably best to move it here.

What is Binding?

In Rebol, "Binding" is property of ANY-WORD! values (like foo, foo:, :foo, etc.) that connects them to a storage location where that word's value may be "looked up". These binding locations are essentially dictionaries. They match the "Key" (the symbol value assigned to the word) with a slot of memory suitable for holding any kind of Rebol value, known as the "Var".

Ren-C formalizes the name for the category of values that may serve as binding targets (e.g. OBJECT!, PORT!, MODULE!, ERROR!) as being called "Contexts". It also adds another called FRAME!.

Specific Binding

The Ren-C term for bindings that connect a word persistently to a fixed variable slot is "Specific" binding. If the value is updated through one word with a specific binding, then another word with that specific binding will always see that update.

Bindings into OBJECT! are specific in R3-Alpha:

 r3-alpha>> obj: make object! [foo: 10]
 r3-alpha>> word: bind 'foo obj
 r3-alpha>> print get word
 10

 r3-alpha>> obj/foo: 20
 r3-alpha>> print get word
 20

The ability of blocks of code to retain the meaning behind their binding when passed to functions is a cornerstone of the function of Rebol. It is the reason why users can do things like write control constructs as functions, and use words that are the same as the words in code blocks passed to them to execute.

So above, word winds up holding a copy of the WORD! foo. That foo has been bound into obj, and now the content of obj is such that foo will look up to the value of 20 if it gets evaluated.

Continuing the example:

r3-alpha>> test: func [arg /local foo] [
    foo: 10
    code: compose [print foo + (arg)]
    probe code
    do code 
]

r3-alpha>> test word
[print foo + foo]
30

The sum comes from 10 and 20. Nothing had acted on arg to rebind it from the understanding of where it should look for the "var" it was was bound to when the caller set it. So two distinct values could be found through the binding.

"Dynamic" Binding in Rebol2/R3-Alpha

A FUNCTION! could serve as a binding target in both Rebol2 and R3-Alpha. But these functions were not "specifically" bound, they were "dynamically" bound. This meant no matter how many instances were running at a time, the words bound in that function would only look up their vars in the most recent invocation:

r3-alpha>> foo: function [x code] [
    append code [print x]
    if x > 0 [
        probe code
        do code
        foo (x - 1) code
    ]
]

r3-alpha>> foo 2 []
[print x]
2
[print x print x]
1  ; Notice this is not 2 :-/
1

This behavior may seem acceptable. One might even imagine it useful, if designing recursions of a function and wanting to take advantage of it somehow.

Yet from a language point of view, it undermines some of the most important code the system seeks to enable: generic code. Consider utilities like COLLECT, or writing one's own versions of control constructs, they are designed specifically to be combined in ways that have not been anticipated.

For instance, a WHILE loop is not that likely to use its own code in its implementation. But a user of a WHILE loop may well put other WHILE loops inside of it. One level of the WHILE will be on the stack when another comes later. If the code and bindings of one level have unexpected impact on another, composability is harmed.

R3-Alpha's CLOSURE!

Users of Rebol who were working on generic code and tools ran up against exactly the problem described. To address the lack of specific binding in function arguments and locals, CLOSURE! was created:

r3-alpha>> foo: closure [x code] [
    append code [print x]
    if x > 0 [
        probe code
        do code
        foo (x - 1) code
    ]
]

r3-alpha>> foo 2 []
[print x]
2
[print x print x]
2  ; compare with FUNCTION!, where this was 1
1

An additional feature of CLOSURE! was that the bindings would persist even when not running, something function couldn't do:

r3-alpha>> c: closure [x] [return [print x]]
r3-alpha>> c-code: c 3
[print x]

r3-alpha>> do c-code
3

r3-alpha>> f: function [x] [return [print x]]
r3-alpha>> f-code: f 3
[print x]

r3-alpha>> do f-code
*** ERROR
** Script error: x word is not bound to a context

The dark side of CLOSURE!

While CLOSURE! may seem like a sizable improvement, it wasn't adopted as a full replacement for FUNCTION!. The reason wasn't that FUNCTION's properties were generally desirable, but because of the relative inefficiency. How CLOSURE! worked was a brute force implementation: it was creating an OBJECT! context on every call, and then deep copying the body and rebinding it to that object.

To put in perspective what the efficiency differential is, this is essentially what CLOSURE! behaved like:

c: function [x-arg] [
    obj: make object! [x: x-arg]
    body: [return [print x]]
    bind/copy/deep body obj
    do obj 
]

That's a deep copy and rebind of the body on every call. With longer code that could be a lot of time. But not just time in invocation...the objects and all the series for the body would then become a burden on the garbage collector.

It was a simple and Rebol-like solution to the problem, but a difficult one for people to accept as the new implementation of FUNCTION!. It seemed unavoidable to have a "fast but broken" way and a "good but inefficient" way.

Ren-C sought to examine the problem to find a single specific-binding solution at acceptable cost, that could remove the cumbersome distraction of FUNCTION! vs. CLOSURE! from the user's awareness.

The "Unsolvable" Problem

What the "unsolvable problem" was is essentially the idea of it being impossible to get one bit pattern to mean two things at once, without some other context. Rebol's implementation model of a WORD! value was 4 things:

A header (saying it was a REB_WORD and not a REB_INTEGER, for instance)
A symbol number in a symbol table
A pointer to the context where the word was bound (or NULL if unbound)
An index caching the numbered context slot that symbol's value was in

The pattern of bits representing that word lived in the body of another pattern of bits representing the series (and all the other values around the word). With the implementation working how it did, with a single copy of the series in a function body you had to interpret the bits in one single way. You could pick the most recent function call, the oldest one, or pick one at random. But without some other clue of context, those four pieces of information meant effectively one thing.

For the "statement of the obvious": any plan for specificity in such a situation must involve creating a kind of function-relative binding. The relatively bound words would lay dormant in the body, then become made specific when re-assembled with context information that would come from...somewhere else.

If the "somewhere else" is the current stack state, then that won't give a word a stable identity. The "somewhere else" has to encode a specific function invocation. The question was always how to make the association of a function invocation with that word lighter than CLOSURE!'s "copy the whole body and thus make new words every call". :-/

A New Kind of Context: FRAME!

If a piece of context information needed to be provided and joined with the fixed information representing a relative word in a series, that context needed a name. It took the name FRAME! (which had been reclaimed from an internal type in R3-Alpha that was deleted from Ren-C.)

The idea for a frame would be that it would be a binding target, and it would have the same lifetime as a function invocation. So unlike the words from CLOSURE!, you would not be able to look them up after a function call...because the data for the arguments on the stack would have been cleared away. However, the lookups would neither crash nor accidentally reactivate to random data due to recycled memory or future calls...a word that had wound up bound to a FRAME! that terminated would be an error for as long as references continued to exist to it.

This implied that FRAME!'s implementation shouldn't (and couldn't) just be the call stack pointer for the function invocation from when it was alive. It would have to be a separate entity, and it would have to participate in the garbage collector. But it could likely be a very light entity, that perhaps even collapsed to the size of a pointer or two in a memory pool while it awaited GC.

(Note: Some nuanced techniques changed this seeming restriction as well. A common header format which could distinguish pointers to C struct pointers to "frame invocations" from a pointer that had been "reified" as a GC-managed context assisted in limiting the range of reifications to when stack-formatted cells were escalated to either indefinite lifetime cells or stack-formatted cells at a higher stack level. It sounds more complicated than it is, but even though it's not that profound it is still beyond the scope of intent of this particular wiki page.)

From Relative to Specific

If the bindings of words representing function arguments and locals were relatively bound to a function, it would be bringing that together with a FRAME! that would make them specific. These specifically bound words couldn't be put into the series representing the body of the function because it needed to be used by other invocations. So instead it was threaded through the call stack as a supplementary pointer.

This introduced a new dichotomy. There would now be "Cells" ...relativized array elements that hadn't gone under scrutiny for whether they needed to be reunited with a frame to be meaningful. And once specified they would become specified "Values". The specification process would begin at the root of the execution of a function body, and as the process of digging in with recursive evaluation happened, this would progressively reify the function-relative cells into frame-specific cells in stack variables...one layer at a time.

Enter C++11 for static analysis

One of the tools in Ren-C's arsenal is having been brought up to speed for building under compilers like ANSI-C 89 as well as under C++11 and beyond. Although it involves know-how and contorting C++ in funny ways, it is possible to build constructs that allow the adding of very sophisticated checks to C programs without necessarily requiring them to change the source as its compiled in C89.

Having the tool and the experience available, it seemed the only way to retrofit a codebase like Rebol for a feature this drastic would be with the help of a C++ build pointing out any flaws. But the goal was to make the changes and rules reasonable enough that they could still be reasonably modified under C, just more manually.