Semantics and Optimization of Copying Function Bodies

hostilefork · January 16, 2024, 9:00am

Part of Rebol's design is that every execution frame has an order to its arguments, represented by an integer.

So for instance with IF:

the CONDITION would be in frame cell[1]
the BRANCH would be in frame cell[2]

NATIVE Functions Use The Ordering

Natives don't have to walk the keys of a frame, e.g. to find a slot that matches the symbol "BRANCH". They are hardcoded to look directly at the index it should be in.

(R3-Alpha hardcoded these indices, and you can see that as D_ARG(1) and D_ARG(2) in the implementation of IF. Ren-C puts the native specs in comments, processed during the build to make macros. These macros like ARG(condition) and ARG(branch) resolve to the integers at compile-time.)

Interpreted FUNC Functions Use Indices Too, But Differently

When a FUNC is being generated, a table is made that maps from symbols for the arguments and locals to the integer for that symbol in the frame.

Then the BLOCK! passed as a body is walked through to find ANY-WORD!s that reference the arguments and locals, and the index is poked into the cells for those words. Then, the binding pointer in the cell is set to point at the function definition.

That isn't enough information to look up the word--because the binding is relative to a function that is not running. Hence it's called relative binding. You need to somehow connect this to an instance of the function that is running and has its argument and local cells filled.

Historical Rebol & Red make this connection dynamically by climbing the stack and looking for an instance of the function seen in the binding. Clearly that doesn't permit indefinite extent (closure) over a local or argument, because when the function isn't running its variables can no longer be found. But it also can be wrong if recursions interact with material bound to various instances of the same function.

Ren-C uses a better mechanism of Specific Binding, where information is slipped through instances of BLOCK!s and other arrays, saying which frame that view of the array is executing on behalf of.

Conflict With New Policy of "Leave Binding Alone"

The bias in the new proposal of binding is that any WORD!s already bound will be left-as is, with only some minor surgery on environments of blocks done by things like FUNC or FOR-EACH, to inject some new variables at the head of the lookup.

That may make it sound like these words which are bound relatively inside function bodies are more trouble than they are worth. In the model, they're actually supposed to be thought of as unbound--so if they carry a secret binding, that could only be used as an optimization when they are actually intentionally combined with the right frame.

The Optimization is Actually Not Minor

Ren-C function frames are unusual, in that a single frame can be reused for several "phases" of a function composition. e.g. you can ADAPT and SPECIALIZE and AUGMENT a function--even a native--and all these just reuse the same frame. But during the phase, it only sees the fields of the frame it is supposed to be able to see.

For instance, if you specialize out the VALUE from append (e.g. by making it always append 5) and then try to ADAPT the resulting function, you won't even know that APPEND has a VALUE at all. It will seem like it only has a series.

Being strict about enforcing this information hiding permits you to do unusual things, like once the VALUE is shielded in the inner function... you can AUGMENT the resulting functions with another argument named VALUE. This process can proceed indefinitely. There could be dozens of fields named VALUE in the frame, but only one visible at a time.

So checking whether a value is visible in the frame is more involved than just walking a list of symbols and comparing them (which isn't necessarily fast in and of itself). The parameter definitions must be consulted also, to see if they line up with the running phase of the function in order to determine the visibility.

The time of function creation is a good moment to do this work, instead of going through it on every lookup. And squeaking performance out of pure virtual binding is going to be hard... we need all the tricks we can get.

Copying The Function Body

R3-Alpha actually had a trick during bootstrap, where while loading the library functions it would use a special version of FUNC that did not copy the function body blocks. It assumed none of them were composed out of parts that were exposed or would change, and if there were exceptions the library authors were supposed to be sophisticated enough to COPY or COPY/DEEP where needed.

(IIRC, it would contaminate these blocks by putting the relative binding information in them, so they would be seen as corrupt if anyone managed to get access to them.)

But after bootstrap was over, it would replace FUNC with a version that deeply copied the bodies.

It invites a lot of accidents in the historical world to have a FUNC that doesn't copy its body deeply. But there are some new possibilities that might be able to avoid accidents while still covering a lot of cases. For instance, it could deeply protect the blocks and make them immutable. This way, if a user ever found themselves bit by it they could add in their own COPY or COPY/DEEP at the relevant places. But 95% or more of the time, you'd not need to and the system could speed up.

In any case, it's interesting that the relative binding information wouldn't be corrupting the binding information any longer, because it's just an optimization for unbound values... and any PICKs or FOR-EACHs would see relatively bound words as unbound.

hostilefork · January 18, 2024, 2:56pm

If this optimization is going to work, then it also needs a complement to the "unbounds optimized to find their membership in the frame, if you ask"... which is "unbounds optimized to NOT find their membership in the frame, if you ask".

Consider this:

 y: <something>

 foo: func [x <local> a b c d e f g h i j ...] [  ; imagine costly to lookup
     code: copy [add x]
     append code to word! "x"
     do code
 ]

The [add x] block gets bound to the environment before it is passed to COPY. That environment can find X and Y and ADD and whatever else.

What we're talking about in the optimization is that when the function body was copied, the X of [add x] got a binding pasted on it to say "if you try and look X up in a frame instance for the foo function, here's the index of X...you don't have to look". The index and the pointer to the function definition can fit in the cell.

But that X is conceptually unbound. There's no reason the other appended X that has no binding whatsoever shouldn't be found just as legitimately as it. (I used TO WORD! "X" here instead of just 'X because quoted things would need to be labeled with the lookup optimization too...or at least not labeled with the negative optimization. Again, it's conceptually still unbound. But if you convert a string to a word at runtime, there's no optimization there.)

By extension, all unbound material needs to be considered in the search when the frame is in the specifier. You can't rule out looking to see if ADD is in the frame just because it doesn't have the optimization applied, for the same reason you can't rule out looking for the no-optimized-binding X.

So that's why I mention the negative hint. All unbound material that's not in the frame during the copy would be tagged with the action definition to say "don't bother looking me up in the environment for a frame instance of this action".

Dynamic material run with the frame context applied that's unbound won't have this information. But if it's unbound, there's no harm in having the lookup cache the information about whether the lookup succeeded or not. Only one frame lookup can win, so it would either need to be sticky with the first lookup done or re-cache (based on last lookup, or maybe if a frame comes along that's bigger than the one cached...)

Note that what this is replacing was a model by which bindings would be fully "hardened" inside the function body during the copy. So everything in the body would point directly (or relatively) at what it needed to look up to, giving speed in future runs (modulo the various "overbindings" that would override bindings for things like loop variables during a FOR-EACH, etc.)

So the example I cite would not work (it would work if you used 'x instead of to word! "x"):

 foo: func [x] [
     code: copy [add x]
     append code to word! "x"
     do code
 ]

>> foo 10
** Script Error: x word is not bound to a context

Should be able to make it work in the new model, without needing to explicitly bind the word to the frame instance.

Keeping the body mostly unbound across every run has nice properties, but it'll be slow without some tricks like this optimization in play.

...Another Argument For The `.member` Selection Notation

With binding being purely virtual, there's competition not just from multiple frames (e.g. nested functions) over the same cell's cache, but also lookups for object members.

This seems to make an interesting case for being able to tell from the value itself if it needs to be looked up in an object.

Without that:

 o: make object! [
     field1: <one>
     field2: <two>
     ...  ; imagine a non-trivial number of fields
     foo: method [arg1 arg2] [
         return reduce [arg1 field1 arg2 field2]
     ]
 ]

When that method runs, it gets a specifier for the object instance and for the frame of FOO. So if we label everything in the body that gets copied as either "unbound, but findable in foo frames" or "unbound, and NOT findable in foo frames"... that's used up all the cache space in the cells.

But if we know that .xxx will never look up to a function argument, but only in the members, the caching can be saved relative to the object prototype.

 o: new object [  ; something where FIELD1 and FIELD2 don't bind deeply
     field1: <one>
     field2: <two>
     ...  ; imagine a non-trivial number of fields
     foo: method [arg1 arg2] [
         return reduce [arg1 .field1 arg2 .field2]
     ]
 ]

This is on top of the benefit I've already mentioned of "giving you some chance of figuring out what is going on".

Note: Module Lookup Is Already (Sort Of) Optimized

The way module lookup works is that words hold a symbol, and that symbol points to a linked list of variable stubs--one for each module that defines that variable.

This wouldn't scale well for objects (think 10000 objects which all had a field "X", searching that list would be slow). But for modules it's fairly fast, as few modules define the same symbol. Also, once a word becomes bound it points directly to the stub for the variable.

(Just mentioning in case you were wondering why I'm fretting over lookup in frames and objects but not the cost of lookup in modules.)

hostilefork · January 25, 2024, 7:25am

I did a quick implementation of this and it does have significant effect.

Prior to the new binding model, just starting up the system and shutting it down was clocked at 480,631,721 "events"

After the first implementation with no optimizations, it was 598,362,973... so around 25% slower.

Just adding this optimization brought it to 524,280,372. So more like 10% slower.

The startup-shutdown test is just an easy thing to do, and more profiling would be necessary... but that's an encouraging result.