"Member Functions" (in the era of Pure Virtual Binding)

Let's Start With "Member Variable" Binding

The properties of historical Rebol binding created a bizarre environment in which to implement objects with methods and member variable references.

There's a pretty basic question one can ask:

You: "It appears that every WORD! can bind to at most one object containing that word's symbol. So if I write a FUNC intended to act as a member function... what are the member word references bound to in that function's body, that allows them to apply the code to the field values of a new instance of the object?"

To make a long story short:

Redbol: "For every field of an object that holds a FUNCTION! value, making a new object instance will clone a new FUNCTION! value for the associated field in the copy. The original function's body is copied deeply, with any references to the original object's fields rebound to the field of the new object."

This creates a pathological explosion. Make an object with 20 methods, each of which has a body that is built up of... say... 10 blocks on average. Now instantiate 10,000 of those objects.

Beyond the basic overhead you'd expect from a new object copy, you just allocated 200,000 new function identities... plus two million nearly-identical arrays for their bodies... solely for the purpose of accomplishing what a single this or self pointer does in most OOP languages.

Those with non-trivial codebases (e.g. Atronix) who at first wrote code in this style...

obj: make object! [
   x: 10
   f: func [y] [return x + y]
]

>> obj/f 20
== 30

...were forced to "de-methodize" it, passing the instance into a "free function":

free-f: func [o y] [return o/x + y]

obj: make object! [
   x: 10
]

>> free-f obj 20
== 30

Worth Noting: Rebol's Central Aesthetic is "free functions"

Rebol's style of data mutation is done as "generics", e.g.:

append block [1 2 3]
;
; ... NOT block/append [1 2 3]

This dynamically chooses the right kind of APPEND procedure based entirely on the type of the first argument (blocks, strings, etc.)

But you can't implement this generic for your own objects. Red doesn't let you APPEND to OBJECT! at all. While in R3-Alpha if you APPEND to an OBJECT! it just adds fields--vs. delegate to any "append implementation" associated with some kind of object "class":

r3-alpha>> obj: make object! [a: 10]
== make object! [
    a: 10
]

r3-alpha>> append obj [b: 20]
== make object! [
    a: 10
    b: 20
]

I will mention that if you had a PORT!, curiously there was a way to supply an "actor" to PORT!s that could implement a small set of "port actions". For an example of this, see the original R3-Alpha ODBC extension:

r3-odbc/src/boot/odbc.r3 at c15c70d61a2f5c39cb01f7c685c4310d4ee987de · gurzgri/r3-odbc · GitHub

For the sake of documenting history, the list of "generic verbs" you can implement on a R3-Alpha are called "port actions" and limited to this arbitrary set:

CREATE, DELETE, OPEN, CLOSE, READ, WRITE, OPEN?, QUERY, MODIFY, UPDATE, RENAME

So even that wouldn't help you with APPEND.


Were "Classes" Ever On The Table in R3-Alpha?

On Carl's blog, circa 2006, he wrote At a CLASS Crossroads?:

He said:

"And, on top of all this, I've yet to mention the fact that REBOL 3.0 is likely to support object methods to make object function implementation more efficient (in memory usage). The implementing of methods in REBOL has always been considered difficult (mainly because there is no referential anchor for a group, a class, of objects -- they are prototypical and can be cloned from each other, not just a single parent). A class-based approach makes the implementation of methods much easier."

The post wasn't without criticism in the comments, e.g. from "Pierre Johnson":

"Class-based inheritance for REBOL? Why not just kill the product?"

"Class-based Aristotlean socialist thinking abounds in academic-based / near academic-based junk -- Java, C++, C#, Python."

"While not elegant, perhaps the #1, most used programming language of the world is Javascript. Why? Simpler, rapid prototyping using prototypes."

You didn't really hear anything about it after that. R3-Alpha got pretty bogged down just trying to be Rebol2 plus Unicode.


How About Red? Any Ideas There?

When OBJECT! was first added to Red circa 2014, DocKimbel wrote the blog 0.5.0: Objects support, and says (emphasis mine)

"Red implements the same object concept as Rebol, called prototype-based objects. Creating new objects is done by cloning existing objects or the base OBJECT! value. During the creation process, existing field values can be modified and new fields can be added. It is a very simple and efficient model to encapsulate your Red code. There is also a lot to say about words binding and contexts, but that topic is too long for this blog entry, we will address that in the future documentation."

Uh... I think I've said above pretty much all there has historically been to say... and it only took me... maybe an hour?

The big idea in Red is essentially to implement the "setters" half of "getters and setters". If you have a field in an object with that the precise name ON-CHANGE*, it will get the name of a field... the old value... and the new value it takes on. A lot of the code they seem to be interested in uses that instead of methods: just write a value to a field, and code executes. If you can get your work done with just that, you won't have method binding problems.

Although, presumably every copy of your object duplicates the ON-CHANGE* method itself?

red>> obj: make object! [
    a: 10
    on-change*: func [word old new] [print [word old new]]
]

red>> obj/a: 20
a 10 20

red>> obj2: make obj [a: 30]
a 20 30  ; note obj2 sees creation w/new field value as a "change"

red>> body-of :obj/on-change*
== [print [word old new]]

red>> append body-of :obj/on-change* [print "that figures."]
== [print [word old new] print "that figures."]

red>> body-of :obj2/on-change*
== [print [word old new]]

I take it back, this is way too complicated to address in a blog post. It's too deep. :roll_eyes:


"Okay, Ren-C... What Have You Got?"

Building up to the existence of Pure Virtual Binding II, Ren-C just kind of got waves of new tricks:

  • First there were "definitional returns", which gave each function a cell to store a local RETURN function. While the behavior was unique to each function, no actual new identity was allocated. Instead a slot in the 4-platform-pointer cell (known as the "coupling") was used to hold the FRAME! of the instance of the running function...so it was efficient at doing so.

  • Then there was "(function-)relative binding". This started a separation in the type system of the interpreter code distinguishing a "Cell" (which hadn't had its binding fully resolved) from a "Value" (which did have a resolved binding). Each function instantiation would slowly trickle down the FRAME! of the function during evaluation, such that the "relative" Cells would have to be paired with that frame before they could be passed to a routine doing non-structural Lookup to follow the word to a "specific" Value.

  • This laid the foundation for what I called "derived binding", which actually mixed the two approaches together. Just as any particular local variable named RETURN could hold a FRAME! in a cell's "coupling" slot (while reusing the same canon RETURN implementation), you could use that technique to store an OBJECT! in any function's coupling slot. So the function call would not only trickle down the frame, but also this object... so there would be two contexts that were searched during derelativization.

So under derived binding, making a new object doesn't require either a new function identity or a deep copy. It just means that the new object receives its function cells with an effective "this" pointer in one of the 4 platform-pointer-sized slots. That pointer is stowed in the FRAME! and trickles down via the cells instantiated by the evaluator as it descends the relativized arrays.

Originally, Derived Binding Did Not Blindly Override...

Here's something that didn't work in derived binding as you'd expect given what I described:

old-ren-c>> x: 42

old-ren-c>> obj: make object! [x: 10, f: null]

old-ren-c>> obj2: make obj [x: 20, f: null]

old-ren-c>> obj2.f: func [] [print ["x is" x]]

old-ren-c>> obj2.f: couple :obj2.f obj2

old-ren-c>> obj2.f
x is 42

The reason it didn't work was because this function was defined outside the scope of the MAKE OBJECT!, meaning the binding of the X was to 42. When applying derived binding, Ren-C wouldn't do any overriding of any WORD! references that weren't bound to an object that was in the same "inheritance chain".

You'd have to do something like this to see the effect:

old-ren-c>> x: 42

old-ren-c>> obj: make object! [x: 10, f: func [] [print ["x is" x]]]

old-ren-c>> obj2: make obj [x: 20, f: null]

old-ren-c>> obj2.f: couple :obj.f obj2

old-ren-c>> obj2.f
x is 20

Here, the X was deep walked during the MAKE OBJECT! and bound to OBJ's X. When derived binding went to run the rebound function in OBJ2, it noticed that the binding to X was to a relative of OBJ2 in the inheritance chain... so it was willing to "forward" that binding to the object in the FRAME! it was relativizing against.

Note that the sort of virtual "this" pointer is not coming from the fact that OBJ2 is on the left hand side of the instantiation of the call to OBJ2.F -- it's solely coming from the coupling slot in the cell stored in that field. You could make OBJ2.F be derived-bound to anything you want.

And now I can tell you that if you had used METHOD instead of FUNC... and hadn't said (f: null) when making OBJ2, then the above behavior is what you would get automatically. The MAKE OBJECT! would simply notice when any of the fields it was copying had a stowed function with a coupling slot of itself, and update the cell in the new object with a coupling pointer to the new object.

(All METHOD is, is an enfix operator that steals the binding from the SET-WORD! on its left to poke into the cell of the generated function.)

But With Virtual Binding II, The Object Wins (Mostly)

We're not in binding Kansas anymore... there is no "deep walk" when the FUNC is created that gives X an initial binding of 10. As of this moment, bindings aren't ever overridden (hole punching is still on the table, but let's not go there right now...)

>> x: 10

>> obj: make object! [x: 20, f: null]

>> obj.f: func [] [print ["x is" x]]

>> obj.f: couple :obj.f obj

>> obj.f
x is 20

I actually didn't know what the following would do until I tried it:

>> obj.f: func [x] [print ["x is" x]]

>> obj.f: couple :obj.f obj

>> obj.f 30
x is 30

I guess locals to the function are looked up first, before member variables. Maybe I meant to do that (C++ does it that way).


What Have We Learned?

  • It seems Rebol could use a better way of doing multiple dispatch so that you could write things like append my-object [a b c] and get custom behavior for the "class".

    • The aesthetic of the system is supposed to promote that kind of free-function syntax

    • It doesn't make a lot of sense that you can only do this with PORT! and only for a handful of "generic verbs"

  • But to the extent people have wanted to do traditional obj/member calls (in Ren-C obj.member), the historical implementation of the idea was catastrophically bad

    • Without changing the overall usage experience (much), Ren-C wiped out the pathological implementation aspects

    • While Ren-C may be on the whole slower than its peers for many things at the moment, derived binding is one area where it blows them away.

      • The "O-Big" example I came up with is simply not runnable in R3-Alpha or Red.
1 Like

A post was split to a new topic: Usermode Multiple Dispatch in R

Something that bothers me is...

...this derived binding may use a very optimized implementation compared to R3-Alpha or Red. But it's still effectively doing something that feels "dirty" when you make a derived object and it feels the need to make decisions to go around adjusting cells so they are effectively different functions.

What if you wanted to grab a function that was relativized against an object and put it into another object, and have its object coupling "stick"? You were just moving a piece of hardened functionality around to act like a function, but it re-relativizes it.

Almost worse to me is that you only get the re-relativization on derivation. Not because that's conceptually the best time to do it, but because it would be even more dirty code if you had to hook every assignment.

In light of the above I have frequently asked myself: "why NOT get the 'this' pointer from the left hand side???"

Then the system won't have to go around tweaking function cells without consulting you. And if you ever explicitly put a coupling to a specific object in a function cell, that never gets overridden by the system. It's assumed you meant to create a persistent coupling of that method with that object--at which point it can be called with no left-hand side (if you like).

But...Not Every Function In An Object Is A Method Of That Object

Let's say I write some kind of emitter:

 msg: "I am emitting:"

 my-emitter: func [item] [
     print [msg (mold item)]
 ]

Now lets say you have some overall state object which lets you provide a handler:

 p: make processor [factor: 10]
 p.emit: :my-emitter
 process p [1 2 3]

You're hoping to get:

 I am emitting: 10
 I am emitting: 20
 I am emitting: 30

But instead you are surprised to get:

** Script Error: PROCESSOR => 10

Looking at the code, you can see why:

 processor: make object! [
     factor: null
     subsystem: "PROCESSOR"
     msg: func [text] [fail [subsystem "=>" text]] 
     emit: null
 ]                         

 process: func [p [object!] data [block!]] [
      if not integer? p.factor [msg "FACTOR must be INTEGER!"]
      if not action? :p.emit [msg "EMIT must be ACTION!"]
      for-each item data [
          p.emit item * factor
      ]
 ]

If p.emit automatically injects P into EMIT asssuming it's a method of P, then MY-EMITTER will see MSG as it is defined in P.

Well, What If Member Variables Were Annotated?

So if we are always "relativizing against" the left hand side, it's clearly a problem that we're just assuming any old plain WORD! is supposed to be a member.

What if for METHOD there was a 'this' (or 'self', as in R3-Alpha), and it was initialized to the invocation object. This could come either from the function cell, or if not in the cell then the left hand side.

 msg: method [text] [fail [self.subsystem "=>" text]] 

Some drawbacks:

  • It does mean we've either created a second class of function (or we give in and just give all FUNCs a SELF regardless of whether you're going to use it)

  • On a tiny function like this it may not look bad, but I absolutely HATE seeing this and self noise all over a method code (Rust drives me up the wall where it can be every other word in the function).

  • We're kind-of creating a keyword, and I don't like that

So I've wondered: "what if we gave the meaning of a BLANK!-headed tuple to mean "pick out of the relativized invocation object?"

>> to tuple! [_ subsystem]
== .subsystem

It's light, it doesn't use any keywords, and would be a vast help in reading code to tell what's a member access and what is not.

msg: func [text] [fail [.subsystem "=>" text]] 

It might help to still have to flip some flag with a FUNC vs METHOD distinction to get it. I don't know.

But... It's Not a WORD! Anymore

Historically there have been many places where you could put a WORD! but could not put a member selection (historically, via PATH!)

Just one small example: consider this simple Red code today:

red>> a: none
red>> parse [1] [set a integer!]
red>> a
== 1

red>> obj: make object! [b: none]
red>> parse [1] [set obj/b integer!]
*** Script Error: PARSE - unexpected end of rule after: set

Generally speaking I don't think this is a good argument against it. A good Ren-C dialect should be willing to take a TUPLE! pretty much anywhere it takes a WORD! (in the event it is intended to be a variable reference).

So if anything it's an argument for it. And it makes it rather nice that .foo is indeed a reduced case of TUPLE! which you should already be handling...instead of some other novel datatype.

And with pure virtual binding, you could always import it, using some construct or another.

msg: method [text] [
   use .  ; imports all fields to top of context stack
   fail [subsystem "=>" text]  ; dot no longer required
]

Maybe the construct would let you import just what you wanted as a WORD!:

msg: method [text] [
   use .subsystem
   fail [subsystem "=>" text]
]

But BLANK!-Headed TUPLE! And PATH! Were Previously Inert...

In order to match historical behavior of "refinement!"s, BLANK!-headed paths e.g. /foo are inert. So BLANK!-headed tuples got the same treatment.

The actual logic was "any path or tuple with an inert first element is inert". Hence why 192.0.0.1 is inert instead of a "cannot pick 0 out of 192" error.

Ah well. Symmetry is boring, anyway....

Odd is our god
So keep it uneven
Our talent's imbalance
It's what we believe in

GET-TUPLE! Can't Automatically Poke The Cell's Field, Right?

When we think about doing something like:

extracted: :p.emit

That should leave the function alone, right...vs. harden a reference to the P object inside the cell of the action for EXTRACTED?

It doesn't seem like a bad thing to have to be explicit about hardening an extracted method.

extracted: couple :p.msg p  ; or whatever

extracted ...  ; now you can call it with no left hand side

I'm Curious Enough To Try It

As is frequently the case, I write a long post and then start messing with the implementation and find problems immediately.

But this is a line of thinking that I've had for a long time... driven both by the near-unreadability of things (like Rebmake) which don't call out member field accesses in giant functions... as well as a strong distaste for the "auto-twiddling" that happens to functions in derivation.

Pure Virtual Binding II now brings this idea within reach. So I'm going to give it a shot. Clearly there's going to be some bit of new implementation magic around to allow a tuple like .foo to carry the frame with it when bound, despite not having a word (as OBJ in obj.foo) to carry it on.

1 Like

A first tentative step showed some success, and allowed the deletion of ugly cruft while still working on complex examples! That's always a win.

But when you're writing freeform code inside of a MAKE OBJECT!, it can be easy to forget to use .field instead of just field. This is because at time of writing, everything in the object spec block sees the words of the object being created.

e.g. if you say:

obj: make object! [
    x: 10
    repeat 3 [
        x: x + 1
        print ["x is" x]
    ]
 ]

That will give you:

x is 11
x is 12
x is 13
== make object! [
    x: 13
]

The broad and deep availability of the object's fields inside of this can be very cool sometimes. But a lot of the time it is a pain... you often only want the top-level SET-WORD!s to target the object being created, leaving plain WORD!s and anything deeper alone:

make-point: func [x [integer!] y [integer!]] [
    return make object! [x: x, y: y]  ; can't do this, x and y are unset!
]

make-point: func [x [integer!] y [integer!]] [
    return make object! compose [x: (x), y: (y)]  ; works, but ... annoying
]

So I'm gathering we need a stricter construction operation that doesn't deeply fog up the bindings inside the block. Essentially, semantics like this, where all the code that isn't a top-level SET-WORD! acts like it isn't inside the object's "scope" at all:

obj: simple-construct [x: <expr1> y: <expr2> z:<expr3>]
=>
obj: make object! [x: y: z: null]
obj.x: <expr1>
obj.y: <expr2>
obj.z: <expr3>

This way, if you define a function inside a SIMPLE-CONSTRUCT you can't fail to use the .field reference style for the members.

I've pointed out before that what MAKE OBJECT! does is high-level and weird, and needs to be broken up into component operations. And I'm now suspecting that FENCE! should have the SIMPLE-CONSTRUCT semantics, such that the following would work:

make-point: func [x [integer!] y [integer!]] [
    return {x: x, y: y}
]

Not that doing-as-JavaScript-does is much of an argument for anything, but I'll point out that does work in JS:

function make_point(x,y) { return {x: x, y: y} }

>> make_point(10,20)
<- Object { x: 10, y: 20 }

Looking back on history, I actually see the motivation for why CONTEXT used to be a synonym for what MAKE OBJECT! does in Rebol2/R3-Alpha-Red:

context: func [
    "Defines a unique (underived) object."
    blk [block!] "Object variables and values."
][
    make object! blk
]

If you write something like:

obj: context [foo: 10, print ["foo is" (foo: foo + 1)], bar: 20]

That actually makes clearer sense, in that it tells you what you're doing is setting up a context in which code will run. I didn't like it because CONTEXT was a noun. Maybe CONTEXTUALIZE?

obj: contextualize [foo: 10, print ["foo is" (foo: foo + 1)], bar: 20]

But it makes sense that there's a verb in this variation that takes in a BLOCK!, because there are high odds you will need to COMPOSE it if there are any collisions between names in the context and names outside of it that you want to reference.

Perhaps we should start looking at MAKE as being a truly high-level builder, not a low level one. I have had some thoughts about that, even going back to variadicness *(via Ren-C mechanisms), perhaps like:

 foo: make function [arg] [print ["arg is" arg]]

 obj: make object [foo: 10, print ["foo is" (foo: foo + 1)], bar: 20]

MAKE could quote the word after it, and this doesn't look too bad. You could still specialize the non-variadic bits today:

 func: specialize :make [name: 'function]

Partially specializing the variadic portion would be less easy, so perhaps there would be make-function and make-object available, which are dispatched to by the internals of MAKE's syntax sugar.

Tangents aside though--this does look like some of the haze surrounding this may be dissipating. If coming up with names for things winds up being the last bit to solve, we'll be quite lucky.