Simple Objects vs. What The People Want

hostilefork · October 18, 2021, 6:45am

Ren-C has a more streamlined version of how R3-Alpha implemented simple OBJECT!s, but it's really mostly the same (though MODULE! has changed significantly)

An OBJECT! is just two parallel lists, which I have called the "keylist" and the "varlist".

So if you say something like:

obj: make object! [
    x: 1 + 2
    y: 10 + 20
]

You will get:

keylist: {symbol(x) symbol(y)}
varlist: [*V0* 3 30]

The first slot in a varlist is used for some tracking information. So:

keylist[0] is the key for varlist[1]
keylist[1] is the key for varlist[2]

You Get A New Keylist With Every MAKE OBJECT!

Nothing in the system goes around looking for common patterns in your object creation to notice that you've made several objects with the same keys.

collect [
    count-up i 1000 [
        keep make object! [x: i * 10, y: i * 20]
    ]
]

You just made 1000 objects, and all of them have their own copy of the keylist {symbol(X) symbol(Y)}. Ren-C made this overhead cost less than 1/4 as much as R3-Alpha, but it's still kind of lame.

The only way you avoid making a new keylist is if you do object inheritance.

point!: make object! [x: y: null]
collect [
    count-up i 1000 [
        keep make point! [x: i * 10, y: i * 20]
    ]
]

This time, there's 1000 objects all sharing a single keylist.

If you expand keys at all, that will result in a new keylist...

You spoil the optimization if you put anything additional in your derived object:

point!: make object! [x: y: null]
collect [
    count-up i 1000 [
        keep make point! [x: i * 10, y: i * 20, z: i * 30]
    ]
]

There's no inheritance mechanism that makes use of the common sublist. So this puts you at 1001 keylists, because your keylist for the original point! never gets used.

Object Expansion via APPEND disconnects shared keylists

R3-Alpha allowed you to add fields to an object. If you did so, you would lose any sharing that it had taken advantage of before.

p: make point! [x: 10 y: 20]  ; reuses point!'s keylist
append p [z: 30]  ; oop, not anymore...gets its own keylist.

Comparisons Are Difficult

Because there's no global mechanism of canonization of keylists, you get entirely different-looking objects by creating the fields in different orders.

obj1: make object! [x: 10 y: 20]
obj2: make object! [y: 20 x: 10]

These objects have been considered to be not equal historically. Because comparisons are done by walking the fields in order. So obj1 <> obj2 in this case.

However, if you create an object via inheritance so it shares a keylist, that will standardize the order of the fields:

point1: make point! [x: 10 y: 20]
point2: make point! [y: 20 x: 10]

Here we will have point1 = point2, since their shared keylist forces the order of x and y to whatever it was in POINT!.

There Are Fancier Ways Of Dealing With This

If you're willing to say that the order of keys in objects shouldn't matter... then you can rethink the data structures to exploit commonalities in the patterns of keys that are created.

The V8 JavaScript engine approaches this with Hidden Classes.

But there's really always some other way of approaching the problem. The way modules work in "Sea of Words" is an example of a structure that seems to work reasonably well for modules--but wouldn't work as well for lots of little objects.

Today's FRAME! Depends On This Non-Fancy Way

Right now, when a native runs it does so with a concept of the order of the arguments and refinements that gets baked into the C code directly. IF knows that the condition is argument 1 and that the branch is argument 2, and it looks directly in slots 1 and 2 of the varlist of the frame to find those variables.

This is pretty foundational to the idea of the language, and is part of what gives it an appealing "simple-ness".

Ren-C has come along and permitted higher level mechanisms like specialization and adaptation, but everything is always getting resolved in a way that each step in a function's composition works on putting information into the exact numbered slot that the lower levels expect it to be in.

Binding Has Depended On This Non-Fancy Way

A premise in Rebol has been that you can make a connection between a variable and an object that has a key with the name of that variable, and once that connection is made it will last. This rule is why there's been dodginess about deleting keys in objects or rearranging them...and why R3-Alpha permits adding new variables but not removing any.

 obj: make object! [x: 10 y: 20]
 code: [x + y]
 bind code obj

If you write something like the above, you are annotating the X inside of CODE with (obj field #1), and the Y inside of CODE with (obj field #2). So nothing can happen with obj that can break that.

This isn't strictly necessary. It could have annotated X and Y with just (obj) and then gone searching each time it wanted to find it. This would permit arbitrary rearrangement of OBJ, inserting and removing keys. It could even remove X or Y and then tell you it couldn't find them anymore.

There are compromises as well. The binding could be treated as a potentially fallible cache...it could look in that slot position (if it's less than the total keylist size) and see if the key matched. If not, it could fall back on searching and then update with the slot where it saw the field.

(Of course this means you have to look at the keylist instead of just jumping to where you want to be in the varlist, and locality is such that they may not be close together; so having to look at the keylist at all will bring you a slowdown.)

But What Is The Goal, Here?

I've mentioned how the FRAME! design pretty much seems to go along well with the naive ordering of object fields.

I guess this is where your intuition comes in as to what represents "sticking to the rules of the game". And I think that hardcoding of positions into the executable of where to find the argument cells for natives is one of the rules.

This suggests that all functions hardcode the positions of their arguments--even usermode functions. I'm okay with this.

So then we get to considering the question about OBJECT!.

A lot of languages force you to predefine the structure of an object before creating instances. And defining that structure is a good place to define its interfaces. If Rebol wants to go in a more formal direction (resembling a Rust/Haskell/C++) then you might suggest you always make a base structure...and you can only have the fields named in it.
Other languages (like JavaScript) are more freeform, and as mentioned can look for the relationships after-the-fact. Order of fields does not matter.

It's clear that Rebol's userbase so far are people who would favor better implementation of the JavaScript model over going to more strictness. I think there'd be a pretty good reception of a model where you could create objects with {...} and where fields could be added or removed as people saw fit. If behind-the-scenes the system was optimizing access to those objects, that would presumably be preferable to this idea that you had to be responsible for declaring prototypes to get efficiencies (that would instantly disappear if you added another field).

But the mechanics definitely get more complicated. :-/

rgchris · October 18, 2021, 8:46pm

This seems to omit one feature of Javascript objects that can be efficient (as much as I'm aware of how such things are implemented)—something I sort of alluded to in my class proposal.

That is that JavaScript objects (essentially maps in the Rebolsphere) only contain a keylist for values that are different and keys common to a type of object are contained in the prototype's keylist (or the prototype's prototype all the way back to the Object object).

The implementation is a bit awkward requiring constructors and such, but I'll try and relate to the above (could use Object.create but with caveats):

// first point of awkwardness, we generally need a function
// to create a prototype
//
let Point = function () {}

// now that we have this base, we can build the prototype
//
Object.assign(
    Point.prototype, {
        x: 0, y: 0,
        form: function () {return this.x + 'x' + this.y}
        // `this` in methods is not equiv. to Rebol's `self`
    }  // <= the prototype we really want
)

//
// Point.prototype has a keylist of [x, y, form]
//

// Create a new derivative object with a keylist of []
//
let point_1 = new Point()

console.log( point_1.x )  // => 0

// Let's expand the keylist to [x]
//
point_1.x = 1

// It still resolves `y` from the prototype
//
console.log( point_1.y )  // => 0

// And the method will work back through the
// prototype chain to resolve each key
//
console.log( point_1.form() )  // => '1x0'

// Creating another derivative object with a novel key
// does not alter the prototype or other inherited types
// 
let point_2 = new Point()
point_2.z = 10

console.log( Point.prototype.z )  // => undefined
console.log( point_1.z )  // => undefined

// Keys added to the prototype are available to derivatives
//
Point.prototype.z = 100

console.log( point_1.z )  // => 100

Anyways, there's a few more interesting features related to this, but thought it worth adding the essential model to the mix.

hostilefork · October 21, 2021, 1:50am

This is good to point out...and I think in general that if you look at a lot of the realities of making objects, you wind up creating functions to do it.

It's another point that making braces that just fabricate objects when they are visited might not be all they're cracked up to be. (Though I've put the idea through consideration, especially as it was proposed for Carl's new ASON, I'm becoming increasingly skeptical.)

IngoHohmann · October 21, 2021, 5:45am

One more thing against the idea: I could imagine, that something that looks a bit like json, and works a bit like json, and then falls short of being compatible, will lead to a lot of confusion.

rgchris · October 21, 2021, 10:41pm

I'd again look at the intersection of MAP! and OBJECT! and where each apply.

I'd contend that if Rebol 2 had a MAP! datatype from the outset, objects wouldn't be as prevalent as they are and we'd have been better off from having that distinction in code, e.g.:

read/custom http://some.site/ #(
    Accept: "text/x-rebol"
)

(continued objections with the #() notation aside, etc.)

Having MAP! values (especially where they have literal notation) takes pressure off that usage of OBJECT! Indeed having MAP! and MODULE! around should already have sharpened the focus for the role OBJECT! fills: not as an awkward stand-in for a key-value type, nor a discrete context for library/support code.

I could refer to the Ten Steps definition where it kind of pertains to classical Object Oriented programming—where that might be desirable (could argue it has its place); or perhaps Javascript where (with but a few exceptions) every value is an object (even Array is but an object with numeric string keys for entries). While I wouldn't necessarily suggest going to the latter extreme, it's not without its own merits (again, is an inspiration for my class proposal).

I don't think any particular idea should be off the table and I don't think it need interfere with any codebases (such that they exist) in that they should for the most part be migrating to MAP! and MODULE! for most all of the common usages

I'd add that it's reasonably clear that JSON 'objects' are analogous to MAP! values and that JSON should map fully to BLANK!, LOGIC!, INTEGER!, DECIMAL!, TEXT!, BLOCK!, MAP! and that OBJECT! be as far away from that discussion as is possible (as it should be from Re[*] source notation).

Back to Javascript, where in addition to stating above that objects are just associative arrays with an inheritance chain and without context, a Map is an Object that acts as a mediator to an additional structure storing keys and values—not effectively different than, say, using an OBJECT! in Rebol 2 to create a map-like interface (except that that structure may be optimized in native code). Could argue that they've tied their own hands by using {...} as Object notation instead of Map

hostilefork · October 23, 2021, 5:26am

rgchris:

I'd contend that if Rebol 2 had a MAP! datatype from the outset, objects wouldn't be as prevalent as they are and we'd have been better off from having that distinction in code, e.g.:
read/custom http://some.site/ #(
    Accept: "text/x-rebol"
)
(continued objections with the #() notation aside, etc.)

I've written up my concerns about source-level MAP!s...a big one being "when (if ever) do you evaluate":

{ Rethinking Braces }... as an array type? - #25 by hostilefork

LOAD-time is before binding so that won't work. And then if you do it at any other time, you wind up with something that's just a shorthand for MAKE OBJECT!...and will still need evaluation.

I don't see anyone being satisfied with the behavior of:

read/custom http://some.site/ #(
    Accept: "text/x-rebol"
    Upgrade-Insecure-Requests: secure-base + 1
)

This is no good if it's redefining + to be 1. So if we know that's bad, what's left that's good?

If the issue is wanting to have a way of saying "I'm giving you a lower, rawer form of input that's not dialected" then maybe that's a good fit for the @[...] array type. "This is just keys and values, I promise".

read/custom http://some.site/ @[
    Accept: "text/x-rebol"
    Upgrade-Insecure-Requests: 2
]

This tactic is taken with UNPACK, for example....which I think is clever.

But maybe the default of when you get a BLOCK! is "this is information in the READ/CUSTOM dialect", whatever that means. And its job is to produce the keys and values somehow. But you can bypass that intelligence with something that requests unintelligence.

I don't think any particular idea should be off the table

Definitely good to clock some time with other languages (which I've been doing recently) and to ask if there are lessons there to be learned.

I went down the road of thinking about {...} for a use besides strings. But I think that what we might see from strings being more powerful with binding for interpolation could usher in a new area.

rgchris · October 23, 2021, 5:58am

I did see that, and while I get that there are issues with evaluation in such a construct, for the purposes of staying on-topic I'll not dig into that so much as to say more than: a convenient key-value type with source prominence can remove that traditional burden from the object type. I think this is worth stating again given the traditional perception and usage of objects in Rebol.

hostilefork · October 23, 2021, 7:34am

So I'll just restate that improving the runtime thing that is "OBJECTMAP!" is clearly on the agenda, but these source representation issues are related and important to get at too.

Clearly I am sympathetic that it would be nice if we could have a "this is an object" notation at source level.

To say "all you need is MAKE OBJECT! on BLOCK!" is like saying you don't need GROUP! (since you can DO a BLOCK!)

However, in "Rebol's JSON" I just suggested assuming all BLOCK! that start with SET-WORD! are objects, and then have some escape notation for the outliers.

There's a sense where recognizing BLOCK!-starting-with-SET-WORD! is an unsatisfying answer. But also a way in which you can choose to find it satisfying. It does have its upsides: Rebol is good at working with and composing blocks, they're inert and use the most pleasing bracket...

Maybe the operator that loads the above-proposed "Rebol JSON" is something easy to use. I propose it maybe being a symbol and done via its own array type. But maybe it's so foundational it should take the name MAKE (we never figured out what the difference of MAKE and TO are, really...)

I dunno, but let's look at having it take over MAKE as a placeholder for "really easy to ask for" (and potentially giving us MAKE-WORD!, MAKE-BLOCK!, etc. as easy named variants)

>> stuff: make [
     name: "objects"
     list: [[a: 10 b: 10 + 10] <gap> [a: 30 b: 20 + 20]]
]
== #objectmap![
    name: "objects"
    list: [
        #objectmap![a: 10 b: 20]
        <gap>
        #objectmap![a: 30 b: 40]
   ]
]

I do think that it's worth pointing out generic quoting as a way to escape out of evaluation when you don't want it:

>> stuff: make [
     name: "plain block"
     list: '[[a: 10 b: 10 + 10] <gap> [a: 30 b: 20 + 20]]
]
== #objectmap![
   name: "plain block"
   list: [[a: 10 b: 10 + 10] <gap> [a: 30 b: 20 + 20]]
]

>> stuff: make [
     name: "list with objects and blocks"
     list: [[a: 10 b: 10 + 10] <gap> '[a: 30 b: 20 + 20]]
]
== #objectmap![
    name: "list with objects and blocks"
    list: [
        #objectmap![a: 10 b: 20]
        <gap>
        [a: 30 b: 20 + 20]
   ]
]

And you still have full representational coverage...double quoting to give you a single-quoted block if that's what you meant:

>> stuff: make [
     name: "quoted block"
     list: ''[[a: 10 b: 10 + 10] <gap> [a: 30 b: 20 + 20]]
]
== #objectmap![
   name: "quoted block"
   list: '[[a: 10 b: 10 + 10] <gap> [a: 30 b: 20 + 20]]
]

That little twist of "quoted things are always that thing minus one quote level, but unquoted things can be more free in their meaning" is something that has shown to have a lot of interesting applications. And here we see it letting us say non-quoted blocks are subject to the "might represent an object" treatment.

I think it's cool.