Renaming SPECIFIER => CONTEXT ?

hostilefork · March 6, 2024, 6:56pm

There was a typeset in historical Rebol that was called ANY-OBJECT!, that tried to pull together anything that had WORD! keys that could be used as a binding target:

r3-alpha>> help any-object!
ANY-OBJECT! is a typeset of value: make typeset! [
    object! module! error! task! port!
]

I had sort of a naming philosophy (which I still have) that you shouldn't name the category based on one of its instances like this. Several reasons, but one is that it certainly confuses the implementation when you extract the pointed-to entity out of a cell:

 if (Cell_Type(cell) == TYPE_ERROR) {
     Object* = Cell_Object(cell);  // wait, but it's an ERROR! not an OBJECT!
     ...
 }

You could call it AnyObject, I guess. But that's not the direction I took... instead calling the category ANY-CONTEXT!, and if you saw Context* in the source you wouldn't get confused to thinking it was an extraction from a CONTEXT! cell, because there was no such thing. You knew it was the implementation of a superclass.

Enter "Specifier"

"Specifier"--the aggregated inheritance of binding candidates--has moved from an implementation detail to something which is rising to the surface of user awareness. For that, it's a lousy name... and am near-certain I want to take "Context" for it.

I like Context better than Environment, as it's shorter and leaves environment for "environment variables" (which being a script-class language Ren-C needs to be better at interacting with than it is today).

We could say the other types are ANY-DICTIONARY!, although the name DICTIONARY! has been suggested as a replacement for MAP!, since we are thinking of MAP more as a function now. But Dictionary may make more sense for things that only permit "words" as keys.

A CONTEXT! itself--as a composition of other dictionaries (and possibly even just programmatic code that answers value-for-WORD!)--may itself be categorized as an ANY-DICTIONARY!

Implementation Variance Needs Work

So something that has happened in the messy evolution of the code is that the one-size-fits all Context* data structure that backed things like OBJECT!, ERROR!, MODULE! etc. became fractured around the time of Sea of Words and LET.

Modules do not use the same representation, and have to be enumerated completely differently. There's not really a lot of generic code that acts the same way for OBJECT! and MODULE!, to the point that we'd be better off dispelling the illusion in the C sources and making Module* its own distinct type.

There's a lot to consider here about what the limits of "Amish" implementation are:

I'm pretty much sold on the known-integer-values concept for Frame. And if I say that drifting away from that destroys what the project is, then it almost certainly does--because I'm known for being iconoclastic about a fair number of Rebol sacred cows.
BUT taking a diverging approach for Module from "parallel arrays of Key and Value cell indexed by integer" has proven essential. Floating variable stubs hanging off the word symbols themselves is an answer that has been critical in giving some legitimacy to modules, and it still passes the "simple" test for me.
- R3-Alpha was utterly hopeless, and Red will be too if they follow down that path (should they ever get modules). I do not think this is a problem appropriate to approach with two-parallel-arrays.
Objects are kind of a wild card. Given their attempt to be dirt-simple, we might more accurately call the current version "Struct" or "Structure" instead (major annoyance in C naming the variables though, Struct _struct, to dodge the struct keyword?)
- People want to dynamically add and remove keys from objects. Rebol2 and Red don't allow it, and R3-Alpha only permitted growth (so the index numbers stored in words that were bound at an index wouldn't be invalidated)
- Moving to a more amortized implementation that spreads and shares keys gets you to something more like a database, where you can't point to little contiguous packets of memory and say "there is the object."
- As I said above, I think this was a necessity for Modules. But the simple implementation that works quite well for them wouldn't scale to tens of thousands of objects which have keys with the same name.
- There's plenty of prior art and writeups of how JavaScript engines and others have approached this, and gotten it to be fast.

Anyway, this all kind of culminates in saying that objects are due for a reckoning at some point. We want to err on the side of simplicity over optimized complexity, but there may be a sufficiently elegant way to attack objects that can grow and shrink effectively and having better code overall than we have today.

Anyway, Back To The Naming Issue...

The above sort of reveals why ANY-OBJECT! isn't a good name for ANY-DICTIONARY or ANY-BINDTARGET?.

Hm, maybe we consider ANY-BINDABLE? to be the category for things that can be bound to, and come up with another name for anything that can be bound? ANY-REFERENCE?

Eh, that sounds confusing. Dictionary may be about as good as it gets (?)

So basically, Dictionary would be a superclass offering lookup from Word Symbol => Value. If you wanted anything else from it, you would have to figure out what subclass it was, because how you do things like enumerate keys and values diverges significantly.

Or Maybe Some Unifying Theory Will Come Along?

It could be, that everything--including OBJECT! and MODULE!, have the "inheritance" powers of what Specifier has today.

Which would mean that you wouldn't have Specifiers. You'd just ask for the binding of a block and maybe get a MODULE! that inherits from an OBJECT!, or an OBJECT! that inherits from a MODULE!, or a LET! that inherits from a LET! that inherits from an OBJECT!, etc.

It all warrants more thought, and maybe a quick attempt to swap out the current OBJECT! implementation with something like V8's Hidden Classes and see how it meshes with the system. I've been so focused on the "bricks" in the language that these sorts of "boring" implementation details have just sort of been left alone while that's sorted out, but now it's getting to the point where there seem to be some answers to guide the shape.

bradrn · March 6, 2024, 11:44pm

This looks good, although I’m not sure I fully understand all the implementation details about OBJECT! and MODULE! which you mention. ‘Context’ feels like a good name to me.

This feels like it’s getting closer to what R does — where ‘environments’ (its name for contexts) are first-class values, which inherit from each other. It doesn’t give ‘inheritance powers’ to other types, but that’s mostly because it has very few datatypes in the first place.

hostilefork · March 8, 2024, 7:46pm

BINDABLE is wrong, but thinking about this in the source and being fully literal about it, the superclass could be called "Binding":

 Binding* binding = Cell_Binding(cell);

And that would give you back something you knew to be a type that can serve as a binding.

ANY-BINDING?.. perhaps.

hostilefork · September 21, 2024, 12:44pm

hostilefork:

I had sort of a naming philosophy (which I still have) that you shouldn't name the category based on one of its instances like this. Several reasons, but one is that it certainly confuses the implementation when you extract the pointed-to entity out of a cell:
 if (Cell_Type(cell) == TYPE_ERROR) {
     Object* o = Cell_Object(cell);  // wait, but it's an ERROR! not an OBJECT!
     ...
 }
You could call it AnyObject, I guess.

FWIW, I realized there's already a name for this as far as the implementation goes.

I've been calling the the "Flex" type that represents a list of variables a "Varlist".

That's slightly contentious with the new meaning of ANY-LIST! in the system (Array + Index + Binding)...and it seems moreso if you write it out as the less weird looking VarList.

VarArray? VarFlex? VarTable? Table?

Meh. VarArray is too long. VarFlex is accurate enough and isn't awful. But I don't know if VarList will really break anyone's brain who's reading the implementation for its use of the word "List" in a compound term.

if (Cell_Type(cell) == TYPE_ERROR) {
    Varlist* varlist = Cell_Varlist(cell);  // [1]
     ...
}

if (Cell_Type(cell) == TYPE_ERROR) {
    VarList* varlist = Cell_VarList(cell);  // [2]
     ...
}

if (Cell_Type(cell) == TYPE_ERROR) {  // [3]
    VarList* varlist = Cell_Varlist(cell);
     ...
}

Oddly enough, I kind of like [3], asymmetrical though it is. I guess I don't care for the mixed casing in a component part of a function name in the underscore separation convention, but I do like the mixed casing in the datatypes.

So then there'd be acessors like Varlist_At(varlist, index) (CTX_VAR), Keylist_Of_Varlist(varlist) (CTX_KEYLIST), Varlist_Lookup(varlist, key) (hm, may not exist today...all lookups need to be done by Cell to integrate frame phasing).

(Note to @bradrn if reading C sources: naming of things is obviously one of those "things that's evolving", you still find a lot of R3-Alpha-ism of "everything is caps, starts with REB, and literacy is avoided". Where you find it the most is in places I haven't made firm decisions (e.g. VAL_TYPE(...), I still don't know what a "type" is). But overall Ren-C has pushed readability forward by leaps and bounds.)

Anyway, this doesn't solve the question of what the usermode exposed name is of these things, but it may be that ANY-CONTEXT! could actually subsume Specifiers and Objects and Modules and Errors etc. if done correctly. It might not be necessary to call out any particular context type as "fundamental", just some contexts don't inherit from any other.

Would that mean a LET! is a context, that might inherit from a FRAME!, that inherits from a MODULE!... and every potential step in the chain has a reified user type? And they're all ANY-CONTEXT?, which can act as a "specifier" (then called a Context* in the implementation?) It would be a rather literal exposure of the mechanics, but maybe being blunt and doing it that way is the right answer. Though it could limit optimizations.

But I do think things are pointing to using "Context" to refer to this general concept of what is today called a "Specifier"...which was always a placeholder of a name.

Maybe it's really just the case of cracking Context* further open to no longer being restricted to VarList*. So Context* could be a Stub* that is a LET chain element. And routines that "speak context" need to account for the little stubby weird specifier elements that currently aren't exposed in usermode. So a routine like Context_At(varlist_or_stub, index) would have conditional code in it to tease out the variable from the stub as being at "index 1". And then you could put that stub in a LET! and have it be an ANY-CONTEXT?...

In such a world, pointers that were currently Context* would bifurcate out into Context* when it was speaking abstractly about "anything you can look up stuff in", and VarList* when it specifically is an array of variables. (Although today modules conflate with varlist, when they are really a "Sea Of Words"... VarSea* ?)

hostilefork · September 27, 2024, 7:04pm

So going in this direction looks good. Specifier is gone, and now there's a sort of fluidity to use of the terms "Binding" vs. "Context". But there's only one datatype--Context--it's just that sometimes it makes sense to name a variable binding.

e.g. fundamental lookups of words pass in (word, context) and not (word, binding). Binding is more for when you are talking about contexts that are poked into certain places, like when they're in a BLOCK!

I'm not sure about the semantics in this world with expanded reified context types. If I do a FOR-EACH on a LET!, does it see one variable or all the variables in the inheritance chain?

Regardless of the default you need some way to do both. I'm not exactly sure how enumeration is going to work. If you have a LET! of an object that inherits from a FRAME! and the frame and object both have an X, how will the implementation suppress seeing the X twice?

Will keep working on it. But glad to be moved away from the "specifier" term.