User Defined Datatype discussion

Where could I find the discussion about new implementation of user defined datatypes ?

3 Likes

The new thing that has come on the scene isn't what I'd really call "user defined datatypes" as much as "extension defined datatypes". It's for C programmers to implement types like IMAGE! or GOB! with a DLL or statically linked module...without those being built in a-priori.

The feature's goal was to get past a historical property that limited Rebol to 64 built-in datatypes, which had to be named in the core interpreter and could not be changed or extended. Ren-C wanted to be much more modular...to avoid carrying the weight of things like GOB! to the JavaScript build (or a redundant IMAGE! datatype that was handled by a browser's canvas.) Then the web build could choose its own extension types, perhaps some kind of JAVASCRIPT-OBJECT! proxy or a CANVAS!, etc.

This was needed during the breaking the project up into independently selectable extensions--of which there are now 31. See the README.md for a few notes:

https://github.com/metaeducation/ren-c/tree/master/extensions

(At this time, the web build uses only JavaScript, Console, and Debugger.)

Implementation Details

A "value cell" in Rebol and Red are four platform pointers in size. Of these, the first platform pointer slot is used as bits for a "header". How the other three pointers are interpreted depends on a byte in that header...which was called the VAL_TYPE() in R3-Alpha (though Ren-C calls this the "cell kind").

Of this byte, only 64 of the states are used in R3-Alpha--and I believe Red. This was chosen instead of 256 in order to limit the number of kinds that need to be handled in a TYPESET! to 64 bits...making typesets small enough to fit in the rest of the cell. (That simplicity is nice for implementation, but points to a pretty big weak spot in the type system for more sophisticated purposes...which new designs will be needed for!!)

Ren-C's "extension types" is a primordial implementation of a strategy to reserve one cell kind to mean "this cell gives up one of its three non-header platform-pointer-units to be a pointer to its type". That allows an arbitrary number of these to be added. They can't pack quite as much data into their cells as the built-in types, since they only have two pointers instead of three to work with. But given that you can always point to some allocated data (and usually need to), it's not a big problem.

What the type pointer points at is an array of implementation functions for molding, generic dispatch ("action!"), comparison, etc. This pointer is made to be very similar in layout to the table for the built-in types like BLOCK! etc. So the performance hit doesn't affect the built-in types; the custom type just has a built-in type dispatcher in the master table that just does a little translation jump to use the table referenced in the cell instead.

Open Questions

Mechanically there are still several things to be addressed. One is that since there's only one cell kind for these custom datatypes--and all typesets are still 64 bits--if you say in your function's type spec that you take an IMAGE! it will also take a VECTOR!, and a STRUCT!... all extension types look like the same type to a TYPESET!. (I'm actually a bit skeptical of TYPESET! as a datatype and thinking what it does should come from maybe a generic interface that could be a function...so your type checks are actually generalizable, to where you could have a function that takes ANY-EVEN! number for instance...)

Also, exactly how datatypes will participate in a naming ecology is not known. Right now the theory is that they register via a URL!. That is to say that type of foo could come back as something including http://example.com/types/matrix. While that's a bit drawn out, one idea that came up in error IDs was that there might be a form of comparison function that lets you get as specific as you want about that... e.g.

>> /matrix ~ http://example.com/types/matrix
== #[true]

>> /types/matrix ~ http://example.com/types/matrix
== #[true]

I'll also mention the idea that type of would give back the new @[block] syntax; which would give them a distinct literal form, which has always been a problem. It may also be that kind of and type of are distinct, with the former giving back Rebol-structured word-based data, with the latter giving back some OBJECT!-like thing called a TYPE! which can answer meta questions about the type.

>> kind of 10
== @[integer]

>> type of 10
== make type! [
     name: "Integer"
     description: "A whole number, either positive or negative"
     minimum: ...
     maximum: ...
     (etc.)
]

So plenty still to worry about. But the first tier goal of being able to build variants of Rebol without GOB! or IMAGE! or VECTOR! or STRUCT! (or mentioning them in built-in type table), while still keeping all those features working has been achieved.

1 Like