Experiences While Paring Down DATATYPE!

You likely know that DATATYPE! in R3-Alpha (Rebol 2, Red...) has a distinct "type byte" in the cell. So you can tell it's a different thing, even if not all representations show that:

r3-alpha>> block: reduce ['integer! integer!]
== [integer! integer!]

r3-alpha>> type? first block
== word!

r3-alpha>> type? second block
== datatype!

One way R3-Alpha has to see the difference is with MOLD/ALL

r3-alpha>> mold/all block
== "[integer! #[datatype! integer!]]"

But What's Actually in a DATATYPE! Cell?

This was the definition struct from R3-Alpha:

typedef struct Reb_Type {
    REBINT type;  // base type
    REBSER *spec;
    // REBINT min_type;
    // REBINT max_type;
} REBTYP;

So an integer to say what type it is (e.g. REB_INTEGER = 1, REB_LOGIC = 2, REB_BLOCK = 3 or whatever). Note that this in the payload of the cell, not the header...because the type in the header is REB_DATATYPE to say it carries a "datatype payload".

Who knows what the commented-out min_type and max_type were. But a remark says this payload is for a "Datatype or pseudo-datatype". We can guess these were for pseudo-datatypes as a way of specifying a range of REB_XXX numbers to implement categories like ANY-SERIES!, as an alternative to typesets (?)

The spec is actually an object, that comes back as the answer to SPEC-OF:

r3-alpha>> spec-of integer!
== make object! [
    title: "64 bit integer"
    type: 'scalar
]

This limited amount of information was built into the executable from the Rebol-format table table in %typespec.r.

You needed to use SPEC-OF to access these properties, but it could have been accessible with paths, e.g. integer!/title. And it might have had more interesting properties:

>> integer!/max-value
== 9223372036854775807

Ren-C Had Actually Added More To DATATYPE!...

Builtin types use a byte for the type in cells of the instances of values of that type. This byte would index a static table of handlers that implement the datatype. That would limit to you to 256 types...although for technical reasons the limit was actually 64.

Ren-C pushed toward an implementation of extension types. This made it so if that byte was a special signal of "CUSTOM", it meant the implementation cells for that type must surrender one of their 4 platform-pointer-sized slots for a pointer to the method table.

A cool aspect of this generalization was that two bits in the cells started being used to flag whether the 2 out of the 4 remaining slots in the cell needed to be marked by the GC. This generalized garbage collection to where a cell could be marked without having to customize the garbage collector for its type byte. It just had to speak in the currency of "nodes".

(Doing this in a way that honors C's rules about strict aliasing was a little tricky, but possible.)

All Redbols Conflated The Looks of DATATYPE! and WORD!

In lockstep, they all did it:

rebol2>> integer!
== integer!

r3-alpha>> integer!
== integer!

red>> integer!
== integer!

Since I have Boron built, I find it renames integer! to int!, but otherwise the same:

)> int!
== int!

)> type? int!
== datatype!

)> type? first [int!]
== word!

It seemed to me that this conflation couldn't possibly be the best answer. So I made Ren-C buck this trend to use the R3-Alpha construction syntax, because it was something that could LOAD back:

>> integer!
== #[datatype! integer!]

>> load "#[datatype! integer!]"
== [#[datatype! integer!]]

Rendering differently was good, but the specific different rendering wasn't all that palatable. And it wasn't showing it as any complex object.

There seemed to be two directions to go with this:

  • Accept DATATYPE! as some kind of alien complex type which has ugly rendering

  • Fit it into the lexical space somewhere.

If it was going to be in the lexical space, that would mean the type would likely be coming from a symbol-bearing type.

Paring It Down: Making DATATYPE! Hold Only A Symbol

Ren-C doesn't use integers to refer to symbols like R3-Alpha and Red. Instead, symbol table entries are series...like strings, which now hold UTF-8 data even when mutable. (Frequently they are compact strings, whose UTF-8 spelling data fits in the space where tracking information for an allocation would live if an allocation were needed.)

So WORD! cells use pointers to refer to their symbols. Despite that, some built-in symbols still are numbered. These symbols have space statically allocated vs. allocated as series from the heap, and they can be indexed by number quickly. Also, the symbols store their (16-bit) number in the series stub so you can go the other direction...from symbol to number.

The first 64 or so symbols are specifically chosen to be things like INTEGER!. This means if a datatype just stored a symbol, it's easy to use that number to index into the builtin-type-hooks table.

If you refer to extension types by some arbitrary symbol which isn't predicted in those first 64 symbols, then it would likely have no number at all. So with a DATATYPE! in your hand and that symbol, you'd have to look some extension type mapping table to get the hooks for that type.

But I mentioned that instances of extension types gave up one of their 4 cell pointers to indicate this table. And usually you need the dispatch table when you have an instance--not the datatype--so this isn't much of a problem.

For Now, DATATYPE! Renders With An &

One step removed would be to say that as today, the WORD-ending-in-! is a layer of indirection over concrete types that use a sigil like &. This has been pitched as perhaps looking like:

integer!: &integer

any-word!: &[word set-word get-word the-word meta-word]

is-even!: &(#[action! {even?} [value]])

But the details are still being worked out.

What I will say is just that I think going in the direction toward where DATATYPE! goes toward "just being a symbol" and having everything else looked up feels correct.

I don't think making DATATYPE! itself a complex object was a direction we wanted to go further in.

2 Likes