Mirrored Type Bytes, Explained


#1

For those who’ve looked at Rebol sources, you know that a Rebol cell is the size of four platform pointers.

Ren-C keeps the general gist, with some important adjustments. It breaks down as:

  • header: one pointer-sized integer (uintptr_t). This tells you the cell’s type (e.g. REB_BLOCK, REB_INTEGER), among other things. Only 32 bits are used of this, allowing the system to function identically on 32 and 64 bit systems…though the extra 32 bits could be applied to some kind of 64-bit optimization purpose. {This post is regarding a proposal about how to use one of the bytes in this header, differently from how it has been used so far.}

  • "extra": one pointer or pointer-sized integer. It’s a union, and which of the union fields is active depends on the type of the cell. For “bindable” types, this holds a binding…and that’s a fairly deep topic. But if a type isn’t bindable–let’s say a MONEY! or a DATE!, then it can just use this for some spare bits to help put more data in the cell without needing to do a dynamic allocation.

  • "payload": Also a union that depends on the type of cell, but this one is the size of two platform pointers. That makes it sufficient to hold something like a 64-bit floating point number on 32-bit platforms. It comes after the “extra” on purpose–so that as long as the cell is on a 64-bit boundary, then on 32-bit platforms this payload will be on a 64-bit boundary as well. (That’s important.)

Beyond just alignment, there’s a lot of nuance to cells, and staying on the right side of the standard. Being able to assign different datatype payloads but still use generic routines on the binding, being able to assign the header in one instruction without breaking strict aliasing. There’s comments in the code if people want to go down a rabbit hole.

But the main thing to take away is that you’re not paying a catastrophic cost for something like a 64-bit integer in a Rebol array. It’s 2x the size it would be otherwise. Fatter than a low-level C type, sure…but all the bits are right there with good locality…you don’t have to go through a dereference to some malloc()'d entity. Not much of a big deal.

Despite Optimizations, Arrays Cost Notably More

When we try to understand the difference between [1 2 3] and [[1] [2] [3]], just how much of cost is that in bytes or processing overhead? If you’re designing a dialect, should you fear BLOCK!s, GROUP!s, and PATH!s?

Well, when you’ve got that cell and the header says it’s a REB_BLOCK, the “extra” field is used for stuff I’m not going to explain here. But the Reb_Block_Payload union contains to two things: a series node pointer and the index that series value has in the block.

Series nodes are fixed-size tracking entities. Though they’re pretty small, they’re still eight platform pointers. To get to the node from the cell you have to hop through the pointer to another memory location, and so that’s going to affect caching.

If you have a block of length 1 or 0, then Ren-C has a very neat optimization called “singular arrays”. The payload for the array lives directly in the series node. Careful mechanics allow this to work, not breaking any alignment or aliasing rules, and even managing to wedge an array terminator into control bits used for other purposes.

So in this case–if you’re lucky–you’ve gone from taking 4 platform pointers for just the REB_INTEGER cell, to a REB_BLOCK cell of 4 platform pointers…and a series node of 8 pointers. 3x the size for [1] vs. just 1.

But let’s say instead of a block, you were doing a PATH!, like the proposed 2-element path for /refinement (a BLANK! that’s not rendered, and then the WORD! “refinement”). What would a 2-element array cost?

You’ve still got the 4 pointer cell and the 8 pointer series node. But now you need a dynamic allocation to hold the 2 cells, so that would be 8 more platform pointers. but you also need a terminator cell since there’s no trick like the singular series to save on it…so you actually need 3 cells. And the pools only allocate in multiples of 2…so that’s 4 cells.

Grand Total: 4 + 8 + 16 => 28 platform pointers…for something that took only 4 before! So [1 1] is 7x as big as 1, and on a 64-bit platform we’re talking 224 bytes. That’s not even counting the storage for the UTF-8 name of the refinement and overhead for that…this is how much it costs just to point to it! (If you add another 1, e.g. [1 1 1], it’s still the same size because you can fit the 3 elements and the terminator in that 4-element memory block you allocated)

I’m neglecting all the overhead that dynamic allocation systems have to keep for their internal bookkeeping. Plus, remember that locality…spreading this data out is about more than just how many bytes you use, it’s where the bytes are.

My proposed solution: “Mirrored Type Bytes”

There’s clearly no generic way to contain an arbitrary cell-sized thing inside a cell without adding cost. It would be like zipping a zip file and guaranteeing it would always be smaller. :-/

But if cells were willing to sacrifice some of their header bits, those bits might hold a key to allowing a and [a] to both fit into a single cell…as long as that cell wasn’t also trying to be a 1-element array (e.g. [[a]]). The extra and payload bits would line up with the the WORD!, but the same cell bits could be seen in one light as a BLOCK!, and in another light as a WORD!. I suggest that this be done with two separate bytes in the header: the “real” type and the “mirror” type. If the real and the mirror ever don’t match, you know it’s a container, and the container’s single-element of content actually corresponds to the payload of the cell with the mirror type.

This sounds like a paradox. How can it have an answer to VAL_TYPE(…) which is both REB_BLOCK and REB_WORD (!) It would seem to have to be one thing or the other.

Answer: Use the type system. If you’re holding the pointer via a plain REBVAL* to the cell, have it say this is a REB_BLOCK. But if you’re holding another pointer type (call it a REBMIR for now), then a REBMIR* would refuse to answer VAL_TYPE()…you’d get a compiler error if you tried. Instead you’d have to ask for the MIRRORED_TYPE(). Then the game becomes figuring out how and when to flip your pointers back and forth between REBVAL and REBMIR.

Now one can see why I’m suggesting that all types hold the mirror byte; because you want to be able to do the flip to mirroring for any cell, even one that’s not contained. So the code is compiled such that every segment of code knows which type byte to look at, based on the REBVAL/REBMIR distinction of the pointer it has in its hand. It can all be checked and fixed at compile time.

Stating The Obvious (?): Only Works for Immutable Arrays

You can’t dodge the allocation of a separate entity outside the cell if you want multiple cell references to be able to make shared changes, or see changes made through one reference via another one.

If you try to apply this optimization on plain code, it couldn’t do what you’d expect:

 >> block: [a] ;-- one element, right on, optimize it right into that cell!
 >> ref: block ;-- now both referring to the "same block"
 >> append block 'b
 ** Script Error: series is const ;-- oops, oh yeah, I'll just use MUTABLE

 >> append mutable block 'b
 ??? ;-- how could ref ever see a change, if only cells are exchanged?

The LOCK process could go through and do some compression, and there could be primitives for making these immutable blocks if you wanted them. But they shouldn’t be the default.

Unless… they are loaded into a PATH!. If we got people on board with the expectation that not only are paths immutable, but embedded groups and arrays are immutable at least at the top level, then this could work.

I discuss how this could really tie down some of the more egregious ambiguities in PATH!, making it a solid and reliable part for dialecting–something it has certainly not been in the past. If we can kill off things like a/:b: in favor of a/(b): and pay no more for it, we very well may should.

I don’t actually quite know how to do this yet…

If you’re holding the pointer via a plain REBVAL* to the cell, have it say this is a REB_BLOCK. But if you’re holding another pointer type (call it a REBMIR for now), then a REBMIR* would refuse to answer VAL_TYPE()…you’d get a compiler error if you tried. Instead you’d have to ask for the MIRRORED_TYPE(). Then the game becomes figuring out how and when to flip your pointers back and forth between REBVAL and REBMIR.

Yeah, that. Don’t quite have it all worked out.

The refinement case is working, but taking PATH! out of ANY-SERIES! meant enumeration is much more limited in the system for them. So it wasn’t hard to just throw in an if() statement for the case. But there’s quite a lot more of:

 RELVAL *item = VAL_ARRAY_AT(block);
 for (; NOT_END(item); ++item) {
      blah_blah(item);
      blah_blah_blah(item);
 }

This can’t work with “unmirrored” containers, and the most basic way of thinking of it seems pathological:

 if (TYPE_BYTE(block) != MIRROR_BYTE(block)) { // mirror byte describes payload
     REBMIR *mirror = cast(REBMIR*, block);

     // these routines need to know to ask MIRROR_TYPE()
     blah_blah_mirror(mirror);
     blah_blah_blah_mirror(mirror);
 }
 else { // real array payload
     RELVAL *item = VAL_ARRAY_AT(block);
     for (; NOT_END(item); ++item) {

         // these routines need to know to ask VAL_TYPE()
         blah_blah(item);
         blah_blah_blah(item);
    }
}

Obviously you don’t want to do that, with two versions of every routine that takes a value! But you need to leverage the type system or there’s no chance of this not getting screwed up.

Complexity aside, it’s an idea worth pursuing. I know it’s possible, so the thing is to get it moved along into the striking range of feasible.