"HEART BYTES", Explained

For those who've looked at Rebol sources, you know that a Rebol cell is the size of four platform pointers.

Ren-C keeps the general gist, with some important adjustments. It breaks down as:

  • header: one pointer-sized integer (uintptr_t). This tells you the cell's type (e.g. REB_BLOCK, REB_INTEGER), among other things. Only 32 bits are used of this (operated on through uint_fast32_t alias field, in case 32-bit masking operations are faster on 64-bit platforms than a 64-bit uintptr_t). This allows the system to function identically on 32 and 64 bit systems...though the extra 32 bits could be applied to some kind of 64-bit optimization purpose. {This post is regarding a proposal about how to use one of the 4 critical bytes in this header, differently from how it has been used so far.}

  • "extra": one pointer or pointer-sized integer. It's a union, and which of the union fields is active depends on the type of the cell. For "bindable" types, this holds a binding...and that's a fairly deep topic. But if a type isn't bindable--let's say a MONEY! or a DATE!, then it can just use this for some spare bits to help put more data in the cell without needing to do a dynamic allocation.

  • "payload": Also a union that depends on the type of cell, but this one is the size of two platform pointers. That makes it sufficient to hold something like a 64-bit floating point number on 32-bit platforms. It comes after the "extra" on purpose--so that as long as the cell is on a 64-bit boundary, then on 32-bit platforms this payload will be on a 64-bit boundary as well. (That's important.)

Beyond just alignment, there's a lot of nuance to cells, and staying on the right side of the standard. Being able to assign different datatype payloads but still use generic routines on the binding, being able to assign the header in one instruction without breaking strict aliasing. There's comments in the code if people want to go down a rabbit hole.

But the main thing to take away is that you're not paying a catastrophic cost for something like a 64-bit integer in a Rebol array. It's 2x the size it would be otherwise. Fatter than a low-level C type, sure...but all the bits are right there with good locality...you don't have to go through a dereference to some malloc()'d entity. Not much of a big deal.

Despite Optimizations, Arrays Cost Notably More

When we try to understand the difference between [1 2 3] and [[1] [2] [3]], just how much of cost is that in bytes or processing overhead? If you're designing a dialect, should you fear BLOCK!s, GROUP!s, and PATH!s?

Well, when you've got that cell and the header says it's a REB_BLOCK, the "extra" field is used for stuff I'm not going to explain here. But the Reb_Block_Payload union contains to two things: a series node pointer and the index that series value has in the block.

Series nodes are fixed-size tracking entities. Though they're pretty small, they're still eight platform pointers. To get to the node from the cell you have to hop through the pointer to another memory location, and so that's going to affect caching.

If you have a block of length 1 or 0, then Ren-C has a very neat optimization called "singular arrays". The payload for the array lives directly in the series node. Careful mechanics allow this to work, not breaking any alignment or aliasing rules, and even managing to wedge an array terminator into control bits used for other purposes.

So in this case--if you're lucky--you've gone from taking 4 platform pointers for just the REB_INTEGER cell, to a REB_BLOCK cell of 4 platform pointers...and a series node of 8 pointers. 3x the size for [1] vs. just 1.

But let's say instead of a block, you were doing a PATH!, like the proposed 2-element path for /refinement (a BLANK! that's not rendered, and then the WORD! "refinement"). What would a 2-element array cost?

You've still got the 4 pointer cell and the 8 pointer series node. But now you need a dynamic allocation to hold the 2 cells, so that would be 8 more platform pointers.

Note: It would have needed to be 4 cells when a terminator was needed...but the terminator no longer applies! So arrays of length 2 can really just use 8 cells in their allocation now. :partying_face:

Grand Total: 4 + 8 + 8 => 20 platform pointers...for something that took only 4 before! So [1 1] is 5x as big as 1, and on a 64-bit platform we're talking 160 bytes. For a refinement like [_ refine] that's not even counting the storage for the UTF-8 name of the refinement and overhead for that...this is how much it costs just to point to it!

I'm neglecting all the overhead that dynamic allocation systems have to keep for their internal bookkeeping. Plus, remember that locality...spreading this data out is about more than just how many bytes you use, it's where the bytes are.

My Solution: "Heart Bytes"

There's clearly no generic way to contain an arbitrary cell-sized thing inside a cell without adding cost. It would be like zipping a zip file and guaranteeing it would always be smaller. :-/

But if cells were willing to sacrifice some of their header bits, those bits might hold a key to allowing a and [a] to both fit into a single cell...as long as that cell wasn't also trying to be a 1-element array (e.g. [[a]]). The extra and payload bits would line up with the the WORD!, but the same cell bits could be seen in one light as a BLOCK!, and in another light as a WORD!.

This sounds like a paradox. How can it have an answer to VAL_TYPE(...) which is both REB_BLOCK and REB_WORD (!) It would seem to have to be one thing or the other.

Answer: Use the type system. If you're holding the pointer via a plain REBVAL* to the cell, have it say this is a REB_BLOCK. But if you're holding another pointer type (call it a REBCEL for now), then a REBCEL* would refuse to answer VAL_TYPE()...you'd get a compiler error if you tried. Instead you'd have to ask for the HEART_BYTE(). Then the game becomes figuring out how and when to flip your pointers back and forth between REBVAL and REBCEL.

Stating The Obvious (?): Only Works for Immutable Arrays

You can't dodge the allocation of a separate entity outside the cell if you want multiple cell references to be able to make shared changes, or see changes made through one reference via another one.

If you try to apply this optimization on plain code, it couldn't do what you'd expect:

 >> block: [a]  ; one element, right on, optimize it right into that cell!
 >> ref: block  ; now both referring to the "same block"
 >> append block 'b
 ** Script Error: series is const ;-- oops, oh yeah, I'll just use MUTABLE

 >> append mutable block 'b
 ??? ;-- how could ref ever see a change, if only cells are exchanged?

The LOCK process could go through and do some compression, and there could be primitives for making these immutable blocks if you wanted them. But they shouldn't be the default.

Unless... they are loaded into a PATH!. If we got people on board with the expectation that not only are paths immutable, but embedded groups and arrays are immutable at least at the top level, then this could work.

I discuss how this could really tie down some of the more egregious ambiguities in PATH!, making it a solid and reliable part for dialecting--something it has certainly not been in the past. If we can kill off things like a/:b: in favor of a/(b): and pay no more for it, we very well may should.

Update: Not only are we on board, but immutable paths and tuples and such are working fine!

I just wanted to follow up here and mention that indeed, the system is allowing /foo and foo/ to be stored in single cells today... likewise with .foo and foo., and /[foo] etc.

This all hinges on immutability of paths and tuples. It also hinges on the type system in the C++ build being good enough to check all the trickery, because otherwise this trick would crash and burn pretty easily! But it doesn't.

We could get fancy and maybe find ways to store .foo/ and .foo. or //foo in single cells too.

Really the main priority was just not to let NewPath wind up coming in and creating a lot of extra overhead for something that already existed. Recapping:

  • When REFINEMENT! was an ANY-WORD! class it cost 4 pointers resident in a block

    • That's 16 bytes on 32-bit platforms, 32 bytes on 64-bit platforms
  • The initial implementation in NewPath cost those pointers plus 8 pointers for a series node plus a pool element that was 4 units long of 4 pointers each...to be big enough for 2 elements and a terminator.

    • That's 112 bytes on 32-bit platforms, 224 bytes on 64-bit platforms

Not only do you get a factor of 7 cost increase, you now have 3 separate locations in memory to jump around at to process what you have (in addition to the locality issue of having to go find the symbol itself, which WORD!s had to do as well).

So in the "keeping it Amish" spirit, it felt like too much when the difference between /FOO and FOO jumped like that. It only takes a few factors of 10 to get to the kinds of software situations we see today. :-/

Anyway, as I say it was a few things coming together...and heart bytes are part of it.

:heart: :heart_eyes: :heart_decoration:

2 Likes