PATH! and TUPLE! compression, explained

hostilefork · January 11, 2019, 5:13am

Since I ever saw them, I wanted to unify PATH! and REFINEMENT!

It did not make sense to me that there was foo/bar and /bar but not /foo/bar or bar/
I pretty quickly landed on the idea that if you saw something like /bar it would be a 2-element PATH!, where the first element was simply BLANK!.

As simple as that sounds, changing to this design the way R3-Alpha was written would make the PATH! form of /bar cost SEVEN TIMES as much!!!

When REFINEMENT! was an ANY-WORD! class it cost 4 pointers resident in a block
- That's 16 bytes on 32-bit platforms, 32 bytes on 64-bit platforms
The initial implementation in NewPath cost those pointers plus 8 pointers for a series stub plus a pool element that was 4 units long of 4 pointers each...to be big enough for 2 elements and a terminator.
- That's 112 bytes on 32-bit platforms, 224 bytes on 64-bit platforms

Not only do you get a factor of 7 cost increase, you now have 3 separate locations in memory to jump around at to process what you have (in addition to the locality issue of having to go find the symbol itself, which WORD!s had to do as well).

In the "keeping it Amish" spirit, it felt like too much when the difference between /FOO and FOO jumped like that. It only takes a few factors of 10 to get to the kinds of software situations we see today. :-/

Before Going Further: Let's Recap CELLs and STUBs

For those who've looked at Rebol sources, you know that a Rebol cell is the size of four platform pointers.

Ren-C keeps the general gist, with some important adjustments. It breaks down as:

header: one pointer-sized integer (uintptr_t). This tells you the cell's type (e.g. REB_BLOCK, REB_INTEGER), among other things. Only 32 bits are used of this (operated on through uint_fast32_t alias field, in case 32-bit masking operations are faster on 64-bit platforms than a 64-bit uintptr_t). This allows the system to function identically on 32 and 64 bit systems...though the extra 32 bits could be applied to some kind of 64-bit optimization purpose.
"extra": one pointer or pointer-sized integer. It's a union, and which of the union fields is active depends on the type of the cell. For "bindable" types, this holds a binding...and that's a fairly deep topic. But if a type isn't bindable--let's say a MONEY! or a DATE!, then it can just use this for some spare bits to help put more data in the cell without needing to do a dynamic allocation.
"payload": Also a union that depends on the type of cell, but this one is the size of two platform pointers. That makes it sufficient to hold something like a 64-bit floating point number on 32-bit platforms. It comes after the "extra" on purpose--so that as long as the cell is on a 64-bit boundary, then on 32-bit platforms this payload will be on a 64-bit boundary as well. (That's important.)

Beyond just alignment, there's a lot of nuance to cells, and staying on the right side of the standard. Being able to assign different datatype payloads but still use generic routines on the binding, being able to assign the header in one instruction without breaking strict aliasing. There's comments in the code if people want to go down a rabbit hole.

But the main thing to take away is that you're not paying a catastrophic cost for something like a 64-bit integer in a Rebol array. It's 2x the size it would be otherwise. Fatter than a low-level C type, sure...but all the bits are right there with good locality...you don't have to go through a dereference to some malloc()'d entity. Not much of a big deal.

Despite Optimizations, Arrays Cost Notably More

When we try to understand the difference between [1 2 3] and [[1] [2] [3]], just how much of cost is that in bytes or processing overhead? If you're designing a dialect, should you fear BLOCK!s, GROUP!s, and PATH!s?

Well, when you've got that cell and the header says it's a REB_BLOCK, the "extra" field is used for stuff I'm not going to explain here. But the Reb_Block_Payload union contains to two things: a series stub pointer and the index that series value has in the block.

Series stubs are fixed-size tracking entities. Though they're pretty small, they're still eight platform pointers. To get to the stub from the cell you have to hop through the pointer to another memory location, and so that's going to affect caching.

If you have a block of length 1 or 0, then Ren-C has a very neat optimization called "singular arrays". The payload for the array lives directly in the series stub. Careful mechanics allow this to work, not breaking any alignment or aliasing rules, and even building some safety in via bit patterns to help catch attempts to wander off the edge of the data into neighboring stubs.

So in this case--if you're lucky--you've gone from taking 4 platform pointers for just the REB_INTEGER cell, to a REB_BLOCK cell of 4 platform pointers...and a series stub of 8 pointers. 3x the size for [1] vs. just 1.

But let's say instead of a block, you were doing a PATH!, like the proposed 2-element path for /refinement (a BLANK! that's not rendered, and then the WORD! "refinement"). What would a 2-element array cost?

You've still got the 4 pointer cell and the 8 pointer series stub. But now you need a dynamic allocation to hold the 2 cells, so that would be 8 more platform pointers.

Note: It would have needed to be 4 cells when a terminator was needed...but the terminator no longer applies! So arrays of length 2 can really just use 8 pointers in their allocation now.

Grand Total: 4 + 8 + 8 => 20 platform pointers...for something that took only 4 before! So [1 1] is 5x as big as 1, and on a 64-bit platform we're talking 160 bytes. For a refinement like [_ refine] that's not even counting the storage for the UTF-8 name of the refinement and overhead for that...this is how much it costs just to point to it!

I'm neglecting all the overhead that dynamic allocation systems have to keep for their internal bookkeeping. Plus, remember that locality...spreading this data out is about more than just how many bytes you use, it's where the bytes are.

Major Realization: PATH!s and TUPLE!s Should Be Immutable

Once I realized paths and tuples could be immutable, it began to make sense that they could avoid the problem of being like arrays.

I called them "sequences", and dropped the assumption that every sequence had an array backing it. Instead, some would...and some would not.

So you wouldn't ask for a pointer to the head cell of a sequence directly. Instead, you would ask for the Nth element of a sequence...and maybe you would get a pointer to a cell that was part of an array. Or you could get a pointer back to a temporary cell you passed in where the item would be written.

Then, something like a refinement-type PATH! would not point at an array stub, but to a word spelling stub. It's possible to tell the difference because in Ren-C all Stubs have "flavor bytes" (which are a parallel to the heart byte in a Cell that tells you if it's a BLOCK! or WORD! etc.)

Since 1-element paths are illegal, we can assume that if a path points to a word spelling it must represent either /a or a/. The distinction is encoded with a single bit in the cell.

It's All Been Working Great

Immutable PATH! and TUPLE! are clearly good, and they permit this optimization.

This has tied down some of the more egregious ambiguities in PATH!, making it a solid and reliable part for dialecting--something it has certainly not been in the past. If we can kill off things like a/:b: in favor of a/(b): and pay no more for it, we very well may should.

hostilefork · November 22, 2023, 8:40am

Almost 5 years ago (!) I optimized PATH! and TUPLE! that looked like /foo (or foo/, or .foo, or foo.) to cost no more than the R3-Alpha REFINEMENT!... at 4 platform pointers.

Without the optimization, the 2-element paths cost 20 platform pointers:

The optimization is great, but you can't apply that trick to foo.bar or foo/bar (or foo/[bar] etc.) Without the blank, you need two cells.

So I had an idea: to reuse the mechanic behind PAIR! that manages to store two cells and skips the array node. So it's 4 + 8 = 12 platform pointers, instead of 20 for these cases.

It was very close to being implemented, and while going over the code I realized that all the disclaimers about it not being implemented were taking up more space and concern than just going ahead and writing it. All the hard work was done--and powering the relatively-few PAIR! that I ever use.

How Much Does It Save?

I don't know how typical the boot code is, but if you boot the system there are 1,648 of these 2-element PATH!s/TUPLE!s being created in memory.

So on a plain boot of a system, it's saving 1,648 * 8 = 13184 platform pointers. On a 64-bit system that's 105,472 bytes (half that on 32-bit)

It's not a "ton" (though 105K here, 105K there, eventually you're talking megabytes). But it could provide another benefit in the form of locality. It means that when something like block.1 is being processed, it can use the node in the TUPLE! directly, instead of needing to follow it to a dynamic allocation.

In practice, it seems the code as written right now is more or less breaking even. So running the code for detecting the new form basically equals hopping through to the array. Hence it's really just the memory benefit for now...though I imagine with some tuning that could change.

Anyway, 105K memory savings at runtime seems like the kind of thing that's worth it, so... it's in now.