Mapping from Series => Series By Co-Opting The Key Series

hostilefork · March 6, 2024, 3:24pm

There was an unfinished idea in an old version of the interpreter. It related to how to deal with problems like trying to make a copy of a block, and make sure any series with the same identity are only copied once in the new structure, and point to that one copied identity.

Rebol2 did not have this behavior:

rebol2>> block: [a]
== [a]

rebol2>> original: reduce [block block]
== [[a] [a]]

rebol2>> append block 'b
== [a b]

rebol2>> original
== [[a b] [a b]]  ; both aliases see the append

rebol2>> duplicate: copy/deep original
== [[a b] [a b]]

rebol2>> append first duplicate 'c
== [a b c]

rebol2>> duplicate
== [[a b c] [a b]]  ; considered by many to be wrong: independent copies

This post isn't about whether that is right or wrong (and having such questions may seem to some as an indication of "this language is madness! get me to Haskell", etc. But as I've said this is the game we're playing here so we roll with it.)

But to not get independent copies, you need a way to map series nodes to copies you've already created...so you can consult that mapping before making new copies. And the direction that was being pursued by the old interpreter I am looking at was to actually do surgery on the originating series nodes, to alter them so they shifted out some of their content, such that they could be their own keys in the mapping.

Generally speaking, all the bits in a series stub are spoken for. So it would seem there's nowhere to stow a pointer to the new series you are creating in it. What the implementation was doing was pushing a 4 pointer cell on the data stack, writing one pointer's worth of information from the stub into that cell, then replacing that pointer slot in the stub with the stack index. Then it wrote the new series into the cell...so the cell contained one stowed pointer from the original series and one pointer for the new series.

This meant the original series was now in a "weird" state, that things like the GC had to know about and tolerate. Other operations looking for the missing information in the stub needed to be caught if they tried to get at it without following the stack index through to the stack cell.

Having the cells on the data stack meant it was not necessary to enumerate all the series stubs after a copy to "clean them up". Otherwise, I'd imagine it may be possible to make some kind of guarantee that for any series appearing in source, the union of the bits in the source series and the bits of the copied series can hold all the information necessary to construct two valid series... e.g. one pointer's worth of information is always redundant in those two copies. If you can get two pointers' worth of information redundant, the second could be used to chain a linked list as you go...removing the need for the stack cells to enumerate.

Though having the stack cells and no particular requirement of information redundancy in source series with their copies offers another benefit: being applicable for creating mappings that aren't copying-related.

Anyway, it was a little unfinished idea I ran across that I wanted to document. I'm cleaning up the bootstrap executable to refresh it with something that will help in the FENCE! migration, and mercilessly deleting any code in the bootstrap executable that does not specifically benefit bootstrap... to reduce the instability surface, speed things up, and make it easier to debug the 6-year old executable if worst comes to worst.