Where the Series Ends: Simplifying Out of Bounds Rules


Most languages with arrays have you keep the index separate. Languages that do something fancier typically abstract the process into "iterators"...which may be more complicated than simple integers.

But Rebol made a strange decision to fold an integer into its arrays and strings (the ANY-SERIES!). It is summarized by someone who didn't like it on the C2 Wiki page "Why isn't Rebol Popular":

>> s: "I hate this approach"
== "I hate this approach"
>> s: next s
== " hate this approach"

But this 'I' isn't lost.

>> back s
== "I hate this approach"
>> s
== " hate this approach"
>> s: head s
== "I hate this approach" 

Whether you love it or hate it, having a hidden index opens a tremendous can of worms.

If you ask around, few people would know the answers to semantic questions involving this. You get an inkling of how complex it is if you just COPY a series that's not at its head...the data before the index is not copied.

>> x: next "abcd"
== "bcd"

>> y: copy x
== "bcd"

>> back x
== "abcd"

>> back y
== "bcd"

This issue crops up everywhere. What about when you REDUCE a block into another? What about if you COMPOSE?

It also means ANY-SERIES! are actually iterators on data that can be mutated through other references. If you are pointing into a string or block at index 1000, and someone clears all the data out of that string, your value cell still holds the index at 1000. What should the semantics be?

Firstly: why does Rebol have this feature at all?

For such a weird thing you'd have to think it would be good in some way. And it does have several concrete advantages:

  • It cuts memory use by more than half for series + index. The index is slipped into an otherwise-unused spot in a Rebol "cell". That spot is the size of a platform pointer. (R3-Alpha actually had another spot available, but Ren-C utilizes this for "binding", which is how blocks representing function bodies can stay connected to the specific instance of a function invocation they represent, to give "specific binding" on the words nested underneath them). Storing an independent index would require another INTEGER! cell. But it's worse than that, because that cell would need to live in a variable--meaning there'd have to be a context key cell for it.

  • It means you don't need multiple return values to return a series and an index. Multiple return values aren't very streamlined today (though there is an idea on making them better floating around).

  • It reduces the amount of code you have to write. It's not just a matter of putting the series and index into the same variable for the storage size that represents. It's all the storage and processing for the units of code in the references. Where today you can just pass series, you would have to pass series index. The multiple return values classically would have required set [series index] some-function-returning-series-and-position.

So weird though it may be, it's pragmatic. The language would be pretty different without it.

Beyond saving space, there's no "magic" involved

There's not any kind of "strong theoretical basis" for Rebol's inclusion of an index in an ANY-SERIES! value. It has all the weaknesses of an independent integer index in a C/JavaScript/Python-type language. Sticking it in the value itself solves nothing.

So I think it's a bad idea to have the behavior be any different from if the index was being held independently.

This means series should be able to hold an arbitrary integer... negative, or past the end. back back back "abc" should take 3 steps back from the head at 1, to be at index -2. And it should take next next next to get it forward to the head.

Every operation that can be done on a series with its internal index should thus be a synonym for doing that operation on at series index with an external index. This means instead of defining two sets of behaviors, we only have to define one.

I propose that molding a series that is not at its head should just include the index. Something like:

>> mold series: "abc"
== {"abc"}

>> mold next series
== {2|"abc"}

FORM-ing can do as it does today. And the console can have some friendly way of printing the molded value that trims it for you, as it trims other things:

>> mold skip "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" 40
== 40|..."aaaaaaaa"

I don't know what FORM-ing things with indexes before the head should do, but the historical bias is to have that count as the whole series.

This seems important. It's a matter of acknowledging what this index represents, why it is there, and holding it to the same standards for acting like an index that is not in the series cell. When it comes to BigNums, we should be able to have that platform-sized-pointer link to a node if it has to, so this isn't creating some second-class-citizen INTEGER! datatype.