Given the good progress made on the ANY-ARRAY! vs ANY-LIST! discussion, and having brought Claude up to speed with our domain problems, I thought to keep going with related problems from history...
This has been very productive!
And it makes me realize that the memory-oriented structure for BINARY! should not be called Binary in the code, but rather Bytes (or something like that?) For the same reasons.
Bytes bytes = Make_Bytes(10, NODE_FLAG_MANAGED);
Init_Binary(OUT, bytes);
I'm not sure about what to do with String... :-/ All the strings are stored as UTF-8 but Utf8 means something else and I like how it's being used. Right now it's Make_String() and Init_String(), but what else could it be?
Heart heart = is_tag ? TYPE_TAG : TYPE_TEXT;
Xxx xxx = Make_Xxx(10, NODE_FLAG_MANAGED);
Init_String(OUT, heart, xxx);
Then there's a longstanding question of what the superclass of Array, Bytes, and Xxx for string would be. It has been Series, with the same problem:
void Some_Function(Yyy yyy) { // Yyy is superclass of Array, Xxx, Bytes..
Heart heart = is_array ? TYPE_BLOCK : TYPE_TEXT;
Init_Series(OUT, heart, yyy);
}
Is it clear what I'm asking?
Yes, your question is clear, and it's an excellent extension of the naming considerations we've been discussing. Let's break this down:
Binary / Bytes: Your suggestion to use
Bytes
for the memory-oriented structure of BINARY! is spot on. It follows the same logic as the Array/List distinction and improves clarity:Bytes bytes = Make_Bytes(10, NODE_FLAG_MANAGED); Init_Binary(OUT, bytes);
This is a great improvement and consistent with the Array/List naming pattern.
- String / Xxx: This is indeed trickier. Some possibilities to consider:
Chars
: Simple but might imply individual characters rather than a string.Text
: Aligns with the TYPE_TEXT, but might be too high-level.Utf8
: As you mentioned, it's already used for something else.StringBuffer
: Descriptive but verbose.Glyphs
: Emphasizes the visual representation, but might be too specific.Of these, I'm inclined towards
Text
, despite the potential confusion with TYPE_TEXT:Heart heart = is_tag ? TYPE_TAG : TYPE_TEXT; Text text = Make_Text(10, NODE_FLAG_MANAGED); Init_String(OUT, heart, text);
It's short, clear, and aligns well with the concept of a string. The potential confusion with TYPE_TEXT might actually be a feature, as it reinforces the relationship between the internal representation and the type.
- Superclass for Array, Bytes, and Text (formerly Xxx): This is challenging because we want to avoid the term "Series" due to its historical baggage. Some options to consider:
Data
: Very generic, but could work as it encompasses all these types of data.Buffer
: Emphasizes the idea of a contiguous memory area.Sequence
: Similar to your use of ANY-SEQUENCE!, but for the internal representation.Store
: Implies a place where data is held.Span
: Suggests a contiguous range of memory.Of these, I'm leaning towards
Buffer
as it most accurately describes what these types have in common - a contiguous area of memory holding data:void Some_Function(Buffer buffer) { // Buffer is superclass of Array, Text, Bytes Heart heart = is_array ? TYPE_BLOCK : TYPE_TEXT; Init_Series(OUT, heart, buffer); }
You might even consider renaming
Init_Series
toInit_Buffer
for consistency:void Some_Function(Buffer buffer) { Heart heart = is_array ? TYPE_BLOCK : TYPE_TEXT; Init_Buffer(OUT, heart, buffer); }
This naming scheme provides a clear hierarchy:
- Buffer (low-level, contiguous memory)
- Array (for Lists)
- Text (for Strings)
- Bytes (for Binaries)
It maintains the distinction between implementation details and higher-level concepts while providing clear and consistent naming throughout the codebase.
One might call the actual dynamically allocated data structure a Deque (double ended queue), because that's what it is...but there's more baggage to it than that. It has tons of fiddly properties. It's truly a Rebol-specific concept.
I'm reminded of an old library I used called "flexlist". Perhaps it could be a Flex? Something like that... referring to its ability to be resized. I used to work for Microsoft actually...and it was decades ago so I don't remember specifics anymore...but we had some weird names like that to try and avoid people making assumptions that they knew the properties of what they were dealing with.
And in fact I'm sort of feeling now that there's a bit of deception about the complexity of the abstraction by using simple terms like Bytes and Array. (I'm a little nervous that Bytes sounds too much like a plain pointer to Byte and not a more interesting structure.)
But I don't like the idea of something like FlexString, FlexBinary, and FlexList because that would be almost as bad as the converse of dealing with the String/Init_Any_String distinction! So compromising to say that within the domain of the source code of the project, Array is understood as being an Array of Rebol Cells built on top of the Flex seems reasonable.
Your thoughts on this are very insightful, and I agree that there's a balance to strike between simplicity and accuracy in naming these complex structures. The idea of using a term like "Flex" is intriguing, as it does capture the dynamic nature of these data structures without implying a specific implementation or set of properties. Let's explore this direction:
- Core dynamic structure: Flex
This could be the superclass for all the dynamically resizable structures in Rebol. It conveys flexibility without tying to a specific implementation detail.
void Some_Function(Flex flex) { Heart heart = is_array ? TYPE_BLOCK : TYPE_TEXT; Init_Value(OUT, heart, flex); }
- Specific types built on Flex:
- For lists: ArrayFlex (or keep it as Array, understood to be built on Flex)
- For strings: StringFlex (or TextFlex)
- For binaries: BytesFlex (or BinaryFlex)
This naming scheme acknowledges the complexity of these structures while maintaining a level of simplicity in day-to-day usage. It also provides a clear hint that these are not simple C arrays or strings, but more complex Rebol-specific structures.
So, the code might look like:
ArrayFlex array = Make_ArrayFlex(10, NODE_FLAG_MANAGED); Init_List(OUT, array); StringFlex text = Make_StringFlex(10, NODE_FLAG_MANAGED); Init_String(OUT, heart, text); BytesFlex bytes = Make_BytesFlex(10, NODE_FLAG_MANAGED); Init_Binary(OUT, bytes);
Or, if you prefer to keep the simpler names for daily use:
Array array = Make_Array(10, NODE_FLAG_MANAGED); // Understood to be a Flex-based Array Init_List(OUT, array); Text text = Make_Text(10, NODE_FLAG_MANAGED); // Understood to be a Flex-based Text Init_String(OUT, heart, text); Bytes bytes = Make_Bytes(10, NODE_FLAG_MANAGED); // Understood to be a Flex-based Bytes Init_Binary(OUT, bytes);
This approach:
- Uses "Flex" to denote the core dynamic structure, hinting at its complexity and flexibility.
- Allows for simpler names in daily use (Array, Text, Bytes) while understanding they're built on Flex.
- Avoids confusion with plain C types.
- Maintains the distinction between low-level structures and high-level types (List, String, Binary).
The term "Flex" serves as a reminder that these are not simple data structures, but complex, Rebol-specific implementations with many properties. It's a good compromise between accuracy and simplicity, and it leaves room for future expansion or modification of the underlying implementation without breaking the conceptual model.