With the libRebol API, there is the question of what to call the function that makes a string, and its shortcuts. So far:
- Since UTF-8 is the default, I thought that
rebString()would take UTF-8 data as a
char*, and look for a '\0' byte to automatically decide how long it was.
rebSizedString()would take a
char*and a size parameter, counting in bytes.
- The UCS-2 variations I feel are probably best named with
Wbecause they are really likely to be used by Windows API programmers, where W is the settled-upon convention for the wide-character APIs.
rebStringW()is more succinct than
rebStringUCS2(). Since wchar_t is different sizes on other platforms, then if people want to read it as "the W is for Windows, vs. Wide" that is fine...because it really is UCS2, not wchar_t.
- Shorthands for making auto-releasing values would be like
rebS(), to easily create a temporary STRING! for things like
rebRun("print", rebS(cstring_utf8), END);and then not have to rebRelease it later, because the rebRun does so as it passes by.
All very nice, except one thing. I believe it would be better to call it TEXT! Why rock the boat? A few reasons:
Category as member of Category is bad
I've never liked the idea of naming categories and members of the category the same thing. Saying "STRING! is an ANY-STRING!" has a loop in it, just as "BLOCK! is an ANY-BLOCK!". Categories are supposed to be ways of talking about common properties, and when you name the member of the set the same as the set...you're sort of prohibiting anything being in the set.
Sentences that sound natural to me are things like "Unlike other languages, Rebol has several string types... TEXT!, TAG!, URL!..." Saying it has several string types, but then saying STRING! is one of them is awkward. Kind of like saying "FUNCTION is a function! that generates a function!"...you can only do this so many times before people think you don't know how to taxonomize anything.
Need a name that spans WORD!, TAG!, SET-WORD!, ISSUE!...
There has been great consternation over questions like "should ISSUE! be an ANY-STRING! or an ANY-WORD!". It changed from Rebol2 to R3-Alpha and chaos ensued. Fights over it raised a question in my mind: why aren't they all unified into one category?
Certainly they can have different behaviors in the evaluator. But what is it--really--that a REFINEMENT! has in common with a WORD! that it doesn't have with a TAG!? It can optionally carry a binding, like a WORD! can. But it just evaluates to itself, like a TAG! does. You can't append to a WORD!, but if it were unbound, might you want to be able to? You can't append to a LOCK'd or PROTECT'd TAG!, either...so its not like every string is mutable in the first place.
So I have a hope to figure out a grand unified theory that will pull all their implementations together. It would speed comparisons (all using UTF-8). It would cut down on unnecessary conversions (why can't I
copy/part skip 'some-word 5 4?). We've seen already that some people wanted issues to carry bindings even though they weren't evaluator-active...why not let a tag do it too?
But when they're all in the same category, what would they be called? I think STRING is a great name for the superclass. You can say "WORD!, TAG!, ISSUE!, TEXT!...these are some of Rebol's many string types...in the category ANY-STRING!"
TEXT! is more familiar for non-CS people
Certainly there's just sort of the question of going against the status quo. If all the other programming languages in the world say
"foo bar" is a "string", does one want to be the only language that seems to not do that (besides VARCHAR() in SQL, or other esoterics)?
But Rebol is different in thinking of string as a category of different parts. And that is a central aspect which should be reinforced wherever possible. If early on someone is exposed to the idea that string is a category and not a concrete type, they might quickly become more adventurous in thinking about how to apply the different concrete types. I think it's a good teachable moment.
Plus: to non-programmers, string is a ball of yarn. Float is something you do in an inner-tube. Carl's initial bias in Rebol led him to err on the side of common language, and chose DECIMAL! for floating point numbers...even if this would rub computer science people the wrong way. Because other languages, types named "decimal" tend to be when you want a fixed precision model of some kind. It's specifically used to mean not floating-point.
Red pushed back on that and called the numeric type FLOAT!. So they're willing to rock the boat, but in the other direction...biasing to the CS term. I haven't really come to a conclusion myself about decimal/float, I see both sides. When pressured over it in the past, Carl was firm that he liked DECIMAL! and wasn't changing it.
But in this case, there's not as much CS culture controversy. TEXT! isn't wanted for other things, it's generally just a synonym for strings. Perhaps it would compete in a GUI as a shorthand for TEXTEDIT! or something, if it used the same type namespace.
It wouldn't break that much
Just like with the ACTION! renaming, you could say string!: text! | string?: :text? and be nearly unaffected. The name STRING! wouldn't be taken for anything else, just as FUNCTION! isn't. It would only be taken as the category name ANY-STRING!.
Pushing on the switch evaluative mechanics would make it fine to
switch type of x [string! [...] just the same as
switch type of x [text! [...]
So the biggest impact is probably on the C code for the interpreter itself. And as mentioned--the API.
In any case, there isn't really a particular rush, but I am having to think these things out. Because I was going to use
rebT as a shorthand for making temporary variables, when I remembered this change...which means that I'd want it for splicing text in.
So something to be aware of, and if anyone wants to protest it then they may.