Case Insensitivity vs. Case-Preservation (can't have both?)

If however the word however is not at the start of your sentence, it is not written with a capital letter 'H' at the beginning, however it is the same word however as the other however that is however spelled with all lower case letters.

I consider the example A: 10 a: 20 to be a "case" of bad practice programming, using the same word (by coincidence this time a word only of length one).

One of the charms of Rebol is its case insensitivity where even PeopleThatWantToKeepCamels are welcome and treated the same as peoplethatwanttokeepcamels (who don't like hitting shift keys the whole time).

Sure this is the base behaviour, case sensitiveness is needed in some places, the language must be able to handle such cases.

So on case basis Rebolers can be very reasonable, but if Rebol starts to look like Java or ... then the cause of the rebellion is lost forever :-/

Lets hear some more opinions.

One of the first examples that I hit in trying out the change (besides residuals from my FAIL experiment) is that modules have been named with capital letters, e.g. system/modules/Event

I should point out that while we might tend to dislike mixed casing, if we allowed such things to be distinct it does open up pretty important space...and module names for top level scope are actually a pretty good example.

Event: import %some-event-module.reb

; you can refer to things like `Event.xxx` and still have local variables
; named `event`.

This may be an important direction, along with my suggestion that we might be able to couple ACTION! and OBJECT! in a way such that something like math [...] or math/ref1/ref2 [...] could invoke a function with refinements...while math.some-constant could be a field and math.some-other-function/ref1/ref2 arg1 arg2 could invoke another function.

Capitalizing datatypes is another concept that is popular in some languages, which might read better to some people's tastes:

make Object [x: 10, y: 20]  ; you have to hit SHIFT to get the "O"

make object! [x: 10, y: 20]  ; ...but you have to hit SHIFT to get the !

Anyway...having an open mind about casing may be necessary to be competitive in the limited space of words. There are only so many. :-/


Note: I'll also point out that if I really just wanted all-caps FAIL to act like fail, I could just say that specific thing...

FAIL: :fail

Important to remember that's available.


UPDATE: In a little less than 4 hours I was able to make the change and get a booting system, implementing the rule that PATH! access and SELECT+FIND still default to caseless by default when looking for keys. This means that if there are multiple cases of the same word, they just return the first.

The biggest cause for trouble on this to get the boot was in the headers, because it's typical to write Rebol [Title: {My Module}] and not Rebol [title: {My Module}]. The problem was that the default header object was defined with lowercase keys. So when you make default-header block it had things like the default title: {Untitled} and then the Title: {My Module} came later...meaning it was the untitled key that was found.

I changed the default object to use capitalized terms, but it shows the kind of issue that would come up. Sometimes case-sensitivity sucks, sometimes case-insensitivity sucks...but for binding (and hence object keys), case-sensitivity provides more flexibility. And there's a value to getting everyone on the same page for what case to use in their headers. :man_shrugging:

I'm not sure how deeply ingrained case insensitivity really is, but I myself always use the same case for all words meaning the same thing, and would only use a different case if I meant something different.
And, though Rebol should never look like Javascript, interoperability with JSON seems really important.

1 Like

To reiterate an earlier point: There's something inconsistent about saying case-preservation is important, but then systemically not heeding case. When an optimization caused a historical case-preservation to lose it, I called this bad:

>> obj: make object! [Some-Name: 10]

>> block: [some-name, some-other-name]

>> bind block obj
>> block
== [Some-Name, some-other-name]  ; ack, where'd my case go?!

I think it would be also bad if everything got lowercased automatically by the system:

>> block: [Some-Name, Some-Other-Name]
== [some-name, some-other-name]

SO...if we can agree both of those situations are bad...then why wouldn't we agree that this r3-alpha behavior is bad?

r3-alpha>> load/header "Rebol [Title: {My Title}]"
== [make object! [
        title: "My Title"  ; Hey, I said `Title:` !
        name: none
        type: none
        ; ...
    ]]

As is Rebol2's habit of going the other way:

rebol2>> print mold load/header {Rebol [title: {my title}]}
    make object! [
        Title: "my title"  ; This time I said `title:` !
        Date: none
        Name: none
        ; ...
    ]

It just goes to show you can't have it both ways. Case-Preservation and Case-Insensitivity are fundamentally at odds.

But this is unfortunate:

>> [code header]: load "Rebol [title: {my title}]"
>> header
== make object! [
     Title: "Untitled"  ; ... huh?
     Date: _
     Name: _
     ; ...
     title: "my title"  ; ... grrr.
 ]

If we tuned OBJECT! to use the same trick that MAP! does at the moment, it could error when you do a case-insensitive access:

>> select header 'title
** Object has different key cases for `title`, use SELECT/CASE

>> select/case header 'title
== "my title"

>> select/case header 'Title
== "Untitled"

Doing that test efficiently would require keeping track of if object keys have synonyms; so each object expansion would need to re-check that and update some bits.

FAIL was the only example I had of deliberately using case to "stand out". I found no others, and apparently I've always stuck to the capitalization in file headers.

The best way to maintain sanity might be to couple my "error if multiple cases exist by default" above with "force case to match by default." It could give intelligible errors:

>> header/date
** Object does not have `date` field, but has `Date`

But we can't let multiple cases break binding in a case-sensitive world, because then if you have multiple cases of the same word anywhere in the user context it would conflict.

>> o: make object! [Title: {Thing}]

; `Title` is now in the user context, because all words are bound into the
; user context *before* the code runs (and makes it a field in the object).
; If you are unclear on this point, re-read:
;
; https://forum.rebol.info/t/the-real-story-about-user-and-lib-contexts/764

>> title: "hello!"
>> title
** If this errors there's already `Title` then that's bad

So binding would have to be one of the /CASE tolerant operations, which makes sense in this concept.

We have some interesting options for saying "I mean it", like doubling up slashes in the path:

>> header/title
** Object has different key cases for `title`, use SELECT/CASE
    ; ^-- more than likely, this generates an "uh oh" and people would then
    ; look and say "why is there more than one case"

>> header//title
== "my title"
   ; ^-- could be a nice syntax for getting things case-sensitively out of
   ; MAP! as well

And I've already mentioned there might be nuances between . and /, though I don't want any nuances that make me less likely to use . because I think it is going to be my preferred field selector. So above I'd probably want header.title to error on the ambiguity, because that's the safer behavior, and say header..title if I meant there's a Title: too and I'm aware of that fact.

Altogether, I'm just about convinced about case-sensitive binding. There are some epicycles to deal with, but it's rather telling that I got a working system so quickly. We know somewhere that case-insensitive comparisons have to be offered, but I think you don't want it anywhere that comes into conflict with the ability to do case-preservation, which means object keys have to preserve case...hence binding itself has to be case-sensitive.

The use of UTF-8 seems a good reason alone for introducing more case sensitive behaviour. The rules around case insensitive comparison appear quite complicated and could cause more problems in the long run.

...for some definition of "behavior".

As with the "CR LF" => "LF" policy...my pitch is to take "strong bets" on trends that are going to be guaranteed to still be relevant, and put the costs of edge cases on those few who demand them.

Be sure to read over this thread regarding unicode normalization:

Thanks. The Unicode normalisation thread is a good read. What a mess! The é example with different behaviours in apps and filesystems is quite an eye opener.

Two more exceptions. :-/

Rebmu

I thought Rebmu would be unaffected, since the decoding of the mixed-case input just produces entirely lowercase tokens. There's no binding of anything uppercase involved..

but once case-sensitive identifiers exist, you'd have a hard time referring to them. Because they'd be broken up (MixedCase => m: ixed c ase)

Just prohibiting mixed-case identifier usage isn't any particular problem for this domain, though there could be some exception syntax (e.g. leading backslash, like \MixedCase meaning honor the case of the next word). It's not really a big deal either way.

CSCAPE

The templating language in CSCAPE uses a weird rule to decide if the result of a code insertion should be uppercased, lowercased, or left alone.

>> items: ["lowercase" "MixedCase" "UPPERCASE"]

>> cscape "Lowercasing $<second items>"
== "Lowercasing mixedcase"

>> cscape "Uppercasing $<SECOND ITEMS>"
== "Uppercasing MIXEDCASE"

>> cscape "Leaving $<Second Items> alone"
== "Leaving MixedCase alone"

While it's pretty weird, I think it's kind of clever, and it works with the domain. There could be of course alternate shorthands like $L<...> for lowercase and $U<...> for uppercase with the default leaving it alone. But it doesn't visually cue you quite as well when you're looking at what's being put together (frequently these are fragments of #define declarations in C or things like that, and it reads much better when the case of the splice cues you to what the result will look like).

Since it's starting from a string, you might say "well, then just do the detection of which it is...then convert the string to lowercase...and load it." However, that means you would also lowercase any embedded strings in the code.

The only thing this broke looked like:

cast(CFUNC*, ${"T_" Hookname T 'Class}),  /* generic */

Strangely enough, Hookname is an enfix function which pulls in the left hand side to build an unspaced full identifier name. You might think I could have just written:

cast(CFUNC*, T_${Hookname T 'Class}),  /* generic */

But as it turns out, if the class is NULL it wants the whole hookname to be nullptr (not T_Nullptr). Which is why I did it in this weird way.

That's the only case, and I fixed it by re-uppercasing the prefix. :-/ You're not really supposed to write an essay inside the CSCAPE escapes in the first place. It's being nice by letting you put a bit of code vs. just variables in the first place.

It's a strange application, and just converting the input to lowercase works. But I'm trying to inventory every place that I hit where case insensitivity was being leveraged somehow.

CR and LF

The character constants CR and LF were defined as uppercase. This is typical with their notations in ascii tables.

With case-sensitive binding, they either need to be referred to as CR and LF ... redefined to be lowercase cr and lf ... or have synonyms.

Not sure how I feel about this one. I'm so used to seeing it capitalized that I feel you lose communication ability if you force it to lowercase. Having synonyms feels a bit wrong. I kind of would go with wanting these to just be uppercase. Anyone else have opinions?

I notice that Red is only case insensitive for the 26:26 unaccented characters.

>> make object! [café: "Coffee" cafÉ: "Scones" Café: "Tea"]
== make object! [
    café: "Tea"
    cafÉ: "Scones"
]

As is Rebol 2 but not Rebol 3 (or Ren-C)


In terms of my own case insensitivity, I use initial caps for headers (including Rebol []) as it has a formality to it but would balk at having to use said caps in accessing that information anywhere in the script system/script/header/title. The other place I use it in a mixed way is representing HTTP headers: header-proto: make object! [Content-Type: "text/html"]—there are benefits when it comes to forming headers in such a way, but again, would feel icky to access them that way in paths: header-proto/content-type

This may be a parochial opinion, but I'd be fine with the 26:26 compromise.


As a side note, was just futzing with some JS code that capitalized its camel-cased class names and did not for it's derivatives.

class GreenThing {...}
greenThing = new GreenThing(...)

Whatever funkiness currently exists with associating binding with cased represention, it is not as bad as this.

1 Like

It's best if you phrase your preferences in terms of a list of tests with desired output (or definitely not-desired output).

I pointed out the problem of case preservation...where if you make an object which already had an opinion of case on its fields, then if your derived object uses different cases you seem to have these options:

  1. Consider the cases equivalent, and collapse the definition to use one of the cases

  2. Consider the cases not equivalent, and end up with keys for both.

  3. Raise an error that you're trying to mix cases of the same word...forcing the deriver to canonize their names to whatever the base used

But the way things are set up, #1 can really only easily collapse the definition to what the parent used. So if you go with this option, you lose what the derived case said.

Also, the idea of making mixed cases in an object illegal won't work with case-insensitive binding, because (for instance) the user context needs to allow you to have Foo and foo word instances bound into it. Which means #3 would have to be limited to only some class forming tools...as opposed to a rule for contexts in general.

This is why right now, we have #2... and hence multiple cases of keys.

I don't buy any of these arguments.
Also, case-sensitivity ruins HELP.

One line is not a rebuttal worthy of heeding.

The most comprehensive analysis of why case insensitivity might make sense for a language (despite not being a practice in pretty much ANY language that people use today) is written by me. In that analysis I did not address the tenuous relationship between case-preservation and case-insensitivity. In this thread I do.

Getting enough bits available in a word cell to do virtual binding at any level of efficiency--without increasing the cell size--is important. There are complex mechanics which might make it possible other ways than not storing a spelling variation pointer...they'll all have some cost, but the biggest cost is just complexity.

If case-insensitivity...something no other language gets itself involved in at identifier-level, especially in the unicode era--is so mind-bendingly important, it needs a strong and completely thought-out defense. Extraordinary claims require extraordinary evidence. Not "I have some idea stuck in my head from 20 years ago that seems it might be good in the abstract, but about 2 minutes to devote to defining it now".

Not any more or less than anything else. I'd argue the impact can be much less, as there is also at hand a list of alternate spellings of the same WORD! (formerly called "synonyms")--which could be acted on to say "did you mean..."

Anything is possible, but the points need to be committed to and analyzed. That means explaining and defending a position on the case preservation of keys which I've explicitly called out twice here.

Of course, one-line rebuttals are not worthy. I always intended to expand upon it.

Here are two significant points to start with:
(1) There are powerful, well-used, and significant computer languages that in fact are case-insensitive. Firstly, Pascal. Then, in no particular order, Fortran, Ada, Basic (most of them), and SQL. Some SQLs even go so far as to treat the data itself in a case-insensitive manner!!
(2) Case-insensitivity is NOT a language design issue. It is a human utility issue. Languages (and file systems!) that are case sensitive are plagued with hard-to-debug errors cause not by the language, but by how hard humans actually find it is to work within case-sensitivity constraints. In fact, I would venture to say that anybody who is comfortable working in a case-sensitive computing environment has spent YEARS bending their brain into that shape, so much so that they no longer even see it as a problem, and can construct (non-human) arguments as to how it is in fact better. In case you are wondering, I am such a person, though I am now trying to at least partially undo that error from my past.

Finally, here are two links that go into some detail (some of it not so relevant, sorry) as to why case-preserving case-insensitivity is important, including replies and rebuttals and demolishing strawman arguments. Please at least peruse them:
(1) OddThinking » The Case for Case-Preserving, Case-Insensitivity
(2) The USS Quad Damage (which is in response to the above link)
If you look carefully, you will even see a position on the case preservation of keys explained and defended, specifically, that if an object has key/value 'FooBar:7' then searching for key 'foobar' should match it and show the matching key/value pair as 'foobar:7'. I understand that this may be difficult to implement.

2 Likes

I've always liked case sensitivity in file systems (disclosure: I'm a long time Linux user).

Maybe I can attribute this to German being my native language, where all nouns have to be written with a capital letter, and morgen and Morgen are actually 2 different words.

Morgen = morning
morgen = tomorrow
(Though this is the only example I have).

I guess this makes case sensitivity normal for me.

And why should I write the same thing in differing casing? I wouldn't exchange letters as well.

1 Like

@IngoHohmann,

The reason you could find only one example is because it's not really an example, and in fact what you are trying to say is going on does not happen in any language, for obvious reasons when you think about it.

"Morgen" and "morgen" are the exact same word with the exact same (set of) meaning(s). When used as a noun it means "morning", and when used as an adverb it means "in the morning". Just like the Spanish "mañana", or the Afrikaans "môre", or in fact the English word "morrow", though it is an archaic usage now.

Ref: dictionary - Why are "tomorrow" and "morning" the same in German? - German Language Stack Exchange

As for why you would want to change case but still mean the same thing, why would anyone ever NOT want to do that? :slight_smile:

1 Like

I feel that if a programming language is case-insensitive, it becomes rather important to have a decent code editor to warn you of potential clashes.

... aNd when useD as a noUn it has TO BE writTEn with a capItAl m is all i'M saYIng.

BeCause IT doesn't maKe sEnsE.

It's important to distinguish the question of if people are given tools to make case-insensitive dialects easier to make, vs. is the whole underlying language itself case-insensitive.

When you make the language itself case-insensitive, you are saying that those who wish to use identifiers case-sensitively--to expand the space of names--cannot do so in the main language. I'd dismissed this as "not important" but the more I've thought about the harsh limit of words, the more it seems someone might need that space.

Anyway, I don't know that the cited Interweb posts make any arguments that really move the needle, and if anything seem more convincing that the language should be case-sensitive. I still think my writeup is far more compelling, if there is an argument for case-insensitivity as a best-choice to be made.

It seems that if anything, when you are using similarly-cased identifiers...you should be given an error, to standardize you on a canon form.

As a sidenote @gchiu had used uppercase to say REPLPAD-WRITE/HTML based on the argument that the HTML acronym is uppercased. (Why this spread to the REPLPAD-WRITE instead of replpad-write/HTML I don't know). But that is an instance where the desire for case-insensitivity seems plausible. But maybe it should be /HTML and force everyone to use the same canon all-uppercase spelling, instead of having everyone writing it differently. :shrug:

(But of course, as those who edit things like Apache configs would know, there's plenty of places where html and HTML are different in the computer world. So I think rather than a knee-jerk "A ha! See it's plausible! In a case" makes for a slam-dunk winning argument where all other angles must be dismissed.)

There are implementation tricks which would cost some performance and complexity to bring the case insensitivity back. But I want us to continue taking a good hard look at it, and so living in a case-sensitive Rebol-flavored world is a good way to find things, such as Graham's example, to include in a big picture explanation of why we pay that cost and what exact rules we need for it.

@rgchris's suggestion about only heeding the 26 ASCII characters in case is also something that we need to understand, e.g. if we don't use such an optimization, why we don't.

Canonizing the keys to is the only strategy that seems sane to me, if multiple cases are not considered an error. Lowercase seems the only option people would accept.

 >> obj1: make object! [FooBar: 10]
 == make object! [foobar: 10]

 >> obj2: make obj1 [FOOBAR: 20]
 == make object! [foobar: 20]

But is this a property of WORD! and OBJECT! only, for binding? If you make a MAP! do the keys act differently? Differently for words, or for strings?

If we give strings bindings, we might say you could round-trip WORD! => TEXT! => WORD! without losing the binding. Then if MAP! threw away case for words, you might get around it with string case sensitivity.

The question of string case sensitivity is a different one from words, and needs separate consieration.

I think this is a bigger fight to pick than most people realize, and the energy it takes to do it right is non-trivial. Ren-C is a framework for implementing anything that can be articulated coherently...there's just a lot of questions about that coherence.

I can do a trick parallel to the quoting to make it possible to store up to 3 spelling variants at word reference sites before needing to do an allocation to make the word reference sites bigger. We can presume this would be rare. But I still think getting the experience here is important.

After looking at a lot of codebases that named their types in UpperCamelCase, I've decided that I rather like it. It's done in Rust:

Naming - Rust API Guidelines

R3-Alpha's sources used all-caps and incorporated "REB" into the names, perhaps as part of an attempt to avoid collisions if working with libraries:

REBBLK *blk;  // a "block" (actually an array, as the series could be used by groups)
REBVAL *val;  // values

Ren-C has been reorganized and doesn't mix these internal names with external codebases due to how well the libRebol API works for writing extension code. I also put moved the asterisk to indicate the pointer as part of the type (more a C++ convention). So the internals look along the lines of:

Array* new_array;
Value* v;
KeyList* keylist;
StackValue(*) stack_value;

Whether Rebol code would be better if it went this way, I dunno.

foo: function [arg [Block Integer]] [...]
bar: make Object [x: 10 y: 20]

foo: function [arg [block! integer!]] [...]
bar: make object! [x: 10 y: 20]

Just wanted to mention that the internals have moved to UpperCamelCase for types in the C code, and it's a pretty radical improvement for readability.