Should Hexadecimal Be Lowercase?

hostilefork · October 7, 2019, 12:29pm

I had to reverse-engineer the changes in Visual Studio's XML project files from VS2017 to VS2019. I did this by having it run the upgrade process on importing the old project, and then looking at the diffs.

One unusual change is that Rebol used lowercase letters for the hex digits in UUIDs...and for some reason VS2019 decided that when it wrote them out it would make them uppercase. This made me wonder if there was any kind of standard. I found a StackOverflow answer saying that there is at least one international standard suggesting the canonized output form should be lowercase. And Microsoft is one of the riff-raff who tend to break it.

But it further suggests that for human readability, lowercase scans better for hexadecimal pretty much anywhere:

"6.5.4 Software generating the hexadecimal representation of a UUID shall not use upper case letters. NOTE – It is recommended that the hexadecimal representation used in all human-readable formats be restricted to lower-case letters. Software processing this representation is, however, required to accept both upper and lower case letters as specified in 6.5.2."

History clearly has a bearing on the favoring of uppercase for hexadecimal--considering that early programming languages only supported uppercase. Yet modern thinking for hex numbers like git commit IDs are pretty much always written out as lowercase. It really seems out there in the idea-o-sphere there is a concept that lowercase hexadecimal is easier to scan and read. People argue this in CSS for colors also.

One of the big sticking points in trying to scan hex optically with uppercase is that B and 8 look kind of similar, as do D and 0 (unless your font puts a slash through the zeros, which was a conscious choice I made in the font used by the Web console...as well as distinguishing 1 and l and I.) As a sample comparison, consider C68B80D with c68b80d.

Beyond just the readability aspect, there's also the typing aspect. Rebol code generally favors lowercase...and when you're doing data entry you don't have to hit the shift key.

Despite these perceived advantages of lowercase hex in the modern world, Rebol currently canonizes binaries as uppercase:

>> #{1a2b3c4d5e6f}
== #{1A2B3C4D5E6F}

The argument to not bias to lowercase would probably be "it's a black box, you're not supposed to look at it"...and so the difference is an advantage of making BINARY! look different, where you say "that's for the computer to read, but not me". Nice though that might sound in theory, I pretty often do have to look at the actual bytes in a BINARY!. For people who favor the school of thought that it should be de-emphasized, editor features like syntax highlighting might be the way for them to get that, while aiming for the most concrete legibility one can get.

Thoughts?

iArnold · October 7, 2019, 1:35pm

I always use lowercase binary.
and I replace all 0 with 2 because of the huge gap between 0 and 1 on my keyboard.

(Or I don't really care about upper and lower in this case)

Mark-hi · October 7, 2019, 1:57pm

rgchris · October 8, 2019, 6:48am

I think it should err on the side of human readability. Rebol as a format is intended to be human readable and hex is largely a human readable/perceptible representation of binary (not saying there isn't cognitive limits on that, but it can be useful learning how to interpret certain sequences/values in that notation).

As an aside, I use lowercase on my CSS hex colour values.

iArnold · October 10, 2019, 7:40pm

Yes both forms occur and are acceptable. Lower in css colors is better, caps in a HeX editor when looking at the HeXed version of the real binary form. Mac addresses are printed lowercase hex codes mostly.
Which looks better?
a2eb01cafedecafbad
A2EB01CAFEDECAFBAD
Or should it be grouped by 2?
a2 eb 01 ca fe de ca fb ad
A2 EB 01 CA FE DE CA FB AD