Hex-Valued Integer Literals: Likely Not In Ren-C

hostilefork · May 14, 2022, 4:35am

On an old Trello there was a card about standardizing the differences between R3-Alpha and Red...and a checklist with only one item:

Hex-valued literal notation (Rebol has none, Red used to use FFh, FFFFh, FFFFFFFFh), now using 0#FF

The motivation was for purposes of Red/System, mostly.

In Ren-C this doesn't seem like a priority. It has ISSUE! (TOKEN!) as a read-only data type that fits in a cell. Hence a systems-oriented dialect already has an efficient way to represent these values.

For instance: it's not a big deal if your assembler says [mov ax, #FE] in its source... if it's generating machine code.

Of course, an ISSUE! in it isn't the same from a metaprogramming sense as a slot with an INTEGER! in it. So you don't get the automatic advantage of every dialect that has INTEGER! support for a given slot working with a hex notation. But isn't that what COMPOSE is for...?

my-dialect [something-or-another 255]

my-dialect compose [whatever (debin [BE +] #FF)]

Having more than one representation for the same type is generally bad, anyway. Let's look at what Red does here:

red>> FFh
== 255

red>> F0h + 0Fh
== 255

If it was so important that it had to be encoded in source, why is it thrown away immediately?

It's something about Red worth knowing exists, but off the radar for implementing, methinks.

hostilefork · July 3, 2022, 12:42am

On this point, there's an option in Rebol to natively format binaries as base64 or base16 or base2:

>> system/options/binary-base: 64
== 64

>> #{DECAFBAD}
== 64#{3sr7rQ==}

>> system/options/binary-base: 2  
== 2

>> #{DECAFBAD}                   
== 2#{11011110110010101111101110101101}

This also requires making the scanner be able to LOAD such binaries. Of course, those scan rules have to be able to recognize invalid ones:

>> 2#{1101}
** Syntax error: invalid "binary" -- "2#{1101}"

Having the system.options.binary-base setting seems random and bad.

Who unilaterally would want all binaries they see or form in the system to be output as Base64 or Base2? If you want those notations you are likely in a special context to ask for them.

Concerning the LOADer with it then seems weird, too.

The same basic arguments would apply that I say elsewhere. By default:

>> 2#{11010110}
== #{D6}

>> append #{DECAFBAD} 2#{11010110}
== #{DECAFBADD6}

"If it was so important that it had to be encoded in source, why is it thrown away immediately?"

I think this is best done by letting people who want this convert it themselves.

If you want to design a format to say how to decode it, that's what dialecting is for. Imagine that we are in the world where ${...} is used for BINARY! and #{...} is a TOKEN! e.g. a read-only TEXT! (with small series optimizations making character and short string storage very efficient)

my-data: [
    <hex> #{DECAFBAD}
    <binary> #{1100}  ; don't even have to enforce byte boundarie
    <octal> #{01234567}
    <base64> #{3sr7rQ==}
]

You're a step away from having those as BINARY!, but you need routines to turn text encodings into BINARY! anyway.

One Feature Exception

The one place this makes a noticeable difference is when someone has a big base64 payload blob they want to embed in their script. They end up paying for a large in-memory TOKEN!/TEXT! they don't want...when what they really want is only its converted form.

I'm not terribly concerned about this cost. I'm more concerned that you can do it at all, and if you can then I guess I'd have to hear the masses clamoring as their top priority that it be faster-on-load as their biggest problem.

But... you could imagine this being abstracted more powerfully with some kind of "single-file-encap" tool, that let you throw attachments onto a script at the end. so you could say system.script.attachments.name and it would have already done all the unzipping and decompression for you. That would be cleaner and could be made faster.

To sum up: Base64 is certainly important functionality to encode and decode, but I'd rather see us focus on making it easy and pleasant to use that decoder on strings than complicate the scanner and console with these settings.