Soliciting Ideas for a BITWISE Dialect

With the creation of ENBIN and DEBIN, I think I showed off how a Rebol approach of dialecting could truly make something easy that's painful in other languages (and a historical pain point in Rebol too, with how people had to wrestle with TO INTEGER! and TO BINARY! conversions).

Under the buzzword BITWISE I've had the notion that a little dialect specifically tailored to bit shifting and masking could be created. While I don't have a clue what it would look like, one thought I had was that instead of trying to come up with a name for AND and OR that's bitwise like AND+ or OR+ (or BITAND and BITOR), instead you'd be able to say something like bitwise [x and y].

But the idea is that bitwise would be so capable, that instead of grumbling about how it sucked that you had to type it in, that you would fall into a universe of fast binary functionality that would more likely than not be where you wanted to be if you were in the mode of writing bitwise math.

I know more about what I'd want this dialect to do (and do efficiently) than I know about what it would look like. Here's a list of "Magic" C snippets for doing binary operations. What bag of tricks might bitwise bring?

Should be able to do:

  • Shifting and Rotation (shift with carry of high bits around to low or vice-versa)
  • AND/OR/XOR/NAND/NOR/XNOR
  • Clipping (constrain to 31 bits came up today)
  • Most/least significant 1 bit or 0 bit (Nth most/least?)

I don't know that I'd rule out being able to do PARSE-like things at a bit level, maybe being able to scan bits from low to high and flip or operate on them as you went...injecting plain code to run as you go (also as PARSE allows?)

But it needs to be easy to escape out of. I don't know if that means that BLOCK! is used for grouping and then parentheses become plain code, or what.

I personally think bitwise [x and y] looks favorable compared with x and+ y, so I'll just start the conversation there. Anyone want to offer up some scenarios of how it might throw in power tools for the bit fiddler?

Might it always evaluate to a result which you can say how many bits you want, e.g.

bitwise/bits [x and y] 31

Make it a skippable parameter as well, to give you the alternative:

bitwise 31 [x and y]

Stuff like that. It likely would need to incorporate some of ENBIN and DEBIN's smarts to be useful.

What kind of controls might it have for overflow? Could it be done in a way that lets you do custom overflow handling if it returns NULL, but then has another value it sets which is the overflow out of the range you specified?

I wrote about my recent realizations that there just isn't the generic compatibility between INTERSECT/AND, or COMPLEMENT/NOT...that Ladislav had intuited. The bitwise operations and set operations come into contention.

Framed this way, I've concluded that you shouldn't be using INTERSECT on INTEGER!...or COMPLEMENT either. When you say BITWISE-AND and BITWISE-NOT you're talking about something different.

Simple Attempt At BITWISE Shows Dialect Quandaries

Needing somewhere to start with BITWISE, I thought about the simplest implementation I could imagine: just bind code into a context of bitwise operations and run it:

bitwise-ctx: make object! [
    and: :bitwise-and
    or: :bitwise-or
    xor: :bitwise-xor
    not: :bitwise-not
    <<: :bitwise-shift-left
    >>: bitwise-shift-right
]

bitwise: func [block [block!]] [
    do in bitwise-ctx block
]

With this, you could just say bitwise [(a xor b) and not c] and get the answer.

But something bothered me about this as a "dialect strategy". What if I had some code I wanted to escape out of the bitwise mode...the way you can just have a GROUP! of ordinary DO code in parse, where NOT meant what it usually did? It seemed any dialect based on binding like this would lose out on such mixtures. You could make an unbitwise [...] operator of a similar ilk, but how would it know what you had the bindings at to put them back? :-/ It would have to assume, and could guess wrong.

This made me think about the way TAG! is used in COMPOSE. What if you tagged parts of your code that weren't bitwise?

 bitwise <$> [(a xor b) and not (<$> if not setting [c] else [d])]

That's an idea, albeit an ugly one. Though if we were to take a cue from PARSE, then BITWISE wouldn't go through BIND at all...and not use DO...but do its own processing based on BLOCK!s for grouping, leaving the GROUP!s bound as they were:

 bitwise [[a xor b] and not (if not setting [c] else [d])]

This takes us away from the "easy" re-use of the evaluator. But it's hard to say how this example is any different from the desires of anyone who wants to attack a dialect. So it feels like there has to be a roadmap.

Bit Manipulation Is A Tricky Domain

Doing weird things like "switching from GROUP! to BLOCK! for precedence" in BITWISE is probably small potatoes compared to the big thoughts needed to make a "good" dialect for the purpose.

For instance: some recent experiences with C integer promotions reminds me that it's really risky to have a dialect that just guesses at things regarding the size. Consider this simple case:

uint16_t some_16bit = ...;
uint16_t another_16bit = ...;
uint32_t result_32bit = (some_16bit << 16) + another_16bit;

Here you are left shifting a 16-bit integer by 16 bits to take up the high bits of a 32-bit number, and adding it to another 16-bit integer for the low-bits. Seems about as straightforward as bitwise math gets, right?

Wrong. In C, when you left shift a type smaller than an integer that can fit in a signed integer, it becomes a signed integer. So it acts like if you had written:

uint32_t result_32bit = ((int)some_16bit << 16) + another_16bit;

If your some_16bit number had a 1 in the high bit, that 16 bit left shift could be shifting the 1 into the highest 32nd bit...used for the signed-int's sign. That is undefined behavior in C. So you have to explicitly cast it to an unsigned integer:

uint32_t result_32bit = ((uint32_t)some_16bit << 16) + another_16bit;

Not only is it a tricky domain, but when your implementation is written in C...you have to be aware of the pitfalls when exposing that functionality through an interpreter.

Examples Needed

I'm a bit unsettled that I can't even think of what a basic BITWISE dialect notation should be. If my example of bitwise [(a xor b) and not c] can't even be trusted as the notation to be implemented, what can? Can we trust:

 axb: bitwise [a xor b]
 nc: bitwise [not c]
 axbanc: bitwise [axb and nc]

...or is even that up for debate?

Stepping out of the domain of DO, there are possibilities of using TUPLE! and PATH! in creative ways.
To throw out one example...what if number-led decorators instruct the dialect to do casting?

 bitwise 32.[8.a xor 16.b]  ; 32 bit result, cast a to 8-bit, b to 16-bit?

The most intimidating thing about tackling such a dialect right now is the idea of trying to design something for a use-case we're not really ready for. It's a completely new thing with no customers. If you're trying to put something in the box, it would be better to have a set of prior attempts to look at what was good and bad about each...and pick the best ideas. Right now all we have is DO plus some infix operators.

So I'll kick this particular can down the street a bit, and move the INTEGER!/BINARY! behavior for bitwise operations to join AND+ OR+ and XOR+.

  • COMPLEMENT on INTEGER! and BINARY! will become NOT+
  • EXCLUDE on INTEGER! and BINARY! will be AND-NOT+

This will allow the set intersect/union/etc. operations on BINARY! to become series-based on ordered byte-collections again...removing that ambiguity.

In all of this, I've realized also that we need a form of SHOVE that works left, for turning enfix invocations into prefix ones:

 and -< arg1 arg2

This helps mitigate the need to come up with a prefix name for everything. It's a little bit ugly, but at least it's general. R3-Alpha used the name and~ for prefix AND...which had to be declared somewhere and wasn't a general method for making enfix prefix.

I was looking for arbitrary length integers and came across this page where the bitwise operators are called andb orb and notb.

As bitwise operators are less commonly used in general programming as far as I can tell, this would be acceptable for me too.
And when a lot of bitwise operations are needed, then it would be an option to present the bitwise dialect as an alternative.

I think is the easiest way to go and it adds only one letter b instead of the bitand or bitwise-and alternatives.