Ugly Types: Less Ugly Than History, Can We Do Better?

bradrn · February 21, 2024, 12:59pm

(Apologies, this post has ended up somewhat long and rambly. TL;DR: we should think much more carefully about how useful TYPE OF really is in practice.)

In trying to sort out my thoughts on this topic, I’ve come to think that the key question we should be asking is: precisely what do we want to use types for?

Starting with the most basic things, one very important use is within the interpreter itself. This is the HEART_BYTE, which (as I understand it) defines how to interpret the bytes making up a Rebol value. (Previously @hostilefork has called this the ‘kind’.) Obviously, this is vital to making the interpreter work. It’s also fairly limited, albeit less now than it used to be.

A second usecase is if you have some arbitrary value and want to find out what you can do with it — what Rebol calls TYPE OF. In most dynamically-typed programming languages, including historical Rebol, this gives you back the internal interpreter type… but there’s no reason it couldn’t yield something more generic, as we’ve been discussing.

A third usecase is if you want to match a value against some criterion. Rebol highlights this quite prominently: in average Rebol code, types are used most frequently to establish preconditions for function arguments. They’re used similarly in PARSE, amongst other places.

(In other languages, the most prominent use of types is to enable static analysis during compilation. This is an extremely useful capability, and in many modern languages, the type system is explicitly designed to make it tractable to check as many properties as possible before the program is run. But Ren-C isn’t compiled, and Rebol more broadly isn’t hugely amenable to static analysis anyway, so this isn’t a concern for us at all.)

Most dynamically-typed languages cover all three of these usecases with a single notion. Each value is stored alongside some type descriptor, which is returned when the programmer asks for typeof(value) (or whatever it might be). Then, you can check that against another type using an ordinary if expressions, same as checking any other condition.

Historical Rebol took much the same approach. It has a fairly unorthodox implementation of supertyping (using typesets), but otherwise, there’s one notion of ‘type’ which covers all usecases. The main wrinkle is that dialects can use special syntax for matching against types, most notably in function parameters.

Ren-C has already diverged from this approach, by recognising that ‘things you can match against’ is a broader category than ‘things the interpreter needs to know about’. Thus, it’s gradually extended the language to accept functions (a.k.a. ‘type constraints’) in places where it previously only accepted types. We’re now at a point where all type-like things, aside from the primitive ‘kinds’, are consistently represented as functions. And I think we’ve agreed that this is a good idea. By separating ‘types the interpreter knows about’ from ‘types we match against’, we free up the interpreter to support a lot more basic types, while giving function definitions a greater ability to express arbitrary preconditions.

But of course, that doesn’t cover all the places types pop up in Rebol. They also appear as the return value of TYPE OF… which, I think, is where our disagreement lies. I’ve been leaning towards unifying it with that idea of ‘types we match against’, meaning that users only have to deal with a single notion of ‘type constraint’. On the other hand, you want to make it a more structured system, focussed around those primitive types known by the interpreter.

However, thinking along these lines has led me to pose a slightly different question: how does TYPE OF get used in practise? I think the answer to this question should significantly influence what we choose it to return.

As a first step towards answering it, I did a quick search of the Ren-C source code. As far as I can see, it’s not used very It looks often. Indeed, I can only find three occurrences:

Two in UPARSE, where it’s used in type-block! combinator to test a value against a type in an if expression. In my opinion, this should really be replaced with MATCH, allowing it to deal with filter actions as well.
One in test/datatypes/varargs.test.reb, where it’s again used to match a value against a type, albeit in a significantly more convoluted way which I don’t understand.

(For comparison, when I search for MATCH, I count >40 occurrences in the mezzanine alone.)

At least to me, this suggests that TYPE OF is of significantly limited use in actual code. I take this as a sign that we shouldn’t waste our time thinking up elaborate schemes to encode information in its return value… rather, we should just make it as simple as possible.

Along those lines, maybe TYPE-BLOCK! isn’t such a good choice for its return type after all, and it should be returning a single TYPE-WORD!. On the other hand, that doesn’t work so well with my conviction that we should only have one variety of TYPE-*. I feel sure that there’s some better design waiting to be discovered for this.