The TYPESET! Representation Problem

hostilefork · September 27, 2022, 2:13am

To recap: It's a real sticking point that Rebol's historical typeset was tied to the idea of a maximum of 64 types, simply checked by a bitset for each of those things. That is a very limited concept. Every Rebol clone has lockstep copied this model--despite sometimes making overtures to the existence of extending types. (and I point out how R3-Alpha's UTYPE! was complete vaporware in every aspect)

There's some fairly different characteristics in play. e.g. when you have ANY-VALUE!, if you load another extension datatype, the set of available types should grow.

This is where I've made the suggestion that ANY-VALUE! might need to just be a "typecheck-flavored function". So some form of function where you know the implicit meaning is to use it to match types.

Hand-waving what that sort of thing might look like:

any-even!: &(even?)

parse [2 4 6] [some any-even!]

Implicit here is that we presume that just referencing a function without that twist...e.g. parse [2 4 6] [some even?] would get you something different (currently, an error).

Seems Cool, But What's The Catch?

A problem with this direction comes down to what happens if you're allowed to change what the words you capture look up to. I cover this in the Survey of Redefining Datatype Words.

To sum up: Native code fiddles cell bits under the expectation that the type checking has been done correctly. If you can change what INTEGER! is defined as--and what the typeset consists of is [integer!]--the guarantees the native expected are broken.

We can harden things by reducing them. So instead of the native storing [integer!] it stores [&[integer]], which retains some amount of readability. But it's not what HELP wants to show, so it either has to reverse-engineer INTEGER! back or keep the source block separately.

If "typesets" are actually "typechecker-flavored functions" it's worse, because they become ugly action literals. Though there's a modern feature that can help with this being not quite as bad as it was: ACTION! symbol caching:

>> any-even!: reduce &(:even?)
== &(#[action! {even?} [value]])

So EVEN? symbol can still be in there. But again this isn't what HELP wants...it presumably wants to show ANY-EVEN!.

(I'll point the obvious that when using a typecheck function, it has to remove from consideration anything not a candidate in the component function's type checking. So if EVEN? only takes INTEGER! it wouldn't try to pass non-integers to EVEN?)

What About Just Locking The WORD!s Used In Natives?

I've written this up in the survey post as a possibility:

>> test!: integer!

>> foo: func [x [test!]] []

>> test!: tag!
** Error: Cannot modify TEST! word, locked for use in a type spec

But I pointed out that it's trickier than it sounds:

Survey of Redefining Datatype WORD!s

Though it would have to be a "semantically deep lock". For instance, if it were legal to say:
 my-types!: [integer! tag!]
 foo: func [x [my-types!]] []
...if the way types were interpreted was such that it would pull in types that were grouped in a block like that, then it would have to reach through and lock INTEGER! and TAG! too. This same "deep lock" notion would need to apply to any functions that might be used.

What About Recursive Typechecks?

People would be able to write things something like:

 types-one!: &[integer! types-two!]
 types-two!: &[block! types-one!]

This is just a case you'd have to catch. The typechecker would have to color the array nodes as it went.

(Saying this disproves the idea would be like saying you shouldn't make a spreadsheet with formulas because there can be cycles.)

Time Has Passed And I Haven't Had Any Better Ideas

Saving a typeset's definition as a list of words offers the desirable property that what's in a typeset is actually useful for HELP, and can be viewed meaningfully as "source".

If we argue that any word used this way has its meaning become locked, that doesn't mean the word can't have its meaning redefined in another context. LIB can define INTEGER! one way, but you can have your module define it another. You can call local variables INTEGER!. etc.

Locking permits optimizations. If a typeset says [integer! block!] and it can trust that the meanings of those words have been locked in the context, it could make some cache where it didn't have to look up the words.

I've been circumspect about making this jump. But besides committing to 64 types and a bitset forever, I just don't see another solution. Guess I'll give it a try.