The TYPESET! Representation Problem

TYPESET! took advantage of 64-bit integers--and a limitation of 64 fundamental types--to fit 64 bit flags into a single value cell to represent a typeset. Besides the obvious lack of extensibility, this has a number of problems. One of which is that typesets render in a pretty ugly way...a seemingly simple concept like ANY-TYPE! expands out fairly monstrously:

r3-alpha>> print mold any-type!
make typeset! [unset! none! logic! integer! decimal! percent! money! char!
pair! tuple! time! date! binary! string! file! email! url! tag! bitset! image!
vector! block! paren! path! set-path! get-path! lit-path! map! datatype!
typeset! word! set-word! get-word! lit-word! refinement! issue! native!
action! rebcode! command! op! closure! function! frame! object! module!
error! task! port! gob! event! handle! struct! library! utype!]

red>> print mold any-type!
make typeset! [datatype! unset! none! logic! block! paren! string! file!
url! char! integer! float! word! set-word! lit-word! get-word! refinement!
issue! native! action! op! function! path! lit-path! set-path! get-path!
routine! bitset! object! typeset! error! vector! hash! pair! percent! tuple!
map! binary! time! tag! email! handle! date! port! image! event!]

People are used to typing in variables--like arrays or objects--and seeing what they look up to be a large amount of data. But a typeclass like ANY-TYPE! doesn't typically have this "explosive" character...and it impedes readability.

If you look closely you'll see another sneaking problem hinted at by R3-Alpha's never-implemented UTYPE! (for user-defined type). Imagining that it had an implementation, this would suggest that all user-defined types would be considered equivalent in a TYPESET!. If you took one user-defined type as a parameter, you would have to take them all, and do some filtering on it after the fact.

(In fact, extension types in Ren-C--which do exist--have this very problem. If you pass a GOB! to a native routine that expects a VECTOR! or a STRUCT!, it will currently crash. This hasn't really come up yet because few are working with those types. But it's something that a real answer to typesets would have to address, part of why I'm mentioning all this.)

This doesn't even touch upon the idea of "type-classes"...

e.g. if you decide to make a base object with something like book!: make object! [...] and later make book! [...], this "book!" is the kind of thing you might consider in some languages to be a class. You might want to write a routine like library-checkout: function [b [book!]] [...]. But there is no facility for this.

But..."Derived binding" in Ren-C set up some groundwork for understanding derivation. This was done for efficiency: to know the relationships in order to be able to forward references to base class members downward to the instance. That avoided needing to make deep copies of every member function of objects each time a new instance is made...just so those functions could refer to the variables in the derivations. Yet the relationship it had to encode to accomplish this could also be used as a type test to see if something came from a given type hierarchy.

So the mechanics are there...and it seems it would be cool to implement. But again, that depends on a notion of what a "typeset" actually is, which is the limiting factor.

And what about "type-tests..."?

Still another question comes along for tests that are basically functions. How about even-integer!, or block-2! where that's a block containing two elements? These seem very useful, though potentially dangerous if the function has side effects... leading one to wonder if there should be a PURE annotation for functions that promise not to have side effects, and that they take all their parameters as CONST and won't use any non-PURE functions in their implementations or look at any variables that aren't parameters that haven't been permanently LOCK'd.

I actually think maybe "type test" is the fundamental thing to be looking at, instead of some nebulous TYPESET! construct. If type tests can be held onto by name, and that name looks up a PURE function (approximated as a regular function with a pinky promise to make it pure in the near term, maybe), then maybe that is better?

This might point in a direction more like:

integer!: &[integer]  ; fundamental

any-value!: &(any-value?)  ; type function

You could then write a native whose implementation was something along the lines of:

 set-typeset: func [name [word!] types [block!]] [
      types: reduce types
      m: make map! length of types
      for-each t types [
          if not sym-block? :t [fail [:t "is not a datatype"]]
          m/(t): true
      set name func [t] [m/(:t)]
      return reduce &(name)

Then imagine you said something like:

>> any-scalar!: set-typeset 'any-scalar? [integer! decimal! ...]
== &(any-scalar?)

You'd end up with an ANY-SCALAR! definition that was not entirely inert value that looks up to a type-checking function. Internally to the system there could be optimizations of this... imagine a generic MAPCHECKER function dispatcher which could be recognized by the evaluator to not even need to call it, but could go straight to the MAP!.

2 posts were split to a new topic: The TYPESET! Representation Problem

On the parallel topic of DATATYPE! representation, there's now an answer...they're represented via symbols and the &[...] block.

It's important to remember that whatever this is, it just kind of gets the ball rolling. It's about seeing a comfortable answer for simple questions, that's not ambiguous:

>> type of 1020
== &[integer]

You can't realistically generalize this. A TYPESET! can't just be a & BLOCK! with more things in it. You're going to fall off the rendering cliff at some point, and just showing the address of it is probably about the best thing the console can show... to give you something back you can ask further questions of.

>> ts: make typeset! [any-series! integer!]
== #[typeset! 0xCDEF1BC4]

>> pick #[typeset! 0xCDEF1BC4] text!
== ~true~  ; isotope

I think the sooner one accepts that direction, the more we can reason about the usefulness of leaning on an existing value class that has basic unambiguous rendering for the "type atoms".

1 Like

Sorry, I don't buy that. By the same token you could stop showing binaries, blocks and strings in the console.
Showing the address may appeal to a C programmer, but I'd rather see partial information and a mark showing 'Attention this is only partial'.
Maybe types in a typeset are always shown in the same order, then it is less ambiguous which types are dropped in the display.
Furthermore the number of types won't grow indefinitely, so just showing the full list may not be that bad after all.

2 posts were split to a new topic: What If "DATATYPE" Was Isotopic (TL;DR: Bad Idea)

A post was merged into an existing topic: What Should TYPE OF an Isotope Be?

To recap: It's a real sticking point that Rebol's historical typeset was tied to the idea of a maximum of 64 types, simply checked by a bitset for each of those things. That is a very limited concept. Every Rebol clone has lockstep copied this model--despite sometimes making overtures to the existence of extending types. (and I point out how R3-Alpha's UTYPE! was complete vaporware in every aspect)

There's some fairly different characteristics in play. e.g. when you have ANY-VALUE!, if you load another extension datatype, the set of available types should grow.

This is where I've made the suggestion that ANY-VALUE! might need to just be a "typecheck-flavored function". So some form of function where you know the implicit meaning is to use it to match types.

Hand-waving what that sort of thing might look like:

any-even!: &(even?)

parse [2 4 6] [some any-even!]

Implicit here is that we presume that just referencing a function without that twist...e.g. parse [2 4 6] [some even?] would get you something different (currently, an error).

Seems Cool, But What's The Catch?

A problem with this direction comes down to what happens if you're allowed to change what the words you capture look up to. I cover this in the Survey of Redefining Datatype Words.

To sum up: Native code fiddles cell bits under the expectation that the type checking has been done correctly. If you can change what INTEGER! is defined as--and what the typeset consists of is [integer!]--the guarantees the native expected are broken.

We can harden things by reducing them. So instead of the native storing [integer!] it stores [&[integer]], which retains some amount of readability. But it's not what HELP wants to show, so it either has to reverse-engineer INTEGER! back or keep the source block separately.

If "typesets" are actually "typechecker-flavored functions" it's worse, because they become ugly action literals. Though there's a modern feature that can help with this being not quite as bad as it was: ACTION! symbol caching:

>> any-even!: reduce &(:even?)
== &(#[action! {even?} [value]])

So EVEN? symbol can still be in there. But again this isn't what HELP presumably wants to show ANY-EVEN!.

(I'll point the obvious that when using a typecheck function, it has to remove from consideration anything not a candidate in the component function's type checking. So if EVEN? only takes INTEGER! it wouldn't try to pass non-integers to EVEN?)

What About Just Locking The WORD!s Used In Natives?

I've written this up in the survey post as a possibility:

>> test!: integer!

>> foo: func [x [test!]] []

>> test!: tag!
** Error: Cannot modify TEST! word, locked for use in a type spec

But I pointed out that it's trickier than it sounds:

What About Recursive Typechecks?

People would be able to write things something like:

 types-one!: &[integer! types-two!]
 types-two!: &[block! types-one!]

This is just a case you'd have to catch. The typechecker would have to color the array nodes as it went.

(Saying this disproves the idea would be like saying you shouldn't make a spreadsheet with formulas because there can be cycles.)

Time Has Passed And I Haven't Had Any Better Ideas

Saving a typeset's definition as a list of words offers the desirable property that what's in a typeset is actually useful for HELP, and can be viewed meaningfully as "source".

If we argue that any word used this way has its meaning become locked, that doesn't mean the word can't have its meaning redefined in another context. LIB can define INTEGER! one way, but you can have your module define it another. You can call local variables INTEGER!. etc.

Locking permits optimizations. If a typeset says [integer! block!] and it can trust that the meanings of those words have been locked in the context, it could make some cache where it didn't have to look up the words.

I've been circumspect about making this jump. But besides committing to 64 types and a bitset forever, I just don't see another solution. Guess I'll give it a try.


What about some kind of solution like utf-8 where the ascii characters take up 1 unit and more exotic take up more units?