The TYPESET! representation problem

So DATATYPE! and TYPESET! are historically strange. DATATYPE! has a very obvious rendering ambiguity:

r3-alpha>> type? integer!
== datatype!

r3-alpha>> block: reduce [integer!]
== [integer!]

r3-alpha>> type? first block
== datatype!

r3-alpha>> type? first [integer!]
== word!

There is a proposal on the table to apply the @[...] structural type to the representation of datatypes:

>> type of 10
== @[integer]  ; alternately @[integer!] ...?

(Note: I'm a bit torn on whether the brevity of @[integer] outweighs the need to have a duplicate symbol in the symbol table. But that said: for most of them you'd probably have the non-exclamation point symbol somewhere anyway... all you need is one variable called block to wind up with a symbol for it, so the brevity of @[block] might be worth it in the long run, especially considering there aren't all that many types.)

This would not be its only use for @[...] type in the system. But the general understanding would be that in any dialect where types were of interest, that's what they would be for.

You would still be able to use arbitrary quoting to get around the other meanings, for instance modal parameters

>> data: [a b c] 

>> append data @[integer]
== [a b c [integer]]  ; "modal" parameter, e.g. @ implies /ONLY

>> append data '@[integer]
== [a b c [integer] @[integer]]  ; quote subverts modal interpretation

>> append data integer!
== [a b c [integer] @[integer] @[integer]]  ; word evaluation also not modal

PARSE could also work around it with quoting, even if the plain intepretation was as a datatype match (after all, INTEGER! rules are a count unless escaped, BLOCK!s are rules... now even LOGIC! is interpreted as whether to continue the parse or not):

>> integer!
== @[integer]

>> did parse [1 2 @[integer]] [integer! @[integer] '@[integer] end]
== #[true]

...but what about TYPESET! ?

TYPESET! took advantage of 64-bit integers--and a limitation of 64 fundamental types--to fit 64 bit flags into a single value cell to represent a typeset. Besides the obvious lack of extensibility, this has a number of problems. One of which is that typesets render in a pretty ugly way...a seemingly simple concept like ANY-TYPE! expands out fairly monstrously:

r3-alpha>> print mold any-type!
make typeset! [unset! none! logic! integer! decimal! percent! money! char!
pair! tuple! time! date! binary! string! file! email! url! tag! bitset! image!
vector! block! paren! path! set-path! get-path! lit-path! map! datatype!
typeset! word! set-word! get-word! lit-word! refinement! issue! native!
action! rebcode! command! op! closure! function! frame! object! module!
error! task! port! gob! event! handle! struct! library! utype!]

red>> print mold any-type!
make typeset! [datatype! unset! none! logic! block! paren! string! file!
url! char! integer! float! word! set-word! lit-word! get-word! refinement!
issue! native! action! op! function! path! lit-path! set-path! get-path!
routine! bitset! object! typeset! error! vector! hash! pair! percent! tuple!
map! binary! time! tag! email! handle! date! port! image! event!]

People are used to typing in variables--like arrays or objects--and seeing what they look up to be a large amount of data. But a typeclass like ANY-TYPE! doesn't typically have this "explosive" character...and it impedes readability.

If you look closely you'll see another sneaking problem hinted at by R3-Alpha's never-implemented UTYPE! (for user-defined type). Imagining that it had an implementation, this would suggest that all user-defined types would be considered equivalent in a TYPESET!. If you took one user-defined type as a parameter, you would have to take them all, and do some filtering on it after the fact.

(In fact, extension types in Ren-C--which do exit--have this very problem. If you pass a GOB! to a native routine that expects a VECTOR! or a STRUCT!, it will currently crash. This hasn't really come up yet because few are working with those types. But it's something that a real answer to typesets would have to address, part of why I'm mentioning all this.)

This doesn't even touch upon the idea of "type-classes"...

e.g. if you decide to make a base object with something like book!: make object! [...] and later make book! [...], this "book!" is the kind of thing you might consider in some languages to be a class. You might want to write a routine like library-checkout: function [b [book!]] [...]. But there is no facility for this.

But..."Derived binding" in Ren-C set up some groundwork for understanding derivation. This was done for efficiency: to know the relationships in order to be able to forward references to base class members downward to the instance. That avoided needing to make deep copies of every member function of objects each time a new instance is made...just so those functions could refer to the variables in the derivations. Yet the relationship it had to encode to accomplish this could also be used as a type test to see if something came from a given type hierarchy.

So the mechanics are there...and it seems it would be cool to implement. But again, that depends on a notion of what a "typeset" actually is, which is the limiting factor.

And what about "type-tests..."?

Still another question comes along for tests that are basically functions. How about even-integer!, or block-2! where that's a block containing two elements? These seem very useful, though potentially dangerous if the function has side effects... leading one to wonder if there should be a PURE annotation for functions that promise not to have side effects, and that they take all their parameters as CONST and won't use any non-PURE functions in their implementations or look at any variables that aren't parameters that haven't been permanently LOCK'd.

I actually think maybe "type test" is the fundamental thing to be looking at, instead of some nebulous TYPESET! construct. If type tests can be held onto by name, and that name looks up a PURE function (approximated as a regular function with a pinky promise to make it pure in the near term, maybe), then maybe that is better?

This might point in a direction more like:

integer!: @[integer]  ; fundamental

any-value!: @(any-value?)  ; type function

You could then write a native whose implementation was something along the lines of:

 set-typeset: func [name [word!] types [block!]] [
      types: reduce types
      m: make map! length of types
      for-each t types [
          if not sym-block? :t [fail [:t "is not a datatype"]]
          m/(t): true
      ]
      set name func [t] [m/(:t)]
      
      return reduce @(name)
 ]

Then imagine you said something like:

>> any-scalar!: set-typeset 'any-scalar? [integer! decimal! ...]
== @(any-scalar?)

You'd end up with an ANY-SCALAR! definition that was not entirely illegible...an inert value that looks up to a type-checking function. Internally to the system there could be optimizations of this... imagine a generic MAPCHECKER function dispatcher which could be recognized by the evaluator to not even need to call it, but could go straight to the MAP!.

Anyway... moral of the story is, I think it's time to kill off both DATATYPE! and TYPESET! as non-concrete, inextensible, render-unfriendly types...and lean on mechanisms we have a shot at exploiting in a meaningful way.

I believe common cases could be made to be not significantly slower than today, using clever bit-packing. Less common cases would have to cost more, but that's not as bad as being impossible or intractable...as they are today.