The TYPESET! Representation Problem

hostilefork · July 9, 2020, 6:56pm

TYPESET! took advantage of 64-bit integers--and a limitation of 64 fundamental types--to fit 64 bit flags into a single value cell to represent a typeset. Besides the obvious lack of extensibility, this has a number of problems. One of which is that typesets render in a pretty ugly way...a seemingly simple concept like ANY-TYPE! expands out fairly monstrously:

r3-alpha>> print mold any-type!
make typeset! [unset! none! logic! integer! decimal! percent! money! char!
pair! tuple! time! date! binary! string! file! email! url! tag! bitset! image!
vector! block! paren! path! set-path! get-path! lit-path! map! datatype!
typeset! word! set-word! get-word! lit-word! refinement! issue! native!
action! rebcode! command! op! closure! function! frame! object! module!
error! task! port! gob! event! handle! struct! library! utype!]

red>> print mold any-type!
make typeset! [datatype! unset! none! logic! block! paren! string! file!
url! char! integer! float! word! set-word! lit-word! get-word! refinement!
issue! native! action! op! function! path! lit-path! set-path! get-path!
routine! bitset! object! typeset! error! vector! hash! pair! percent! tuple!
map! binary! time! tag! email! handle! date! port! image! event!]

People are used to typing in variables--like arrays or objects--and seeing what they look up to be a large amount of data. But a typeclass like ANY-TYPE! doesn't typically have this "explosive" character...and it impedes readability.

If you look closely you'll see another sneaking problem hinted at by R3-Alpha's never-implemented UTYPE! (for user-defined type). Imagining that it had an implementation, this would suggest that all user-defined types would be considered equivalent in a TYPESET!. If you took one user-defined type as a parameter, you would have to take them all, and do some filtering on it after the fact.

(In fact, extension types in Ren-C--which do exist--have this very problem. If you pass a GOB! to a native routine that expects a VECTOR! or a STRUCT!, it will currently crash. This hasn't really come up yet because few are working with those types. But it's something that a real answer to typesets would have to address, part of why I'm mentioning all this.)

This doesn't even touch upon the idea of "type-classes"...

e.g. if you decide to make a base object with something like book!: make object! [...] and later make book! [...], this "book!" is the kind of thing you might consider in some languages to be a class. You might want to write a routine like library-checkout: function [b [book!]] [...]. But there is no facility for this.

But..."Derived binding" in Ren-C set up some groundwork for understanding derivation. This was done for efficiency: to know the relationships in order to be able to forward references to base class members downward to the instance. That avoided needing to make deep copies of every member function of objects each time a new instance is made...just so those functions could refer to the variables in the derivations. Yet the relationship it had to encode to accomplish this could also be used as a type test to see if something came from a given type hierarchy.

So the mechanics are there...and it seems it would be cool to implement. But again, that depends on a notion of what a "typeset" actually is, which is the limiting factor.

And what about "type-tests..."?

Still another question comes along for tests that are basically functions. How about even-integer!, or block-2! where that's a block containing two elements? These seem very useful, though potentially dangerous if the function has side effects... leading one to wonder if there should be a PURE annotation for functions that promise not to have side effects, and that they take all their parameters as CONST and won't use any non-PURE functions in their implementations or look at any variables that aren't parameters that haven't been permanently LOCK'd.

I actually think maybe "type test" is the fundamental thing to be looking at, instead of some nebulous TYPESET! construct. If type tests can be held onto by name, and that name looks up a PURE function (approximated as a regular function with a pinky promise to make it pure in the near term, maybe), then maybe that is better?

This might point in a direction more like:

integer!: &[integer]  ; fundamental

any-value!: &(any-value?)  ; type function

You could then write a native whose implementation was something along the lines of:

 set-typeset: func [name [word!] types [block!]] [
      types: reduce types
      m: make map! length of types
      for-each t types [
          if not sym-block? :t [fail [:t "is not a datatype"]]
          m/(t): true
      ]
      set name func [t] [m/(:t)]
      
      return reduce &(name)
 ]

Then imagine you said something like:

>> any-scalar!: set-typeset 'any-scalar? [integer! decimal! ...]
== &(any-scalar?)

You'd end up with an ANY-SCALAR! definition that was not entirely illegible...an inert value that looks up to a type-checking function. Internally to the system there could be optimizations of this... imagine a generic MAPCHECKER function dispatcher which could be recognized by the evaluator to not even need to call it, but could go straight to the MAP!.

hostilefork · August 28, 2022, 10:34pm

On the parallel topic of DATATYPE! representation, there's now an answer...they're represented via symbols and the &[...] block.

It's important to remember that whatever this is, it just kind of gets the ball rolling. It's about seeing a comfortable answer for simple questions, that's not ambiguous:

>> type of 1020
== &[integer]

You can't realistically generalize this. A TYPESET! can't just be a & BLOCK! with more things in it. You're going to fall off the rendering cliff at some point, and just showing the address of it is probably about the best thing the console can show... to give you something back you can ask further questions of.

>> ts: make typeset! [any-series! integer!]
== #[typeset! 0xCDEF1BC4]

>> pick #[typeset! 0xCDEF1BC4] text!
== ~true~  ; isotope

I think the sooner one accepts that direction, the more we can reason about the usefulness of leaning on an existing value class that has basic unambiguous rendering for the "type atoms".

IngoHohmann · August 30, 2022, 9:35am

Sorry, I don't buy that. By the same token you could stop showing binaries, blocks and strings in the console.
Showing the address may appeal to a C programmer, but I'd rather see partial information and a mark showing 'Attention this is only partial'.
Maybe types in a typeset are always shown in the same order, then it is less ambiguous which types are dropped in the display.
Furthermore the number of types won't grow indefinitely, so just showing the full list may not be that bad after all.

hostilefork · September 26, 2022, 8:36am

2 posts were split to a new topic: What If "DATATYPE" Was Isotopic (TL;DR: Bad Idea)

hostilefork · September 26, 2022, 8:41am

A post was merged into an existing topic: What Should TYPE OF an Isotope Be?

hostilefork · September 27, 2022, 2:13am

To recap: It's a real sticking point that Rebol's historical typeset was tied to the idea of a maximum of 64 types, simply checked by a bitset for each of those things. That is a very limited concept. Every Rebol clone has lockstep copied this model--despite sometimes making overtures to the existence of extending types. (and I point out how R3-Alpha's UTYPE! was complete vaporware in every aspect)

There's some fairly different characteristics in play. e.g. when you have ANY-VALUE!, if you load another extension datatype, the set of available types should grow.

This is where I've made the suggestion that ANY-VALUE! might need to just be a "typecheck-flavored function". So some form of function where you know the implicit meaning is to use it to match types.

Hand-waving what that sort of thing might look like:

any-even!: &(even?)

parse [2 4 6] [some any-even!]

Implicit here is that we presume that just referencing a function without that twist...e.g. parse [2 4 6] [some even?] would get you something different (currently, an error).

Seems Cool, But What's The Catch?

A problem with this direction comes down to what happens if you're allowed to change what the words you capture look up to. I cover this in the Survey of Redefining Datatype Words.

To sum up: Native code fiddles cell bits under the expectation that the type checking has been done correctly. If you can change what INTEGER! is defined as--and what the typeset consists of is [integer!]--the guarantees the native expected are broken.

We can harden things by reducing them. So instead of the native storing [integer!] it stores [&[integer]], which retains some amount of readability. But it's not what HELP wants to show, so it either has to reverse-engineer INTEGER! back or keep the source block separately.

If "typesets" are actually "typechecker-flavored functions" it's worse, because they become ugly action literals. Though there's a modern feature that can help with this being not quite as bad as it was: ACTION! symbol caching:

>> any-even!: reduce &(:even?)
== &(#[action! {even?} [value]])

So EVEN? symbol can still be in there. But again this isn't what HELP wants...it presumably wants to show ANY-EVEN!.

(I'll point the obvious that when using a typecheck function, it has to remove from consideration anything not a candidate in the component function's type checking. So if EVEN? only takes INTEGER! it wouldn't try to pass non-integers to EVEN?)

What About Just Locking The WORD!s Used In Natives?

I've written this up in the survey post as a possibility:

>> test!: integer!

>> foo: func [x [test!]] []

>> test!: tag!
** Error: Cannot modify TEST! word, locked for use in a type spec

But I pointed out that it's trickier than it sounds:

Survey of Redefining Datatype WORD!s

Though it would have to be a "semantically deep lock". For instance, if it were legal to say:
 my-types!: [integer! tag!]
 foo: func [x [my-types!]] []
...if the way types were interpreted was such that it would pull in types that were grouped in a block like that, then it would have to reach through and lock INTEGER! and TAG! too. This same "deep lock" notion would need to apply to any functions that might be used.

What About Recursive Typechecks?

People would be able to write things something like:

 types-one!: &[integer! types-two!]
 types-two!: &[block! types-one!]

This is just a case you'd have to catch. The typechecker would have to color the array nodes as it went.

(Saying this disproves the idea would be like saying you shouldn't make a spreadsheet with formulas because there can be cycles.)

Time Has Passed And I Haven't Had Any Better Ideas

Saving a typeset's definition as a list of words offers the desirable property that what's in a typeset is actually useful for HELP, and can be viewed meaningfully as "source".

If we argue that any word used this way has its meaning become locked, that doesn't mean the word can't have its meaning redefined in another context. LIB can define INTEGER! one way, but you can have your module define it another. You can call local variables INTEGER!. etc.

Locking permits optimizations. If a typeset says [integer! block!] and it can trust that the meanings of those words have been locked in the context, it could make some cache where it didn't have to look up the words.

I've been circumspect about making this jump. But besides committing to 64 types and a bitset forever, I just don't see another solution. Guess I'll give it a try.

iArnold · September 27, 2022, 8:41pm

What about some kind of solution like utf-8 where the ascii characters take up 1 unit and more exotic take up more units?

hostilefork · April 12, 2023, 6:53pm

hostilefork:

I've made the suggestion that ANY-VALUE! might need to just be a "typecheck-flavored function". So some form of function where you know the implicit meaning is to use it to match types.

Hand-waving what that sort of thing might look like:
any-even!: &(even?)

parse [2 4 6] [some any-even!]

"MORE Time Has Passed, And I Haven't Had Any Better Ideas". Pushing forward with isotopes made an already bad situation with the 64-bit TYPESET! worse. Isotopes bring in a new isotopic form of every datatype, necessitating things like splice!, activation!, and matcher!. But my verdict is that isotopes (and the reimagining of void) are the undeniable direction of solution to classic problems with the language...so...

I have implemented the strategy, and it is limping along...actually running.

New Types: &word, &tu.p.le, &pa/th, &[...], &(...)

I decided to take the ampersand to make TYPE-WORD!, TYPE-BLOCK!, TYPE-GROUP!, etc.

A basis of this replacement for typesets is that &(...) are "type AND groups" and &[...] are "type OR groups".

Spec blocks convert regular blocks to these implicitly, and you can combine them there without the ampersand:

foo: func [arg [(integer! negative?) block!]] [
    ... you know arg is either a negative integer or a block ...
]

If the argument wouldn't pass the type checking of a function referenced in one of the type containers, then that counts the same as a failure. Hence NEGATIVE? would imply a test for DECIMAL! or INTEGER!.

Exempting ANY-VALUE? from taking ANY-VALUE!

In order to solve the recursive nature of this, the ANY-VALUE? function has no type on its argument...which means don't type check it. Hence ANY-VALUE! can indeed be expressed as &(any-value?) without causing an infinite loop.

Of course, opening typechecking to arbitrary computation does introduce a lot of potential for problems...including but not limited to infinite loops. But we can get infinite loops a lot of ways.

Speeding Up Checks for Common Types

When you express something like the type constraint for ANY-SERIES! as &(any-series?) that might make it seem that each call to a function like APPEND would always be calling a type checking function--possibly several. This seems to add significant overhead.

But I added an optimization. The type checking machinery can make exceptions for these built-in cases and just skip the function call...doing a fast test based on something like the historical bitset-based typesets...so long as ANY-SERIES? points to the definition it thinks it does.

LOGIC! is a type constraint

With this, we see LOGIC! ceases to be a foundational type, but a type constraint...it's actually an isotopic word which passes the test for logic by being either isotopic ~false~ or ~true~:

logic!: &(logic?)

And this ushers in a new era of thinking about the usage of types. It means moving away from:

 switch type of x [  ; old way
     logic! [...]  ; would never happen, since TYPE OF TRUE is &isotope
     block! [...]
 ]

So this should be replaced with something new where the type constraint function or matching is called. We're running out of words, so I propose a refinement as switch/type

switch/type x [
    logic! [...]
    block! [...]
]

The question of what concrete type something is will matter in a few cases, but we'll just have a limited number of concrete types, which will include &isotope and &quoted. So you won't be able to say type of and get back a logic! (for instance)

Hence, TYPESET! No Longer Exists

This solves the typeset representation problem because they just don't exist anymore. They are functions, some of which are native and can be recognized and tested for without going through function calling machinery.

You need to use MATCH functions with them. Because if you try to FIND on what you think of as a typeset, you will just be interacting with it as a block or group...nothing special.

>> positive-integer!
== &(integer! positive?)

>> find positive-integer! 'positive?
== &(positive?)

So instead, use match function is how to do type tests... not FIND.

>> match positive-integer! 10
== 10

>> match positive-integer! -10
== ~null~  ; isotope

>> match positive-integer! "hello"
== ~null~  ; isotope

Amazingly I Have This Working, But It's Far From Good

The code is a wreck, held together only by the fact that it's got so many asserts in it that I can make it limp along. It will take time to hammer it into shape good enough to push out.