What is the use of typesets?

bradrn · February 20, 2024, 4:29am

Recent discussions have brought the notion of ‘typesets’ to my attention. On reading the source code, this confused me a bit… they don’t seem to be used all that much. Moreover, they seem to be subsumed by the idea of optimised constraint functions.

So I set out to try them myself. Only to find myself thoroughly confused, because typesets don’t actually seem to be accessible from Ren-C itself. Or, at least, none seem to be defined — normal type names evaluate to TYPE-BLOCK!s as expected, but typeset names are unassigned:

>> word!
== &[word]

>> text!
== &[text]

>> any-utf8!
** Script Error: any-utf8! word is attached to a context, but unassigned
** Where: console
** Near: [any-utf8! **]
** Line: 1

>> any-type-value!
** Script Error: any-type-value! word is attached to a context, but unassigned
** Where: console
** Near: [any-type-value! **]
** Line: 1

[On which note, incidentally, let me yet again mention my conviction that the other TYPE-* datatypes are useless and should be removed.]

I expected to at least be able to use them in function signatures, but I can’t even do that, since it crashes a program:

>> test: func [x [word!]] [return x]
== ~#[frame! {test} [x]]~  ; anti

>> test: func [x [any-utf8?]] [return x]
== ~#[frame! {test} [x]]~  ; anti

>> test: func [x [any-utf8!]] [return x]
Assertion failure: QUOTE_BYTE(v) == ANTIFORM_0
Line 165, File: /home/bradrn/Documents/red/ren-c/src/include/cells/cell-quoted.h
Trace/breakpoint trap (core dumped)

So… if they can’t be accessed from Ren-C itself, then why does the interpreter have typesets at all?

hostilefork · February 20, 2024, 6:08am

The current Ren-C does not have them.

(The %types.r table still has things like ANY-UTF8! mentioned in it, but that's just because I haven't gotten around to changing it...it's used to make the ANY-UTF8? function.)

Let me patiently again mention my conviction that not only do I use them (and intend to continue to use them) but I am not budging on being able to do this:

>> parse [x: $y z "a" <b> %c] [
     words: collect [some keep &any-word?]
     numbers: collect [some keep &any-string?]
 ]

>> words
== [x: $y z]

>> numbers
== ["a" <b> %c]

If the functions I wanted to be able to use as constraints were members of objects, I would need &obj.my-constraint?

If the function had a refinement I was interested in, I would need &my-constraint?/refinement

We do not assume that just any arbitrary function looked up by WORD! reference in a dialect (at least, in PARSE) is meant to be used as a type constraint.

>> parse [x: $y z "a" <b> %c] [
     words: collect [some keep any-word?]
     numbers: collect [some keep any-string?]
 ]

Even if we could do that (by some categorization system that said a function returned a boolean and took a single value and was a plausible type constraint) I don't know that I'd want to. Applying a test to the current value of the input is different than what's suggested by any-value? xxx inline in the parse...which makes it seem like you're testing the product of the next rule. I prefer at the source level having this called out, and having the inertness overture let you know that it's not taking the ensuing thing as an argument.

The only undecorated functions that I'd want to dispatch would be those that fit the format of a combinator.

bradrn:

>> test: func [x [any-utf8!]] [return x]
Assertion failure: QUOTE_BYTE(v) == ANTIFORM_0
Line 165, File: /home/bradrn/Documents/red/ren-c/src/include/cells/cell-quoted.h
Trace/breakpoint trap (core dumped)

Due to the large amount of flux right now, I'm focusing on lining up parts more than I am on making sure the error handling is good. So there's a lot of stuff that asserts vs. errors.

Of course this is clearly something that should just be an error, as putting undefined things in type spec blocks is an ordinary usermode problem.

bradrn · February 20, 2024, 6:23am

Ah, OK. What confused me was %types.r, plus a recent commit message mentioning typesets.

Fair enough, I’d forgotten about this. That’s one more thing for me to consider, I guess!

hostilefork · February 20, 2024, 6:31am

Typesets are now an implementation detail about how some type constraints are implemented quickly.

Some constraints like ANY-ARRAY? can be implemented just by checking the range of the byte in the cell being between a lower and higher value.

Other constraints are sparse, and to check them fast there's just a table... where each entry in the table for each type has bitflags OR'd together for whether the type for that entry in the table has membership in that typeset.

Historically all typesets were implemented one way: as 64 flags in an 64-bit integer. As the commit message said, this just shifted the internals so that instead of the "fast" technique meaning you're limited to 64 types, you're limited to 64 non-range typesets for this optimization... and the limit of 256 types comes from using a byte in the cell.

(I say the "fast" technique, but there is no "slow" technique to compare with... it just has to have an implementation for things to run, and this is what it is.)

bradrn · February 20, 2024, 6:37am

I’d actually been thinking of approaches like this myself, as a way to implement constraints… so it’s good to know we’re on the same page here!

hostilefork · February 20, 2024, 9:02pm

Update %types.r with e.g. ANY-STRING? vs ANY-STRING! · metaeducation/ren-c@7523828 · GitHub

bradrn:

>> test: func [x [any-utf8!]] [return x]
Assertion failure: QUOTE_BYTE(v) == ANTIFORM_0
Line 165, File: /home/bradrn/Documents/red/ren-c/src/include/cells/cell-quoted.h
Trace/breakpoint trap (core dumped)

This was a simple incorrect assert in the error delivery.

If you hit asserts (or bugs) then please feel free to report them on GitHub Issues. I don't really use it much right now, as I just keep a little local task list...and most things that don't fit there are open-ended design discussions, where the forum's reorganizability makes it a better medium.

But I do write up an occasional thing if it's something I'm not imminently going to fix and don't feel like it belongs as a long comment inline in the source, or as a forum post. Did this one recently, for example:

Quasi-Void Rendering Ambiguity in Paths · Issue #1157 · metaeducation/ren-c · GitHub

If you're reading source, hopefully you've gathered that there are a lot of "extenuating factors" which mean that what you're reading may not be good or current... so take everything with a grain of salt.

A lot of the code in the building process is more convoluted than it need be because it's trying to run in a 6-year-old Ren-C as well as a current one.
- As bad as the convolutions are, it's still a bit impressive to see how much emulation and twisting can be done. The design has been influenced by what was hard or easy to bend, so recent interpreters are even more bendy.
Rebmake is a catastrophe beyond that, I've explained why.
- Again...as much of a nightmare it is, maintaining that nightmare provides insights. It's like a large organic test case that demands attention to itself...I can't skip over its needs, because things need to build.
Large swaths of the system are legacy C if/then/else soup that I wouldn't trust further than I can throw it. But this is the life support for the moment, that makes the interpreter able to actually do some things... so it's possible to test the ramifications of designs on sort-of-real-world situations.
Entire fundamental pieces (like binding and types, as recent examples), have undergone experimental prototyping while trying to juggle all the balls in the air... the carnage is pretty intense.
The concerns of isotopes are still very much propagating (in fact, the concerns of generic quoting never quite got a good formalism). The very existence of "values that decay" raise the question of which routines have to face about who should see decayed or undecayed values. This is the consequence of innovating in the midst of existing code.
- Fortunately Ren-C can be built as C++ for extra checks, and I started introducing some amount of type safety with Element/Value/Atom distinctions. You'll see a pattern of me trying to make things that compile as brutish C but get a leg up if a C++ compiler is used.

But as I've said before, as despairing as some of it may seem, the defensive programming of asserts and such put Ren-C in a much stronger position than something like R3-Alpha or Red. If they have a bug they just kind of have to go "oh well, it does that sometimes." Ren-C is so noisy when something is out of alignment that it usually pinpoints the moment something goes wrong.

And I do think it actually does quite a lot of impressive things, considering the "rules of the game" the implementation is playing.

bradrn · February 21, 2024, 1:04pm

hostilefork:

Let me patiently again mention my conviction that not only do I use them (and intend to continue to use them) but I am not budging on being able to do this:
>> parse [x: $y z "a" <b> %c] [
     words: collect [some keep &any-word?]
     numbers: collect [some keep &any-string?]
 ]

While thinking about types more generally… I came to the conclusion this isn’t as good an argument as I thought it was. Because you can do it just as easily with TYPE-BLOCK! alone:

>> parse [x: $y z "a" <b> %c] [
     words: collect [some keep &[any-word?]]
     numbers: collect [some keep &[any-string?]]
 ]

More than that, TYPE-BLOCK!s suggest a nice way to match one of several types, just like you can do in function definitions:

>> parse [x: $y z "a" <b> %c] [
     words-or-numbers: collect [some keep &[any-word? any-string?]]
 ]

hostilefork · February 22, 2024, 1:01am

bradrn:

More than that, TYPE-BLOCK!s suggest a nice way to match one of several types, just like you can do in function definitions:
>> parse [x: $y z "a" <b> %c] [
     words-or-numbers: collect [some keep &[any-word? any-string?]]
 ]

So you'll recall that was something that the design passed through at one point. And in fact, GROUP!s would intersect, while blocks would union.

 >> match &[(negative? integer!) (positive? decimal!)] -10.20
 == ~null~  ; anti

 >> match &[(negative? integer!) (positive? decimal!)] -10
 == -10

 >> match &[(negative? integer!) (positive? decimal!)] 3.04
 == 3.04

 >> match &[(negative? integer!) (positive? decimal!)] 3
 == ~null~  ; anti

You don't technically need two types to do this if you're willing to pay for parentheses inside a group even for one clause. But all things being equal, having the option to say &(negative? integer!) instead of &[(negative? integer!)] could be nice.

If the design winds up being able to accommodate this, I'm for it, but it all comes down to what the type system and TYPE OF need etc. as per other threads.

However...

The constraints already pushed it away from [some keep any-word!] to [some keep &any-word?]. So it carries a wart that many would say has compromised the aesthetic goals of a common case (that already had a "wart" of !, now a ?). But the & wart at least buys you something rather significant: the generality that you can bring in any function.

Taking that another two steps away with three characters of noise moves this to an even more unpalatable place, without offering a benefit that the single character did not provide (for the common case).

Unlike some of the people who play this game who don't appear that self-aware... I'm quite conscious that this obsession with the economy of every little character and drift away from English can seem ridiculous. Maybe it's a dumb game. But if one is going to play it, one has to try and follow the rules.

(If you didn't see my "timeless bear" story... as stupid as it may sound... the reason I told it is because actually: yeah, that's how obsessive we actually intend to be.)