User Defined Type Scenarios Solicited

hostilefork · December 20, 2019, 7:56am

@giuliolunati has a desire to apply Rebol to "mathy" studies and implementations. He was an early tinkerer with the R3-Alpha VECTOR! datatype, and has done several usermode projects...a few are:

complex numbers
fractions
matrices
studying permutations of Rubik's Cubes (via user natives)

His (reasonable) desire is for mathematical objects that can integrate the operations people are familiar with. To pick a simple example, having + or add work on these completely custom datatypes, vs. having to call matrix-add or matrix+ or matrix/add or -> matrix/+.

Yet try typing help add in R3-Alpha, what do you get?

r3-alpha>> help add
USAGE:
    ADD value1 value2

DESCRIPTION:
    Returns the addition of two values.

ADD is an action value.

ARGUMENTS:
    value1 (scalar! date!)
    value2

I'll point out there's no a-priori type checking on the second argument. Furthermore, if you asked what a SCALAR! was:

r3-alpha>> help scalar!
SCALAR! is a typeset of value: make typeset! [integer! decimal! percent!
money! char! pair! tuple! time!]

So the People Clamored and Rallied behind the UTYPE!

UTYPE! was billed as a "user-defined type" which would allow Rebol's "Generic Functions" (like ADD or INSERT or APPEND) to be defined for it, extending the abilities of those operations to new types.

Unlike a typical function defined with something like FUNC that runs a single body, Rebol Generics would delegate the responsibility for handling the function call to whatever the datatype of the first argument was. This creates some sort of puzzling implementation combinatorics...

>> add 12-Dec-2012 1
== 13-Dec-2012

>> add 1 12-Dec-2012
== 13-Dec-2012

In the first case, it is DATE!'s handler which received a date value and an integer to process as an argument. In the second case, it is INTEGER!'s handler which received an integer value and a date to process as an argument. So what happens in R3-Alpha is that the handler for INTEGER! would have to recognize when it has a date argument, and reverse the call.

The comment makes it sound like that's a universal rule so this is okay. You might argue that 1 + (anything) is necessarily the same as (anything) + 1. That rules out 1 + "abc" being "1abc" with "abc" + 1 being "abc1"...which seems a bit at odds with Rebol's freedom-of-choice ethos. But you could still do it...you'd just have to do it outside the generics system somehow.

But you only need look a few lines down to see a comment that says:

"Only type valid to subtract from, divide into, is decimal/money".

So this means INTEGER! has made a rule for all time that 1 - (anything) works only for DECIMAL! and MONEY!.

Not Only Is This A Very Unexpandable Framing of Generic Functions, There was a fully negligible non-implementation for UTYPE! in R3-Alpha

People seemed to easily get their hopes up when they see a new datatype show up in the interpreter. ("Look, there's TASK! Look there's UTYPE! They must be coming soon!") And I've tried to really get folks to not let their imaginations run away with them for what something might be until you've seen proof it works. "Use the Source, Luke!"

The implementation of UTYPE! that was there likely took less than 20 minutes to add. To explain the code: a UTYPE! is essentially an alias for an object with a table of R3-Alpha's generic functions, each entry having a FUNC with the signature of that generics. Then they could be the recipients of these method calls whenever your UTYPE! was the first argument.

If your UTYPE! was anything but the first argument, then nearly the only thing that could work on it would be ADD when an integer was the first argument. And only if you defined a sensitivity to ADD for your type when an INTEGER! was the second argument.

That's a "big picture" critique, but the mechanical implementation problems were very problematic too if you read the code: The UTYPE! which provides the table of dispatch functions order those functions in numeric order (e.g. say that APPEND is 32 or ADD is 7). This requires whoever creates the UTYPE to have internal knowledge of a numbering system of generated C code that could change the numbers. Worse, it was not a map or sparse table; your table had to be as long as the largest generic you wanted to support.

Since none of the generics had signatures that were aware of UTYPE!, then any UTYPE! would have to be accepted as the first argument. Nothing was worked out there. And Do_Function() didn't run any type checking, so the UTYPE! could not narrow the type signature in its own function definition. And of course nothing in the system was set up to take UTYPE!, so even the modest 1 + (utype) wouldn't run unless you either accept a UTYPE! for every argument or say you'll take it.

To help focus on what was implemented...the UTYPE! was quickly wiped out of Ren-C, and nothing of value was lost. The questions of how new types would work was left as an area of consideration for the future, when a design could be articulated.

Grim Though This Sounds, There Is A Silver Lining

In the intervening time, Ren-C has remedied many technical bits which plagued the existing generics (while not redesigning the "dispatch on first argument" rule itself). It establishes dispatch by a speedy means of associating symbol IDs with actual words, so a user defined type could really speak of which generics it wants to override by name. Function dispatch is under flexible control--as the feature set in specializations and composition should show.

I won't go on to enumerate all the mechanical ways in which Ren-C is more prepared to attack a "real" solution to generic functions. I will just say it is. If we know what we're looking for.

Clearly there has to be some kind of pattern engine involved that lets you solve the idea that a new type could claim responsibility for 1 - mytype as well as mytype - 1. And I think this raises the question of whether there is truly a difference between a "generic function" and a "non-generic function". It seems all function calls need to be "overloadable" if someone knows they can speak for a certain set of types involved.

Remember: Rebol doesn't have the underpinnings to compete with a Haskell or a C++ on their turf directly. It's always been an illusion of what you can get away with via tricks, using something lighter and..."brickier/visceral -> The Minecraft of Programming". And every time a new trick comes along where people think they know how code is working but pull it apart and go "wow!" and see they can do their own magic of language design when they never thought it could be in their reach, it's what makes it fun.

giuliolunati · December 24, 2019, 9:27pm

Some remarks, strictly in random order:

A: obviously user types are not only for math, nor I am interested only in math: there are also iterators, containers and ports (are port schemes sort of user types implementation? ...)

B: the vector extension could be a starting point for experimenting

C: I implemented user types at Rebol level in a module named custom.reb, and used it to implement complex numbers and fractions. In that implementation, binary functions try to delegate the 1st arg, then the 2nd. So 1 + 3/2 and 3/2 + 1 are both managed by fraction type.

D: I think there isn't a fundamental difference between user datatypes and "ordinary" ones.

E: The scenario could be:
(1) every generic function has a canonical name (2) every datatype (not only the user ones) is/has a dictionary, containing the canonical names of all the functions that datatype can manage, and associating each of them to a concrete function ("method"). E.g. the INTEGER dictionary must contain ADD, MULTIPLY, ... NEXT...; the DECIMAL dictionary must contain also EXP, etc;
(3) The generic function checks the 1st arg datatype to see if it has a "method" associated to the canonical name of the function
(4) if so, that method is tried, passing in all args. (4a) The method can return a result (or an ordinary error) then the game is done.
(4b) Else the method must return a special error, saying "I can't manage these args"
(5) In that case, or if the 1st arg has no method with the right name, the 2nd arg is checked and so on.