A Justification of Generalized Isotopes

hostilefork · August 16, 2022, 10:23am

Here is a train of thought to help people realize why isotopes are needed, and why unifying their behaviors and mechanisms under a common umbrella makes sense. It starts from the issue of solving /ONLY and then explains the generalization.

As time permits, I'll come back and try to improve this...

NOTE: Terminology has changed over time to where what was once called "the isotopic form" of a value is now called "the antiform", in order to be more consistent with the meaning of the word isotope as describing a group of forms in other fields. You may see lingering references to the old usage.

Years of fretting over the /ONLY debacle converged on a somewhat inescapable conclusion:

It's better to carry the intent of whether a value needs to be spliced on that value...as opposed to having subtle variants of core operations that modulate the splicing.

I'd worked up to a point where I was implementing the "mark of intent" by adding a quoting level to suppress splicing. Yet this faced likely accidents when someone had a quoted value in a variable...and really meant to use it somewhere as-is, with the quote--vs. thinking of the quote as a splice-suppression signal which the operation should remove.

Then @rgchris made this remark:

Putting Splicing Intent On APPEND'ed Value

The issue I have with doing the opposite of ONLY—let's call it SPREAD—is what is the interim value?
>> block: [a b c [a b c]]

>> find block pick block 4
[[a b c]]

>> find block spread pick block 4
[a b c [a b c]]

>> spread pick block 4
???
It would seem to have virtue over ONLY and is a better word.

If Trying This In Historical Redbol, What Might One Do?

As a rough first cut, let's represent splices with a specially recognizable 2-element wrapper block. We'll signal it's a splice with a series in the first slot--checking for the unique identity of that series. Then put the block itself as the second element:

splice-cue: "!!!splice!!!"

spread: func [block [block!]] [
    return reduce [splice-cue block]
]

splice?: func [value] [
    if not block? :value [return false]
    return same? splice-cue first value
]

Then we can write our new versions of things like APPEND that are specifically aware of this construct.

append*: func [series [series!] value] [
    return either splice? :value [
        append series second value
    ][
        append/only series :value
    ]
 ]

It works more or less in your average Redbol, e.g. in Red:

red>> append* [a b c] spread [d e]
== [a b c d e]

red>> append* [a b c] [d e]
== [a b c [d e]]

red>> append* [a b c] 'd
== [a b c d]

red>> append* [a b c] first ['d]
== [a b c 'd]

In fact, this is essentially how the bootstrap executable for Ren-C simulates the SPREAD behavior.

But the weaknesses are immediately apparent!!!

Not A Distinct Type: Too Easy To Overlook Handling

There's no special type for the spliced block...it's just a BLOCK!. This means any routine that hasn't been written to handle it, will just let it leak through.

red>> reduce [spread [a b c] [a b c]]
== [["!!!splice!!!" [a b c]] [a b c]]  ; not [a b c [a b c]]

Changing to some other generic type that can contain a block...such as an OBJECT!...doesn't help matters. You are kind of in trouble any time an operation willfully lets you put these into an array.

The first instinct might be to introduce a new SPLICE! datatype, with a system-wide rule that splices can't be put into arrays. (Enforcing such a rule across all array-manipulating code is challenging...so let's sort of make a note of that fact, but continue.)

Because of the peculiar nature of not being able to be put in a block, there'd have to be a decision made about function arguments as to whether or not they took this type. Many functions designed to handle generic values would not be able to handle them, so there'd presumably need to be some typeset like ANY-NOTSPLICE! or ANY-NORMAL!.

How To Represent A Type That Can't Be Put In A Block?

Now we've got several things to ponder about our new type. For instance: what you should see here?

>> obj: make object! [foo: spread [d e]]
== make object! [
    foo: ???
]

We just said that a defining feature of SPLICE! is that you can't accidentally put them in blocks. But the argument to MAKE OBJECT!, namely [foo: ???], is a block. If ??? can't itself be a splice!, then what is it?

This brings up a possibly-related question: what if you want a way to put the intent of whether to splice or not into "suspended animation?"... in a way that you could collect it?

Here's a sort of contrived example of the puzzle:

generate: func [n [integer!]] [
   if even? n [return reduce [n n + 1]]
   return spread reduce [n n + 1]
]

wrap: func [
    return: [...]
    in [splice! block!]
][
    ...
]

unwrap: func [
    return: [splice! block!]
    wrapped [...]
][
    ...
]

n: 0
pending: collect [while [n < 4] [keep wrap generate n]]

data: copy []
for-each item pending [append data unwrap item]

How would you write WRAP and UNWRAP such that at the end of the code above, you'd get:

>> data
== [[0 1] 1 2 [2 3] 3 4]

If the system didn't provide some answer to this, you'd end up needing to re-invent something kind of equivalent to the primitive ["!!!splice!!!" [...]] mechanic as a means of persistence:

>> pending
== [[0 1] ["!!!splice!!!" [1 2]] [2 3] ["!!!splice!!!" [3 4]]]

Isotopes Were Designed For This!

Isotopes are a set of curated answers for these problems. Originally they were introduced to address issues like what an UNSET! was...which has some of the same class of problems as SPLICE! (such as not wanting to be put in BLOCK!s, and not accepted by default or by most routines).

Isotopes introduce two new variants of datatypes called antiforms and quasiforms. Antiforms cannot be put in blocks.

Isotopes are:

general - all base value types (e.g. unquoted things) can have antiforms and quasiforms
efficient - antiforms and quasiforms do not require allocations, and merely are a different state of a byte in the value cell (the same byte that encodes quoting levels)
"meta-representable" - all antiforms can be produced by evaluating their quasiforms, and quasiforms can be produced by evaluating quoted quasiforms.

I mentioned at the outset that it would be somewhat costly to bulletproof all of native code against the ability to do something like append a specific data type like "SPLICE!" to a block. But with isotopes this problem has been solved once for all the forms...so the same code that prevents a so-called "UNSET!" from winding up in arrays works for splices. That's because a splice is actually a group! antiform, and an unset variables actually hold BLANK! antiforms!

Above I asked:

What you should see here?

>> obj: make object! [foo: spread [d e]]
== make object! [
    foo: ???
]

Isotopes give us the answer, that it's foo: ~(d e)~. This is the previously mentioned "QUASIFORM!" of GROUP!, which when evaluated produces an antiform of GROUP!...which by convention represents a splice.

But antiforms themselves have no canon representation. The console can print out a comment or show them in a different color, but to talk about them having a representation doesn't make much sense as you'll never see them in source.

>> ~(d e)~
== ~(d e)~  ; anti

I also asked:

"How would you write WRAP and UNWRAP such that at the end of the code above, you'd get:"
>> data
== [[0 1] 1 2 [2 3] 3 4]

With group antiforms representing splices, you don't need to write WRAP and UNWRAP... because these operations are built in operations called META and UNMETA. And the pending array would look like:

>> pending
== ['[0 1] ~(1 2)~ '[2 3] ~(3 4)~]

When the QUOTED! blocks are UNMETA'd, they become regular blocks and then are appended as-is. When the QUASIFORM! groups are UNMETA'd they become antiforms and give the splice intent. This produces the desired "suspended animation" to preserve the intent.

That suspended animation is also used in the ^META parameter convention, which indicates a function argument can accept arbitrary antiforms... and the add-quoting-or-quasi behavior brings those antiform variables into a reified state so they can be safely handled.

The Proof Is In The Capabilities

I've explained about splices, and mentioned how it crosses needs with unset variable states.

But it's also how NULL and VOID are implemented, as antiforms of WORD! states that can't be put in blocks.

The ERROR! antiform is used to have a sneaky out-of-band way to return definitional errors

The FRAME! antiform is what we'd traditionally think of as an "ACTION!" or "FUNCTION!". It triggers execution if accessed via WORD! references. This makes it safe to handle items picked out of blocks without worrying about defusing actions...because only quasiform or plain frames can be put in blocks in the first place!

It's natural for there to be some confusion with the new idea--especially given all its churn through the course of design. But the design is becoming clearer, and I think people are going to find this gives solidity to writing complicated but coherent code...vastly outpacing historical Redbol.

BlackATTR · August 16, 2022, 12:37pm

Great writeup and congrats on this foundational feature. It looks like a winner!

hostilefork · October 15, 2022, 7:59pm

2 posts were merged into an existing topic: Should REDUCE Heed SPREAD?

hostilefork · October 15, 2022, 8:13pm

Looking back at a quote from Nenad I have historically taken issue with, we may not actually disagree as much as it first seems. He said:

"Redbol languages are based on denotational semantics, where the meaning of every expression needs to have a representation in the language itself. Every expression needs to return a value. Without unset! there would be a hole in the language, several fundamental semantic rules would be collapsing, e.g. reduce [1 print ""] => [1] (reducing 2 expressions would return 1 expression)."

We actually agree on the part in bold. The twist is that he goes on from what I think is the true part ("needs to have a representation"), and conflates it with the idea that the direct use of an expression's result must behave as something you can put in a block.

I'm saying you should always be able to get to a value to put in a block... but you might need an additional step to get it. That could be an operation like META or REIFY, which gives a value you can put in a block, but can then be reversed to provide an antiform back.

In Ren-C, PRINT returns "trash"...a BLANK! antiform. This would cause an error, but not if you META it:

>> reduce [1 print ""] 
** Script Error: Invalid use of ~ antiform

>> reduce [1 meta print ""] 
== [1 ~]

>> reduce [1 ^ print ""] 
== [1 ~]

These are the problems that isotopes are designed to solve! Without formalizing an isotope mechanism in the language, your choices are:

Write your code manipulating Rebol structures in another language (like C or Red/System)...which is inherently "meta" and can handle the oddness of these states.
- (People should be suspicious when problems with the language are addressed by not using the language!)
Make usermode code struggle with refinements like /ONLY that pushes the oddness off of the values and forces generalized code to shift into a different handling mode.

It's a significant enough problem area to be worth attacking with a generalized solution, that keeps the oddness on the value states where it belongs. People should have an "a ha" moment about that when seeing things like REPLACE:

>> replace/all [[a b] a b a b] [a b] [c d e]
== [[c d e] a b a b] 

>> replace/all [[a b] a b a b] spread [a b] [c d e]
== [[a b] [c d e] [c d e]]

>> replace/all [[a b] a b a b] [a b] spread [c d e]
== [c d e a b a b]

>> replace/all [[a b] a b a b] spread [a b] spread [c d e]
== [[a b] c d e c d e]

As I say, when Red tries to solve these kinds of problems without isotopes--e.g. claiming UNSET! is "just another type"--it's like they're doing complex math without complex numbers. You will hit limits when trying to do nontrivial things.

hostilefork · October 16, 2022, 3:52am

If Isotopes Are So Great, Why Don't Other Languages Have Them?

In a way, Rebol2 had some inkling of isotopic ideas with errors. Because an ERROR! couldn't be fetched from a WORD! without triggering an exception...you'd have to DISARM it, and that would convert it into an OBJECT!.

Early on when I was looking at the language, I did wonder if functions should have an "armed" vs. "disarmed" state--like errors. This arose while trying to generate C code which assigned function variables...kind of along these lines:

r3-alpha>> do compose/only [append-alias: (:append)]
; How to stop APPEND from running, by changing *only* inside the (...) ?
; (and still make APPEND-ALIAS act as a synonym for APPEND)

I talked myself out of it at the time...because I worried about "hidden bits" like the armed state. But after years of building on incremental ideas like generic quoting, there are answers to such historical problems. You can put a quasiform in that slot and it can evaluate to the antiform (itself having no representation, and can't be put in blocks, stopping the spread of this "invisibility").

But why didn't a parallel concept evolve in Lisp or other languages? Here are some thoughts on reasons...

Lisp's quoting is a rendering trick on its list structures, there's not an actual place to store a negative quoting level.

Antiforms are tied cin losely to the concept of generic quoting, and I've kind of said that they're akin to "having a quoting level of negative one".

One thing that would hold Lisp back from seeing this as a natural possibility is that there isn't actually a "quoted value type". When you see quotedness, it's just a rendering trick. Something along these lines:

lisp>> (print "Hi")
"Hi"

lisp>> '(print "Hi")
== (print "Hi")

lisp>> (quote abc)
== abc

lisp>> '(quote abc)
== 'abc

lisp>> '(quote (quote abc))
== ''abc

lisp>> (type-of '(print "hello"))
== cons  ; something like "group!"

lisp>> (type-of '''abc)
== cons

On the plus side of not building quote levels into the cells themselves, they can use the existing flexibility of lists to get arbitrarily high levels of quoting. (Right now Ren-C mechanics only allow 127 quote levels.)

But you can see how this would create a pretty big barrier to coming up with an idea like "negative quoting"; it would seem to make no sense.

Also: Like in Ren-C, if you evaluate a quoted structure in Lisp you drop one level of quoting. But they didn't think it worth it to put an UNQUOTE in the box as a narrowed form of EVAL that only took quoted structures. Without that, one wouldn't be likely to think of wilder things like UNMETA.

Lisp's focus on compilation means they wouldn't like the idea of things like runtime conversions of normal values into states that would make a variable act undefined.

This kind of fits in with the fact that a lot of things Rebol does would be off the table for many Lisp implementers.

For instance: Lisp dropped the idea of being able to mark a function's arguments as being quoted at the callsite, because of how much that interferes with compilation:

"The idea of first-class operative combiners, i.e., first-class combiners whose operands are never evaluated, has been around a long time. Such creatures were supported by mainstream Lisps through the 1970s, under the traditional name fexprs, but they made a mess out of the language semantics because they were non-orthogonal to the ordinary variety of procedures constructed via lambda — and, more insidiously, because at that time the mainstream Lisps were dynamically scoped (a language feature that causes more problems for fexprs than it does for the less powerful macros)."

Most Languages Use Containers

I've done a writeup of Haskell's Either and Rust's Result, showing some of what's similar about them to isotopes:

Haskell and Rust Error Handling

There are actually a fair number of nuances, but antiforms are kind of like a container that's available system-wide on every variable... but not array slots.

And because it's systemic and built in, you don't have to think about this containership in advance. Look at what it takes to return an Either from some sample Haskell code:

parseDigit :: Char -> Either ParseDigitError Int
parseDigit c =
  case c of
    '0' -> Right 0
    '1' -> Right 1
    '2' -> Right 2
    '3' -> Right 3
    '4' -> Right 4
    '5' -> Right 5
    '6' -> Right 6
    '7' -> Right 7
    '8' -> Right 8
    '9' -> Right 9
    _ -> Left (NotADigit c)

The isotope model is more like letting you say ('0' -> 0) and (_ -> raise NotADigit c), so you only have to call out the "weird" cases.

Though there's no truly silver bullet: if you're exchanging reified data via arrays, you can't use antiforms there. So the convention of "containership" has to be decided on in advance for fully generic code. (If the code isn't fully generic and you aren't using quasiforms for something else, then the QUASIFORM! can serve as a means of tunnel antiform intent.)

It turns out to be actually really hard to tie these concepts together coherently, and people who undertake such challenges usually wouldn't bother with a runtime model as informal as a Redbol's.

Newcomers to non-rigorous languages like JavaScript will often ask questions along the lines of "hey, why does JavaScript need both null and undefined". This triggers a lot of conversation about the various practical problems that would happen if you only had one or the other, and usually people throw up their hands and say "what's done is done" and get on with their lives.

A much rarer question would be: "might null and undefined be related in some transformative way, where certain basic operations naturally coerce and promote/demote between them in a meaningful pattern". Because that's a sophisticated academic way to think, and people who care about that use "better" languages.

While someone might suggest this means the isotope design is thus a case of polishing a turd, my recent forays into Rust are reminding me of the unusual and distinct strengths that Ren-C has. I'm withholding my verdict on whether its future is more than a kind of educational video game, but I think it's at least that... so making the design "click" where it can feels worth it.