Shades of Distinction In Non-Valued Intents

hostilefork · May 14, 2021, 2:44pm

UPDATE 2022: Rewritten completely to explain the modern answers for the topic that inbound links to this thread wanted to talk about.

Rebol2/R3-Alpha/Red Have Two Kinds of Nothing (both reified)

Historical Redbol gives you two main choices for "nothingness"...#[none] and #[unset]... both of which can be found either in variables, or as values in blocks:

rebol2>> block: reduce [none print "print returns unset"]
print returns unset
== [none unset]  ; misleadingly renders as WORD!s

rebol2>> type? first block
== none!

rebol2>> type? second block
== unset!

Using #[none] has the advantage of being "friendly" on access via word, allowing you to write things like:

rebol2>> var: none

rebol2>> either var [print "do something with var"] [print "do something else"]
do something else

But when var contained an #[unset], you'd get an error instead:

rebol2>> unset 'var

rebol2>> either var [print "do something with var"] [print "do something else"]
** Script Error: var has no value

So instead of using var directly, you had to do something more circuitous and pass the word "var" into a special test routine (morally equivalent to today's set? 'var)

Hence #[none] was reached for frequently out of convenience. Yet this convenience came with a cost: it was very easy to accidentally append one to a block, even if its non-valued intent should have conveyed you might not have wanted to add anything at all.

But it's hard to say: sometimes you did want to add #[none] to a block, to serve as a placeholder.

Also, being able to enumerate a block which contained #[unset] values was problematic, because if you did something like a FOR-EACH it would appear that the variable you were enumerating with was itself not set.

Early Ren-C Made Reified BLANK! and non-Valued NULL

One thing that bugged me was that there was no "pretty" representation for a non-valued state in a block... and that #[none] often thus displayed itself as the word none (seen in the example at the top of the post).

So the BLANK! datatype took the single underscore _.

>> second [a _]
== _

>> if blank? _ [print "yep, it's a blank"]
yep it's a blank

>> if not _ [print "blank is also falsey"]
blank is also falsey

And critically, one of the first things I tried to do was rethink the #[unset] state into something that you'd never find in a block, and called it NULL (as well as made it correspond to C/Javascript null in the API):

>> second [a _]
== _

>> third [a _]
; null

Since NULL couldn't be found in a block, it wasn't ambiguous when you got NULL back from a block operation as to whether there was a "null in that position".

But it's still just two things:

blank! - A nothing you can put in a block
- it was logically false
- it was friendly via word access (no need for GET-WORD!)
null - A nothing you couldn't put in a block
- it was also logically false
- it was unfriendly via word access (need GET-WORD! for :VAR, or SET? 'VAR)

This put you in a difficult situation for your choices of emptiness when you were dealing with something like:

append block value  ; what nothing state should you use for value?

If you wanted to avoid accidentally appending blanks to arrays, you kind of wanted NULL so you'd get an error. But once you used NULL, you could not write the convenient if value [...] control structure.

Later Ren-C added a separate "ornery" non-Value State

A third state was added to be neither logically true nor false, and that would trigger an error on accessing a variable with it. (I'll whitewash history a bit and say this state was always called "TRASH", and also always could not be put in blocks.)

This was the new state of unset variables:

>> unset 'x

>> x
** Error: X is an unset variable

>> get/any 'x
== ~  ; trash

>> if get/any 'x [print "Ornery!"]
** Error: trash is neither logically true nor false

So NULL now represented a middle ground. It was something that was easy to test for being nothing (using IF) but that was impossible to accidentally put into a block.

This gave you three behaviors:

[1]  >> trash-value
     ** Error: TRASH-VALUE variable is unset

[2]  >> null-value
     ; null

     >> append [a b] null-value
     ** Error: APPEND does not allow adding NULL to blocks

[3]  >> blank-value
     == _

     >> append [a b] blank-value
     == [a b _]

WORD! Isotopes Brought Infinite Non-Valued Choices

Eventually the NULL state became the isotopic status of the WORD! null, so a ~null~ antiform.

It joined ~true~ and ~false~ as being antiforms you could test for truthiness and falseyness.

You'd use the null antiform as the initialization for something you may run some code and find it doesn't assign, and you want to be able to test that.

 directory: ~null~

 for-each [key val] config [
     if key = 'directory [
         if directory [
             fail ["Directory was already set by config:" directory]
         ]
         directory: val
     ]
 ]

VOID Provided a Clean "Opt-Out" Option

An unfortunate sacrifice that had been made in the design was that the "non-valued" status of NULL was chosen to raise attention to an error condition, rather than be an opportunity to opt-out of an APPEND:

>> append [a b] null-value
** Error: This error is the choice that we went with

>> append [a b] null-value
== [a b]  ; would have been another possibility, but too accident prone

Some "strange" things were tried...such as making it so that appending a BLANK! was a no-op, and if you wanted to append a literal blank you had to append a quoted blank:

 >> append [a b] _
 == [a b]  ; hmmm.

 >> append [a b] quote _
 == [a b _]  ; hmmm.

(It wasn't that strange considering appending a BLOCK! would append its contents, and a quoted block was being tried as the way of specifying /ONLY. This line of thinking ultimately led to the designs for the isotopes that solve things like splicing intent, so it wasn't all for naught!)

After invisibles were rethought as NIHIL, the VOID state that could be stored in variables came along as a new piece of the puzzle.

>> void
== ~void~  ; anti

>> meta void
== ~void~

I realized that void was the perfect choice for opting out of operations:

>> append [a b] void
== [a b]

>> append void [a b c]
== ~null~  ; anti

As you see above, an operation can return null when it doesn't have another good answer for giving back in case of a no-op. This gives good error locality, since the null won't trigger another opting out unless you explicitly convert the null to a void with MAYBE.

>> append (append void [a b c]) [d e f]
** Error: APPEND doesn't accept ~NULL~ antiform for the series argument

>> maybe null

>> append (maybe append void [a b c]) [d e f]
== ~null~  ; anti

Void variables were deemed to be something it would be undesirable to have be "too friendly". So they were slated to be mean on WORD!-access, and instead MAYBE would be used with null variables.

This gives a (seemingly) complete picture

[1]  >> trash-value
     ** Error: TRASH-VALUE variable is unset

      >> append [a b] get/any 'trash-value
      ** Error: APPEND does not allow adding ~ antiforms to blocks
      
[2]  >> void-value
     ** Error: VOID-VALUE is void (use GET-WORD! or GET/ANY)

     >> append [a b] :void-value
     == [a b]

[3]  >> null-value
     == ~null~  ; anti

     >> append [a b] null-value
     ** Error: APPEND does not allow adding NULL to blocks

[3a] >> append [a b] maybe null-value
     == [a b]

[4]  >> blank-value
     == _

     >> append [a b] blank-value
     == [a b _]

hostilefork · November 30, 2022, 4:00am

I've brought the above discussion up to date, and I think it paints a pretty clear picture of why things are the way they are.

But I realized while thinking about what the right default for ARRAY would be, that there are two single-character reified non-values:

[1]  >> array 3
     == [_ _ _]  ; BLANK!s

[2]  >> array 3
     == [~ ~ ~]  ; Quasi-BLANK!s (meta-TRASH!)

This Offers Us Some Nuance Even If A State Must Be Reified!

When it comes to the direct behavior of APPEND to a block, these states have to work the same. In the era of isotopes, all reified values are appended as-is... it cannot (and should not) be any more complex:

>> append [a b] second [c _]
== [a b _]

>> append [a b] second [c ~]
== [a b ~]

>> append [a b] second [c [d e]]
== [a b [d e]]

But when we throw in an extra operation, we can imagine a difference. For instance, we could make BLANK! semantically equivalent to an empty array for the purposes of things like SPREAD or EMPTY?

>> spread second [c []]
== ~()~  ; anti

>> spread second [c _]
== ~()~  ; anti

>> append [a b] spread second [c _]
== [a b]

>> empty? second [c _]
== ~true~  ; anti

...and then we'd say that if you tried to do such things with a "meta-trash", it would be an error:

>> spread second [c ~]
** Error: SPREAD does not accept meta-trash arguments

I think this suggests that ~ makes a better choice for the default value of ARRAY elements! We can't default to an antiform like the one representing unset variables, but it's the closest thing.

Ultimately it came to seem that having only the antiforms ~null~ and ~false~ be falsey was more valuable than having BLANK! be falsey. Simply being able to assume that anything you can find in an array is truthy offered more leverage. So blanks are now truthy, BUT they're empty.

What About Opting Out Of As-Is Appends, etc?

I mentioned that all items that can be found in a block have to act mechanically identically when it comes to TAKE-ing and APPEND-ing them. But what would XXX be if you wanted the following?

>> append [a b] xxx second [c [d e]]
== [a b [d e]]

>> append [a b] xxx second [c _]
== [a b _]

>> append [a b] xxx second [c ~]
** Error: Cannot append TRASH (~ antiform) to a block

I'm trying having this operator be called DEGRADE. It would turn all quasiforms into their corresponding antiform:

>> degrade first [~asdf~]
== ~asdf~  ; anti

>> degrade first ['foo]
== 'foo

>> degrade first [123]
== 123

The reverse of this operator would be REIFY.

What About FOR-EACH Variations?

I think an additionally neat spin on how these can be treated differently can be how FOR-EACH responds.

>> for-each x (second [c []]) [
       print "Loop never runs"
   ]
; void

>> for-each x (second [c _]) [
       print "Loop never runs"
   ]
; void

>> for-each x (second [c ~void~]) [
       print "Loop never runs"
   ]
== ~null~  ; anti (like a void in, null out... or if a BREAK was hit)

>> for-each x (second [c ~]) [
       print "Loop never runs"
   ]
** Error: FOR-EACH does not accept QUASI-BLANK as its data argument

This is a bit more speculative, but I like the general idea that a quasi void could let you have a kind of nothing that gave you the "opt out" ability in places where it could... and quasi blank could give you an error, while blank acts like an empty series. This seems to offer some nice invariants that reduce overall code you have to write handling edge cases.

I hope that this all plugs together, @rgchris. Can you review this thread and tell me if I've finally gotten it all to gel for you?

hostilefork · June 22, 2023, 11:51pm

hostilefork:

Void variables were deemed to be something it would be undesirable to have be "too friendly". So they were slated to be mean on WORD!-access...
[2]  >> void-value
     ** Error: VOID-VALUE is void (use GET-WORD! or GET/ANY)

     >> append [a b] :void-value
     == [a b]

So... I'm questioning the necessity of making accesses of void variables require a GET-WORD!.

There's a certain amount of historical bias against "liberal voids" that dates back to the old-timey havoc that voids could wreak on code structure. They'd just skip the evaluator ahead:

old-ren-c>> append [a b c] void 'd
== [a b c d]

Those were the days when it had no way to represent in an object whatsoever. A bit later, a voided variable meant an unset variable...so it couldn't be too easy to read them. That was supplanted with the idea that unset variables are actually blank antiforms, completely distinct.

Today voids are meta-representible with a quasiform:

>> make object! [
    a: if false [10]
    b: void
    c: ~void~
 ]
 == make object! [
     a: ~void~
     b: ~void~
     c: ~void~
 ]

The main point of lingering concern is that void variables might be "too easy to make", and if you assign void to a variable you might wind up opting out of things you didn't mean to opt out of.

var: case [
    conditionA [...]
    conditionB [...]
]  ; imagine none of the cases run

append [a b c] var  ; is this too easy to be a no-op?

But is that fear really justified, when you could have written:

append [a b c] case [
    conditionA [...]
    conditionB [...]
]  ; again, imagine none of the cases run, no-op by design

The whole idea of void-in-null-out is designed to limit the issues of error locality that historical "none propagation" had. The odds of protecting someone are probably lower than interfering with legitimate void intent.

Some of the most "fearsome" motivating cases are no longer applicable. For instance, REDUCE was historically used to assign variables itemwise...and a vanishing expression could wreak havoc:

 set [a b c] reduce [expr1 expr2 expr3]

Modern ideas like PACK don't vaporize voids... they preserve them in a meta state:

 >> meta pack [1 void 3]
 == ~['1 ~void~ '3]~

And the SET-BLOCK! and SET-WORD! unpacking logic is much more clever!

By and large people lobbied for the convenience of REDUCE being able to splice and evaporate expressions. Cases where an exact number of expressions with no splices or evaporations are needed are less common, and could be handled with REDUCE/EXACT or other routine.

And if UPARSE became void-friendly it would help with the issue of dealing with NULL (and void) in UPARSE. Null rules would error, void rules would bypass without advancing the input.

You can't skip voids as arguments, and interstitial voids no longer vanish...they cause evaluations to be void:

>> x: if 1 > 2 [print "not run"]
== ~void~  ; anti

>> 10 + 20 x
== ~void~  ; anti

But this oddness is really just kind of an epicycle of the rest of the oddness of what voids do inside COMPOSE or REDUCE or anything else. Are these examples any scarier than one another?

 >> do [10 + 20 x]
 == ~void~  ; anti

 >> reduce [1 + 2 x 10 + 20]
 == [3 30]

Considered in total, I think it's time to be more accepting of the fluidity and power of void variables. If you don't like them, don't make them.

(This might mean making constructs like CASE and SWITCH yield null when no branches are taken? Or that could be a choice people make in their own base library.. to tweak voiding control structures to be nulling when no branches taken... so you'd have to say maybe case [...] to get voiding behavior)

bradrn · March 9, 2024, 1:17am

I’ve read this through a couple of times, but am still not sure I understand it correctly. The discursive treatment in the first post is nice for motivating the various types, but confusing for actually getting a handle on how they work.

So, trying to summarise the situation, here’s my understanding at the moment of the various values involved:

‘Blank’: an ordinary datatype inhabited by the single value _.
‘Trash’: the antiform of _. Throws an error on variable access, or any other attempt to use it. Used to represent unset variables.
‘Void’: the antiform of void. Throws an error on variable access or any other attempt to use it, except with APPEND, where it acts as a no-op. Not sure how this is used.
‘Null’: the antiform of null. Does not throws an error on variable access, and tests falsey in conditionals, but throws an error on other attempts to use it. Used to represent uninitialised variables.
‘Nihil’: the antiform of an empty block, i.e. a multi-return containing no elements. Completely ignored by the evaluator.

Does this seem correct?

hostilefork · March 9, 2024, 2:14am

Note that NIHIL is only ignored in "interstitial slots". If you try to call a function that isn't expecting a NIHIL and pass it as an argument, that's an error:

>> append [a b c] 'd comment "ignored"
== [a b c d]

>> append [a b c] comment "not ignored" 'd
** Script Error: No value in antiform BLOCK! pack: ~[]~ (nihil)

At one point in time, the second worked. For an understanding of why it no longer does, see:

Invisibility Reviewed Through Modern Eyes

Void is used generically in many places when you want things to vanish:

>> compose [<a> (if false [<b>]) <c>]
== [<a> <c>]

Allowing NULL to vanish here would be too liberal and not reveal what were likely errors. If you have something that may be NULL that you want to convert to a VOID if so, you can use MAYBE.

VOID is also is used for opting out of things, using the "void-in-null-out" strategy. Compare:

>> block: ["a" "b"]

>> unspaced block
== "ab"

>> to word! unspaced block
== ab

With:

>> block: []

>> unspaced block
== ~null~  ; anti

>> to word! unspaced block
** Script Error: to expects [~void~ element?] for its value argument

>> maybe unspaced block
== ~void~  ; anti

>> to word! maybe unspaced block
== ~null~  ; anti

With:

>> block: null

>> unspaced block
** Script Error: unspaced expects [~void~ text! block! the-block! issue!]
                 for its line argument

>> unspaced maybe block
== ~null~  ; anti

>> to word! maybe unspaced maybe block
== ~null~  ; anti

Historical Redbol had a lot of people asking that things give NONE! back when they took NONE! in, and this "none propagation" was messy in terms of leading to whole chains which would opt themselves out without knowing where the problem was. Void-in-null-out encourages being more purposeful--you only throw in the MAYBE where you need them.

hostilefork · March 9, 2024, 9:17am

A post was split to a new topic: How To Choose Between Returning VOID and NIHIL?