Shades of Distinction In Non-Valued Intents

UPDATE 2022: Rewritten completely to explain the modern answers for the topic that inbound links to this thread wanted to talk about.


Rebol2/R3-Alpha/Red Have Two Kinds of Nothing (both reified)

Historical Redbol gives you two main choices for "nothingness"...#[none] and #[unset]... both of which can be found either in variables, or as values in blocks:

rebol2>> block: reduce [none print "print returns unset"]
print returns unset
== [none unset]  ; misleadingly renders as WORD!s

rebol2>> type? first block
== none!

rebol2>> type? second block
== unset!

Using #[none] has the advantage of being "friendly" on access via word, allowing you to write things like:

rebol2>> var: none

rebol2>> either var [print "do something with var"] [print "do something else"]
do something else

But when var contained an #[unset], you'd get an error instead:

rebol2>> unset 'var

rebol2>> either var [print "do something with var"] [print "do something else"]
** Script Error: var has no value

So instead of using var directly, you had to do something more circuitous and pass the word "var" into a special test routine (morally equivalent to today's set? 'var)

Hence #[none] was reached for frequently out of convenience. Yet this convenience came with a cost: it was very easy to accidentally append one to a block, even if its non-valued intent should have conveyed you might not have wanted to add anything at all.

But it's hard to say: sometimes you did want to add #[none] to a block, to serve as a placeholder.

Also, being able to enumerate a block which contained #[unset] values was problematic, because if you did something like a FOR-EACH it would appear that the variable you were enumerating with was itself not set.

Early Ren-C Made Reified BLANK! and non-Valued NULL

One thing that bugged me was that there was no "pretty" representation for a non-valued state in a block... and that #[none] often thus displayed itself as the word none (seen in the example at the top of the post).

So the BLANK! datatype took the single underscore _.

>> second [a _]
== _

>> if blank? _ [print "yep, it's a blank"]
yep it's a blank

>> if not _ [print "blank is also falsey"]
blank is also falsey

And critically, one of the first things I tried to do was rethink the #[unset] state into something that you'd never find in a block, and called it NULL (as well as made it correspond to C/Javascript null in the API):

>> second [a _]
== _

>> third [a _]
; null

Since NULL couldn't be found in a block, it wasn't ambiguous when you got NULL back from a block operation as to whether there was a "null in that position".

But it's still just two things:

  • blank! - A nothing you can put in a block

    • it was logically false
    • it was friendly via word access (no need for GET-WORD!)
  • null - A nothing you couldn't put in a block

    • it was also logically false
    • it was unfriendly via word access (need GET-WORD! for :VAR, or SET? 'VAR)

This put you in a difficult situation for your choices of emptiness when you were dealing with something like:

append block value  ; what nothing state should you use for value?

If you wanted to avoid accidentally appending blanks to arrays, you kind of wanted NULL so you'd get an error. But once you used NULL, you could not write the convenient if value [...] control structure.

Later Ren-C added a separate "ornery" non-Value State

A third state was added to be neither logically true nor false, and that would trigger an error on accessing a variable with it. (I'll whitewash history a bit and say this state was always called "TRASH", and also always could not be put in blocks.)

This was the new state of unset variables:

>> unset 'x

>> x
** Error: X is an unset variable

>> get/any 'x
== ~  ; trash

>> if get/any 'x [print "Ornery!"]
** Error: trash is neither logically true nor false

So NULL now represented a middle ground. It was something that was easy to test for being nothing (using IF) but that was impossible to accidentally put into a block.

This gave you three behaviors:

[1]  >> trash-value
     ** Error: TRASH-VALUE variable is unset

[2]  >> null-value
     ; null

     >> append [a b] null-value
     ** Error: APPEND does not allow adding NULL to blocks

[3]  >> blank-value
     == _

     >> append [a b] blank-value
     == [a b _]

WORD! Isotopes Brought Infinite Non-Valued Choices

Eventually the NULL state became the isotopic status of the WORD! null, so a ~null~ isotope.

It joined ~true~ and ~false~ as being isotopes you could test for truthiness and falseyness. But if you were okay with getting an error on conditional testing, any other word could be used:

  config: ~initialize-system-not-called~

  initialize-system: func [
      {Let's say this function reads the config file}
  ][
      ...
      config: [...]
  ]

This usually causes a nice labeled message anytime someone tries to use CONFIG:

Going this route would create a pain point for anyone who thought they were going to test for whether you had a config initialized by testing if config [...]. So that has to be considered as whether it's what you want.

Because on the other hand, you should use the null isotope as the initialization for something you may run some code and find it doesn't assign, and you want to be able to test that.

 directory: ~null~

 for-each [key val] config [
     if key = 'directory [
         if directory [
             fail ["Directory was already set by config:" directory]
         ]
         directory: val
     ]
 ]

VOID Provided a Clean "Opt-Out" Option

An unfortunate sacrifice that had been made in the design was that the "non-valued" status of NULL was chosen to raise attention to an error condition, rather than be an opportunity to opt-out of an APPEND:

>> append [a b] null-value
** Error: This error is the choice that we went with

>> append [a b] null-value
== [a b]  ; would have been another possibility, but too accident prone

Some "strange" things were tried...such as making it so that appending a BLANK! was a no-op, and if you wanted to append a literal blank you had to append a quoted blank:

 >> append [a b] _
 == [a b]  ; hmmm.

 >> append [a b] quote _
 == [a b _]  ; hmmm.

(It wasn't that strange considering appending a BLOCK! would append its contents, and a quoted block was being tried as the way of specifying /ONLY. This line of thinking ultimately led to the designs for the isotopes that solve things like splicing intent, so it wasn't all for naught!)

After invisibles were rethought as NIHIL, the VOID state that could be stored in variables came along as a new piece of the puzzle.

>> void

>> quote void
== '

I realized that void was the perfect choice for opting out of operations:

>> append [a b] void
== [a b]

>> append void [a b c]
== ~null~  ; isotope

As you see above, an operation can return null when it doesn't have another good answer for giving back in case of a no-op. This gives good error locality, since the null won't trigger another opting out unless you explicitly convert the null to a void with MAYBE.

>> append (append void [a b c]) [d e f]
** Error: APPEND doesn't accept ~NULL~ isotope for the series argument

>> maybe null

>> append (maybe append void [a b c]) [d e f]
== ~null~  ; isotope

Void variables were deemed to be something it would be undesirable to have be "too friendly". So they were slated to be mean on WORD!-access, and instead MAYBE would be used with null variables.

This gives a (seemingly) complete picture

[1]  >> trash-value
     ** Error: TRASH-VALUE variable is unset

      >> append [a b] get/any 'trash-value
      ** Error: APPEND does not allow adding ~ isotopes to blocks
      
[2]  >> void-value
     ** Error: VOID-VALUE is void (use GET-WORD! or GET/ANY)

     >> append [a b] :void-value
     == [a b]

[3]  >> null-value
     == ~null~  ; isotope

     >> append [a b] null-value
     ** Error: APPEND does not allow adding NULL to blocks

[3a] >> append [a b] maybe null-value
     == [a b]

[4]  >> blank-value
     == _

     >> append [a b] blank-value
     == [a b _]
1 Like

I've brought the above discussion up to date, and I think it paints a pretty clear picture of why things are the way they are.

But I realized while thinking about what the right default for ARRAY would be, that there are effectively three single-character reified non-values:

[1]  >> array 3
     == [_ _ _]  ; BLANK!s

[2]  >> array 3
     == [~ ~ ~]  ; Quasi-VOID!s

[3]  >> array 3
     == [' ' ']  ; Quoted-VOID!s

This Offers Us Some Nuance Even If A State Must Be Reified!

When it comes to the direct behavior of APPEND to a block, all these states have to work the same. In the era of isotopes, all reified values are appended as-is... it cannot (and should not) be any more complex:

>> append [a b] second [c _]
== [a b _]

>> append [a b] second [c ~]
== [a b ~]

>> append [a b] second [c ']
== [a b ']

>> append [a b] second [c [d e]]
== [a b [d e]]

But when we throw in an extra operation, we can imagine a difference. For instance, we could make BLANK! semantically equivalent to an empty array for the purposes of things like SPREAD or EMPTY?

>> spread second [c []]
== ~()~  ; isotope

>> spread second [c _]
== ~()~  ; isotope

>> append [a b] spread second [c _]
== [a b]

>> empty? second [c _]
== ~true~  ; isotope

...and then we'd say that if you tried to do such things with a quasi-void or a quoted-void, it would be an error:

>> spread second [c ~]
** Error: SPREAD does not accept QUASI-void arguments

>> empty? second [c ']
** Error: EMPTY? does not accept QUOTED-void arguments

I think this suggests that ~ makes a better choice for the default value of ARRAY elements! We can't default to an isotope like the one representing unset variables, but it's the closest thing.

Ultimately it came to seem that having only the isotopes ~null~ and ~false~ be falsey was more valuable than having BLANK! be falsey. Simply being able to assume that anything you can find in an array is truthy offered more leverage. So blanks are now truthy, BUT they're empty.

What About Opting Out Of As-Is Appends, etc?

I mentioned that all items that can be found in a block have to act mechanically identically when it comes to TAKE-ing and APPEND-ing them. But what would XXX be if you wanted the following?

>> append [a b] xxx second [c [d e]]
== [a b [d e]]

>> append [a b] xxx second [c _]
== [a b _]

>> append [a b] xxx second [c ~]
** Error: Cannot append TRASH (~ isotope) to a block

>> append [a b] xxx second [c ']
== [a b]

I'm trying having this operator be called DEGRADE (or, UNREIFY?). It would turn all quasiforms into their corresponding isotope, and its behavior of turning QUOTED! void into a void would be a unique behavior for which that is the only quoted form that it would do that for.

>> degrade first [~asdf~]
== ~asdf~  ; isotope

>> degrade first ['foo]
== 'foo

>> degrade first [123]
== 123

>> degrade first [']
; void

The reverse of this operator would be REIFY.

What About FOR-EACH Variations?

I think an additionally neat spin on how these can be treated differently can be how FOR-EACH responds.

>> for-each x (second [c []]) [
       print "Loop never runs"
   ]
; void

>> for-each x (second [c _]) [
       print "Loop never runs"
   ]
; void

>> for-each x (second [c ']) [
       print "Loop never runs"
   ]
== ~null~  ; isotope (like a void in, null out... or if a BREAK was hit)

>> for-each x (second [c ~]) [
       print "Loop never runs"
   ]
** Error: FOR-EACH does not accept QUASI-VOID as its data argument

This is a bit more speculative, but I like the general idea that a quoted void could let you have a kind of nothing that gave you the "opt out" ability in places where it could... and quasi void could give you an error, while blank acts like an empty series. This seems to offer some nice invariants that reduce overall code you have to write handling edge cases.


I hope that this all plugs together, @rgchris. Can you review this thread and tell me if I've finally gotten it all to gel for you?

2 Likes

So... I'm questioning the necessity of making accesses of void variables require a GET-WORD!.

There's a certain amount of historical bias against "liberal voids" that dates back to the old-timey havoc that voids could wreak on code structure. They'd just skip the evaluator ahead:

old-ren-c>> append [a b c] void 'd
== [a b c d]

Those were the days when it had no way to represent in an object whatsoever. A bit later, a voided variable meant an unset variable...so it couldn't be too easy to read them. That was supplanted with the idea that unset variables are actually void isotopes, completely distinct.

Today voids are meta-representible with a lone apostrophe:

>> make object! [
    a: if false [10]
    b: void
    c: '
 ]
 == make object! [
     a: '
     b: '
     c: '
 ]

The main point of lingering concern is that void variables might be "too easy to make", and if you assign void to a variable you might wind up opting out of things you didn't mean to opt out of.

var: case [
    conditionA [...]
    conditionB [...]
]  ; imagine none of the cases run

append [a b c] var  ; is this too easy to be a no-op?

But is that fear really justified, when you could have written:

append [a b c] case [
    conditionA [...]
    conditionB [...]
]  ; again, imagine none of the cases run, no-op by design

The whole idea of void-in-null-out is designed to limit the issues of error locality that historical "none propagation" had. The odds of protecting someone are probably lower than interfering with legitimate void intent.

Some of the most "fearsome" motivating cases are no longer applicable. For instance, REDUCE was historically used to assign variables itemwise...and a vanishing expression could wreak havoc:

 set [a b c] reduce [expr1 expr2 expr3]

Modern ideas like PACK don't vaporize voids... they preserve them in a meta state:

 >> meta pack [1 void 3]
 == ~['1 ' '3]~

And the SET-BLOCK! and SET-WORD! unpacking logic is much more clever!

By and large people lobbied for the convenience of REDUCE being able to splice and evaporate expressions. Cases where an exact number of expressions with no splices or evaporations are needed are less common, and could be handled with REDUCE/EXACT or other routine.

And if UPARSE became void-friendly it would help with the issue of dealing with NULL (and void) in UPARSE. Null rules would error, void rules would bypass without advancing the input.

You can't skip voids as arguments, and interstitial voids no longer vanish...they cause evaluations to be void:

>> x: if 1 > 2 [print "not run"]

>> 10 + 20 x

But this oddness is really just kind of an epicycle of the rest of the oddness of what voids do inside COMPOSE or REDUCE or anything else. Are these examples any scarier than one another?

 >> do [10 + 20 x]

 >> reduce [1 + 2 x 10 + 20]
 == [3 30]

Considered in total, I think it's time to be more accepting of the fluidity and power of void variables. If you don't like them, don't make them.

(This might mean making constructs like CASE and SWITCH yield null when no branches are taken? Or that could be a choice people make in their own base library.. to tweak voiding control structures to be nulling when no branches taken... so you'd have to say maybe case [...] to get voiding behavior)