More On the Mechanics of "Void" (Historical)

hostilefork · June 28, 2017, 6:04pm

UPDATE: This is a very dated post which uses terminology to talk about "void" as if it were the thing we now call NULL. The best place to read about what evolved from this is probably:
NULL, BLANK!, VOID!: History Under Scrutiny. Experience with the NULL-described as-void here contributed to that outcome.

I've previously argued why whatever it is that you consider to be a variable being "unset" should not be something you can put into a block as a value:

To me it seems it should be obvious...in the way that it's a bad idea to make an acid that can eat through anything. ("What would you put it in?") If there is such a value that you can find while enumerating the data in a block, that is indistinguishable from a variable which has no value... that's too tough to distinguish from an error state.

Note that I'm not prohibiting the idea of something which--when evaluated and assigned to a variable--unsets it. Ren-C has actually become more liberal about this, allowing x: () or x: void, where void is effectively void: does []. Here I'll talk a little bit more about the issue, updated from some earlier writing.

There is no "UNSET! datatype". type-of do [] isn't a DATATYPE! at all...it's a blank value, e.g. blank? type-of () is TRUE. This is not to be confused with blank? ()...which is false.
An evaluation which produces an entity with no type is said to be "void"--considered the absence of a value (as opposed to a value whose type is void! or unset! or what-have-you).
Because voids are not Rebol values, they may not be stored in a BLOCK! or other ANY-ARRAY!. Operations like APPEND, REDUCE, INSERT etc. are thus free to define meanings for what to do when they see them. This is on a case-by-case basis, e.g. join [a b c] () is [a b c], while reduce [a () b] is an error by default. (A construct like reduce-each could give more fine-grained control for handling this and other cases, e.g. collect [reduce-each x [1 + 2 () 3 + 4] [keep/only either set? 'x [:x] ['it-was-void]]]. => [3 it-was-void 7]
When a context has a value for a word, it is said that "the variable is set". If it does not have a value for a word, it is said that "the variable is not set" or "the variable is unset". This is a terminology shift--and because of it the unset? routine is deprecated, because proper usage would be unset? 'some-var and not unset? :some-var. It will be revived with the correct meaning in the future, but for now not set? 'some-var is to be used...as UNSET? is gradually replaced with VOID? in legacy modules.
I'll restate it for emphasis, that void? is not a test for a void! type. There is no void! or unset!. There is no construction syntax for a void! or unset!, because it is not a value...you can't write do [x: #[unset!]] anymore because the source block could not hold a non-value in that second spot of the block. void? is a standalone routine you only call on a transient evaluative result.

Note: This worldview makes, I think, much more sense:
set 'x value
assert [set? 'x]

unset 'x
assert [unset? 'x]
"Setness" and "unsetness" does much better as a property of a variable, than a value. Being liberated from having variables that are set to UNSET! (e.g. the interpreter source had code like SET_UNSET(value))...and all the confusions that go with that, is very nice.

By pushing void into this new role, users work with it in many more places where BLANK! ("none!") might have been used previously. As the default return result of conditional expressions and loops, with a pervasive distinct meaning for "opting out"... there are many much more elegant ways of doing things. It takes over many cases of what BLANK! used to be for--(which helps clarify the roles of BLANK! specifically as a purposeful positional placeholder that is used in blocks and should -not- be omitted).

>> compose [a (if false ['b]) c]
== [a #[none] c] ;-- R3-Alpha/Red 0.6.0
== [a c] ;-- Ren-C

>> compose [a (switch 1 [2 ['b] 3 ['c]]) d]
== [a #[none] d] ;-- R3-Alpha/Red 0.6.0
== [a d] ;-- Ren-C

>> rejoin ["a" (while [1 > 2] ["b"]) "c"]
== "anonec" ;-- R3-Alpha/Red 0.6.0
== "ac" ;-- Ren-C

The more places that void is not allowed to make sense for, the more features open up. For instance: you cannot pass a void value as a refinement argument, so they can be used to revoke refinements:

 >> condition: true
 >> append/dup copy [a b c] [d e] if condition [2]
 [a b c d e d e]

 >> condition: false
 >> append/dup copy [a b c] [d e] if condition [2]
 [a b c d e]

Note: There is one portability pitfall to be aware of, in the sense of "creates problems that can be hard to find". It's not terribly common, but is sort of along these lines:
>> all [true (if 1 > 2 [true]) true]
== #[none] ;-- R3-Alpha/Red 0.6.0
== true ;-- Ren-C
So here we see that returning void made Ren-C consider the condition to opt-out. e.g. "if 1 < 2 then I do not have a vote to contribute to this ALL". While R3-Alpha and Red saw the whole expression as being logically false. I strongly believe the Ren-C interpretation is the more powerful and useful one, and it fits perfectly with the other findings.)

Also, there is room here for tools like if? ... which would return TRUE if the condition was taken and FALSE if it was not, ignoring the result of the block. This is potentially more generically useful in some places, allowing one to not worry about what falls out the bottom of the block.

Familiarity Breeds Factoring

These new features are all fine and well to use with inline IF or WHILE or SWITCH or CASE. But if you can do something inline, it isn't long before people want to factor out the expression. If you can write all [... case [...] ...] it's quite natural to want to change that to expr: case [...] | all [... :expr ...]

But when it comes to assigning "voids", R3-Alpha, Rebol2, and Red 0.6.0 make it a bit difficult. This produces an error in all three:

 >> code: []
 >> result: do code
 ** Script error: result: needs a value

To work around this, one must use a function call to SET with a refinement, /ANY. (In Ren-C the refinement is called /OPT which ties into saying it's an "optional set"..as well as tying it to the OPT primitive that converts BLANK! to void and passes through all other values.)

>> code: []
>> set/any 'result do code
>> if unset? :result [print "It was unset"]
It was unset

There's already a mismatch here, between set 'result (...) and result: (...) when compared with get 'result and :result, demonstrated if we use a plain GET.

>> code: []
>> set/any 'result do code
>> if unset? get 'result [print "It was unset"]
** Script error: result has no value

In other words, :result is effectively GET/ANY while result: is just SET, as opposed to SET/ANY.

One reasoning could be because "getting" has two entry points via words...one that is the plain undecorated use of a word (e.g. result vs :result) while "setting" has only one hook (result:). But note that it's not appropriate to think of the leading : as /ANY, and without as just GET...because GET doesn't call functions the way an undecorated word does! So :result is the only real entry point to GET, and it's a GET/ANY.

After considerable reflection, and trying things multiple ways...in Ren-C result: acts as SET/OPT, and :result acts as GET/OPT. The unrefined GET and SET operations give errors if passed a void. This has a number of benefits, here are a few:

As explained, it provides parity in the definitions for setting and getting.
Instead of those who can handle "opted-out" variables (or are willing to just have a "downstream" error) being forced to use the ugly set/opt 'foo (...), the people for whom assurances have extra value can optionally use the prettier set 'foo (...). Ren-C offers even more choices, like foo: ensure (...) and that would work as ensure foo: (...) as well.

Note: One consideration on wanting to shift the burden for "it would be nice to check" onto those who think it's valuable is the idea that really, checking the return result of a function for just void is awfully specific. If you call a function and expect it to be an integer, but it comes back a BLANK!, then are you any "safer"? Quite arguably you are less safe, because when you go to use the assigned result and it's a BLANK! then you won't get any errors, such as with x: some-function | append data x. Where the void will complain, the BLANK! wouldn't. This shows how the previous checking was very much an illusion, and it gets in the way of much more interesting code when everything is piped together. ensure integer! x: (...) is an example of expanding the concepts.

Convenient means of unsetting: foo: ()
Cleaner expressivity of conditional sets. For instance, if one wishes to conditionally set or unset a key in a map one can write map/key: if condition [code] and it will be removed should the condition be false. This is more convenient than either condition [map/key: (code)] [unset 'map/key].
Avoids separate test for presence and lookup in a map...e.g. unless void? value: :map/key [do stuff with value]

Note: This builds on the idea that allowing any legal value to be in the mapped-to range of a map is an important feature. Regardless of the legal kinds of keys that a map permits, having the full range of values ... including BLANK! in that range... is useful. There's simply a lot that can be done. See #253 for more on how this has cleaned things up.

Overall, this has held up much better (in my opinion) than the previous design.

hostilefork · June 28, 2017, 6:04pm

Thinking Point 1: Get-Function vs Optional Get

With ERROR! "disarmed", the only "live" types which remained that needed GET-WORD! to inspect their values in R3-Alpha were UNSET! and ANY-FUNCTION!. Ren-C has pared ANY-FUNCTION! down to just one universal type, FUNCTION!...with the best qualities of functions and closures all together in one.

But I've come to think it's not a good idea to use :some-func if we expect some-func to be defined. Instead, use get 'some-func. So the schematic is:

Want a value vs a function call...variable shouldn't be unset, think it might be a function: get 'foo
Want to invoke value if it's a function or get something back otherwise, shouldn't be unset: foo
Variable may be unset, and would like to evaluate to a void if so...may or may not be a function: :foo

This will help avoid the accidental tolerance of unset variables when all you wanted was to suppress the function call.

Note that in the matrix of possibilities this leaves something out: "The variable may be unset, in which case I want a void back and no error...but if it's a function I want to run it, and if it's not I want the value." So basically if set? 'foo [foo]. This isn't necessarily an uncommon desire, but seeing whether it needed its own operation would have to come from looking at practice. If an operation did take a single 'foo argument, the operation would either have to be variadic (likely bad) or enforce that foo was arity 0, so perhaps the if set? form is the best idea.

hostilefork · June 28, 2017, 6:05pm

Thinking Point 2: Typos

One thing that happens as a result of this shift is that people use :foo a lot more often than they did before. Using a GET-WORD! instead of a plain WORD! indicates "I know it may not be set...but that's okay, because this is a spot where I want to use a void to signal an opt-out"

extra: if need-extra-stuff ["...something extra"]
print ["main message" :extra]

And that's pretty elegant...but there's now a danger of making typos that go uncaught. The reason is because "references" to words get bound before they ever "see" a "declaration":

>> foo: does [print variable] ;-- variable word is already bound here
>> variable: 10 ;-- is re-using the "definition" that the does already made
>> foo
10

Since everything is bound...no matter how it's spelled, then you can get in trouble once you consider a variable with no content as being "interesting" and not just an error.

The good-ish news here is that it's not a new problem...and it is something the module system has to handle. And among the features that it does handle in its current condition, this is one of the things it can do. Though usually the problem it was worried about was not reading, but writing: using stray variable names and creating new globals with each stray name. So more like this:

>> foo: does [varable: variable + 10] ;-- typo...
>> variable: 10
>> foo
>> print variable
10 ;-- "wait, it's supposed to be 20..."

So it's a mirror of that, just from reading things that shouldn't have a binding but get one anyway since the user context is so permissive by default.

I guess the moral of the story is that if you're not using modules, you might stick with using NONE! to carry conditions when you put them into variables. That way you can use a normal word fetch (that would fail if undefined) and then use OPT to convert the NONE! into a void:

extra: either need-extra-stuff ["...something extra"] [none]
print ["main message" (opt extra)]

There are other ways to write this in Ren-C

extra: to-value if need-extra-stuff ["...something extra"]

extra: either need-extra-stuff ["...something extra"] _

extra: all [need-extra-stuff | "...something extra"]

The last one is the best, and it shows a case of how by thinking about the difference between what NONE!/"BLANK!" is for the problem can be seen in a new light. (ALL and ANY cannot return a void or a false).

hostilefork · June 28, 2017, 6:05pm

Thinking Point 2.1: Unused Refinement Args are Not Set

I said if you're not using modules then typos could screw you up. And you might be the kind of person who says fine, you'll keep not-using-modules and go with the all [...] instead of if ... [...] pattern, and then use opt on references. Basically you'll enjoy the new inline forms when putting IF/SWITCH/CASE/WHILE* into something that can handle optionality...but when you factor, you'll stick to the convention of making sure your variables are assigned NONE! and then un-none-i-fy them. Because you didn't really want to put NONE!s in blocks before, and you're not going to start now.

Note: You might want to start thinking about it, because "NONE! is a lot more FUN!" when its literal form is just _. Before it was a terror, looking like a word when it shouldn't... you really don't like writing #[none] very much, and # never seemed right. But with _, you start having more ideas for it in dialects, and you do want to compose it into blocks and such.

That's fine when it's your code and you're making the rules, but refinement arguments are now voids whenever the refinement isn't used. That's a property of MAKE FUNCTION! itself... not FUNC or FUNCTION.

So again, without module protection on bindings, if you write something like:

foo: func [bar /ref1 baz /ref2 mumble] [
     zapf: any [:baz | :mumblle | ...] ;-- typo, uncaught! might miss mumble
     ...
 ]

By default there's a nice guard in here that keeps the average coder from accidentally using a refinement argument that wasn't supplied, because they're not set if their corresponding refinement is none. Ordinary word access will fail on them. But if you were accustomed to Rebol2/R3-Alpha's choice to make refinements NONE! by default, then you don't want to have to test ref1 or ref2 before accessing their arguments because that's "redundant"...and you use GET-WORD!, and there could be an issue.

I could just repeat "well, reading something you shouldn't isn't that different from writing something you shouldn't, you're just used to the old problem and not the new one, you need modules anyway so look at all the other benefits". And I will say that. But I gave an alternative before so it seems there should be some alternatives here.

One is defaults, which I have said is a likely addition. This covers a lot of cases, even if you just want a default of BLANK!:

x: 100
foo: func [bar /ref1 baz (x * 100) /ref2 mumble (_)] [
     zapf: any [baz | mumble | ...] ;-- typo gets caught, mumble undefined
     ...
 ]

The leading idea on defaults is they would come from the generator, e.g. this would actually be doing:

make function! [[bar /ref1 baz /ref2 mumble] [
    baz: any [:baz 10000]
    mumble: any [:mumble _] 
    zapf: any [baz | mumblle | ...]
]]

So the expressions are evaluated once at function generation time and then get grafted into the body as code (probably use a tweaked default which only acts on unset variables).