NULL, first-class values, and safety


#1

I found a post from Carl titled "UNSET! is not first class".

It’s important to understand the unset! datatype; otherwise, we run the risk of assuming that it is first class (assignable, passable, returnable) when it’s really not intended for that kind of usage!

He gets to the idea that it’s not a “normal” value. While not taking the step to make it illegal to put in blocks, I think that was just a matter of not having thought through how to prevent it. You see the notion of a wish to quarantine something that is a “necessary evil”.

Tuning the model in Ren-C, the roles of the two cases of NONE!/UNSET! were reshaped to into NULL, VOID!, and BLANK!. This has been very successful, putting the “hot potato” nature of null to good use by keeping it something you cannot assign…while allowing it to be conditionally false. The “neither true-nor-false” role then falls to VOID!, as a prickly value that is nonetheless a value and can be put in blocks if you insist.

Should NULL assigns via SET-WORD! unset variables?

When NULL was first being introduced, it wasn’t the failure result from ANY or FIND. Those still used blank, which was more convenient since that era’s null was neither true nor false (like UNSET! had been).

Instead, null was sneaking in as the outcome of failed conditionals…as well as trying to be a “more correct” answer to things like select [a 10 b 20] 'c. There, a null result distinguished from a literal blank in a block, such as select [a 10 b 20 c _] 'c.

But with null being such a hot potato, there were difficulties. So it was tried that foo: null would unset the foo variable, vs. be an error. This made it a bit less awkward to work with if you wanted to write something like:

if not null? x: select block value [
   ...do stuff with x...
] else [
   ...x is unset, maybe do error handling here...
]

If SET-WORD!s caused errors, that would be more tricky–you’d have to test the result of SELECT for null, but the test would return LOGIC! and lose your result for assigning.

But, enfix to the rescue…once operations like ELSE and THEN came on the scene, they offered a new possibility…instead of needing IF and a test for a branch, the branch could react to the nullness before the assignment. Whether you needed error handling or default values, this pattern addressed most needs.

 x: select block value else [<default-value>]

Then null changed to be conditionally false, with VOID! picking up its neither-true-nor-false duties. Routines like ANY and ALL and FIND began returning NULL on failure, which could be used in conditionals without any extra work. Then conversion of nulls to blanks was made as easy as the short, repurposed word TRY.

With this change, it was moved back to where null assignments to SET-WORD! were errors, though I’ve sometimes wondered if there would be an advantage to letting NULL implicitly unset variables. Should you be able to say pos: find block value and then if set? 'pos […] or do you really need to say pos: try find block value?

I think experience has spoken: the errors are good

Putting together Carl’s sentiment with my own experience, the signal of unsetness that is not a first-class-value (which is NULL, now) should not silently assign to variables. TRY is the great equalizer here, which very literately lets you markup operations which you are aware can fail. It has been a great success.

But we do have some edge cases here, for instance APPEND.

>> data: copy [a b c]

>> append data case [1 = 2 ['d] 3 = 4 ['e]]
[a b c]

Should APPEND choke on the NULL? Carl had written:

…if you find I’m not enthusiastic about extending mezz functions to accept unset! values, you now know why. If you really think such a change is needed, you’ll need to write a short explanation for why the exception is required. I’m pretty open minded, but just because we can do something does not mean we should do it.

There are rather good reasons to avoid NULL arguments; one of which is that NULL is used in frames to denote unspecialized arguments. Hence you can’t really tell apart an APPEND which has had its appended value specialized to NULL from one that hasn’t had the value specialized out at all.

But what if you used TRY, and APPEND without /ONLY followed the “BLANK!-in, NULL out” protocol?

>> data: copy [a b c]

>> append data try case [1 = 2 ['d] 3 = 4 ['e]]
// null

>> data
[a b c]

>> append/only data _
[a b c _]

One problem is that mutating routines generally aren’t supposed to follow “BLANK!-in, NULL out”.

Another possibility would be to make a refinement to APPEND that suggested a known tolerance for nulls, distinct from /ONLY… e.g. /OPT

 >> data: copy [a b c]

 >> append data case [1 = 2 ['d] 3 = 4 ['e]]
 ** Error: NULL input to APPEND illegal unless /OPT

 >> append/opt data case [1 = 2 ['d] 3 = 4 ['e]]
 [a b c]

Point of all this is that I’ve become pretty convinced that assignments via SET-WORD! should error on NULL But still struggling with if there is a real hard philosophical reason why APPEND should or should not error on NULL.


#2

This is rather important, so I don’t want to take it lightly. We want to get it right, and after Beta/One will be too late. So here’s some more thought:

The core issue is two competing applications we might want to use NULL’s special status for:

  1. Representing the absence of a value in order to “opt out” of a parameter where a BLANK! has a legitimate purpose. e.g. append block _ has a reasonable interpretation as adding a blank to a block just as append block 1020 has a reasonable interpretation of adding an integer. But since append data null cannot add a value, it can distinctly be used to signal not adding one.

  2. Representing the absence of a value in order to indicate a failure to find a necessary value, allowing errors to trigger at appropriate places. For instance, if select data key returns null, then it’s convenient to have first second third select data key trigger an error at third–since that helps isolate where the problem occurred in the chain.

Case 2 led to a convention called “BLANK! in, NULL out”. This is the idea that while most routines will error on NULL input, if they get a BLANK! input (and don’t have a significant meaning for blank) then they will provide a null output. This can be mitigated with TRY to convert nulls to blanks. So you could say third try select data key and not get an error, rather a null.

Yet the needs of Case 1 are everywhere. Consider MAP-EACH:

>> map-each x [1 2 3 4] [if odd? x [x]]
== [1 3]

Or even just:

>> map-each x [1 2 3 4] [match odd? x]
== [1 3]

Doesn’t it seem nice and natural to assume that if the MAP-EACH branch returns null that nothing gets added? Isn’t that better than having BLANK! mean add nothing, and null be an error? Plus if blanks were signaling adding nothing, how would you map to a blank?

All Parameters are not Created Equal

It’s important to point out that when we’re looking at routines, different parameters are different…and may not have the same null philosophy. So append block null is distinct from append null block. A policy about parameter handling can discern and say one of these is an error and one not, without being a blanket statement about all arguments.

What has been the going idea so far with operations like APPEND is that null is legal to opt-out of the material to append, but not the thing to append to. If append didn’t mutate, you’d be able to say append _ block if you want to opt-out of the append operation by way of its series parameter, and that–in turn–would return a null. But since it mutates, the concern is that blank is used too casually as nothing-ness that if no one checks the result, you don’t want append obj/series data to silently have no effect when you’d said obj: make object! [series: _] to hold an initial value.

For the first argument I think this is solid reasoning…no nulls, no blanks. But for the second–as with the MAP-EACH–I can’t help but feel that something is lost here if the thing-to-append can’t be NULL. Not only that, but you are getting the normal return result…the series you passed into APPEND, vs. the NULL you would get if you passed a blank for the series.

I feel like this is right, and the error-raising desires of bullet point (2) just have to take a back seat when it comes to that second parameter.

Is the rule “if BLANK! has meaning here, NULL means opt out?”

This rule feels rather weird. But can we say that all routines where passing in a BLANK! to a parameter is meaningful (e.g. does not mean opt-out), should those routines accept NULL in that parameter to mean opt out of that parameter?

MAP-EACH fits this rule in the sense of its body result’s blank significance (but wanting null to opt out), and APPEND does in its second parameter. Is it truly general? You can REPLACE blanks…

 >> block: copy [1 _ 2 _]
 >> replace/all block _ 0
 [1 0 2 0]

And you can REPLACE things with blanks:

 >> block: copy [3 0 4]
 >> replace block 0 _
 [3 _ 4]

So does this suggest that you should be able to opt out of those parameters with nulls? Note that opting out of a parameter doesn’t necessarily mean the operation is a no-op:

 >> block: copy [3 0 4]
 >> replace block 0 null
 [3 4]

But it could wind up having no effect:

 >> block: copy [3 0 4]
 >> replace block (if 10 > 20 [0]) blank
 [3 0 4]

And do notice that opting out of what-to-replace like that, and being sure it won’t replace any content, couldn’t be done with blank (or a Rebol2/R3-Alpha none!).

This kind of thing has been applied unevenly in experiments. Making it a generality and committing to it would offer a lot of power…and it’s hard to see how in the scheme of things, erring on the side of “safety” instead would really help Rebol’s big picture.

But how does that rule apply with SET?

We know that set 'foo _ should be legal, as setting a variable to blank is legal. But passing a null needs to either be an error or unset the variable…“opting out” of the value cannot “opt out” of the assignment and leave it with its existing value!

>> foo: 10

>> set 'foo select [a 20 b 30] 'c

>> print foo
10 // this would pretty clearly be bad mojo

What about map removals?

If you have a map, how would you take an element out of it? It lacks positioning. Red’s lack of a non-value like NULL which is distinct from the valued BLANK! (“none!”) means they struggle with this issue. It seems a waste to not take advantage of Ren-C’s hard-earned upper hand, here.

One option would be poke m key null, which might be good enough. But what about m/(key): null? If SET-PATH! (and likely SET-WORD!) allowed that, then it could work too.

But as I mentioned in the initial post on this thread, I’ve been pretty personally sold on not allowing null assignments via SET-WORD!s. This means you’re covered with an error when you write stuff like:

 x: <foo>
 ...
 num: switch x [
     <bar> [1 + 2]
     <baz> [3 + 4]
 ]

You don’t have to be paranoid and throw in some default case or failure branch. You know null assignments are going to be illegal so you can just code in the cases that match if those are all you’re expecting. If you want to indicate you’re okay with failing, you can say try switch. I don’t think it’s just CASE and SWITCH that benefit…it’s across the board. And it allows more comfort in making things like GET be willing to return NULL without a special “GET/ANY”-like refinement.

But how valuable is that error locality vs. being able to unset map keys or variables more easily? Tough decisions, here!

I’m not sure exactly what all this points to. But it feels that null parameters likely do have to be tolerated for the second argument of routines like APPEND…and by extension, probably for the second and third arguments of REPLACE, etc.


#3

My gut feeling is, it should unset variables and be a noop in append/replace cases, though not for the first value.
I can think of places, where having it error may be desirable.
I could imagine 'set’s being errors, and otherwise it being noops.


#4

I think the preponderance of evidence is on the side of unsetting.

Substitution principle

We know that we want you to be able to say things like print [… if false […] …] or otherwise have complex expressions evaluate to null and errorlessly signal an opt-out. But when this can be an arbitrarily complex expression, shouldn’t you be able–without thinking about how to rewrite it–factor it out?

print [
    ...
    some complex expression returning null
    ...
]

=>

 sub: some complex expression returning null
 print [... :sub ...]

It seems it shouldn’t be harder than that. If you try to accomplish the same thing with TRY and OPT, then by definition you are losing information…since you conflated nulls with blanks just for the sake of getting things into a variable. That’s an opportunity to screw this up–and it’s more typing/code.

Safety injection requirement may make code LESS safe

For instance, imagine a world where null assignments unset:

all [
    foo: select some-data item
    bar: any [whatever whatever-else]
] then [
    do stuff with foo and bar
]
// if foo or bar caused a failure, they'll be unset and trigger errors

This kind of pattern gets you into the THEN with the knowledge that FOO and BAR are not null (in this case, you also know they’re not false or blank). But a null that caused the THEN not to run will leave whatever variable was involved in a state where accessing it gives an error.

Now think about the rote addition of TRY to dodge mandatory errors from set-word assignment:

all [
    foo: try select some-data item
    bar: try any [whatever whatever-else]
] and [
    do stuff with foo and bar
]
// but if foo or bar caused a failure, they contain a "safe" blank now

Firstly, you can’t use a THEN anymore…because you’re not testing for value-ness. You need to use AND to test for truthy-ness so blank doesn’t count to run the clause. To use THEN you’d have to get even hairier, with opt foo: try select …

Plus, the TRY made the situation worse after it. Pursuant to some of the arguments about why “blankification” is dangerous for branches (and hence voidification is better), NULL causing “unsetification” in assignment is better than having people manually blankify with TRY. Being unset is a more ornery state for a variable, and ornery is good here.

(Note: Phrasing is important here, variables cannot “hold null”. So “unsetification” is not such a ludicrous term.)

Will seem more natural to Rebol2 users, vs. needing to “junk things up”

You don’t need a “fancy” example like the ALL above with multiple assignments to see how it looks more polluted. From Rebol2, people are used to writing if pos: find data item [...]. Telling them they need a TRY to do so would likely seem like a step back, and you don’t want to force them to use a THEN if that’s not how they want to write it. Having lots of choices is the goal.

This way, they’ll only need to throw on the TRY if they have some reason for reading the pos later. But knowing the POS isn’t a valid position has merit.

It’s easier than ever to trigger your own failures in conditionals

I brought up the idea that a switch statement that doesn’t match could error by default:

num: switch x [
    <bar> [1 + 2]
    <baz> [3 + 4]
]

But switch is evaluative now, with an evaluative DEFAULT mechanism:

num: switch x [
    <bar> [1 + 2]
    <baz> [3 + 4]
    default [fail "switch didn't match"]
]

But it goes further than that, because FAIL’s argument is optional…it will just report an error where it is if you say default [fail]. And even further than that, you don’t need the default at all, just fail if you get there:

num: switch x [
    <bar> [1 + 2]
    <baz> [3 + 4]
    fail
]

That actually pinpoints the error better, because if there’s a problem outside the switch at the point of assignment you don’t know what happened (e.g. did one of the branches return void?) This applies to CASE too, and anywhere else (end of a condition branch, if you like…just FAIL with no further args is fine)

SET-WORD! behavior can’t be customized to be more lax

Programming constructs like SWITCH can be modified arbitrarily to error on more situations. You could change your SWITCH to require a special refinement or flag to allow fallthrough–for instance. Or you could make a switch that didn’t match any conditions return a VOID! value, whose sole purpose in life is to be a pain and cause errors on assignments or tests for conditional truth and falsehood.

I think in the grand scheme of things, if you really notice you’re having a problem, the language has tools to shape around that. But the behavior of SET-WORD! is part of the evaluator. It’s a strictness you wouldn’t be able to remove.

VOID! assignments are disallowed and cover several classic cases

While VOID! has nothing in particular to do with variables being unassigned, it comes up in other “no value” situations:

 >> x: do []
 ** Script Error: x: can't be VOID! (use TRY, OPT, or SET*)

 >> x: print "hi"
 hi
 ** Script Error: x: can't be VOID! (use TRY, OPT, or SET*)
 
 >> x: if true [print "hi"]
 hi
 ** Script Error: x: can't be VOID! (use TRY, OPT, or SET*)

And GROUP!s don’t synthesize values, but also prohibit x: () (albeit with a bad error message, that is a low-priority to come up with a clever way to improve without slowing down the evaluator).

Allowing NULL to do an assign is more akin to where Rebol2 casually allowed NONE! assignments these days. It still provides a bit more rigor, because if you try to use the variable on some code path where it wasn’t set, you’ll find out about that when you try to use it…vs silently having it accessible.

Gives a syntax to unset map keys

I already covered this and how Red wrote up trouble with it. Maps also may not be the only types where conveying an interest in “unsetification” is worthwhile.

And it’s a nicer syntax for just unsetting variables. foo: null seems a pretty clear way to do it, as opposed to unset 'foo. It may confuse people that they can’t follow up with if foo = null […], and need to say if :foo = null […] or if unset? 'foo. But if someone can’t get past that they probably aren’t going to be very successful in using the language.

Unsetting is better for Code Golf

While I don’t want to ruin the usability of the language for the sake of code golf, it’s clearly better for making shorter programs to remove the need for TRY.


Seems “unsetification” of null assignment is the winner

I’ve mentioned that unsetting variables on nulls was the original behavior in the design of null. It was useful and didn’t really cause any problems. The main thing I wasn’t happy about was that things like x: print “Hello” didn’t error, because print wasn’t supposed to return a result…and null was the only way to do that at the time.

VOID! values came along and picked up the responsibility for triggering those kinds of errors, while null took its special non-value status on to greater and greater duties. The existence of constructs like ELSE made it seem like maybe it was good to increase the safety by erroring on SET-WORD!s that weren’t actually sets, so null became errors.

But I think the arguments above–especially the first two–show it didn’t necessarily get safer overall. You’re not improving safety if you’re forcing people to generate values they have to turn around and transform to get the values they actually wanted…especially when that transformation loses information (conflating blanks and nulls). This is why blankification was changed to voidification, and manual blankification done by the user at callsites has all the downsides plus it junks up the code.

(As it happens, the R3-MAKE we are currently bootstrapped to still has the null-unset convention. So it’s a good thing to have this decided before committing a new r3-make)


Do not COLLECT [keep if false [$100]]
#5

I’m convinced. So as far as SET-WORD! and SET-PATH! is concerned, the deed is now done, and I feel enough scrutiny was given to justify it:

For what it is worth, I don’t think the amount of concern given over such issues is excessive. It’s a pretty important decision.

Seems this is how it’s shaping up, but it might suggest that APPEND and friends not take VOID! unless you use an /ONLY.


#6

…and I’m not having any regrets so far about the SET-WORD! assignment.

Consider INPUT, which once returned a BLANK! when it aborted (e.g. you hit Escape/Ctrl-D instead of enter, not Ctrl-C which would halt the script). As of a shortly pending commit, it will return a NULL:

while [i: input] [
    ...
]

When you’re done with that loop (assuming you didn’t BREAK), i is not set. This makes sense: There was no more input. You shouldn’t be using i anymore. If you accidentally try to use it, you should get an error at the moment of use.

But it’s still very clean and convenient to write the loop just like that. It’s the best of both worlds.

If you want to deliberately set it up so that i is testable after the loop…for instance to know if the loop broke or not, you have that option:

while [i: try input] [
    ... // code that may BREAK
]

... // more code

if i [
    ... // there was a BREAK, hence `i` is still a TEXT!
]

The TRY is needed here if you want to avoid using more complicated things like if set? 'i … which I think you generally should try to structure your program not to do, unless you have a really good reason. Unsetting a variable should not be treated like a way to track state you intend to use…it’s only applicable for that when you really are using every possible value type (generic code that operates on ANY-VALUE! in a BLOCK! for instance) and so you have no other option without making a LOGIC! variable to track “usedness”.


#7

This is a great summary of considerations and justifications for these design changes. Essential reading.