Should GET-WORD! of unset variables raise an error?

hostilefork · July 15, 2020, 3:35am

The following is the behavior of Rebol2 when a GET-WORD! is used on an "UNSET!" value:

rebol2>> unset 'x
rebol2>> type? :x
** Script Error: x has no value

So you would have to fall back on GET/ANY to really get an unset:

rebol2>> type? get/any 'x
== unset!

But R3-Alpha decided to be more lenient in this respect, and Red followed the lead:

r3-alpha>> unset 'x
r3-alpha>> type? :x
== unset!

red>> unset 'x
red>> type? :x
== unset!

I've always been skeptical of this, because quite often the reason you are using a GET-WORD! is because you think the thing in your hand might be a function and you don't want to call it. But when you let it mean "get a thing that's not set" as well, you're opening the doors to letting typos through:

return reduce ["My Action" :my-actionn]  ; whoops, I meant MY-ACTION!

Is it worth the tradeoff? If anyone would say yes, it would probably be me... because I argue for the importance of being able to write truly generic code... so that usermode can be as rigorous as the internals. But I'm not really sure. Maybe most generic code should choke on unset unless you really wanted to process them... and maybe that added step of having to use GET/ANY is what it should take.

@rgchris suggests some agreement:

I feel :var should be the route to disarming functions/errors as the imperative usage—makes passing them on more intuitive.

I will point out that if we go back to the Rebol2 style, there's nothing stopping versions of the future from opening it back up to the R3-Alpha and Red style. But if people write code expecting GET-WORD!s to return void it will be harder to backpedal later when any significant codebases exist.

This leads me to think we might want to try going back to erroring on unset with GET-WORD! and see where the pain points are. There may be tools for addressing that pain that are shaped other ways.

Any objections?

rgchris · July 15, 2020, 1:30pm

I do wonder if there'd be a way to override this, for example—OF could be allowed to reflect on VOID values as more likely than not, you're going to catch a typo if you get a result you don't expect. type of :thing is common enough and more awkward when expressed as type of get/any 'thing . Not sure if that becomes an extra burden on the function spec.

hostilefork · July 15, 2020, 6:34pm

The typos are a problem, because if (type of firrst [a b c]) is going to be sneaky and void-tolerant, that will silently evaluate to [a b c].

Nevertheless, Ren-C is capable of the experiment:

of: enfixed func [
    'property [word!]  ; soft quote for `(second [length type]) of value`
    :look [<...> <opt> any-value!]  ; hard quote variadic can peek one ahead
    value [<...> <opt> any-value!]  ; normal variadic TAKE if not voided word
][
    reflect either void? get/any try match [word! path!] first look [
        take look
        void
    ][
        take value
    ] property
]

>> type of asdfasdf
== #[datatype! void!]

>> type of second [<a> #b {c}]
== #[datatype! issue!]

The rules for what variadics can and cannot do mirror the rules the evaluator has for itself. So having a quoted variadic feed to "peek" at the next item in the feed and see if it is a WORD! or PATH! evaluating to a not set state, then bypassing that WORD! from evaluation and returning the void datatype can be done.

I doubt it's a good idea to make it do this kind of thing out of the box. But neat that you can do it.

hostilefork · July 16, 2020, 7:32pm

All right, I've gone with it. The commits are now in master branch (still pre-stackless) and R3C branch.

Because path processing is a bit costly, I went ahead with a specialization as GET* 'X which acts the same as GET/ANY 'X. It's not quite as cheap as the :X single-token GET-WORD! for accessing a voided word, since it has to fetch a function and run it. But at least it doesn't do path processing, and since it's a native and not a specialization it doesn't have to pay for the (small) specialization overhead. It might be worth it for performance-critical code to have such a thing...I don't know.

This brings back the question of if we want an easier test for whether a variable is voided, e.g. VOIDED? VAR or UNDEFINED? VAR. I had suggested that I thought VOIDED? was clearer by not introducing a new wacky term which might get confused with UNSET? (e.g. actual lack of value, NULL). But then I screwed up a usage of it which gave me second thoughts.

I've often mentioned my torment over whether to stick with "UNSET?" as meaning "set to void", but I really feel that a NULL variable is one that is not set... has no value. Contains nothing you can put in a block. We could also test this with nulled? var. But as with voided-the-word I feel like you could easy get confused with VOID? and VOIDED? as to what it is talking about, while a word that removes you from mentioning the type might be better for comprehension.

Ideas welcome...report any perceived problems or benefits. So far I like it and it doesn't seem to really affect all that much code; many of the problem areas are more on the SET side, which we've decided permit VOID! unless you pre-filter it out with constructs before the SET.

Note: Something additionally I did on master (though not R3C) was some reconciliation of terminology in the internal C sources themselves. So for instance, if an address of a bound word is found in a context and void is legal, I don't call that "Get_Word_XXX"... instead it is "Lookup_Word_XXX". This way any time you see Get_Word_XXX you know that VOID! will be pre-filtered for you and raise an error. I like to keep things in sync like this, to help people reading the C be informed by parity with language behaviors.

hostilefork · September 22, 2020, 9:51am

I think I may be building to The Answer. The watershed concept of tuple and path discernment could give what we need here.

Let's say we put it back, and then...

PATH! looks up ANY-VALUE!, but errors on VOID!. If the value that it looks up to is an ACTION!, it runs it.
- ...but new twist... if you end the PATH! in a slash, it will ensure the looked up to value is an action. If it's not, it will error.
- If an action is executed, then all results are considered legal...so it can return ANY-VALUE!, including ACTION! and VOID!
- We can observe thus that there is a policy that "nothing to the left of a slash can be void".
GET-PATH! is a variation on PATH! that does not run the action, but just fetches the value...and does not error on VOID!
- As with plain path, ending with a slash will enforce that the looked up result is an ACTION!. But since this is also the result, it further guarantees the overall evaluation is an action.
TUPLE! acts like PATH! with the distinction that nothing to the left of a dot can be an ACTION!.
- If you end in a dot, you are thus saying that the result is neither an action or a void. You don't run the action, so you just got the value looked up to.
- If you don't end in a dot, the result can be ANY-VALUE!, because you may run an action...and many actions return void.
GET-TUPLE! modifies tuple access rules to parallel GET-PATH!, but without the allowance of VOID! return results.
- A GET-TUPLE! that ends in a dot is thus pointless if you're writing it from scratch, but if it was fabricated by transformation it does the right thing.

So for day-to-day safety, tuples are the best choice: a.b.c. If you're paranoid about running code when you don't want to run code, end those tuples with dot (e.g. a.b.c.)

But this doesn't give you any particular super powers on the results of plain tuple. It would be neat to have a shorthand for checking a function invocation to say something about the results.

We could imagine it being an application of the leading position:

 /some/path/  ; ensure an action is run, then ensure result is ACTION!
 .some.path  ; action may or may not run, result is not ACTION! or VOID!

That's fairly rational. But we need tuples with leading dots to be inert too badly (for predicates), and might as well stay compatible with history having "refinements" be inert too. Two good reasons to avoid such an interpretation.

There's adding more to the the trailing position, but it has a problem differentiating check result vs. check lookup. Only these two combinations have meaning:

 some/path//  ; ensure an action is run, then ensure result is ACTION!
 some/path/.  ; ensure an action is run, result is not ACTION! or VOID

You get a contradiction out of some.path./ ("guarantee lookup isn't a function but that the result is"). And some.path.. is useless ("guarantee lookup isn't a function and that the result isn't a function"). There's no point in introducing something broken and ugly, for only two cases that no one would use.

However, if you're target is an assignment vs. passing the value on somewhere...then as a consolation prize, we can use the terminal state on assignments in SET-PATH! and SET-TUPLE! by giving meaning to it:

action/: expression-that-must-return-action
not-action-not-void.: expression-that-must-return-non-action-non-void

Though that's only two options, they're pretty good ones to have. Almost all the the time you need to know whether you're assigning an action or not. And if you're assigning an action you know it's not void, so both these assignment forms prohibit voids.

It's weird to see it all bottoming out at baseline where x: :y is absolutely 100% permissive...any state on either side. But there's a kind of purity to that, and it's certainly good for code golf (which remains an important target market in my mind!)

I think as people adjust to using TUPLE! by default, the GET-TUPLE! protection against void will seem natural. And tools like ensure integer! x and non action! y are all shaping up pretty well to fill in the rest. Terminal dot and terminal slash are just convenient and effective shorthands for common worries, for people who have systemic needs for good error locality (e.g. me debugging bootstrap!)

hostilefork · April 17, 2021, 10:36am

I'm feeling a bit of the frustration on this. And some things have changed. So it's probably time to look at these things.

I haven't got everything figured out for "sea of words" (and have been doing other things, while I think about it).

But it's bringing about a shift that's a bit like JavaScript's strict mode...where typos are not going to be introducing new definitions. This will reduce the risk of GET-WORD! being able to return "voids/bad-words" by quite a lot.

That said...I feel like "don't run functions" and "this might be undefined" are different source-level intentions. I want to know when I read code which was intended.

So I'm on the fence about the meaning of terminal dot in TUPLE!. Could obj.some-func. mean "if obj.some-func looks up to an ACTION!, don't run it"? The period meaning something terminal?

I'd been going with the idea of "be assured this isn't a function". However, the idea that functions can act like objects and have their "meta" fields accessed with dots is becoming attractive. obj.some-func.property

Something to think about. In any case, there will hopefully be good news on sea of words and modules coming along at some point... and I think it has a bearing on this question. But having different notations for the disable-function-call and this-may-be-undefined intent still feels pertinent.