Lingering Idea: Labeled "Ornery" WORD!s ("BAD-WORD!")

Something @rgchris had done that I liked was to use TAG! in module headers as a sort of TBD when the script header contained a configuration for that script.

It cued you what you needed to edit, and the script could deliver you errors telling you if you didn't configure it one way or another:

 Rebol [
     Title: "My Cool Script"
     Description: {
         Edit the variable below to set the directory.
     }
     Directory: <your-path-here>
]

The code looks to see if the directory is a FILE! or a TAG!. If it's a tag, it errors. But the erroring has to be done manually.

This made me wonder about what it would be like if there was a mean type that was WORD!-like, with an interned immutable spelling. But it would error if you tried to access a variable holding it.

Notationally, I thought surrounding the word with ~ would be good:

Directory: ~file!~

I chose this because:

  • The argument for the purpose like in headers for "TBD" seems a solid one for a light and usable notation...vs. some monstrosity like #[bad-word! "file!"].

  • We're running out of symbols, so there is not much to pick from.

  • ~ is undesirable for general use due to its hard-to-hit location on most keyboards.

  • ~ is wavy and off-putting and weird--but that's an asset for this purpose.

It seems clear that this would be a win over what tilde is used for in today's codebases.

We could call the type TILDE! although that doesn't really have the connotations for its toxic nature. TOXIC! ? I've made my many arguments over time about why it's not UNSET! and I'm more entrenched in that thinking than ever.


UPDATE: This feature was originally called "labeled voids" in 2020--under a completely different set of terminology than being used a year later. Ultimately it was called BAD-WORD!, so I've retroactively edited this thread to use that...to try and make any points raised easier to follow.


Since these can be used as a kind of deferred error--a bad result that's only there to be bad if you actually use it--the text in these words could help guide you to what happened.

Example: With branching structures, it's confusing that nulls are turned into non-null if you don't use the @[...] branch forms. But wouldn't it be clearer if there was some name on that thing?

 >> if true [null]
 == ~branched~

Or if you reduce some code that tries to put a null in a block but can't:

 >> reduce [<a> if false [<b>]]
 == [<a> ~null~]

And variables that didn't have any definition could have a nice name too:

 >> get/any 'asdfadsf
 == ~unset~

Functions that had no value to return might go with ~no-return~ or ~nothing~.


UPDATE: Functions like PRINT which have no return result ultimately went with ~none~ for this purpose, as it is short and not used for anything else anymore.


This Feels Like A Strong Play For Ren-C's Hand :call_me_hand:

It makes the "brick" a lot more useful for building with, and you can already see above that it would guide people to better awareness with things like ~branched~.

3 Likes

I like this a lot, it increases transparency in a good way and should make the language easier to learn/understand. I don't see many downsides listed-- are we missing anything?

2 Likes

Consensus on a notation is probably all that's needed.

There's even be enough cell bits that these could trace back to the file and line where they were created, if that were wired up. So if you're downstream of one of these you could get information on where it came from.

(I was just thinking about what would happen if frame-local variables were initialized to ~local~, but then wondering about what would happen once it got far from home and you wondered about its origins...local to what?)

Similar ideas might be possible for BLANK! and NULL. I should tinker with it.

2 Likes

YESSS :+1:
I like it. Much.

2 Likes

I've had this thought for a couple of years now (again, mostly triggered by Chris's tag trick). But I hadn't made a specific post about it where I proposed actual names (e.g. ~branched~). But in doing so, it kind of reinforces just how many of these might be useful.

The rule for the console could be more narrow for what it does not show. One possibility would be the unique "empty BAD-WORD!" is interpreted as "don't show in console". It means that for functions you didn't want to print a console result you'd just say [return ~] and that would be how you did it, with the baseline of functions showing if ~none~ was produced.

They Error If Evaluated, So You Have To Quote Them

Since BAD-WORD!s are "ornery" you must quote them to not get an error when they are encountered in an evaluative contexts:

>> '~foo~
== ~foo~

>> ~foo~
** Error: BAD-WORD! cannot be evaluated

UPDATE: A new strategy known as "BAD-WORD! isotopes" emerged after this initial concept. Explaining isotopes is beyond the scope of this historical thread, but suffice to say that something like ~foo~ no longer errors in evaluation... it produces a "~foo~ isotope".


Here's some more ideas of how things might work:

>> do []
== ~empty~  ; probably better if this shows, vs being invisible with ~

>> suppressed: func [x] [if x [print "not printed"] return '~]
>> suppressed false  ; console knows not to print plain ~

>> fallout: func [x] [if x [print "not printed"]]
>> fallout false
== ~branched~

>> data: []
>> proclike: func [return: <none> x] [append data x]
>> proclike 10
== ~none~  ; consider non-`~` voids shown by console a feature vs. bug

Potential for this becoming a frequently-reached-for light-error mechanism might be high. Conversion to error could use the symbol as the ID (given that these are constrained to WORD! spelling rules and interning so multiple instances aren't taking up multiple copies of the string). Conversion of errors to voids could be another interesting direction. This gives you something like the "armed" and "disarmed" states of Rebol2, but a lot more interesting.

(Note: This suggests avoiding internal tildes, so that ~bad-mojo~ can be converted to bad-mojo without incurring the problem of being a void itself. I think this suggests disallowing internal ~, and only having it at the beginning and end.)

While Rebmu will mourn the loss of ~ for word names, it will give another tool for dialecting. So it won't be like it can't be used.

2 Likes

I should mention opportunistic invisibility. What's wrong with showing all BAD-WORD!s, and making things like HELP invisible?

Well... it might not be a good fit. This might work:

>> help 10
10 is an INTEGER!

It's invisible so it could have no == whatever line.

But complications are:

  • If you typed something like 10 + 20 help foo you'd get the help for FOO, followed by == 30

  • Invisibility detection needs to be covered by DO. e.g. if your code to evaluate is [help foo] then you'd have to first turn that into a GROUP! (help foo) to signal disappearing is okay. Then you would need to have a return result from DO of that to distinguish from plain VOID!. (This will be needed anyway, just not necessarily for this.)

I just kind of feel like HELP isn't meant to vanish. It's not a comment. Something about it seems a better fit as an operation that returns a value the console understands you'd rather not see. (I think that value should be special... e.g. not the same BAD-WORD! that is used to denote unset variables or null-transitioned branches.)

1 Like

One place where this could be used in a non-trivial way is in "NewPath" that wants to use the UNIX home directory.

path: '~/foo/bar

If we allow BAD-WORD! to be one of the things in paths, there could be a special exception that says that particular pattern is permitted by the TO FILE! conversions. You'd get into a situation with:

 path: compose '(help "whatever")/foo/bar

Where you could "accidentally wind up with a nameless void in a slot where it was allowed to have stringlike meaning".

But if most voids aren't nameless, then that would be kind of a rare concern.

Anyway, you could also just say "~"/foo/bar like you would for any other component that didn't LOAD as a WORD!, but it's nice to imagine such a very common case having some kind of exemption.

I've gotten an implementation working and it's looking great. As @BlackATTR has pointed out there's a lot of parts in the box and a lot of nuance, so anything that helps people get their bearings and reinforces the mechanics is important. This is exceeding the expectations I had for it in that regard.

(I'm actually kicking myself for not just trying this out sooner. It wasn't at all hard to write, and I really have thought about this a long time!)

But I hit a couple of things that might benefit from some design thought:

Right now the evaluator considers BAD-WORD! to be an error if it is seen literally. So this would be an error:

 >> do [10 + 20 ~void~ 3 + 4]
 ** Error: Can't evaluate BAD-WORD!s

I'll point out that this is a Ren-Cism that historical Rebol/Red don't care about:

rebol2>> do [10 + 20 #[unset!] 3 + 4]
== 7

r3-alpha>> do [10 + 20 #[unset!] 3 + 4]
== 7

red>> do [10 + 20 #[unset!] 3 + 4]
== 7

Yet it feels important for the sake of not letting stray unsets that get COMPOSEd into code be silently tolerated.

If BAD-WORD! can't be literally evaluated, and you're writing code in a literal context, then you'd have to write:

foo: '~whatever~
foo: the ~whatever~

That's less satisfying than foo: ~whatever~. Although one of the allures of the tildes is looking "bad", this is not the kind of bad I'd have in mind.

One way of looking at this could be to say it's a feature of SET-WORD! / SET-PATH! assignment to look at the thing on the right, and if it's a literal BAD-WORD!, accept it. Just make that narrow exception. Which seems all right...but then you get to:

There you have an evaluative argument, and no SET-WORD!. So you either have to say [return '~] or [return the ~] (or [return x: ~])

I anticipate returning and assigning BAD-WORD!s to be common enough that this would be a loss to the prettiness of the idea.

So continuing the idea of finessing this: Maybe functions that are willing to take an evaluated BAD-WORD! argument also do the literal-bad-word-acceptance trick?

But I'll point out that tricks like this are always more nasty than they sound...what happens when you have something that tries to left-quote the BAD-WORD! that comes after an assignment? foo: ~xxx~ left-quoter. One of Ren-C's strengths is not sweeping such what-ifs under the table, so it's good to know that a trick like this wouldn't be without its ramifications.

Should This Be Attacked More Generally?

The main thing I'm trying to watch out for here is letting BAD-WORD!s that get composed into places that they have no effect and vanish. That is, cases like do compose [1 (print "Hi") 2]. However, I've suggested before that discarded literals may hint at problems and thus need to be errors. In terms of things that have wasted my time, blocks I meant to be code being silently discarded probably have caused way more insidious problems than accidentally-discarded BAD-WORD!s have...

Plus, if you want to make a BAD-WORD! in the API, it would be nice if you could say:

 REBVAL *undef = rebValue("~undefined~");

Off course you can't do that for WORD!s, you have to quote them. But it seems like a loss to make you put a quote there.

The best thing is probably to lay hope in a general solution for catching stray voids along with stray blocks or anything else, and say that BAD-WORD! is an inert type. This means hoping that dereferencing variables holding voids, passing them to arguments of functions that don't take them, and trying to use them in truthy/falsey spots is where you get your error leverage. That means their usage at the source level can stay clean.

But it means I have to put on my thinking cap and review that discard-literals-raises-error proposal to see if there's anything there.

1 Like

It would be nice if that worked.

path: '~/foo/bar

And I think, I never used tilde in an identifier anywhere.

I think in practice, it's probably a bad idea to let the evaluator gloss over BAD-WORD!s. Apostrophe is pretty slight and you have to use it plenty of other places, so I guess foo: '~xxx~ is a small price to pay.

Though with ~ being given back to WORD!, it offers another option:

>> foo: ~ whatever
== ~whatever~

>> foo: ~ (second ["nulled" "unset"])
== ~unset~

Or maybe it would have another BAD-WORD!-related function. "get with understanding the thing may be a bad word"?

>> x: '~unset~
== ~unset~
>> y: 10
== 10

>> x
** Error, x is ~unset~

>> ~ x
== ~unset~
>> ~ y
== 10

Either way, I think giving lone ~ back to WORD! is a good idea, and you can override it to whatever usage you like as an operator. We should think on what it might do, and what might make a good default.

BAD-WORD!s turned out to be more successful than I imagined. With evaluating to isotope form and the introduction of ^META values, they've redefined the landscape of how to take control of edge cases in the representation.

I wanted to come back to this thread and add a remark about how systemically useful this is. (Which led me down a path of trying to edit the historical thread to actually make some sense in modern terms....)

But anyway--today I hit a good example. I was going over some stuff in the TLS code, and found this:

; Each encrypted message in TLS 1.1 and above carry a plaintext
; initialization vector, so the ctx does not use one for the whole
; session.  Unset it to make sure.
;
unset in ctx 'client-iv
unset in ctx 'server-iv

So in TLS 1.0 mode, there is a single "iv" (initialization vector) reused for the client and server over the whole session. But you don't want to use those fields in TLS 1.1 and above, because you're supposed to be using the data from each message.

The historical trick to catch unwanted reads of these fields was to unset them. It was awkward to do so...since UNSET!s couldn't be assigned via SET-WORD!. But also, it didn't communicate any information besides "unset"

Today, you can painlessly get the effect...with a more meaningful error on access:

ctx.client-iv: ctx.server-iv: ~per-message~

So if you get an error on trying to access these variables, it will tell you the BAD-WORD! isotope is ~per-message~. You can look that up in the source or get the gist of what it means. (I'm still thinking about how we might put the file and line number into the value itself, so you could find this assignment and get at the origin of the value!)

Using BAD-WORD!s and their isotopes effectively is a lot of bang for the buck. Neat stuff!

2 Likes