Why "TRASH" Is The Name For "Unset State"

hostilefork · December 12, 2023, 3:38pm

As isotope design became refined, I became very pleased with the choice for what unset variables hold being the antiform blank.

>> x: ~
== ~  ; anti

>> unset? 'x
== ~true~  ; anti

The alternative of using the antiform word! ~unset~ was available. But antiform blanks are a particularly pleasing choice, due to their succinct representation...which helps assigned variables stand out better in lists of mostly-unset variables. And it's hard to think of what antiform blanks would mean if they were not the unset state!

Also, if ~unset~ were used then people would probably expect unset? ~unset~ to be true. But I've held pretty strongly to my general terminology:

"There is no such thing as an 'unset value'. But variables can be considered unset...when they hold a value that is an antiform blank."

I'm pleased with this and have almost no complaints.

The problem is that ANTIFORM-BLANK is a mouthful. It needed a short name.

JavaScript Don't Care

JavaScript went with the idea that unset variables hold "undefined", and you can test for it using the typeof operator to subvert the error that is raised on access for variables that are never declared (what we might think of as unbound):

>> typeof asdf == 'undefined'
<- true

But if you define a variable, then it will retrieve the undefined state without error:

>> let jkl
<- undefined

>> jkl
<- undefined

Whatever. Point is, they don't have any crisis of conscience on whether "variables are undefined, not values". They just go with it, as historical Redbol did with UNSET!.

But I don't like it.

TRASH Came To Seem The Best Option

"trash" isn't a new idea--I suggested it way back when the unset state was being changed away from being called VOID.

Inside the codebase I'd been using the term "trash" to mean corrupting memory, e.g. setting pointers to 0xDECAFBAD so they'll generate errors if accessed in the debug build.

The concept of putting "trash" in a variable to disrupt accesses to it is kind of similar.

>> var: ~
== ~  ; anti

>> var
** Error: var is ~ antiform

"Unset variables hold trash" comes across cohesively.

The more I thought about it, the more I liked it... because it kind of takes things a "notch up" from null. What's more ornery than a null? Something that's actually "garbage". In this way the pejorative sound of trash is actually kind of consistent with "hey, you can't reference that variable from a word because it's trash".

The initial roadblock I had was how jarring it might seem to people reading source for something that returns trash...which is the default prescribed return value when you don't have any specific idea:

foo: func [
    return: [trash?]  ; weird!
    bar [block!]
][
    append bar [a b c]
    return trash  ; also... weird!
]

But there's a reasonable answer in the modern type checking world for these cases which is semiotically consistent... use a tilde. The type spec accepts it (and assumes you mean an isotope), and RETURN will accept it (where the evaluator will turn the quasiform into an antiform):

foo: func [
    return: [~]
    bar [block!]
][
    append bar [a b c]
    return ~
]

As it happens, the default return result from functions is trash. I've also proposed it might be good to make the assumption that a function without a RETURN: spec is a procedure with no return result (as opposed to an unconstrained result).

foo: func [
    bar [block!]
][
    append bar [a b c]
]

You can still say return ~ or return trash if you want in such functions if you want to return early.

In Short: It's A Weird Name, But We Don't Have To Use It Often

hostilefork · December 12, 2023, 3:48pm

Bard's Follow-Up Opinion On "Trash"

If you hadn't ruled out "trash" as a name for the unset state, I likely wouldn't have suggested it in my initial response. While it's a technically accurate term in the context of programming, it does have negative connotations that might not be desirable for everyday use in your language.

Here are some reasons why I wouldn't have recommended "trash" initially:

Negative connotation: The word "trash" implies something unwanted, useless, or even harmful. This might not be the best way to describe a state in your language that is perfectly valid and intentional.

Inconsistent with other terms: The term "trash" doesn't seem to align with the other descriptive, concise, or metaphorical options presented. It might create a confusing inconsistency in the naming scheme of your language.

Potential confusion with runtime errors: Since you mentioned using "trashing" for memory corruption in your implementation, there could be confusion between the two terms for users unfamiliar with the internal workings of the interpreter.

However, after understanding your explanation for preferring "trash," I do see its potential appeal. It accurately represents the behavior of the unset state in throwing an error when referenced. Additionally, its connection to your implementation details adds a layer of technical elegance.

Ultimately, the decision of naming the unset state is yours. Both "trash" and the other suggestions have their pros and cons. Weighing the different factors, such as clarity, uniqueness, and aesthetics, will help you choose the name that best fits your language and its users.

When I told it about the fact that tildes could appear in more contexts (like return ~ vs. return trash) so the potential jarringness of the name need not affect new users:

Interesting approach to handling the naming of the "trash" state in your language. Making it implicit through the tilde symbol ~ and testing for its presence through terms like "Unset" or "Undefined" simplifies the interface for new users. This allows them to utilize the concept without getting bogged down by the specific terminology behind it.

However, having a verbal name like "trash" still serves a purpose for advanced users and for discussions about the language itself. It provides a concise and clear way to refer to the concept, especially when contrasting it with other states like "null" or "void."

Overall, your approach seems to strike a good balance between simplicity and clarity. By using the tilde symbol for everyday usage and reserving the "trash" name for more specific contexts, you cater to both novice and experienced users of your language.

ChatGPT's Follow-Up Opinion On "Trash"

I understand the reasoning behind considering "trash," especially given its contextual link to the way your interpreter handles memory corruption. Without the context you provided earlier, "trash" could indeed be a viable option among other suggestions. It succinctly communicates an unset state without explicitly implying corruption, especially if it aligns well with the technical aspects of your language implementation.

If you were to ask without ruling out "trash," I might have suggested it as a possibility considering its relevance to memory management and technical contexts, but I would have also offered other options to ensure a range of choices to fit your specific needs.

Names for concepts in programming languages often carry different connotations and interpretations, and it's crucial to find a term that best fits both the technical aspect and the conceptual clarity within the context of your language. If "trash" aligns well with your language's technical mechanisms and communicates the intended meaning clearly, it might indeed be a fitting choice.

When I told it about the fact that tildes could appear in more contexts (like return ~ vs. return trash) so the potential jarringness of the name need not affect new users:

That's a clever approach! Using a symbol like "~" to represent the concept of trash and allowing it to be used without explicitly mentioning its name streamlines its usage. It allows newcomers to interact with the language without needing to worry excessively about the specific term for that state. Then, when verbalization or explicit identification becomes necessary, having a name like "trash" readily available provides a clear way to discuss and refer to that state.

This strategy offers a balance between simplicity in usage and clarity in communication. It ensures that users can work with the language comfortably while also having a recognizable term available for discussions or specific contexts where naming becomes essential. It's a smart way to handle such nuances within the language's design.

I thought to ask ChatGPT about NIHIL also, and I liked how it seemed to "get" me

The distinction you're making between various degrees of absence or nothingness within your language is intriguing. Naming concepts that are distinct yet related can indeed be quite challenging. "Nihil" is an interesting term, but as you mentioned, it might not immediately convey the precise nature of this extreme absence of value.

Given your language's nuanced hierarchy of null, void, trash, and now this even-more-extreme absence, perhaps a term that denotes an absolute lack of existence or vanishing might work...