BAD-WORD! Choices: The Role of English in the Evaluator

I've chiseled and shaped things to the point where there's hopefully only one frame-mutation that happens during frame execution, that only happens for one datatype, and only when the field isn't hidden from view. I explain all that here:

"Default Values And MAKE FRAME!"

I mention something particular, which is that in the world of labeled BAD-WORD!s and isotopes, some of those labels are leaking into the mechanics of implementation.

Some of these are "easy to change" (like the convention of returning ~none~). You can wrap a function that returns none to rename its output. You could load in a new mezzanine and skinned natives and reuse the evaluator as-is.

But ~unset~ is getting baked in at a deeper level. It's something used by binding, and MAKE FRAME!:

>> f: make frame! :append
== make frame! [
    series: ~unset~
    value: ~unset~
    part: ~unset~
    only: ~unset~
    dup: ~unset~
    line: ~unset~
]

Should We Change The Unset Convention to Plain ~

I like the idea of a core engine that doesn't have any references to specific English words. It would be neat if you could load up a whole different DLL of natives and Mezzanines. But this is showing a specific behavior tied to a particular spelling of a word.

But what if we moved to plain ~ for unset? That would mean seeing this instead:

>> f: make frame! :append
== make frame! [
    series: ~
    value: ~
    part: ~
    only: ~
    dup: ~
    line: ~
]

You still get the idea (bunch of not defined things), it's just cleaner. Then imagine this:

>> f.value: '~baddie~
== make frame! [
    series: ~
    value: '~baddie~
    part: ~
    only: #
    dup: ~
    line: ~
]

>> apv: make action! f

>> apv [1 2 3]
== [1 2 3 ~baddie~]

It kind of pops doesn't it, that only the unlabeled ~ were treated as unspecialized? All those ~unset~ feel like noise.

The Distinguished State As The Unusual State

In terms of looking at BAD-WORD! and asking what the "distinguished" state is, it seems pretty clear that no name is that distinguished state.

If you're wondering what the difference between a "built-in behavior of DO" and a "built-in behavior of the evaluator" is, it comes down to what it's technically possible for you to replace without recompiling the C code.

Right now ~none~ is the signal to the console not to print any output (used by HELP for example). But you could tweak to change the console. But the unset is recognized by the internal frame mechanics.

So should ~ be the "system void!", known to mean "unset"?

It may seem more obtuse. But people typing help ~ would be able to get an explanation of it.

One thing that's particular about this state is it can't be converted to WORD!. (null = label of ~) So this makes it distinct on a sort of special level. It kind of seems like if any one single BAD-WORD! state should be having this mean property, that should be the undefined and weird state.

2 Likes

Note: This comment was made when ~unset~ was the longer ~undefined~.

I think that ~undefined~ seems too long and ~ feels too minimal. What about whittling it down a bit to ~undef~, ~def?~ or just ~?~.

There's also ~~, which if you think of the ~ as a delimiter around things could make unlabeled voids a little fatter and easier to see.

But one motivator behind using ~ is so you can still get ~/foo/bar as a PATH!.

We could exempt ~ as a WORD!, which might make sense, if all VOID!s have labels (and it would make the PATH! exemption less random). Basically, if we're not going to use it for unlabeled voids because it's too slight, maybe it should be allowed to be used for other things.

1 Like

I'm ok with all of these shorter proposals. If there's a darned good reason to dial it all the way down to ~, I can be easily persuaded.

English is a fact of life in programming, so I have no problem with some English in the evaluator.

~ is ok for me, too.

Just don't start with localised values in the evaluator it's more trouble than worth. (German Excel functions, anyone?)

3 Likes

I've been leaning to giving ~ back to WORD!. This would normalize its behavior in PATH!s like ~/foo/bar, and open it up as a potential operator in its own right...which turns out to be quite powerful for single-character tokens. You can then give meanings to:

  • ~ abc
  • ~ (foo bar)
  • ~ [baz mumble] etc. etc.

This takes a character that is kind of more ugly than you want to use in the average word, and opens up applications for it. (I feel that handling $ and $$ and $$$ in this way gives a slew of advantages which are nearly as good as having a full lexical form... if $ can quote a word, then $ abc is for the most part better than $abc, you get a binding and everything.)

But since BAD-WORD!s follow WORD! rules, opening up ~ as a WORD! means that ~~~ is a legal BAD-WORD!. That isn't really the end of the world, but it means you probably shouldn't let more than one tilde be a WORD! to avoid ambiguity.

That then distinguishes ~~~ somewhat, for whatever that is worth. Which I don't think is worth much, I like looking at and typing ~void~ or even ~undefined~ better.

So summary is that I think the "odd man out" of ~ should go back to being a WORD! It gets 99% of its usefulness there as an operator, and has the common application as home directory in PATH!s. This then lets us disallow BAD-WORD! in paths otherwise--which feels more correct. It means all BAD-WORD!s have non-empty labels, making them align fully with WORD!.

That excludes ~ as a possibility for the purpose I suggest in the original post. And I want to rule out ~~~ because while it will be a legal BAD-WORD!, I don't particularly like typing it or looking at it. (Maybe because it's so unappealing, it could be the "don't even show this one in the console" unique signal...)

I'm afraid this has become the way it is. Certain words... like ~null~ and ~void~ and ~unset~ are getting baked in with specific meanings and behaviors.

While making non-English-centric dialects might be fair game, and creating a customized console experience that translates some of these ideas at a higher level... the mechanics are going to be stuck here until the language itself gets abstracted into something more graph-like where WORD!s are linked and mentioned by GUID in binary source files...and names are put on as renderings.

For the medium that this is, I think I've made peace with the BAD-WORD! isotopes getting special behaviors. (But the non-isotope forms are all handled equivalently...except in as much that they are connected through evaluation to the behaviors when they take on their isotope form.)

Should We Change The Undefined Convention to Plain ~ ?

I mentioned the change from ~undefined~ to ~unset~.

Then I made plain ~ generate an ~unset~ isotope.

I explain why in Three Single-Character Intents… x:_, x: ‘, and x: ~

You can override this if you think of more interesting applications in your own code.

1 Like

I had a passing thought that maybe there is a compromise here, where ~ is a synonym for ~unset~

The idea would be that non-isotopes would render as the whole word, but when null isotopes are rendered in objects they would look like ~.

>> obj: make object! [
    iso: ~unset~
    plain: '~unset~
]
== make object! [
    iso: ~
    plain: '~unset~
]

Remember that in "MAKE OBJECT! notation" (e.g. escaped/quoted notation) everything that's not inert is escaped...and isotopes are not escaped to show their isotope status.

But I think this idea is kind of a bust for two reasons.

 >> code: [item: ~, if not set? 'item [print "My code changed :-("]]
 == [item: ~unset~, if not set? 'item [print "My code changed :-("]]

There are of course other cases where you lose the exact thing you wrote. But that kind of feels like it sucks more than the average "quotes became braces on a string" or "my date format got reordered". ~ seems genuinely different.

Speaking of which: we assume this ~ and ~unset~ synonym would be forced to render as ~ when it appeared in a PATH!.

But having written some path merging code, I've needed to do things like test to see if the head of a path is a blank or not. If ~ was a BAD-WORD!, then its neither-true-nor-false status would cause glitches in trying to analyze and merge paths containing it.

That actually might not be so bad... Because you generally don't want to be able to do things like:

>> join '/a/b/c/ ~/d/e/f
== /a/b/c/~/d/e/f   ; this is probably not what you want

Having the ~ be a special disruptive case would cue you to special handling.

But, that's weird. And if we make ~ a BAD-WORD! that you can put in a path, it makes prohibition of other bad words in paths seem weird.

Long story short: I think we just keep ~ as a special WORD! and live with the ~unset~ pollution.

On the plus side, we don't see as many unsets now that Sea of Words is in effect...because we're not creating a zillion variables for words that don't have a definition. Most places you see unsets will be calls to action. optional arguments in function frames--like refinements that you're allowed to leave blank--will just have to be a casualty of this.

So I think this decision is made...

~ is a WORD!... BAD-WORDS! are English known to the Interpreter

2 Likes

This was a good test to have done, because it turns out I use (var: ~) all the time. Being able to wipe out a variable's contents with a single character is super handy.

So I don't find being able to redefine ~ as some generic operator all that compelling anymore--any more than I think being able to define apostrophe as a generic operator is a good idea.

But another reason to promote ~ to being the content state for undefined variables is something else that's been nagging me...regarding the tests UNSET? and SET?.

Consider by Analogy the NULL? Function...

Remember that ~null~ isotopes exist to say "yes, this is a NULL intent, but it came back from a function you might have been testing for something where pure NULL means something else." It helps catch mistakes, like:

>> thing: null

>> if match [<opt> integer!] thing [print "Want this to run, it matched!"]
** Error: It's good we error here on the ~null~ isotope, and mention DID

Without the curve ball of the error, you'd have gotten NULL and the message wouldn't have printed. The error is great! You have to say did match instead.

But notice it means you also can't write:

>> if null = x: match [<opt> integer!] thing [print "Want this to run, it matched!"]
** Error: The = function won't take isotopes for its argument

So this is where NULL? comes in. It's not just about saving one character over null =
If you look at a function like null?, it exists to be tiny bit shorter than null =, but more importantly to help you get past isotope issues, since they take ^META parameters and are willing to decay null isotopes.

>> if null? x: match [<opt> integer!] thing [print "Want this to run, it matched!"]
Want this to run, it matched!

>> x
; null  (the isotope decayed on assignment) 

But Right Now, UNSET? Breaks The Pattern...

It's a lot less useful to test for an UNSET? isotope directly than it is to test if a variable contains an ~unset~ isotope. In the current paradigm, it's nearly never that expressions generate ~unset~ isotopes. You basically are always testing variables for unsetness.

So if UNSET? followed the pattern of testing for the isotope directly, you'd have to write:

unset? get/any 'var

It's much preferable to be able to just say:

unset? var

So that's what UNSET? does; it takes a WORD!/TUPLE! and gets it...then tests it. But I think it's quite reasonable to think if you see ~unset~ isotopes floating about, that the name implies it would be applicable to directly testing that isotope.

>> null? null
== #[true]  ; pure null

>> null? ~null~
== #[true]  ; null isotope

>> unset? ~unset~
** Error: "Weird," thinks the user, "why didn't that work?"

If we allowed ~ isotopes to represent "unsetness", the mistake would not be as easy to make.

The user would see that a variable contained an ~ isotope, and it would disconnect from the pattern of XXX? looks for a thing named XXX?.

Beyond That, I've Shown That It Reduces Clutter...

Hard to deny that it's easier to see the signal in the noise:

>> get 'frame
== make frame! [
    series: [a b c]
    value: '~something~
    part: ~
    only: #
    dup: ~
    line: ~
]

Compare that to the jumble you get with the more verbose form:

>> get 'frame
== make frame! [
    series: [a b c]
    value: '~something~
    part: ~unset~
    only: #
    dup: ~unset~
    line: ~unset~
]

And there's something particularly pleasing about seeing it all clean and fresh on a make...

>> make frame! :append
== make frame! [
    series: ~
    value: ~
    part: ~
    only: ~
    dup: ~
    line: ~
]

Plus, BAD-WORD!s are Now Truthy And Friendly

There's no reason anymore to disallow them in PATH!s and TUPLE!s.

>> item: first [~/home/Projects/ren-c/README.md]
== ~

>> type of item
== #[datatype! bad-word!]

>> if item [print "It's truthy now..."]
It's truthy now...

There's not any real reason to disallow bad words in paths and tuples.

>> type of first [~why~/not/~this?~]
== #[datatype! path!]

I can actually already think of applications for that.

It Closes the ~~~ Gap

I've pointed out that if ~ is allowed to be a WORD!, then if you are of the belief that all WORD! should be convertible to BAD-WORD! then you would be able to produce ~~~. I'm not a fan of that.

It feels a bit better to say "All WORD! can become BAD-WORD!, but you can't convert the ~ BAD-WORD! back into other words"... especially if the one you can't convert back represents an unset intent.

Assuming No Objections, I Think The Decision is Made

This doesn't have to do with eliminating English use, as there are several isotopes that have unique behavior in the evaluator:

  • ~null~ decays to NULL on variable assignment
  • ~void~ also decays to NULL on variable assignment
  • ~false~ decays to #[false] on variable assignment
  • ~blank decays to _ on variable assignment
  • ~blackhole~ decays to # on variable assignment

But the reasons have added up to a critical mass for this particular isotope to be nameless.

A post was split to a new topic: Merge Equal Fields when Molding/Printing Objects