BLANK! 2022: Revisiting The Datatype

hostilefork · August 25, 2022, 1:50pm

In historical Redbol's meaning of the datatype NONE!, it had the bad habit of looking like a WORD!:

rebol2>> 'none
== none

rebol2>> none
== none  ; same in R3-Alpha and Red

But it wasn't a word:

rebol2>> type? 'none
== word!

rebol2>> type? none
== none!

It was a distinct type, which also happened to be falsey (while WORD!s are truthy):

rebol2>> if 'none [print "Truthy word!"]
Truthy word!

rebol2>> if none [print "Falsey none!"]
== none

And as we can see, NONE!s served purposes of signaling "soft failures": branches that didn't run, or FINDs that didn't find, or SELECTs that didn't select... etc.

rebol2>> find "abcd" "z"
== none

rebol2>> select [a 10 b 20] 'c
== none

Ren-C Divided NONE!s roles across NULL, VOID, and BLANK!

NULL - an "antiform" state of WORD! that couldn't be put in BLOCK!s. Anywhere that NONE! would be used to signal a soft failure operation--like FIND or SELECT--would use ~null~.

>> null
== ~null~  ; anti

>> find "abcd" "z"
== ~null~  ; anti

>> select [a 10 b 20] 'c
== ~null~  ; anti

>> append [a b c] null
** Error: APPEND doesn't allow ~null~ isotope

BLANK! was represented by a lone underscore ( _ ) and could be put into blocks:
```
>> append [a b c] _
== [a b c _]
```
At the outset, it retained the choice to be falsey:
```
>> if _ [print "Won't print because blanks are falsey"]
```

VOID - another "antiform" state of WORD!, that's the result of things that are effectively "no ops". Some contexts choose to make them vanish, and when functions like APPEND get them as an argument they are treated as no-ops:

>> void
== ~void~  ; anti

>> if null [print "Doesn't print as NULL is falsey"]
== ~void~  ; anti

>> compose [abc (if false ['def]) ghi]
== [abc ghi]

>> append [a b c] void
== [a b c]

>> for-each void [1 2 3] [print "no variable"]
no variable
no variable
no variable

Question One: Should BLANK! Just Be A WORD! ?

Ren-C allows you to use underscores internally to words, so it feels a little bad to take away one word.

Outside of historically being hardcoded as falsey, what makes BLANK! fairly "built in" is that in the path mechanics, it fills in the empty slots:

>> to path! [_ a]
== /a

>> as block! 'a/b/c/
== [a b c _]

There's other places the blank is used, such as to opt-out of multi-returns.

>> [_ value]: transcode/next "abc def"
== " def"

>> value
== abc

Question Two: Does BLANK! Still Need To Be Falsey?

My feeling is that having blank be falsey doesn't have all that much benefit. NULL does a better job of it, and really what it does is mess with its usefulness as a placeholder:

>> append [a b c] maybe all [1 > 2, 3 > 4, _]
== [a b c]

>> append [a b c] maybe all [1 < 2, 3 < 4, _]
== [a b c _]  ; this makes sense to me

Thinking of BLANK! as being "null-like" in terms of non-valuedness is generally a hassle. It makes you wonder about whether something like DEFAULT should think of it as being assigned or not:

>> item: _

>> item: default [1 + 2]
== ???

In practice, I prefer only non-array-element things (NULL, NOTHING, etc.) being the only cases that DEFAULT overwrites. This is because NULL is far more useful than BLANK! when it comes to representing something that you think of as "not being assigned"... as you'll get errors when you try to use it places (e.g. in APPEND). Trying to use it to represent nothingness invariably leads to stray appearances in blocks (Shixin wrote a lot of code to try to filter them out in Rebmake, prior to it being switched to NULLs)

This makes more sense, and I think it bolsters the argument that BLANK! is less of a falsey-NULL relative...but more of a placeholder value. I've said "blanks are to blocks what space is to strings". And space is truthy:

>> if second "a b" [print "Space is truthy"]
Space is truthy

>> if second [a _ b] [print "So why shouldn't blank be truthy?"]
???

So Either Way, I Suggest The Removal of BLANK! From Being Falsey. This creates some incompatibility in Redbol (which has been using NONE! as a blank substitute). But it's something that can be worked around.

rgchris · August 26, 2022, 1:55am

There's a lot to ponder here. I think on the one hand it's important to explore all of the possibilities, on the other it seems to be getting awfully convoluted and lacking a comprehensive narrative.

I'm not up to speed with much of what has changed in this realm for some time, so I apologise if this glosses over some since settled items, though judging by this post, there's much still unsettled.

For me (using the family name) Rebol's first obligation is to represent data—both in language and the way the language is interpreted in memory. Specifically BLANK! and its underscore literal is a huge win (this is from me, the ultra-conservative sceptic) in representing positive nothingness—that a thing exists but lacks assignation: [name: "Thing" link: _]. Despite that positivity, I do think that as it represents the known absence of a value in data, it should be falsey in the general flow that data primarily should determine that flow.

What it becomes in a dialect or within the general flow as distinct from NULL is of lesser importance as I see it. If NULL is the evaluator's ultimate representation of nothingness, then there should be a way to access that in internal dialects, such as SET-BLOCK! or PATH! and the like or it is not really fulfilling its role.

I have this sense that the BLANK-NULL-VOID-ERROR story has too many actors with overlapping roles. I don't have anything tangible to back that up with at this time.

hostilefork · August 26, 2022, 6:36am

Well, that's something.

Hence you are on the side of "Taking underscore away from the word pool does more good than harm."

I'm trying to make a general engine... so it will be possible to do Redbol compatibility, and if you want different rules you should be able to have them. But the core as I see it is the "default" distribution which should be based on what is reasonably determined the "best" and most coherent.

It is a work in progress...but...I believe there's plenty of evidence that things are pointing toward a solid outcome.

The proof comes from the code: the contrast between what the approaches without it can't do (and how catastrophically they regularly fall down) vs. what the approaches with them can do cleanly and correctly.

UPARSE is a giant piece of evidence, but I think there's quite a lot more.

hostilefork · March 1, 2024, 12:38pm

I can argue pretty strongly for all the behaviors as being shades of distinction that are important.

I'll mention that I just addressed a weakness, which was that VOID didn't have a "good" representation in a block that identified it in the class of "weird states". e.g. it didn't have a quasiform. Now it does.

This means these are your stock "reified" options for shades-of-nothingness:

 [name: "Thing" link: _]
 [name: "Thing" link: ~]  ; ~ is "quasi-blank" a.k.a. "trash"
 [name: "Thing" link: ~null~]
 [name: "Thing" link: ~void~]

It's the clear design rule now that all of these forms are inert (when accessed via word) and truthy--everything you can put in a block is.

But each of these forms have different behaviors once you evaluate them (or DEGRADE them, e.g. degrade fourth [name: "Thing" link: ~void~] narrowly turns the quasiforms to antiforms without doing any transformation of other types):

 >> x: _
 >> print either x ["truthy"] ["falsey"]
 truthy
 >> append [a b c] x
 == [a b c _]

 >> x: ~
 >> print either x ["truthy"] ["falsey"]
 ** Error: x is not set (~ antiform), see GET/ANY
 >> print either get/any 'x ["truthy"] ["falsey"]
 truthy
 >> append [a b c] x
 ** Error: x is not set (~ antiform), see GET/ANY
 >> append [a b c] get/any 'x
 ** Error: APPEND expects [~void~ element? splice?] for its value argument

 >> x: ~null~
 >> print either x ["truthy"] ["falsey"]
 falsey
 >> append [a b c] x
 ** Error: APPEND expects [~void~ element? splice?] for its value argument

 >> x: ~void~
 >> print either x ["truthy"] ["falsey"]
 ** Error: ~void~ antiform is neither truthy nor falsey
 >> append [a b c] x
 == [a b c]

So...I'm afraid that BLANK!'s relationship to nullness and falseness has basically gone away. Instead, it's the "space unit" of BLOCK!s--the moral equivalent of a space character in a TEXT!.

BLANK! has important uses that make it good to be a non-reassignable unit type, taken away from WORD!. Crucially it now provides the heart of the nothing antiform to represent unset variables--using a similarly light-looking antiform/quasiform of ~ ("trash").

Also, I've mentioned its application in PATH! and TUPLE!:

>> to path! [_ x]
== /x

 >> to path! [x y _]
 == x/y/

...as well as in multi-returns:

>> [_ @end]: find "abcdef" "cd"  ; opt out of main find result, just get tail
== "ef"

>> end
== ef

Having it be out-of-band and not a WORD! is a strength in these areas.

I do think that having SPREAD of a BLANK! return a VOID or an empty splice is probably a good thing... though not completely sure on the merits of choosing one over the other. An empty splice may be "more coherent" in the sense that one probably wouldn't want foo spread _ to give back null if foo spread [] would not. Considering it "EMPTY?" in certain contexts may be appropriate as well...but not falsey.

A year down the road here, things are clicking in place. Though it would have been massively helpful to have a time machine and send a few of these posts a few years back. I'm patching a bootstrap executable to be compatible with many of the conventions, and it's pretty quick work when you know what the decisions are.

(I was just watching a video laying out the difficulties in creating the blue LED (recommended)... and it's so comprehensive on semiconductor technology that sending that one video back in time would have radically changed the course of history.)

I've got a pretty solid sense that the best substrate comes when everything you can PICK out of a block is inert and truthy, where this kind of thing holds:

backup: copy block1
block2: copy []
while [value: try take block1] [  ; you can also shim TAKE as synonym for TRY TAKE
    append block2 value
]
assert [block2 = backup]  ; always true, for any and every BLOCK! (GROUP!, etc.)

So it's upon you--the interpreter of the block--to give it meaning. If you want to know if something is blank, you say BLANK?. If you want to DEGRADE things, you do that.

I've surveyed the most code written by the most people...and maintained giant systems after those who wrote them wandered off. And I've tried a lot of things. What it's converging on is what I believe to be the best direction for this medium.

But the proof should be in your code, as well. I've held off on advocating people spin their wheels porting scripts while things are in flux, but as the flux diminishes I think it's worth it to do some porting of key Rebol2 scripts and document the experience.

A little more work on binding first... but... the time is coming.

hostilefork · March 6, 2024, 1:19pm

Okay, I think this kind of sums up the difference here:

BLANK!s aren't null-equivalents or falsey, they are EMPTY?

You (@rgchris) hopefully don't expect an empty block to be falsey.

EMPTY? is a test which can work across both blanks and empty blocks (and empty strings, binaries...), to say they are values that are intentionally empty. And then EMPTY? on null can be an error.

I'm a bit reticent to say that a blank can be passed anywhere you'd pass a void to... but rather they can be passed anywhere you can pass an empty block (or empty string?) to, and give you back the same meaning. That's actually an interesting point: if the meaning for an empty block and empty string would be different when passed to a routine, then I don't think blank should play favorites in acting like either, because it doesn't connote any particular kind of emptiness.

I have some other thoughts here about how BLANK! seems to be useful as a way of fitting into places that want to say they are conceptually holding series, but want to avoid the creation of a series identity. The issue being that you wouldn't so much mind writing [] in these slots except for the fact that what you really need is copy [] which gets ugly...and with just _ you push the responsibility of making the series to whoever starts expanding it. But then, if you're making a prototype of an object that's going to get copied that isn't enough to get new copies in the instances... which points to a deeper problem that BLANK! is only papering over. That needs a bigger discussion, but other things need to be sorted out related to objects first.

hostilefork · September 9, 2024, 9:12pm

So further in my thinking of saying BLANK!s are empty? is that we can ask "What is the LENGTH OF a BLANK!"

Trying to shape up the semantics for consistency, I think the LENGTH OF a BLANK! is 0.

BrianH didn't like the idea of R3-Alpha LENGTH? of a NONE being 0. But having split out the roles of nothingness to a finer granularity, we can say:

LENGTH OF BLANK is 0
LENGTH OF TRASH is ~error~
LENGTH OF VOID (antiform) is NULL
LENGTH OF NULL (antiform) is ~error~
LENGTH OF NOTHING (antiform) is ~error~

This heeds my policy of saying that if what the routine did would be different e.g. for a string or a block, then blank shouldn't give an answer. But here, both an empty string and an empty block say 0, so I think the length should be 0.

What About REMOVE-EACH and BLANK!

So this is an interesting one, because here you're asking to modify the input in a way that only removes elements from the input...and then returns it and the count.

>> s: [1 2 3 4 5]

>> [series count]: remove-each num s [even? num]
== [1 3 5]

>> series
== [1 3 5]

>> count
== 2

While you can't APPEND to a BLANK! meaningfully, it would be reasonable to argue that you can REMOVE-EACH from a BLANK!...because there are no elements you can remove, and so you can give back the blank and 0.

>> [result count]: remove-each x _ [fail "this part never runs"]
== _

>> result
== _

>> count
== 0

We can do that...but should we?

I'm not sure, but I do feel like this is helping shape the policy on what blanks do. You don't pass blanks into routines and get nulls out when an empty series would not do that. (This is what REMOVE-EACH was doing previously for blank, and I think that was wrong.)