BLANK! (_) as SPACE in String-Oriented Dialects

hostilefork · February 6, 2020, 11:32pm

(Note: I'm kicking off this discussion with a conclusion from an otherwise outdated thread, circa 2020.)

Having had a fair amount of time to reflect on the evolution of things, I think we need to make undecorated word fetches not produce an error with NULL. I've outlined the history of why they were erroring at the outset, and walking through it I think there's a coherent plan for when there is tolerance and when there is not (e.g. function arguments by default).

But today I realized something particularly pleasing about this. We had a concept that BLANK!...due to its non-erroring status, would be the way of "disarming" a null assignment to a variable. It's still going to be a way of disarming parameters for "blank in, null out", but not needed for a plain assignment.

That frees up blank for dialect uses distinct from NULL.

In particular, it recovers it for something that was tried for a time... being synonymous with space (#" "). The concept emerged when it was called into question whether PRINT should default to adding implicit spaces, and how you should avoid it doing so. print ["It" _ was _ "ugly"] if the common case didn't have implicit spaces, so ultimately we went with SPACED by default and you could write print unspaced [...] if that's what you wanted.

Yet _ was deemed to need to behave just as NULL did, as you couldn't put a NULL in a variable. So when you needed to, you would use blank, and it would be the non-erroring synonym. It was a sad loss for when you wanted a nice way to note spaces, but seemed to be the way it had to be.

Well, not any more!

With non-erroring NULLs, dialects are free to distinguish the behavior of blanks and nulls. NULL being the most obvious "nothing". So why not bring back BLANK! as being a synonym for space?

It still has the "no delimiters applied" status of a CHAR!. So spaced ["a" _ _ "b"] would still be just two spaces between the a and the b... not five.

Looking good!

hostilefork · July 16, 2022, 12:18pm

Even better news here in 2022...

...it's not even used for that anymore!

*BLANK!-in-NULL-out has been replaced by VOID-in-NULL-out. This is good. Because it had been uncomfortable dealing with the difference in meaning for a literal blank vs. a fetched blank:

>> maybe-unused-var: _

>> unspaced ["a" maybe-unused-var "b"]
== "ab"

>> unspaced ["a" _ "b"]
== "a b"

I tried various ways of rationalizing this (for instance, "a WORD! doesn't act like a fetched version of itself, why should BLANK! have to?") But I'll freely admit this was coming from a place of "I want blank for space in string dialects--so I'll keep justifying it until I find a good-enough sounding excuse, even if it's kind of broken."

Of course, I'm only hurting myself when I do that. This bit me when I was trying to factor the internals so that DELIMIT was based on the same code as REDUCE. Because if I thought of DELIMIT as being a kind of post-processing pass on what a reduce was doing, it wouldn't have enough information to accomplish the distinction above:

unspaced ["a" _ "b"]

Step 1: reduce ["a" _ "b"]
== ["a" _ "b"]
Step 2: ...
Step 3:
== "a b"

vs.

unspaced ["a" maybe-unused-var "b"]

Step 1: reduce ["a" maybe-unused-var "b"]
== ["a" _ "b"]
Step 2: ...
Step 3:
== "ab"

"What's step 2?"

Everything Changed...

Now any transitions from blanks to NULL-ness or VOID-ness or SPACE-ness (or anything else) will be conscious acts of a dialect. It can do that without being beholden to some idea that BLANK! has to be reserved for an implementation mechanic of something like MAYBE.

You would really make a variable NULL, and to have it disappear in something like a REDUCE or UNSPACED you'd literally use MAYBE to do that.

>> var: null

>> reduce ["a" var "b"]
** Error: var is NULL and REDUCE doesn't like that  ; paraphrased :-)

>> reduce ["a" maybe var "b"]  ; MAYBE turns NULL into VOID
== "ab"

With BLANK! now freed up, I think if the string conversions interpret it as space, that's great!

>> to text! [Hello _ New _ Blank _ World!]
== "Hello New Blank World!"

Armed with GET-BLOCK! as REDUCE, you have some great shorthand:

>> var1: "Hello"
>> var2: 'World!

>> to text! reduce [var1 _ 'New _ "Blank" _ var2]
== "Hello New Blank World!"

>> to text! :[var1 _ 'New _ "Blank" _ var2]
== "Hello New Blank World!"

Pleasing and solid. And the internals benefit--as I mentioned--by being able to do things like build DELIMIT on top of REDUCE.

_{And it should go without saying at this point... but... Redbol and your own ideas could come in and do all this differently.}

hostilefork · November 30, 2022, 12:28pm

With some of my observations on how we can use "single character intents of nothing" ([_ ' ~]), I think it's worth coming back to look at this BLANK!-as-space question one more time..._{(It's almost never just one more time, is it. )}

Shades of Distinction In Non-Valued Intents - #3 by hostilefork

I suggest that it would be nice if anywhere that would take a series, BLANK! acting like an empty series would be a cool behavior.

So... how necessary is it that we give DELIMIT behavior for BLOCK!s or GROUP!s--empty or otherwise--that we would want to have parity with empty block for blanks?

Looking at just what's there right now, today, we have this behavior:

>> c: 1020
>> d: 304

>> unspaced ["a" ["b" c] d]
== "abc304"

>> spaced ["a" ["b" c] d]
== "a bc 304"

So it uses the contents of the block as raw material, but doesn't reduce it or obey the delimiter. With GET-BLOCK! you have a shorthand to reduce it, if you like:

>> unspaced ["a" :["b" c] d]
== "ab1020304"

>> spaced ["a" :["b" c] d]
== "a b1020 304"

I've always tended to think that having automatic behavior for BLOCK! here does more harm than good. More often than not, you just get nonsense when you wanted something else:

>> block: [1 2 <x> hello]

>> print ["Your block is:" block]
Your block is: 12<x>hello  ; ugh.

It's bad enough when the results get printed and you see the garbage. But worse when it just affects some string you're writing somewhere accidentally.

There's some efficiency gain if the enumeration of the block gets folded recursively into the process because you're not generating large intermediate strings and merging them... you're just building one big long string as you go. So that led me to think it might have value, and implemented it. But I've remained a skeptic... and...

...I Don't Think DELIMIT Should Have a BLOCK! Behavior

The case above shows the kind of nonsense I'm tired of.

Now that DLIMIT Could Heed SPREAD that gives other ways to optimize the situation of wanting to fold a block's contents into a string formation.

There's a weird but kind of cool behavior that quoting an item will mold it. And you can use ^META to take most types up a quoting level:

>> str: "abc"
>> blk: [a b c]

>> print ["String is" ^str "and Block is" ^blk]
String is "abc" and Block is [a b c]

If you're wondering why not to use @str and @blk, it's because ECHO semantics used those for non-mold-oriented splicing, e.g. if you just had a plain string it would echo it:

>> echo [String is @str]
String is abc

Not that PRINT and ECHO need to line up, but it's something to think about.

I guess the long story short here being that I am not all that worked up over the loss of synonymousness between an empty block and blank in the DELIMIT dialect.

If BLANK! did anything besides act as a space, it would probably need to be an error. I guess I'll have to keep my eyes open for what the opportunities are for blank being synonymous with an empty block in these stringification scenarios... but my instinct is to say those aren't particularly interesting.

I think the apostrophe case could be used for when you really mean nothing--as a reified proxy for void--instead of blank.

The big question may actually be at the topmost level of FORM, e.g. FORM of an empty block vs. FORM of a blank.

>> form [a b]
== "a b"

>> form [_]
== " "  ; we are presuming this direction

>> form []
== ""  ; (or possibly null ?)

>> form _
== " "  ; or should it be forced to match what empty block is?

But there's no rule that form [_] has to match form _ any more than there is that form [[]] has to do the same thing as form [], so this needs to be weighed.

IngoHohmann · December 2, 2022, 10:03am

Makes sense for me like this.

hostilefork · June 26, 2023, 5:39pm

Experiments have run the gamut to where a plain BLANK! once would opt out of appending to a series, and you had to quote it to append it:

experiment>> append [a b] _
= [a b]

experiment>> append [a b] quote _
== [a b _]

But the satisfying world we have now is that anything you can pick out of a block will append as-is to another block. So adding a blank is additive:

>> append [a b] _
== [a b _]

It feels like a decent fit to say that appending a blank to a string is additive...since VOID and such are available if you want to opt out:

>> append "ab" _
== "ab "

Though it raises the question of what BINARY! should do:

>> append #{0102} _
== ???

It seems that adding a UTF-8 representation is the story for ASCII:

>> as binary! "AB"
== #{4142}

>> append #{0102} "AB"
= #{01024142}

But when it comes to integers, strings append the molded form...while binaries just add one byte, not the bytes of the formed string of the integer:

>> append "ab" 10
== "ab10"

>> append #{0102} 10
== #{01020A}

A bit of a mixed bag, that could support arguments that BLANK! could be "the space of binaries" (e.g. #{00})

>> append #{0102} _
== #{010200}

But I think that's not so useful. It's more likely that the character representation of space is useful:

>> to binary! #" "
== #{20}

>> append #{0102} _
== #{010220}

Ren-C FIND and PARSE mechanics already allows you to search for strings in BINARY!, implicitly looking for the UTF-8 representation.

What If _ Was Really The Canon Representation of Space Chars?

>> pick "ab " 3
== _

>> #" "
== _

>> char? _
== ~true~  ; isotope

>> space? _
== ~true~  ; isotope

I've mentioned that single character intents are on the rise... we could call quoted void (apostrophe) "blank" and it could be used in contexts where you want to say there's no value:

>> blank? first [']
== ~true~  ; isotope

>> blank
== '

You'd still have _ as an evaluator-inert dialecting part that can't be redefined. You just would need to use something like # or ' or ~ in cases where you had a slot that could be either any character -or- some out of band thing.

Off the top of my head, I can see a few problems. If _ became a character literal, it shouldn't be used for vacant spots in paths. So let's say paths start using this "new blank":

>> as block! first [/a]
== [' a]

Doesn't look too bad, but if you want to parse it you need double apostrophes to match those spots:

parse [/a] [into path! ['' 'a]]

This is because a single apostrophe is matched as void, e.g. matches without advancing the parse position. And it wouldn't work if you used a word to reference the blank:

>> parse [/a] [into path! [blank 'a]]
** Error: To match a QUOTED! you must use @blank

>> parse [/a] [into path! [@blank 'a]]  ; would need to do this...
== a

But it's not awful, and at least it doesn't silently treat the quote as a vanishing rule.

It's still kind of an interesting thought to make _ the literal char! of space. Still inert, still usable in dialects. It would leave ' and ~ as "friendly nothing" and "unfriendly nothing".

>> spread first [']  ; would return void

>> spread first [~]
** Error: Cannot spread meta-NONE

Either way, I think the conclusion here is that append #{0102} _ should be #{010220}