How Does BLANK! Interact With Strings?

hostilefork · June 26, 2023, 5:39pm

Experiments have run the gamut to where a plain BLANK! once would opt out of appending to a series, and you had to quote it to append it:

experiment>> append [a b] _
= [a b]

experiment>> append [a b] quote _
== [a b _]

But the satisfying world we have now is that anything you can pick out of a block will append as-is to another block. So adding a blank is additive:

>> append [a b] _
== [a b _]

It feels like a decent fit to say that appending a blank to a string is additive...since VOID and such are available if you want to opt out:

>> append "ab" _
== "ab "

Though it raises the question of what BINARY! should do:

>> append #{0102} _
== ???

It seems that adding a UTF-8 representation is the story for ASCII:

>> as binary! "AB"
== #{4142}

>> append #{0102} "AB"
= #{01024142}

But when it comes to integers, strings append the molded form...while binaries just add one byte, not the bytes of the formed string of the integer:

>> append "ab" 10
== "ab10"

>> append #{0102} 10
== #{01020A}

A bit of a mixed bag, that could support arguments that BLANK! could be "the space of binaries" (e.g. #{00})

>> append #{0102} _
== #{010200}

But I think that's not so useful. It's more likely that the character representation of space is useful:

>> to binary! #" "
== #{20}

>> append #{0102} _
== #{010220}

Ren-C FIND and PARSE mechanics already allows you to search for strings in BINARY!, implicitly looking for the UTF-8 representation.

What If _ Was Really The Canon Representation of Space Chars?

>> pick "ab " 3
== _

>> #" "
== _

>> char? _
== ~true~  ; isotope

>> space? _
== ~true~  ; isotope

I've mentioned that single character intents are on the rise... we could call quoted void (apostrophe) "blank" and it could be used in contexts where you want to say there's no value:

>> blank? first [']
== ~true~  ; isotope

>> blank
== '

You'd still have _ as an evaluator-inert dialecting part that can't be redefined. You just would need to use something like # or ' or ~ in cases where you had a slot that could be either any character -or- some out of band thing.

Off the top of my head, I can see a few problems. If _ became a character literal, it shouldn't be used for vacant spots in paths. So let's say paths start using this "new blank":

>> as block! first [/a]
== [' a]

Doesn't look too bad, but if you want to parse it you need double apostrophes to match those spots:

parse [/a] [into path! ['' 'a]]

This is because a single apostrophe is matched as void, e.g. matches without advancing the parse position. And it wouldn't work if you used a word to reference the blank:

>> parse [/a] [into path! [blank 'a]]
** Error: To match a QUOTED! you must use @blank

>> parse [/a] [into path! [@blank 'a]]  ; would need to do this...
== a

But it's not awful, and at least it doesn't silently treat the quote as a vanishing rule.

It's still kind of an interesting thought to make _ the literal char! of space. Still inert, still usable in dialects. It would leave ' and ~ as "friendly nothing" and "unfriendly nothing".

>> spread first [']  ; would return void

>> spread first [~]
** Error: Cannot spread meta-NONE

Either way, I think the conclusion here is that append #{0102} _ should be #{010220}

hostilefork · September 20, 2024, 6:34am

I've outlined the reasons why this is a bad idea.

But there's another possibility: What if # was the representation of the space character instead of the NUL (0) character?

It would be a lot more useful. Space has a particularly ugly representation (#" ").

Because the 0 codepoint can't appear in Ren-C strings (only binaries), it hasn't gotten a lot of use. Any routine that deals in codepoints can use regular NULL when dealing with situations where there's no codepoint.

# is a bit heftier than _ but I've perhaps been too harsh about it:

>> print unspaced ["Have I been" # # # "too harsh?"]
Have I been   too harsh?

Admittedly, it doesn't look particularly "space-like". But is #" " an improvement?

`#` As Space Feels Pretty Obvious Right Now

I'll mention that one of the motivations for making # the NUL character was because it was used for a time as the "canon truthy value" (e.g. the value of an argless refinement when used, in juxtaposition to the ~null~ antiform). I wanted as few possible accidents by having that value accepted somewhere that didn't mean it, and the lame reason I made it NUL was just to stop a few cases...like appending to strings.

So it was intentionally useless.

The ~okay~ antiform does a much better job at that useless-except-branch-triggering role.

So Now, What About BLANK! ?

If we let go of the dream that _ represents space--and I make peace with the idea that people can learn that # means space--then it frees up underscore's meaning.

I'll mention that I've noticed a lot of single characters can stand on their own now:

symbols: [
    ? ~ * + - = < > | : / .   ; words
    $ & ' ^ @                 ; sigils (maybe % will be, also?)
    #                         ; now a space...issue (token?) 
    ~                         ; quasi-blank (trash)
    _                         ; blank
]

Of course there's some you can't do this with (brackets, parentheses, braces). And comma acts weird enough sticking to what's on its left that you probably don't want to do this unless you're quoting it (',) where it won't have that behavior.

I'm really on the fence here, enough so to say that I'm starting to think this just has to be an error.

If it did work, I think it does have to be additive...because APPEND to a BLOCK! of a blank is additive. I don't think it can act like an empty string when appending to a string, but append a thing to a block...that makes no sense.

So if you can do this:

>> append "ab" first [+]
== "ab+"

Then I kind of think that blank's underscore-ness is what we defer to here:

>> append "ab" first [_]
== "ab_"

But I'm not convinced this is helping anyone.

HOWEVER, I do think it should be what it FORMs to.

>> form _
== "_"

(I have some Big Thought on the meaning of FORM, which I'll write up at some point here.)

How Does BLANK! Interact With Strings?

What If _ Was Really The Canon Representation of Space Chars?

# As Space Feels Pretty Obvious Right Now

So Now, What About BLANK! ?

`#` As Space Feels Pretty Obvious Right Now