Weird WORD!s - Allow, But Escape Them?

My feeling is you should be able to build paths and tuples out of anything that's a valid WORD!. But is it time we had an escaping mode for "weird words"?

Let's say you didn't want <.> to be a TAG!, but rather a TUPLE! where the first element was < and the second was >.

We could do something like backquotes:

`<`.`>`

Having an escaping mode for words would open up more lexical space. For instance, I like the idea of allowing $FOO, $(FOO), $FOO/BAR, $[FOO BAR] etc. as another type...

But this would seem to kill off the idea of being able to have $ and $$ etc. as WORD!s, because you get into ambiguous situations... is $/foo a PATH! with the $ word in the first slot, or an ENV-PATH! with an empty first slot?

These ambiguities create problems for other things that might stand alone all right, because we don't want to have "second-class-citizen" WORD!s that can't appear in paths.

But what if we used backticks if they wind up in paths?

`$`/foo   ; PATH! with $ in the first slot
$/foo  ; ENV-PATH! with blank in the first slot

This could give us the likes of : and :: as operators...

>> `:`: does [print "I am colon!"]

>> :
I am colon!

>> type of :`:`
== #[datatype! action!]

It could work for other standalone characters, like @ and perhaps &. % could be the same (with %"" or %{} used for empty file)

I feel like # and / may not be good candidates for this treatment, it would need more thought.

The point wouldn't be that you'd likely be going crazy with paths involving these characters, but rather that you might want to do interesting things with them standalone. It's just to put them on the map as legitimate words.

2 Likes

I definitely think there should be a way to escape weird words.
I'm not yet a big fan of this specific proposal, but can't think of anything better.

With a new rule pending that "word-active functions" have to be assigned with SET-PATH!s e.g. /foo: then that means "weird words" like < and || and |> have to be able to be in paths.

So unfortunately, this is now a blocking item. Time to revisit it.


Long ago, @Mark-hi suggested following Lisp's example and using vertical bars for escaping symbols:

>> |word with spaces|: 10

>> print ["The value is" |word with spaces|]
The value is 10

Maybe this seems more palatable. But Lisp uses backslash to escape their vertical bars. And we don't want to have to mangle things like: some ["a" | "b"] into:

parse "ab" [some ["a" |\|| "b"]]

So we'd have to make some different tradeoffs in the design than Lisp.

All-Vertical-Bar Tokens Could Be Escaping-Exempt

For starters: if a token consists of only vertical bars, we might say we don't think of that as being escaped:

>> as text! '|
== "|"

>> as text! '|||
== "|||"

You might say "Hey, if it's that easy, why wouldn't Lisp have thought of that?"

It's because this sacrifices symbols that start and end with spaces. So if you think this way, you can't have things like:

>> as text! '| this wouldn't be possible |
== " this wouldn't be possible "

You'd have to escape the spaces one way or another:

>> as text! '|\ maybe this would work\ |
== " maybe this would work "

Similar issues arise with commas and other delimiters. We have to be able to decide if the sequence |, is starting some arbitrary WORD! with a comma as the first character, or if that's a vertical bar WORD! followed by a COMMA!. Same for |) to decide if you should consider that parenthesis to be a vertical bar WORD! followed by a parentheses that might close an existing group... or a arbitrary word with left parenthesis as the first character.

It seems pretty fair to me to say that delimiter characters can't be in your "arbitrary word" at the beginning or end, at least without escaping. Though having dots in the words is a requested feature:

>> as text! |graham.likes.these|
== "graham.likes.these"

I don't know the exact boundaries here:

>> as text! |should this (work?)|
== "should this (work?)"

But seeing as we've gotten by for a pretty good while without such weirdness in WORD! at all, I don't think these edge cases need to be the focus of the present moment.

Note that even though foo and |foo| could be interchangeable, we can't say | and ||| are interchangeable. Instead, | would be interchangeable with |\||.

What About Things Like "Flags" <|

As with the "all bars" cases, we want to be able to use these unescaped as operators. For instance, "left flag" has been used to point to the left evaluation while eliding everything to the right:

>> 1 + 2 <| print "Hello" print "World"
Hello
World
== 3

But just because they would be unescaped when standing alone, doesn't mean we can get away with that everywhere. Let's imagine we want to make the PATH! whose first element is <| and whose second element is |>

Under new design proposals, if we just were to write <|/|>, that's actually a TAG!...the kind of tag that would permit internal < and >

>> as text! <|<|>
== "<"

So we'd get:

>> type of <|/|>
== #[datatype! tag!]

>> as text! <|/|>
== "/"

To try for the PATH! we want, let's think about hypothetically just wrapping the flags in vertical bars:

>> as block! '|<||/||>|
== [<| |>]

It's not the worst looking thing. :worried:

But if we're not escaping the vertical bar that's part of the flag, then how it would know that the first element should be <| instead of seeing the |<| pattern and assuming that meant it was < ?

One reasoning could be that as long as it hasn't hit a delimiter (] or ) or , or / or . or space or newline) then all vertical bars are considered content.

This policy would allow:

>> /|<||: does [print "Hello"]

>> <|
Hello

But again we have to ask what such assumptions rule out. And what it rules out are any internal delimiters--so no spaces, parentheses, brackets, dots, slashes.

That seems a bit much to throw out, for the sake of a few weird operators...if-and-when they happen to wind up in paths. So we'd probably have to escape this:

>> /|<\||: does [print "Hello"]

>> <|
Hello

Otherwise, Start-And-End Vertical Bar Must Be Escaped

So there's an idea that <| won't require escaping when standalone as WORD!, but then when they are put in PATH! they will be as |<\||.

But what about |<| ? That notation pretty clearly needs to be reserved for how < appears when put in a PATH!, otherwise we'd have things like </> would be a PATH! and not a TAG!. (It needs to be a TAG!)

Quick Look At Those Backticks Again

Backticks have the advantage that we seem to be 99% uninterested in them as symbols otherwise. So it's a bit less messy:

/`<|`: does [print "Hello"]

But they still might have other applications that are less esoteric. It seems wasteful to apply them here.

Especially because in practice, we have an evaluator to draw from, which could make things look better:

/('<|): does [print "Hello"]

Actually...that is probably what I would suggest doing. But if such paths can be formed, then we have to have a representation for them... like if they get COMPOSE'd

>> compose the /('<|):
== /|<\||:  ; or whatever

It's not great...but when working in the limited medium of text, you wind up with these kinds of things. Hopefully it would be pretty rare.

Tentative Strategy

I think that it's rare enough that people will be putting vertical-bar-words in paths and tuples we can just go ahead and say you always escape them.

Sure, we could say that |.| is a TUPLE! instead of a representation of a single-character period WORD! But I think that it will be much more often desirable to have |.word| without having to write |\.word|

So if you want two vertical bars in a TUPLE! you'd say |\||.|\|| - and that should be reasonable discouragement against saying it too often. :slight_smile:

But spaces and commas and parentheses and such will need to be delimited inside your "weird escaped words", at least at the start and end.

>> block: transcode "(|) (|)"
== [(|) (|)]

>> type of first block
== #[datatype! group!]

>> length of first block
== 2

>> first first block
== |

e.g. this is not a GROUP! holding a word with a spelling of ") ("

If it starts AND ends with a vertical bar, it's an escape notation unless it's all vertical bars.

Hopefully this is enough of a sketch to enable pushing through to the next dilemma...

1 Like

Another idea might be to have backticked (or something else) braced strings be the notation for weird words.

`{word.with.dots}
`{|||}
`{ and with {} inside and spaces aroun }

This way there's no need to invent new escaping rules, because it's already there in strings.

I do agree the same escaping rules should be used, but I think the caret-escaping is an increasingly poor idea.

Carl wanted to move away from it and use C's escaping:

String character escapes use C notation. They use backslash notation, for example "\n" for newline and "\t" for tab.

I definitely feel that with caret taking on more of a purpose in the language now for ^META, that sticking with the status quo for escaping in strings may be wise.

Needs thought...