Alternate String Forms if {...} Becomes An Array Type

hostilefork · October 18, 2021, 1:06am

I've brought up in earnest something I call "The FENCE! Proposal". It involves retaking {...} for an object-like-purpose.

The twist is that it's really another ANY-ARRAY! type, peer to [...] and (...) - but geared toward being a shorthand for dialected MAKE (defaulting toward representing key/value stores). Then as with [...] and (...), dialects would be free to override {...} in their own way.

A sad consequence of this would be the loss of the alternative string form {...}. Discussions of what to use instead were lengthier than the discussion of the proposal itself, so I've moved them onto their own thread here...starting with my initial musing...

What Other Asymmetric String Technique Might Be Used?

Some languages have weird techniques, like even letting you make up your own delimiters by whatever you use in front of the quote:

str: ?"This says "quote followed by question mark" terminates"?
str: |"This says "quote followed by bar" terminates"|
str: xyz"This says "quote followed by zyx" terminates"zyx

(Not making that up.)

The risk of using a symbol like | is that even if it looks good in isolation, you might not like it in a parse rule, like rule1 | |"some string"| | rule2. A less-used character might be better.

Or perhaps those who really feel the need for another asymmetric string delimiter should assign a couple keys in their editor to unicode:

str: «Maybe People «who really care» could use "Chevrons"?»

Who knows. One place to look is the topic of "HEREDOC"

Another place to look is the List of open/close paired braces/brackets/quotes in Unicode

giuliolunati · September 30, 2021, 9:32am

I 90% use {...} for multiline strings, but I think "..." could be multiline:

"This is
a multiline
string"

BTW, I don't like ^-escape, I'd prefer \-escape as in ASON/AltScript

giuliolunati · September 30, 2021, 9:43am

And, if we'll impose space around strings (and blocks maybe?) then we'll open the road to various string flavours:

^"I'm an ^"^^-escaped^" string, ^^ and ^" must be escaped, \ is literal!"

\\"I'm an \\"\\\\-escaped\\" string, \\\\ and \\" must be escaped, ^ is literal!"

"I'm a ""raw"" string, ^ and \\ are literal, but "" must be doubled!"

hostilefork · September 30, 2021, 2:44pm

A place that gets hit particularly hard by losing an alternative delimiter is the API.

rebElide("print {We've relied on this :-( and sucks to lose it}");

If quotes are all we have for strings, mixing inside other language's quotes looks bad fast:

rebElide("print \"We've relied on this :-( and sucks to lose it\"");

One axis for attacking this problem is Sea of Words and Echo, where if what you're doing is not too antagonistic (like the :-( above) you can imagine operators that turn blocks into strings.

In PRINT's case, it might be that @ does exactly this...suppressing the REDUCE:

rebElide("print @[This might just form the contents as-is.]");

Chevrons might actually not be a terrible answer to the issue for people doing a lot of editing in the API:

rebElide("print «We've relied on this :-( and sucks to lose it»");

The idea of sacrificing << and >> themselves to strings is a weird one:

rebElide("print <<We've relied on this :-( and sucks to lose it>>");

It looks too much like tags.

Of course, there's always... backquote... which may just be the best of the bad options:

rebElide("print `We've relied on this :-( and sucks to lose it`");

In any case, this mixture issue is a real pain point on losing braces for strings. We need to keep apostrophe for what it is, so the options are pretty limited.

Carving <{...}> out of legal tags is another alternative to <<...>> which might come off as a little bit "less taglike"?

rebElide("print <{We've relied on this :-( and sucks to lose it}>");

If you squint, <{ almost looks like its own compound symbol. A relative of ﴾ornate parentheses﴿ or perhaps ⦓Arc Brackets⦔

Does It Seem Worth The Sacrifice?

For all that freeing up FENCE! does, can we suffer through <{...}> strings and/or backticks?

It feels like a tradeoff worth pursuing.

I think a nice thing about <{ is that it isn't something that would be sought after as an operator in its own right, the way << would.

giuliolunati · September 30, 2021, 9:01pm

I prefer ` over <{ }>

iArnold · October 1, 2021, 6:35am

Oh please no backticks, that key should be removed from all keyboards worldwide as far as I am concerned.

I am also "attached" to our use of braces.

Braces are bad otherwise because of their similarity with parenthesis, sometimes it is hard to see they are used.
They use of the curly brackets in this language is one of the things I particularly like about Rebol.

The small list of Rebol features that kept me:

1-based
case insensitive
no curly braces, only for multiline string
no parentheses needed for function call / parameter passing
many data types, no need to recreate all those trivial ones over and over again
Almost all functionality out of the box, no import needed even for much used "library" things
pleasing syntax, flexibility

hostilefork · October 1, 2021, 6:58am

I've never had a problem distinguishing them.

Do note that in dialects, the "sea of words" means that this new FENCE! could be used as a kind of string.

>> thing: '{"Almost anything" you write, it's possible to LOAD now.}

>> type of thing
== #[datatype! fence!]

>> first thing
== "Almost anything"

>> last thing
== now.

>> type of last thing
== #[datatype! tuple!]

>> print thing  ; let's say it interprets FENCE! as needing to be FORM'd
"Almost anything" you write, it's possible to LOAD now.

Very interestingly... this can be made compatible... because '{quoted braces} drops the quote under evaluation in historical Ren-C and gives you a TEXT!. So a construct could be rigged to take what quoted braces produce either way, so long as it was LOAD-able.

Of course some things don't work, like semicolons. And unless something changes in the plan, parentheses directly adjoining words wouldn't work either, like {foo()}

It does mean that in things like module headers or function specs, there could be a rule like saying that fences are handled like strings.

foo: func [
    {Maybe this still works, with "strings" in it, if you want it to?}
    return: [integer!]
    args...
][...]

Certainly tolerance of this would be needed as a compatibility measure. But maybe it's more than compatibility. Maybe it's an enduring duality of what "fences" are for.

giuliolunati · October 1, 2021, 11:40am

Another use of {...} is in command line:
bash> r3 script.reb string: {the-string}
<{...}> would need quoting:
bash> r3 script.reb string: "<{the-string}>"

Same issue for backticks.

hostilefork · October 3, 2021, 12:29pm

Angle + Bracket

{ Rethinking Braces }... as an array type?

I prefer ` over <{ }>

Another use of {...} is in command line:
`bash> r3 script.reb string: {the-string}`
 <{...}> would need quoting:
`bash> r3 script.reb string: "<{the-string}>"`

As you point out, backticks have the same problem. But as I've said, one of the potentially nice things about calling the new array type FENCE! could be that we stay fluid on its dialecting purpose...and sometimes it's used for things that could be text but contain quoted items. Like I say--that won't help you with semicolons or anything non-LOADable, but it may cover a lot of cases.

If you look at things like module headers, I think the <{ }> is not the worst thing in the world:

Rebol [
    Title: "Your module title here"
    Type: module
    Name: your-module
    Rights: <{
        Copyright 2012 REBOL Technologies
        Copyright 2017-2021 Ren-C Open Source Contributors
    }>
    License: <{
        Licensed under the Apache License, Version 2.0
        See: http://www.apache.org/licenses/LICENSE-2.0
    }>
    Description: <{
       We've gotten used to writing anything we want inside of braced
       strings...this gives us more freedom with single braces:
       
            printf("The char is } and that's okay\n");

       So there's a benefit to it.
   }>
]

Compare with a backtick, which I do find to be worse, personally. It looks like... there's dirt on the screen. (So actually agreeing with @iArnold on that, it's a bad character.)

One other advantage of is that it is LOAD-compatible with R3-Alpha (more relevantly, older Ren-Cs), which would consider it to be a TAG!.

Red scans <{}> as < {} > at this time, but perhaps they'd decide that having curly braces in tags is more valuable than that compression. Probably not.

Backticks

Rebol [
    Title: "Your module title here"
    Type: module
    Name: your-module
    Rights: `
        Copyright 2012 REBOL Technologies
        Copyright 2017-2021 Ren-C Open Source Contributors
    `
    License: `
        Licensed under the Apache License, Version 2.0
        See: http://www.apache.org/licenses/LICENSE-2.0
    `
    Description: `
       We've gotten used to writing anything we want inside of braced
       strings...this gives us more freedom with single braces:
       
            printf("The char is } and that's okay\n");

       So there's a benefit to it.
   `
]

Double-quotes just doesn't fit the bill because quotes are used too freely inside both text and code samples, and we've gotten too used to that.

Backslash with Indentation?

I've really been wishing multiline text literals could have something like Yaml does, where indentation drives it. They use | which is clean, but we'd hate to lose the symbol for other purposes. Maybe backslash?

Rebol [
    Title: "Your module title here"
    Type: module
    Name: your-module

    Rights: \
        Copyright 2012 REBOL Technologies
        Copyright 2017-2021 Ren-C Open Source Contributors

    License: \
        Licensed under the Apache License, Version 2.0
        See: http://www.apache.org/licenses/LICENSE-2.0

    Description: \
       We've gotten used to writing anything we want inside of braced
       strings...this gives us more freedom with single braces:
       
            printf("The char is } and that's okay\n");

       So there's a benefit to it.
]

Indentation-driven answers have the benefit of not needing to worry about escape sequences.

Pound Sign Plus Braces

I'll also point out that I'd been talking about a new notation for BINARY!, e.g. &{...} in order to free up #{...} as an ISSUE!+TOKEN!+CHAR! notation, so that would be an immutable string literal. (Note that ASON suggests Carl wanted to make all string literals immutable.) But I think #{} looks pretty bad for the above:

Rebol [
    Title: "Your module title here"
    Type: module
    Name: your-module
    Rights: #{
        Copyright 2012 REBOL Technologies
        Copyright 2017-2021 Ren-C Open Source Contributors
    }
    License: #{
        Licensed under the Apache License, Version 2.0
        See: http://www.apache.org/licenses/LICENSE-2.0
    }
    Description: #{
       Note that this would not have the benefit of braces not escaping:
       
            printf("The char is ^} and must be escaped\n");

       So that's a drawback.
   }
]

Something about that I find more jarring than <{...}>, but it also doesn't have the advantage of not needing to escape single braces.

giuliolunati · October 3, 2021, 1:46pm

No syntactic indentation please, keep the freeform style of Rebol !!

giuliolunati · October 3, 2021, 6:25pm

I'd like "named quotes":
WORD"..."WORD
e.g. with --"..."--

--"2^3
"string""--
=
"2^^3^/^"string^""

and with ++"..."++

++" --"string"-- "++
=
" --^"string^"-- "

giuliolunati · October 3, 2021, 6:30pm

This could be

Rights: --"
        Copyright 2012 REBOL Technologies
        Copyright 2017-2021 Ren-C Open Source Contributors
"--

hostilefork · October 4, 2021, 7:50am

Another possibility is to have a character that means "make a string to the end of line", and then putting these in BLOCK!s:

Rebol [
    Title: "Your module title here"
    Type: module
    Name: your-module
    Rights: [
      \ Copyright 2012 REBOL Technologies
      \ Copyright 2017-2021 Ren-C Open Source Contributors
    ]
    License: [
      \ Licensed under the Apache License, Version 2.0
      \ See: http://www.apache.org/licenses/LICENSE-2.0
    ]
    Description: [
      \ We've gotten used to writing anything we want inside of braced
      \ strings...this gives us more freedom with single braces:
      \
      \      printf("The char is } and that's okay\n");
      \
      \ So there's a benefit to it.
   ]
]

Several languages (like Haskell) basically don't have better answers for multi-line strings vs. "apply an operator that inserts newlines onto an array of strings". There's a specialization of DELIMIT/TAIL that does that, and it's called NEWLINED.

>> newlined ["one" "two"]
== "one^/two^/"

It looks kind of jarring to use a character that is not straight up-and-down.

; We're used to seeing multiple lines like this,
; and I think it's more comfortable because of the verticality.

Perhaps lone exclamation mark? We might even argue that if a FENCE! sees strings inside of it, then the evaluator will make a string out of what it gets...

Rebol [
    Title: "Your module title here"
    Type: module
    Name: your-module
    Rights: {
      ! Copyright 2012 REBOL Technologies
      ! Copyright 2017-2021 Ren-C Open Source Contributors
    }
    License: {
      ! Licensed under the Apache License, Version 2.0
      ! See: http://www.apache.org/licenses/LICENSE-2.0
    }
    Description: {
      ! We've gotten used to writing anything we want inside of braced
      ! strings...this gives us more freedom with single braces:
      !
      !      printf("The char is } and that's okay\n");
      !
      ! So there's a benefit to it.
   }
]

So imagine FENCE! having this reaction:

>> {x: 10 y: 10 + 10}
== object!##{x: 10 y: 20}

>> {"one" "two"}
== "one^/two^/"

This would mean that FENCE! alone wouldn't represent a MAP!, but maybe you make mappings with "double-fences" ?

>> {{"one" "two"}}
== map!##{{"one" "two"}}

We've talked about how some serializations would turn in-memory representations into source code, and maybe that could be true of this as well.

>> var: {
    ! Line one
    ! Line two
}
== text!{
    ! Line one
    ! Line two
}

>> serialize var
== {
    ! Line one
    ! Line two
}

It knows that the data produced by evaluation is no longer a FENCE! but a TEXT!, however you can use operations that go back to source...just like with OBJECT!s.

This has the slight advantage of being Redbol compatible, although it doesn't address single-line braced strings.

Weird, yes, but it's good to just make sure all the options are examined.

hostilefork · October 4, 2021, 8:52am

All this is showing just how valuable and rare the ASCII-range asymmetric delimiters really are.

I'll mention that we're using -- for dumping abstractions at the moment, and it's been pretty good for that:

>> block: [a b c]

>> -- block
== [a b c]

We'd have to use something else, but I guess there's nothing wrong with going back to ??

If we introduce multiple delimiter kinds, we start getting problems on the representation...

What happens when someone appends a --"..."-- string to a ++"..."++ string? How should the appended result be molded?

hostilefork · October 4, 2021, 8:55am

The raw string idea, along with no escaping by default, seems good. I like the doubled quotes being an option.

giuliolunati · October 4, 2021, 9:22am

Well, having
"..." as multiline quasi-raw (with "" representing ")
\"..." as multiline \-escaped
(and maybe ^"..." as multiline ^-escaped ?)
would be good enough for me
(And I think I'll never need to use <{ ... }>)

iArnold · October 6, 2021, 8:50pm

Why again did Rebol choose to separate single-line strings from multi-line strings?
A string is a string, even if it contains newlines.

hostilefork · October 7, 2021, 3:39am

Double-quoted strings are fairly universal so supporting them was to look familiar, and because having two forms of string delimiter means you can write "}" and {"}...picking the best one for your content.

Having double-quoted strings not be multi-line means you'll get easier to interpret errors when you forget the closing quotes. Odds are you have another set of quotes somewhere in your program to pair with, so you'll get a closed string no matter what...and so your error will be something weirder than "no close quote for string". It will be the result of trying to scan what was supposed to be the insides of another string.

johnk · October 19, 2021, 12:39am

Given all the options my vote would be for the triple quote - """
The majority of languages use " for strings and the triple makes the fact that it is a "special" string stand out. It also makes it easier to spot the closing quotes in long multi-line strings. Python uses """ and is very popular which will aid readability for new users.
Thanks, John

hostilefork · October 19, 2021, 12:51am

Note that triple quotes doesn't help as a new answer for the API example (it makes it exponentially worse if you tried to use them!)

rebElide("print {If braces are given up, how to replace this?}");

Remembering that I'm trying to avoid a lot of ugly API calls like:

rebElide("print \"If braces are given up, how to replace this?\"");

C doesn't have any alternatives there besides quotes. And JavaScript has apostrophe-delimited strings as an alternative, but we use apostrophe a lot inside code so that's not good either.

Rebol is so saturated in its use of symbols that creative choices are unavailable.

obj: (|x: 10, y:20|)  ; for instance, (| "banana clips" |)

We believe | and || are legal symbol in arrays, and (||) is a 1-element GROUP! with || in it, while (| |) is a 2-element GROUP! of two |'s

It's tough here, and <{ and }> so far still seem like a practical answer which allows braces to take on a new role. But we don't really know if that role will pay off yet, so it's worth continuing to think.