Alternate String Forms if {...} Becomes An Array Type

I've brought up in earnest something I call "The FENCE! Proposal". It involves retaking {...} for an object-like-purpose.

The twist is that it's really another ANY-ARRAY! type, peer to [...] and (...) - but with a bias toward representing key/value stores. Then as with [...] and (...), dialects would be free to override {...} in their own way.

A sad consequence of this would be the loss of the alternative string form {...}. Discussions of what to use instead were lengthier than the discussion of the proposal itself, so I've moved them onto their own thread here...starting with my initial musing...

What Other Asymmetric String Technique Might Be Used?

Some languages have weird techniques, like even letting you make up your own delimiters by whatever you use in front of the quote:

str: ?"This says "quote followed by question mark" terminates"?
str: |"This says "quote followed by bar" terminates"|
str: xyz"This says "quote followed by zyx" terminates"zyx 

(Not making that up.)

The risk of using a symbol like | is that even if it looks good in isolation, you might not like it in a parse rule, like rule1 | |"some string"| | rule2. A less-used character might be better.

Or perhaps those who really feel the need for another asymmetric string delimiter should assign a couple keys in their editor to unicode:

str: «Maybe People «who really care» could use "Chevrons"?»

Who knows. One place to look is the topic of "HEREDOC"

Another place to look is the List of open/close paired braces/brackets/quotes in Unicode

I 90% use {...} for multiline strings, but I think "..." could be multiline:

"This is
a multiline
string"

BTW, I don't like ^-escape, I'd prefer \-escape as in ASON/AltScript

And, if we'll impose space around strings (and blocks maybe?) then we'll open the road to various string flavours:

^"I'm an ^"^^-escaped^" string, ^^ and ^" must be escaped, \ is literal!"

\\"I'm an \\"\\\\-escaped\\" string, \\\\ and \\" must be escaped, ^ is literal!"

"I'm a ""raw"" string, ^ and \\ are literal, but "" must be doubled!"
2 Likes

A place that gets hit particularly hard by losing an alternative delimiter is the API.

rebElide("print {We've relied on this :-( and sucks to lose it}");

If quotes are all we have for strings, mixing inside other language's quotes looks bad fast:

rebElide("print \"We've relied on this :-( and sucks to lose it\"");

One axis for attacking this problem is Sea of Words and Echo, where if what you're doing is not too antagonistic (like the :-( above) you can imagine operators that turn blocks into strings.

In PRINT's case, it might be that @ does exactly this...suppressing the REDUCE:

rebElide("print @[This might just form the contents as-is.]");

Chevrons might actually not be a terrible answer to the issue for people doing a lot of editing in the API:

rebElide("print «We've relied on this :-( and sucks to lose it»");

The idea of sacrificing << and >> themselves to strings is a weird one:

rebElide("print <<We've relied on this :-( and sucks to lose it>>");

It looks too much like tags.

Of course, there's always... backquote... which may just be the best of the bad options: :frowning:

rebElide("print `We've relied on this :-( and sucks to lose it`");

In any case, this mixture issue is a real pain point on losing braces for strings. We need to keep apostrophe for what it is, so the options are pretty limited.

Carving <{...}> out of legal tags is another alternative to <<...>> which might come off as a little bit "less taglike"?

rebElide("print <{We've relied on this :-( and sucks to lose it}>");

If you squint, <{ almost looks like its own compound symbol. A relative of ﴾ornate parentheses﴿ or perhaps ⦓Arc Brackets⦔

Does It Seem Worth The Sacrifice?

For all that freeing up FENCE! does, can we suffer through <{...}> strings and/or backticks?

It feels like a tradeoff worth pursuing.

I think a nice thing about <{ is that it isn't something that would be sought after as an operator in its own right, the way << would.

I prefer ` over <{ }>

Oh please no backticks, that key should be removed from all keyboards worldwide as far as I am concerned.

I am also "attached" to our use of braces.

Braces are bad otherwise because of their similarity with parenthesis, sometimes it is hard to see they are used.
They use of the curly brackets in this language is one of the things I particularly like about Rebol.

The small list of Rebol features that kept me:

  • 1-based
  • case insensitive
  • no curly braces, only for multiline string
  • no parentheses needed for function call / parameter passing
  • many data types, no need to recreate all those trivial ones over and over again
  • Almost all functionality out of the box, no import needed even for much used "library" things
  • pleasing syntax, flexibility

I've never had a problem distinguishing them.

Do note that in dialects, the "sea of words" means that this new FENCE! could be used as a kind of string.

>> thing: '{"Almost anything" you write, it's possible to LOAD now.}

>> type of thing
== #[datatype! fence!]

>> first thing
== "Almost anything"

>> last thing
== now.

>> type of last thing
== #[datatype! tuple!]

>> print form thing
"Almost anything" you write, it's possible to LOAD now.

Maybe even a GET-FENCE! will form it?

>> :{Like "this", maybe?}
== "Like \"this\" maybe?"  ; or "Like ""this"" maybe?"

Of course some things don't work, like semicolons. And unless something changes in the plan, parentheses directly adjoining words wouldn't work either, like {foo()}

It does mean that in things like module headers or function specs, there could be a rule like saying that fences are handled like strings.

foo: func [
    {Maybe this still works, with "strings" in it, if you want it to?}
    return: [integer!]
    args...
][...]

Certainly tolerance of this would be needed as a compatibility measure. But maybe it's more than compatibility. Maybe it's an enduring duality of what "fences" are for.

1 Like

Another use of {...} is in command line:
bash> r3 script.reb string: {the-string}
<{...}> would need quoting:
bash> r3 script.reb string: "<{the-string}>"

Same issue for backticks.

If you look at things like module headers, I think the <{ }> is not the worst thing in the world:

Rebol [
    Title: "Your module title here"
    Type: module
    Name: your-module
    Rights: <{
        Copyright 2012 REBOL Technologies
        Copyright 2017-2021 Ren-C Open Source Contributors
    }>
    License: <{
        Licensed under the Apache License, Version 2.0
        See: http://www.apache.org/licenses/LICENSE-2.0
    }>
    Description: <{
       We've gotten used to writing anything we want inside of braced
       strings...this gives us more freedom with single braces:
       
            printf("The char is } and that's okay\n");

       So there's a benefit to it.
   }>
]

Compare with a backtick, which I do find to be worse, personally. It looks like... there's dirt on the screen. (So actually agreeing with @iArnold on that, it's a bad character.)

Rebol [
    Title: "Your module title here"
    Type: module
    Name: your-module
    Rights: `
        Copyright 2012 REBOL Technologies
        Copyright 2017-2021 Ren-C Open Source Contributors
    `
    License: `
        Licensed under the Apache License, Version 2.0
        See: http://www.apache.org/licenses/LICENSE-2.0
    `
    Description: `
       We've gotten used to writing anything we want inside of braced
       strings...this gives us more freedom with single braces:
       
            printf("The char is } and that's okay\n");

       So there's a benefit to it.
   `
]

Double-quotes just doesn't fit the bill because quotes are used too freely inside both text and code samples, and we've gotten too used to that.

I've really been wishing multiline text literals could have something like Yaml does, where indentation drives it. They use | which is clean, but we'd hate to lose the symbol for other purposes. Maybe backslash?

Rebol [
    Title: "Your module title here"
    Type: module
    Name: your-module

    Rights: \
        Copyright 2012 REBOL Technologies
        Copyright 2017-2021 Ren-C Open Source Contributors

    License: \
        Licensed under the Apache License, Version 2.0
        See: http://www.apache.org/licenses/LICENSE-2.0

    Description: \
       We've gotten used to writing anything we want inside of braced
       strings...this gives us more freedom with single braces:
       
            printf("The char is } and that's okay\n");

       So there's a benefit to it.
]

Indentation-driven answers have the benefit of not needing to worry about escape sequences.

I'll also point out that I'd been talking about a new notation for BINARY!, e.g. &{...} in order to free up #{...} as an ISSUE!+TOKEN!+CHAR! notation, so that would be an immutable string literal. (Note that ASON suggests Carl wanted to make all string literals immutable.) But I think #{} looks pretty bad for the above:

Rebol [
    Title: "Your module title here"
    Type: module
    Name: your-module
    Rights: #{
        Copyright 2012 REBOL Technologies
        Copyright 2017-2021 Ren-C Open Source Contributors
    }
    License: #{
        Licensed under the Apache License, Version 2.0
        See: http://www.apache.org/licenses/LICENSE-2.0
    }
    Description: #{
       Note that this would not have the benefit of braces not escaping:
       
            printf("The char is ^} and must be escaped\n");

       So that's a drawback.
   }
]

Something about that I find more jarring than <{...}>, but it also doesn't have the advantage of not needing to escape single braces.

As you point out, backticks have the same problem. But as I've said, one of the potentially nice things about calling the new array type FENCE! could be that we stay fluid on its dialecting purpose...and sometimes it's used for things that could be text but contain quoted items. Like I say--that won't help you with semicolons or anything non-LOADable, but it may cover a lot of cases.

No syntactic indentation please, keep the freeform style of Rebol !!

I'd like "named quotes":
WORD"..."WORD
e.g. with --"..."--

--"2^3
"string""--
=
"2^^3^/^"string^""

and with ++"..."++

++" --"string"-- "++
=
" --^"string^"-- "

This could be

Rights: --"
        Copyright 2012 REBOL Technologies
        Copyright 2017-2021 Ren-C Open Source Contributors
"--

Another possibility is to have a character that means "make a string to the end of line", and then putting these in BLOCK!s:

Rebol [
    Title: "Your module title here"
    Type: module
    Name: your-module
    Rights: [
      \ Copyright 2012 REBOL Technologies
      \ Copyright 2017-2021 Ren-C Open Source Contributors
    ]
    License: [
      \ Licensed under the Apache License, Version 2.0
      \ See: http://www.apache.org/licenses/LICENSE-2.0
    ]
    Description: [
      \ We've gotten used to writing anything we want inside of braced
      \ strings...this gives us more freedom with single braces:
      \
      \      printf("The char is } and that's okay\n");
      \
      \ So there's a benefit to it.
   ]
]

Several languages (like Haskell) basically don't have better answers for multi-line strings vs. "apply an operator that inserts newlines onto an array of strings". There's a specialization of DELIMIT/TAIL that does that, and it's called NEWLINED.

>> newlined ["one" "two"]
== "one^/two^/"

It looks kind of jarring to use a character that is not straight up-and-down.

; We're used to seeing multiple lines like this,
; and I think it's more comfortable because of the verticality.

Perhaps lone exclamation mark? We might even argue that if a FENCE! sees strings inside of it, then the evaluator will make a string out of what it gets...

Rebol [
    Title: "Your module title here"
    Type: module
    Name: your-module
    Rights: {
      ! Copyright 2012 REBOL Technologies
      ! Copyright 2017-2021 Ren-C Open Source Contributors
    }
    License: {
      ! Licensed under the Apache License, Version 2.0
      ! See: http://www.apache.org/licenses/LICENSE-2.0
    }
    Description: {
      ! We've gotten used to writing anything we want inside of braced
      ! strings...this gives us more freedom with single braces:
      !
      !      printf("The char is } and that's okay\n");
      !
      ! So there's a benefit to it.
   }
]

So imagine FENCE! having this reaction:

>> {x: 10 y: 10 + 10}
== object!##{x: 10 y: 20}

>> {"one" "two"}
== "one^/two^/"

This would mean that FENCE! alone wouldn't represent a MAP!, but maybe you make mappings with "double-fences" ?

>> {{"one" "two"}}
== map!##{{"one" "two"}}

We've talked about how some serializations would turn in-memory representations into source code, and maybe that could be true of this as well.

>> var: {
    ! Line one
    ! Line two
}
== text!{
    ! Line one
    ! Line two
}

>> serialize var
== {
    ! Line one
    ! Line two
}

It knows that the data produced by evaluation is no longer a FENCE! but a TEXT!, however you can use operations that go back to source...just like with OBJECT!s.

This has the slight advantage of being Redbol compatible, although it doesn't address single-line braced strings.

Weird, yes, but it's good to just make sure all the options are examined.

All this is showing just how valuable and rare the ASCII-range asymmetric delimiters really are. :ring:

That's not technically contentious when using -- unattached to a quote, but I feel like I'd rather reserve -- for a dumping abstraction. It kind of jumps off the page to me, and it's nice to be able to glance at code and know when you've left debugging in it.

If we introduce multiple delimiter kinds, we start getting problems on the representation...

What happens when someone appends a --"..."-- string to a ++"..."++ string? How should the appended result be molded?

Beyond that, looks-wise I really do prefer <{...}> to --"..."--. It has the advantage that we're not invoking any WORD!-characters, so it really stays in the domain of "delimiters".

Rights: <{
    Copyright 2012 REBOL Technologies
    Copyright 2017-2021 Ren-C Open Source Contributors
}>

And as I've said, it might have a unicode parallel for those who want to embrace the higher codepoints:

Title: ⦓Your "Title" Here⦔
Rights: ⦓
    Copyright 2012 REBOL Technologies
    Copyright 2017-2021 Ren-C Open Source Contributors
⦔

We see some pretty big impacts to losing braced strings in prominent places... module headers and the API jump off the page immediately.

So far I feel like <{...}> is my leading choice:

  • I think people can see <{ and }> as compound asymmetric symbols in their own right, carving out a space that lets it not be seen as a nesting of <...> and {...}.

    • If we introduce them as symbols and suggest UTF-8 alternatives, then that narrative can be solidified.
  • As mentioned, it's not borrowing from any legal WORD!-characters. It's only involving delimiters.

  • It gets a boost over {...} as a string representation by allowing the use of unpaired { and } characters inside of it.

  • It doesn't introduce any mechanics we aren't dealing with already.

The raw string idea, along with no escaping by default, seems good. I like the doubled quotes being an option.

<{I don't escape \n, that's a literal backslash}>

\<{I escape \n with backslash}>\  ; idea 1

<\{I escape \n with backslash}\>  ; idea 2

The second representation looks a little cleaner, but maybe too much like a TAG!. Also doesn't line up with what you can do with "..." so maybe it's better to have the backslashes outside.

Well, having
"..." as multiline quasi-raw (with "" representing ")
\"..." as multiline \-escaped
(and maybe ^"..." as multiline ^-escaped ?)
would be good enough for me
(And I think I'll never need to use <{ ... }>)

1 Like

Why again did Rebol choose to separate single-line strings from multi-line strings?
A string is a string, even if it contains newlines.

Double-quoted strings are fairly universal so supporting them was to look familiar, and because having two forms of string delimiter means you can write "}" and {"}...picking the best one for your content.

Having double-quoted strings not be multi-line means you'll get easier to interpret errors when you forget the closing quotes. Odds are you have another set of quotes somewhere in your program to pair with, so you'll get a closed string no matter what...and so your error will be something weirder than "no close quote for string". It will be the result of trying to scan what was supposed to be the insides of another string.

So...I brought that up as an example, but it's ugly.

Let's momentarily focus on the problem of string escaping inside of C or JavaScript strings. As a reminder of what we're trying to avoid, it's having simple common cases require backslash escaping, like:

"fail \"ENCODING must be UTF-8, UCS-2, UTF-16, or LATIN-1\""

Here languages like JavaScript have an edge because they have single-quoted strings:

"fail 'ENCODING must be UTF-8, UCS-2, UTF-16, or LATIN-1'"

Carl's ASON supposedly supports single-quoted strings too, but the only example given is double-single-quotes: ''abc''. In any case, quoting is too foundational in Ren-C and can't budge.

I think it would be a bad direction to try and get more places to accept TAG! as a synonym for TEXT!:

"fail <ENCODING must be UTF-8, UCS-2, UTF-16, or LATIN-1>"

I've mentioned the idea that &{...} would become BINARY!, freeing up #{...} as an alternate form of ISSUETOKENCHAR! (TOKEN!).

"fail #{ENCODING must be UTF-8, UCS-2, UTF-16, or LATIN-1}"

But this has some of the same issues as TAG!, in that it's really supposed to be a separate type that gets dialected differently.

How About Those Unicode Characters?

The only unicode choice that looks different enough to not be something like "slightly different way to draw a parenthesis" yet still look remotely good to me is a double angle bracket:

"fail ⟪ENCODING must be UTF-8, UCS-2, UTF-16, or LATIN-1⟫"

Weird asymmetric quotes also...exist:

"fail “ENCODING must be UTF-8, UCS-2, UTF-16, or LATIN-1”"

But while those might be distinct codepoints, it looks similar enough to plain quotes to feel like a bad choice. Though the way they run up against regular quotes is a bit easier to see in the fonts I'm looking at than an apostrophe running up against regular quotes, so it scores a point over JavaScript's choice here.

What About FORM-ing WORD!s Instead of Using A String?

You can imagine calling FORM on a BLOCK!, and that will handle some things:

"fail form [ENCODING must be UTF-8, UCS-2, UTF-16, or LATIN-1]"

Then the idea of being able to ask a GET-FENCE! to be a synonym for that behavior is fairly out-there:

"fail :{ENCODING must be UTF-8, UCS-2, UTF-16, or LATIN-1}"

We could argue that perhaps FAIL itself can accept FENCE!, so if you pass it quoted it won't turn into an object:

"fail '{ENCODING must be UTF-8, UCS-2, UTF-16, or LATIN-1}"

But that puts a lot of pressure on FAIL. We could say it's all part of the behavior of DELIMIT, which assumes you don't want to make objects from fences but you want to turn them into text:

"fail [{ENCODING must be UTF-8, UCS-2, UTF-16, or LATIN-1}]"

But then you run into trouble if you are trying to do something like:

print ["The point is" format-point {x: 10 y: 20}]

That {x: 10 y: 20} is an argument to format-point, and the free mixing of expressions in DELIMIT/UNSPACED/SPACED makes that contentious as to how to read it.

Anyway--important to remember here is that FENCE! doesn't have the representational abilities of the old string...it can only hold LOADable code (e.g. forget semicolons). So trying to bend it to string purposes is a strange idea that isn't a substitute in general for strings.

Hate to Say It, <{This Still Looks Like The Best Answer}>

I feel like we're set up to catch more blame for being bad at expressing objects than we are to catch praise for being good at expressing strings.

e.g. I think we'd not be doing ourselves any favors by making objects be the "weird" thing.

obj: <{
    x: 10
    y: 20
}>

obj: <<
    x: 10
    y: 20
>>  ; (I think >> and << have likely other more interesting uses)

obj: {
    x: 10
    y: 20
}  ; This is what the people want and expect nowadays...

I notice Julia uses three quotes in a row for strings-with-single-quotes in them:

Title: """Your "Title" Here"""
Rights: """
    Copyright 2012 REBOL Technologies
    Copyright 2017-2021 "Ren-C" Open Source Contributors
 """

Point being: people are used to doing weird things when they need to put arbitrary text into their code. And they try not to do it (e.g. using separate files). Rebol has just tried pretty hard to pack everything-into-one-file where possible. Which is a good goal, and it's part of what has made it unique. {...} strings helped make that possible. But.. :frowning:

When you look at these options, is <{...}> really so bad? And maybe we say the UTF-8 version of it is ⟪...⟫

Rebol [
    Title: "Your module title here"
    Type: module
    Name: your-module
    Rights: ⟪
        Copyright 2012 REBOL Technologies
        Copyright 2017-2021 Ren-C Open Source Contributors
    ⟫
    License: ⟪
        Licensed under the Apache License, Version 2.0
        See: http://www.apache.org/licenses/LICENSE-2.0
    ⟫
    Description: ⟪
       Note that this covers all ascii:
       
            printf("The char is } and need not be escaped\n");

       And you can put in ⟪matched UTF-8 pairs⟫
    ⟫
]

Potential Compatibility Advantage of <{...}>

One other advantage of is that it is LOAD-compatible with R3-Alpha (more relevantly, older Ren-Cs), which would consider it to be a TAG!.

Red scans <{}> as < {} > at this time, but perhaps they'd decide that having curly braces in tags is more valuable than that compression. Probably not.

1 Like

Given all the options my vote would be for the triple quote - """
The majority of languages use " for strings and the triple makes the fact that it is a "special" string stand out. It also makes it easier to spot the closing quotes in long multi-line strings. Python uses """ and is very popular which will aid readability for new users.
Thanks, John

1 Like