Alternate String Forms if {...} Becomes An Array Type

No syntactic indentation please, keep the freeform style of Rebol !!

I'd like "named quotes":
WORD"..."WORD
e.g. with --"..."--

--"2^3
"string""--
=
"2^^3^/^"string^""

and with ++"..."++

++" --"string"-- "++
=
" --^"string^"-- "

This could be

Rights: --"
        Copyright 2012 REBOL Technologies
        Copyright 2017-2021 Ren-C Open Source Contributors
"--

Another possibility is to have a character that means "make a string to the end of line", and then putting these in BLOCK!s:

Rebol [
    Title: "Your module title here"
    Type: module
    Name: your-module
    Rights: [
      \ Copyright 2012 REBOL Technologies
      \ Copyright 2017-2021 Ren-C Open Source Contributors
    ]
    License: [
      \ Licensed under the Apache License, Version 2.0
      \ See: http://www.apache.org/licenses/LICENSE-2.0
    ]
    Description: [
      \ We've gotten used to writing anything we want inside of braced
      \ strings...this gives us more freedom with single braces:
      \
      \      printf("The char is } and that's okay\n");
      \
      \ So there's a benefit to it.
   ]
]

Several languages (like Haskell) basically don't have better answers for multi-line strings vs. "apply an operator that inserts newlines onto an array of strings". There's a specialization of DELIMIT/TAIL that does that, and it's called NEWLINED.

>> newlined ["one" "two"]
== "one^/two^/"

It looks kind of jarring to use a character that is not straight up-and-down.

; We're used to seeing multiple lines like this,
; and I think it's more comfortable because of the verticality.

Perhaps lone exclamation mark? We might even argue that if a FENCE! sees strings inside of it, then the evaluator will make a string out of what it gets...

Rebol [
    Title: "Your module title here"
    Type: module
    Name: your-module
    Rights: {
      ! Copyright 2012 REBOL Technologies
      ! Copyright 2017-2021 Ren-C Open Source Contributors
    }
    License: {
      ! Licensed under the Apache License, Version 2.0
      ! See: http://www.apache.org/licenses/LICENSE-2.0
    }
    Description: {
      ! We've gotten used to writing anything we want inside of braced
      ! strings...this gives us more freedom with single braces:
      !
      !      printf("The char is } and that's okay\n");
      !
      ! So there's a benefit to it.
   }
]

So imagine FENCE! having this reaction:

>> {x: 10 y: 10 + 10}
== object!##{x: 10 y: 20}

>> {"one" "two"}
== "one^/two^/"

This would mean that FENCE! alone wouldn't represent a MAP!, but maybe you make mappings with "double-fences" ?

>> {{"one" "two"}}
== map!##{{"one" "two"}}

We've talked about how some serializations would turn in-memory representations into source code, and maybe that could be true of this as well.

>> var: {
    ! Line one
    ! Line two
}
== text!{
    ! Line one
    ! Line two
}

>> serialize var
== {
    ! Line one
    ! Line two
}

It knows that the data produced by evaluation is no longer a FENCE! but a TEXT!, however you can use operations that go back to source...just like with OBJECT!s.

This has the slight advantage of being Redbol compatible, although it doesn't address single-line braced strings.

Weird, yes, but it's good to just make sure all the options are examined.

All this is showing just how valuable and rare the ASCII-range asymmetric delimiters really are. :ring:

That's not technically contentious when using -- unattached to a quote, but I feel like I'd rather reserve -- for a dumping abstraction. It kind of jumps off the page to me, and it's nice to be able to glance at code and know when you've left debugging in it.

If we introduce multiple delimiter kinds, we start getting problems on the representation...

What happens when someone appends a --"..."-- string to a ++"..."++ string? How should the appended result be molded?

Beyond that, looks-wise I really do prefer <{...}> to --"..."--. It has the advantage that we're not invoking any WORD!-characters, so it really stays in the domain of "delimiters".

Rights: <{
    Copyright 2012 REBOL Technologies
    Copyright 2017-2021 Ren-C Open Source Contributors
}>

And as I've said, it might have a unicode parallel for those who want to embrace the higher codepoints:

Title: ⦓Your "Title" Here⦔
Rights: ⦓
    Copyright 2012 REBOL Technologies
    Copyright 2017-2021 Ren-C Open Source Contributors
⦔

We see some pretty big impacts to losing braced strings in prominent places... module headers and the API jump off the page immediately.

So far I feel like <{...}> is my leading choice:

  • I think people can see <{ and }> as compound asymmetric symbols in their own right, carving out a space that lets it not be seen as a nesting of <...> and {...}.

    • If we introduce them as symbols and suggest UTF-8 alternatives, then that narrative can be solidified.
  • As mentioned, it's not borrowing from any legal WORD!-characters. It's only involving delimiters.

  • It gets a boost over {...} as a string representation by allowing the use of unpaired { and } characters inside of it.

  • It doesn't introduce any mechanics we aren't dealing with already.

The raw string idea, along with no escaping by default, seems good. I like the doubled quotes being an option.

<{I don't escape \n, that's a literal backslash}>

\<{I escape \n with backslash}>\  ; idea 1

<\{I escape \n with backslash}\>  ; idea 2

The second representation looks a little cleaner, but maybe too much like a TAG!. Also doesn't line up with what you can do with "..." so maybe it's better to have the backslashes outside.

Well, having
"..." as multiline quasi-raw (with "" representing ")
\"..." as multiline \-escaped
(and maybe ^"..." as multiline ^-escaped ?)
would be good enough for me
(And I think I'll never need to use <{ ... }>)

1 Like

Why again did Rebol choose to separate single-line strings from multi-line strings?
A string is a string, even if it contains newlines.

Double-quoted strings are fairly universal so supporting them was to look familiar, and because having two forms of string delimiter means you can write "}" and {"}...picking the best one for your content.

Having double-quoted strings not be multi-line means you'll get easier to interpret errors when you forget the closing quotes. Odds are you have another set of quotes somewhere in your program to pair with, so you'll get a closed string no matter what...and so your error will be something weirder than "no close quote for string". It will be the result of trying to scan what was supposed to be the insides of another string.

So...I brought that up as an example, but it's ugly.

Let's momentarily focus on the problem of string escaping inside of C or JavaScript strings. As a reminder of what we're trying to avoid, it's having simple common cases require backslash escaping, like:

"fail \"ENCODING must be UTF-8, UCS-2, UTF-16, or LATIN-1\""

Here languages like JavaScript have an edge because they have single-quoted strings:

"fail 'ENCODING must be UTF-8, UCS-2, UTF-16, or LATIN-1'"

Carl's ASON supposedly supports single-quoted strings too, but the only example given is double-single-quotes: ''abc''. In any case, quoting is too foundational in Ren-C and can't budge.

I think it would be a bad direction to try and get more places to accept TAG! as a synonym for TEXT!:

"fail <ENCODING must be UTF-8, UCS-2, UTF-16, or LATIN-1>"

I've mentioned the idea that &{...} would become BINARY!, freeing up #{...} as an alternate form of ISSUETOKENCHAR! (TOKEN!).

"fail #{ENCODING must be UTF-8, UCS-2, UTF-16, or LATIN-1}"

But this has some of the same issues as TAG!, in that it's really supposed to be a separate type that gets dialected differently.

How About Those Unicode Characters?

The only unicode choice that looks different enough to not be something like "slightly different way to draw a parenthesis" yet still look remotely good to me is a double angle bracket:

"fail ⟪ENCODING must be UTF-8, UCS-2, UTF-16, or LATIN-1⟫"

Weird asymmetric quotes also...exist:

"fail “ENCODING must be UTF-8, UCS-2, UTF-16, or LATIN-1”"

But while those might be distinct codepoints, it looks similar enough to plain quotes to feel like a bad choice. Though the way they run up against regular quotes is a bit easier to see in the fonts I'm looking at than an apostrophe running up against regular quotes, so it scores a point over JavaScript's choice here.

What About FORM-ing WORD!s Instead of Using A String?

You can imagine calling FORM on a BLOCK!, and that will handle some things:

"fail form [ENCODING must be UTF-8, UCS-2, UTF-16, or LATIN-1]"

Then the idea of being able to ask a GET-FENCE! to be a synonym for that behavior is fairly out-there:

"fail :{ENCODING must be UTF-8, UCS-2, UTF-16, or LATIN-1}"

We could argue that perhaps FAIL itself can accept FENCE!, so if you pass it quoted it won't turn into an object:

"fail '{ENCODING must be UTF-8, UCS-2, UTF-16, or LATIN-1}"

But that puts a lot of pressure on FAIL. We could say it's all part of the behavior of DELIMIT, which assumes you don't want to make objects from fences but you want to turn them into text:

"fail [{ENCODING must be UTF-8, UCS-2, UTF-16, or LATIN-1}]"

But then you run into trouble if you are trying to do something like:

print ["The point is" format-point {x: 10 y: 20}]

That {x: 10 y: 20} is an argument to format-point, and the free mixing of expressions in DELIMIT/UNSPACED/SPACED makes that contentious as to how to read it.

Anyway--important to remember here is that FENCE! doesn't have the representational abilities of the old string...it can only hold LOADable code (e.g. forget semicolons). So trying to bend it to string purposes is a strange idea that isn't a substitute in general for strings.

Hate to Say It, <{This Still Looks Like The Best Answer}>

I feel like we're set up to catch more blame for being bad at expressing objects than we are to catch praise for being good at expressing strings.

e.g. I think we'd not be doing ourselves any favors by making objects be the "weird" thing.

obj: <{
    x: 10
    y: 20
}>

obj: <<
    x: 10
    y: 20
>>  ; (I think >> and << have likely other more interesting uses)

obj: {
    x: 10
    y: 20
}  ; This is what the people want and expect nowadays...

I notice Julia uses three quotes in a row for strings-with-single-quotes in them:

Title: """Your "Title" Here"""
Rights: """
    Copyright 2012 REBOL Technologies
    Copyright 2017-2021 "Ren-C" Open Source Contributors
 """

Point being: people are used to doing weird things when they need to put arbitrary text into their code. And they try not to do it (e.g. using separate files). Rebol has just tried pretty hard to pack everything-into-one-file where possible. Which is a good goal, and it's part of what has made it unique. {...} strings helped make that possible. But.. :frowning:

When you look at these options, is <{...}> really so bad? And maybe we say the UTF-8 version of it is ⟪...⟫

Rebol [
    Title: "Your module title here"
    Type: module
    Name: your-module
    Rights: ⟪
        Copyright 2012 REBOL Technologies
        Copyright 2017-2021 Ren-C Open Source Contributors
    ⟫
    License: ⟪
        Licensed under the Apache License, Version 2.0
        See: http://www.apache.org/licenses/LICENSE-2.0
    ⟫
    Description: ⟪
       Note that this covers all ascii:
       
            printf("The char is } and need not be escaped\n");

       And you can put in ⟪matched UTF-8 pairs⟫
    ⟫
]

Potential Compatibility Advantage of <{...}>

One other advantage of is that it is LOAD-compatible with R3-Alpha (more relevantly, older Ren-Cs), which would consider it to be a TAG!.

Red scans <{}> as < {} > at this time, but perhaps they'd decide that having curly braces in tags is more valuable than that compression. Probably not.

1 Like

Given all the options my vote would be for the triple quote - """
The majority of languages use " for strings and the triple makes the fact that it is a "special" string stand out. It also makes it easier to spot the closing quotes in long multi-line strings. Python uses """ and is very popular which will aid readability for new users.
Thanks, John

1 Like

Note that triple quotes doesn't help as a new answer for the API example (it makes it exponentially worse if you tried to use them!)

rebElide("print {If braces are given up, how to replace this?}");

Remembering that I'm trying to avoid a lot of ugly API calls like:

rebElide("print \"If braces are given up, how to replace this?\"");

C doesn't have any alternatives there besides quotes. And JavaScript has apostrophe-delimited strings as an alternative, but we use apostrophe a lot inside code so that's not good either.

Rebol is so saturated in its use of symbols that creative choices are unavailable.

obj: (|x: 10, y:20|)  ; for instance, (| "banana clips" |)

We believe | and || are legal symbol in arrays, and (||) is a 1-element GROUP! with || in it, while (| |) is a 2-element GROUP! of two |'s

It's tough here, and <{ and }> so far still seem like a practical answer which allows braces to take on a new role. But we don't really know if that role will pay off yet, so it's worth continuing to think.

1 Like

A tricky decision indeed.
I can see the value of the API use case and it is a very good cause. Would it be an option to support both? That way """ is easier for new users and the <{ }> is great for APIs? I agree that this is still not ideal, but might be worth considering if the implementation effort is low. Thanks, John

A small suggestion: could the usual "" strings be made multiline? I don’t see any reason they couldn’t contain newlines. And it seems to be a pretty popular choice amongst programming languages. (Curiously, it seems particularly popular amongst the ‘minimalistic’ bunch: Lisps, Smalltalk, Io, etc.)

Giuilio mentions it at the top of this thread, and I don't know that I have super strong opinions on it... though I'm slightly biased against it.

What I really want is a clean way to embed strings-in-strings for API usage, and it's woeful that none of the solutions feel good.

Ah, I didn’t realise this thread existed already, sorry! I do see the issues with "…" now.

Out of those proposed, I don’t hate #{…} as a possible solution. <{…}> isn’t bad either.

So I'm actually probably more concerned about the need to escape in the <{ }> notation than I am about needing to escape in the plain string notation. Because the <{ }> is designed for just spanning a bunch of freeform text... quite possibly from another programming language.

Regarding quote doubling... given the space-significance of Rebol, I wonder if a rule about nested quotes could be that if a quote isn't followed by a terminating delimiter character (space, ) ], }, comma, newline) then it starts a paired quote?

 "I'm an "example" string and my quotes work fine"

So the rule being that "e wasn't a string termination sequence because the quote wasn't followed by a delimiter, and so instead it cued a "nested quote mode" where it will consume the next quote.

When in nested quote mode, the next quote would have to end the nest, regardless of what was after it.

It wouldn't work for:

 "My favorite symbol is the " " symbol"
 "My favorite symbol is the "," symbol"
 "My favorite symbol is the "]" symbol"
 "My favorite symbol is the "}" symbol"
 "My favorite symbol is the ")" symbol"
 "My favorite symbol is the "
 " symbol"

But I think it would cover most of the nested quote situations that occur in practice. (And actually, we could in theory make the middle cases work if we said that a nested quote followed by delimiters but no spaces or newlines until the next quote are still part of the same string, which might be a good rule)

After having seen an opening quote, then seeing a quote followed by another quote could be considered to be one of the non-terminating conditions (treat "" as if you'd seen "e or whatever) such that:

>> print """"
== ""

That seems natural to me. I think the idea of doubling quotes inside strings to get one quote feels less natural, and if we have to use an escape sequence to get an unpaired or weird quote it doesn't bother me so much if the common desire is covered.

I greatly dislike this approach — I think it’s a bad idea to make this work with some embedded quotes, but not others. That feels like it could get a bit confusing in practice.

Also, that doesn’t solve the problem of passing strings in the C API.

With FENCE! existing, I think it's a given that anything that reuses {} needs to have the delimiter on both ends. Not only is it an important visual cue after a long read to know you're not dealing with a fence, but it gives you the benefit of not having to escape lone }.

Rebol [
    Title: "Your module title here"
    Type: module
    Name: your-module
    Rights: #{
        Copyright 2012 REBOL Technologies
        Copyright 2017-2021 Ren-C Open Source Contributors
    }#
    License: #{
        Licensed under the Apache License, Version 2.0
        See: http://www.apache.org/licenses/LICENSE-2.0
    }#
    Description: #{
       Note this reclaims the benefit of braces not escaping:
       
            printf("The char is } and it need not be escaped\n");

       So that's good.
   }#
]

To my eyes, that's rather abrasive to be using for a very common string type. Definitely more abrasive than <{...}>. And it leaves us without a representation for BINARY!, e.g. #{DECAFBAD}

Rebol [
    Title: "Your module title here"
    Type: module
    Name: your-module
    Rights: <{
        Copyright 2012 REBOL Technologies
        Copyright 2017-2021 Ren-C Open Source Contributors
    }>
    License: <{
        Licensed under the Apache License, Version 2.0
        See: http://www.apache.org/licenses/LICENSE-2.0
    }>
    Description: <{
       Note this reclaims the benefit of braces not escaping:
       
            printf("The char is } and it need not be escaped\n");

       So that's good.
   }>
]

But if the goal is "distinguishibility but slightness", you can't get too much more slight than -.

Rebol [
    Title: "Your module title here"
    Type: module
    Name: your-module
    Rights: -{
        Copyright 2012 REBOL Technologies
        Copyright 2017-2021 Ren-C Open Source Contributors
    }-
    License: -{
        Licensed under the Apache License, Version 2.0
        See: http://www.apache.org/licenses/LICENSE-2.0
    }-
    Description: -{
       Note this reclaims the benefit of braces not escaping:
       
            printf("The char is } and it need not be escaped\n");

       So that's good.
   }-
]

If that's too slight to the point of being "too fency", then = is another option whose sleek horizontalness doesn't break the visual flow as much...though it does certainly break the flow.

Rebol [
    Title: "Your module title here"
    Type: module
    Name: your-module
    Rights: ={
        Copyright 2012 REBOL Technologies
        Copyright 2017-2021 Ren-C Open Source Contributors
    }=
    License: ={
        Licensed under the Apache License, Version 2.0
        See: http://www.apache.org/licenses/LICENSE-2.0
    }=
    Description: ={
       Note this reclaims the benefit of braces not escaping:
       
            printf("The char is } and it need not be escaped\n");

       So that's good.
   }=
]

My use of === as a section divider leads me to not wanting to see something this heavy in such positions.

Red has adopted %{...}% for "raw strings", e.g. those that you don't need to use escaping inside. Which compromises the FILE! string type %file.txt, which would want to use %{...}% for spanning spaced filenames I'd imagine.

I believe I agree with @giuliolunati that not using escaping makes the best default. if you wanted escaping, then maybe backslash is the way to go. Ugly, but it would call attention to the fact that you're using escaping...and you'd be able to notice if you took the escaping out and could drop the backslashes at that point.

Rebol [
    Title: "Your module title here"
    Type: module
    Name: your-module
    Rights: -{
        Copyright 2012 REBOL Technologies
        Copyright 2017-2021 Ren-C Open Source Contributors
    }-
    License: -{
        Licensed under the Apache License, Version 2.0
        See: http://www.apache.org/licenses/LICENSE-2.0
    }-
    Description: -\{
       Note this reclaims the benefit of braces not escaping:
       
            printf("The char is } and it need not be escaped\n");

       So that's good.  And here we can do \n\n\n for escaped lines.
   }\-
]

I'm not using just \{...}\ because I imagine this as a family of escaped strings %\{...}\%, so you'd need to know which it is. It also counterintuitively looks better with the -.

Rebol [
    Title: "Your module title here"
    Type: module
    Name: your-module
    Rights: -{
        Copyright 2012 REBOL Technologies
        Copyright 2017-2021 Ren-C Open Source Contributors
    }-
    License: -{
        Licensed under the Apache License, Version 2.0
        See: http://www.apache.org/licenses/LICENSE-2.0
    }-
    Description: \{
       Weirdly this does not look as good, despite dropping a character.
       
            printf("The char is } and it need not be escaped\n");

       So that's good.  And here we can do \n\n\n for escaped lines.
   }\
]

The slightness argument makes the -{...}- and -\{...}\- idea actually kind of compelling to me. It wouldn't cause confusion with TAG! (and would leave <{...}> and <\{...}\> as an option for a tag that wouldn't need to escape > inside of it.)

It's a pattern which could be used with quoted strings too.

>> print "I'm a \n not escaped string"
I'm a \n not escaped string

>> print \"I'm an \nescaped string"\
I'm an
escaped string

>> print -"I'm a quote " safe string"-
I'm a quote " safe string

>> print -\"I'm a quote " safe\nescaped string"\-
I'm a quote " safe
escaped string

Gives you options. -{"}- isn't a fantastic way to say "single quote" but it's better than -"""- and definitely better than \"\""\

The BINARY! representation contention with TOKENISSUECHAR! (whatever that thing is) bothers me. I'd imagine like #{"}# as a single character for quote, and #"{"# as a single character for brace, with things like #a and #b just standing on their own since there are no spaces or irregular letters.

But we've kind of run out of symbols at this point... what does #{DECAFBAD} become to be a binary? There'd still be &"..." and $"..." I guess (not fence forms like &{..} and ${ }). It's up to you whether to put a closing "& or "$ on that, since $ and & and other escapable things aren't legal inside binaries.

Or angle brackets with a decorator. $<DECAFBAD> is actually a pretty respectable looking BINARY!. And it leaves #< and #> for the characters of less than and greater than (assuming we don't get so saturated that #<> needs to be something, but maybe that's looking like a bad assumption).

I'm definitely liking the raw-by-default concept. It makes the most sense.

I'm still leery of "..." being multiline (or even -"..."-). With the -{...}- option available I think being able to rein in the frustration of unclosed quotes would be a benefit to having the choice. Making the jump to the braced form would go hand in hand with having to be more vigilant/tolerant of the potentially mysterious errors that happen when multiline strings aren't correctly terminated and wind up coupling with a later delimiter than expected, making a larger-than-expected string.

Using "" for escaping a single quote in a raw string vs. being an error is something to ponder ...as keeping from forcing you to a different string choice or the ugly \""\ just to get a quote inside a quoted string. But I still feel it's more logical to say:

>> print """"
""

Instead of:

>> print """"
== "

The idea of starting the quotes-look-for-pairs process when a quote is seen followed by a non-closing-delimiter seems novel and useful.

>> print "I think "(this seems cool)" and would be useful."
I think "(this seems cool)" and would be useful.

So you wouldn't have to jump to the 2-character delimiter format of -{...}- for common quoted material cases, making the fence transition introduce somewhat less "noise".

This strategy doesn't cut you any breaks if you want lone quotes inside a lone quoted string. Since it's raw, there's no representation for that. You have to go to more than one character for your delimiter...to use a different delimiter that doesn't conflict with quote, or to add the ability to escape. So -{"}- or -"""- would be your only reasonable choices for a lone quote string... given that \"\""\ is unreadable.

If doing something in C, the choice would clearly be the braced form, giving you the "about as good as it's going to get" option of:

rebElide("print -{\"}-");  // backslash is for the C literal, not seen by Rebol

I was going to say it's not as "nice" as it was in the pre-FENCE! era:

rebElide("print {\"}");

But actually it feels like there's a bit more guidance with that compound delimiter to know what you're looking at, fences or not.

If single quoted strings are confined to one line, it helps mitigate how out of hand this could get. It's something that I think should be given a chance before dismissing it out of hand.

Everything about the practice of escaping strings comes down to tradeoff situations. This just would be the case that if you are going to deal with putting quotes inside of single quoted strings, you would have to ask yourself "am I doing a simple quote pair that doesn't clearly conflict with what looks like single-quoted string termination". If the answer is no, choose another method.

I don't know if the deeper nuance to make closing delimiters legal under some situations but not others is worth it:

"this would ") be illegal"  ; thinks the ) with space after is a closing delimiter 
"this could ")" be legal"  ; looks ahead and sees ), but then looks and sees "

But again, I wouldn't dismiss that out of hand either. It may follow a simple-enough-to-articulate implementation rule, that is learnable in practice.