Alternate String Forms if {...} Becomes An Array Type

Note that triple quotes doesn't help as a new answer for the API example (it makes it exponentially worse if you tried to use them!)

rebElide("print {If braces are given up, how to replace this?}");

Remembering that I'm trying to avoid a lot of ugly API calls like:

rebElide("print \"If braces are given up, how to replace this?\"");

C doesn't have any alternatives there besides quotes. And JavaScript has apostrophe-delimited strings as an alternative, but we use apostrophe a lot inside code so that's not good either.

Rebol is so saturated in its use of symbols that creative choices are unavailable.

obj: (|x: 10, y:20|)  ; for instance, (| "banana clips" |)

We believe | and || are legal symbol in arrays, and (||) is a 1-element GROUP! with || in it, while (| |) is a 2-element GROUP! of two |'s

It's tough here, and <{ and }> so far still seem like a practical answer which allows braces to take on a new role. But we don't really know if that role will pay off yet, so it's worth continuing to think.

1 Like

A tricky decision indeed.
I can see the value of the API use case and it is a very good cause. Would it be an option to support both? That way """ is easier for new users and the <{ }> is great for APIs? I agree that this is still not ideal, but might be worth considering if the implementation effort is low. Thanks, John

A small suggestion: could the usual "" strings be made multiline? I donā€™t see any reason they couldnā€™t contain newlines. And it seems to be a pretty popular choice amongst programming languages. (Curiously, it seems particularly popular amongst the ā€˜minimalisticā€™ bunch: Lisps, Smalltalk, Io, etc.)

Giuilio mentions it at the top of this thread, and I don't know that I have super strong opinions on it... though I'm slightly biased against it.

What I really want is a clean way to embed strings-in-strings for API usage, and it's woeful that none of the solutions feel good.

Ah, I didnā€™t realise this thread existed already, sorry! I do see the issues with "ā€¦" now.

Out of those proposed, I donā€™t hate #{ā€¦} as a possible solution. <{ā€¦}> isnā€™t bad either.

So I'm actually probably more concerned about the need to escape in the <{ }> notation than I am about needing to escape in the plain string notation. Because the <{ }> is designed for just spanning a bunch of freeform text... quite possibly from another programming language.

Regarding quote doubling... given the space-significance of Rebol, I wonder if a rule about nested quotes could be that if a quote isn't followed by a terminating delimiter character (space, ) ], }, comma, newline) then it starts a paired quote?

 "I'm an "example" string and my quotes work fine"

So the rule being that "e wasn't a string termination sequence because the quote wasn't followed by a delimiter, and so instead it cued a "nested quote mode" where it will consume the next quote.

When in nested quote mode, the next quote would have to end the nest, regardless of what was after it.

It wouldn't work for:

 "My favorite symbol is the " " symbol"
 "My favorite symbol is the "," symbol"
 "My favorite symbol is the "]" symbol"
 "My favorite symbol is the "}" symbol"
 "My favorite symbol is the ")" symbol"
 "My favorite symbol is the "
 " symbol"

But I think it would cover most of the nested quote situations that occur in practice. (And actually, we could in theory make the middle cases work if we said that a nested quote followed by delimiters but no spaces or newlines until the next quote are still part of the same string, which might be a good rule)

After having seen an opening quote, then seeing a quote followed by another quote could be considered to be one of the non-terminating conditions (treat "" as if you'd seen "e or whatever) such that:

>> print """"
== ""

That seems natural to me. I think the idea of doubling quotes inside strings to get one quote feels less natural, and if we have to use an escape sequence to get an unpaired or weird quote it doesn't bother me so much if the common desire is covered.

I greatly dislike this approach ā€” I think itā€™s a bad idea to make this work with some embedded quotes, but not others. That feels like it could get a bit confusing in practice.

Also, that doesnā€™t solve the problem of passing strings in the C API.

With FENCE! existing, I think it's a given that anything that reuses {} needs to have the delimiter on both ends. Not only is it an important visual cue after a long read to know you're not dealing with a fence, but it gives you the benefit of not having to escape lone }.

Rebol [
    Title: "Your module title here"
    Type: module
    Name: your-module
    Rights: #{
        Copyright 2012 REBOL Technologies
        Copyright 2017-2021 Ren-C Open Source Contributors
    }#
    License: #{
        Licensed under the Apache License, Version 2.0
        See: http://www.apache.org/licenses/LICENSE-2.0
    }#
    Description: #{
       Note this reclaims the benefit of braces not escaping:
       
            printf("The char is } and it need not be escaped\n");

       So that's good.
   }#
]

To my eyes, that's rather abrasive to be using for a very common string type. Definitely more abrasive than <{...}>. And it leaves us without a representation for BINARY!, e.g. #{DECAFBAD}

Rebol [
    Title: "Your module title here"
    Type: module
    Name: your-module
    Rights: <{
        Copyright 2012 REBOL Technologies
        Copyright 2017-2021 Ren-C Open Source Contributors
    }>
    License: <{
        Licensed under the Apache License, Version 2.0
        See: http://www.apache.org/licenses/LICENSE-2.0
    }>
    Description: <{
       Note this reclaims the benefit of braces not escaping:
       
            printf("The char is } and it need not be escaped\n");

       So that's good.
   }>
]

But if the goal is "distinguishibility but slightness", you can't get too much more slight than -.

Rebol [
    Title: "Your module title here"
    Type: module
    Name: your-module
    Rights: -{
        Copyright 2012 REBOL Technologies
        Copyright 2017-2021 Ren-C Open Source Contributors
    }-
    License: -{
        Licensed under the Apache License, Version 2.0
        See: http://www.apache.org/licenses/LICENSE-2.0
    }-
    Description: -{
       Note this reclaims the benefit of braces not escaping:
       
            printf("The char is } and it need not be escaped\n");

       So that's good.
   }-
]

If that's too slight to the point of being "too fency", then = is another option whose sleek horizontalness doesn't break the visual flow as much...though it does certainly break the flow.

Rebol [
    Title: "Your module title here"
    Type: module
    Name: your-module
    Rights: ={
        Copyright 2012 REBOL Technologies
        Copyright 2017-2021 Ren-C Open Source Contributors
    }=
    License: ={
        Licensed under the Apache License, Version 2.0
        See: http://www.apache.org/licenses/LICENSE-2.0
    }=
    Description: ={
       Note this reclaims the benefit of braces not escaping:
       
            printf("The char is } and it need not be escaped\n");

       So that's good.
   }=
]

My use of === as a section divider leads me to not wanting to see something this heavy in such positions.

Red has adopted %{...}% for "raw strings", e.g. those that you don't need to use escaping inside. Which compromises the FILE! string type %file.txt, which would want to use %{...}% for spanning spaced filenames I'd imagine.

I believe I agree with @giuliolunati that not using escaping makes the best default. if you wanted escaping, then maybe backslash is the way to go. Ugly, but it would call attention to the fact that you're using escaping...and you'd be able to notice if you took the escaping out and could drop the backslashes at that point.

Rebol [
    Title: "Your module title here"
    Type: module
    Name: your-module
    Rights: -{
        Copyright 2012 REBOL Technologies
        Copyright 2017-2021 Ren-C Open Source Contributors
    }-
    License: -{
        Licensed under the Apache License, Version 2.0
        See: http://www.apache.org/licenses/LICENSE-2.0
    }-
    Description: -\{
       Note this reclaims the benefit of braces not escaping:
       
            printf("The char is } and it need not be escaped\n");

       So that's good.  And here we can do \n\n\n for escaped lines.
   }\-
]

I'm not using just \{...}\ because I imagine this as a family of escaped strings %\{...}\%, so you'd need to know which it is. It also counterintuitively looks better with the -.

Rebol [
    Title: "Your module title here"
    Type: module
    Name: your-module
    Rights: -{
        Copyright 2012 REBOL Technologies
        Copyright 2017-2021 Ren-C Open Source Contributors
    }-
    License: -{
        Licensed under the Apache License, Version 2.0
        See: http://www.apache.org/licenses/LICENSE-2.0
    }-
    Description: \{
       Weirdly this does not look as good, despite dropping a character.
       
            printf("The char is } and it need not be escaped\n");

       So that's good.  And here we can do \n\n\n for escaped lines.
   }\
]

The slightness argument makes the -{...}- and -\{...}\- idea actually kind of compelling to me. It wouldn't cause confusion with TAG! (and would leave <{...}> and <\{...}\> as an option for a tag that wouldn't need to escape > inside of it.)

It's a pattern which could be used with quoted strings too.

>> print "I'm a \n not escaped string"
I'm a \n not escaped string

>> print \"I'm an \nescaped string"\
I'm an
escaped string

>> print -"I'm a quote " safe string"-
I'm a quote " safe string

>> print -\"I'm a quote " safe\nescaped string"\-
I'm a quote " safe
escaped string

Gives you options. -{"}- isn't a fantastic way to say "single quote" but it's better than -"""- and definitely better than \"\""\

The BINARY! representation contention with TOKENISSUECHAR! (whatever that thing is) bothers me. I'd imagine like #{"}# as a single character for quote, and #"{"# as a single character for brace, with things like #a and #b just standing on their own since there are no spaces or irregular letters.

But we've kind of run out of symbols at this point... what does #{DECAFBAD} become to be a binary? There'd still be &"..." and $"..." I guess (not fence forms like &{..} and ${ }). It's up to you whether to put a closing "& or "$ on that, since $ and & and other escapable things aren't legal inside binaries.

Or angle brackets with a decorator. $<DECAFBAD> is actually a pretty respectable looking BINARY!. And it leaves #< and #> for the characters of less than and greater than (assuming we don't get so saturated that #<> needs to be something, but maybe that's looking like a bad assumption).

I'm definitely liking the raw-by-default concept. It makes the most sense.

I'm still leery of "..." being multiline (or even -"..."-). With the -{...}- option available I think being able to rein in the frustration of unclosed quotes would be a benefit to having the choice. Making the jump to the braced form would go hand in hand with having to be more vigilant/tolerant of the potentially mysterious errors that happen when multiline strings aren't correctly terminated and wind up coupling with a later delimiter than expected, making a larger-than-expected string.

Using "" for escaping a single quote in a raw string vs. being an error is something to ponder ...as keeping from forcing you to a different string choice or the ugly \""\ just to get a quote inside a quoted string. But I still feel it's more logical to say:

>> print """"
""

Instead of:

>> print """"
== "

The idea of starting the quotes-look-for-pairs process when a quote is seen followed by a non-closing-delimiter seems novel and useful.

>> print "I think "(this seems cool)" and would be useful."
I think "(this seems cool)" and would be useful.

So you wouldn't have to jump to the 2-character delimiter format of -{...}- for common quoted material cases, making the fence transition introduce somewhat less "noise".

This strategy doesn't cut you any breaks if you want lone quotes inside a lone quoted string. Since it's raw, there's no representation for that. You have to go to more than one character for your delimiter...to use a different delimiter that doesn't conflict with quote, or to add the ability to escape. So -{"}- or -"""- would be your only reasonable choices for a lone quote string... given that \"\""\ is unreadable.

If doing something in C, the choice would clearly be the braced form, giving you the "about as good as it's going to get" option of:

rebElide("print -{\"}-");  // backslash is for the C literal, not seen by Rebol

I was going to say it's not as "nice" as it was in the pre-FENCE! era:

rebElide("print {\"}");

But actually it feels like there's a bit more guidance with that compound delimiter to know what you're looking at, fences or not.

If single quoted strings are confined to one line, it helps mitigate how out of hand this could get. It's something that I think should be given a chance before dismissing it out of hand.

Everything about the practice of escaping strings comes down to tradeoff situations. This just would be the case that if you are going to deal with putting quotes inside of single quoted strings, you would have to ask yourself "am I doing a simple quote pair that doesn't clearly conflict with what looks like single-quoted string termination". If the answer is no, choose another method.

I don't know if the deeper nuance to make closing delimiters legal under some situations but not others is worth it:

"this would ") be illegal"  ; thinks the ) with space after is a closing delimiter 
"this could ")" be legal"  ; looks ahead and sees ), but then looks and sees "

But again, I wouldn't dismiss that out of hand either. It may follow a simple-enough-to-articulate implementation rule, that is learnable in practice.