{ Rethinking Braces }... as an array type?

I've historically been pretty attached to braces for strings. They sure can be nice.

But I am increasingly thinking braces might be better applied as a new array type:

>> bracey: first [{This [would] be @legal}]
== {This [would] be @legal}

>> length of bracey
== 4

>> second bracey
== [would]

So it would act like a BLOCK! or a GROUP! when it was inert. But the real benefit would be the idea that if this braced form got evaluated, it would effectively do a MAKE MAP! or MAKE OBJECT! (or "something along those lines")

>> obj: {x: 10 y: 20, z: 30}
== make object! [  ; whatever this representation is, it's not {x: 10...}
    x: 10
    y: 20
    z: 30
]

This kills two birds with one stone: A neat new dialecting part that would also give a better source notation for objects!

Having its evaluator behavior be "make an object/map" pushes this from "frivolous third form of block" to being clearly useful on day 1. But I think the block form would soon turn out to not be frivolous.

Carl Himself Wants To Move Away From Braced Strings

In Carl's "ASON" pitch, he moves away from Rebol's choice to make braces an asymmetric string delimiter:

  • "Braces {} are used to denote objects. They are lexical and may be used directly without evaluation (the make constructor is not necessary)."

  • "Braces {} are not used for multi-line strings. A single+double quote format is used for multi-line strings."

I must admit braced strings can make a lot of situations in the text programming world look better than they typically would.

But it comes at a cost for taking the asymmetric delimiter, and is a real weakness against JavaScript and JSON. When rethought as this fun new dialecting part, it actually offers a new edge and plays to Rebol's strengths.

What might the new {...} type do in PARSE? As a branch type? In your own dialects?

What Other Asymmetric String Technique Might Be Used?

Some languages have weird techniques, like even letting you make up your own delimiters by whatever you use in front of the quote:

str: ?"This says "quote followed by question mark" terminates"?
str: |"This says "quote followed by bar" terminates"|
str: xyz"This says "quote followed by zyx" terminates"zyx 

(Not making that up.)

The risk of using a symbol like | is that even if it looks good in isolation, you might not like it in a parse rule, like rule1 | |"some string"| | rule2. A less-used character might be better:

parse data [rule1 | ~"some string"~ | rule2]

Or perhaps those who really feel the need for another asymmetric string delimiter should assign a couple keys in their editor to unicode:

str: «Maybe People «who really care» could use "Chevrons"?»

Who knows. One place to look is the topic of "HEREDOC"

Another place to look is the List of open/close paired braces/brackets/quotes in Unicode

But point is that braces have been sought after for the missing element of key-value-thing representation (I'll avoid calling it object vs. map).

My {...} Proposal Is Arrays, Not Object Literals

It might seem like having a source representation of objects that maps directly to the loaded/ in-memory representation would be better. But in practice, you can't really get the loaded form to ever look completely like the source...there's so many issues with nested cyclical structures or things that just don't mold out.

It doesn't work in JavaScript either. Note that you're not supposed to be loading JSON directly in any case into JavaScript...you're always supposed to go through parsers and serializers. So that should be weighed here when looking at the suggestion of a structural type that happens to evaluate to give you an in-memory representation.

Map Representation Via : ?

There was another remark in the Altscript on the role of colon:

For JSON compatiblity:

  • Keys (word definitions) can be written with quotes ("field":)
  • A lone colon (:) will automatically associate to the word/string immediately before it.
  • Commas as element separators are allowed as long as they are not directly followed by a non-digit character (to avoid confusion with comma-based decimal values.)

The note about the colon seems like it might be good for maps.

mapping: {
    1 : "One"
    "Two" : 2
}

This could help avoid the need for SET-INTEGER! or similar.

3 Likes

Yes, "make object!" must die. I have wasted much time in the past trying to work out how to use blocks instead of objects just to avoid the mess that objects make on mold of a structure.

Yes please!

Damn, that's true, but surely there's a solution somewhere. Nesting should be straight forward, but cycling I guess needs to represent a context reference and we don't have identifiers for contexts.

123 : {bizarre-thought : "is something like this the new type of" "context" cycling: @123}

But if that was feasible and allowed, would there be any significant difference between { and [ ?

2 Likes

Well my point is about ambiguity:

 >> first [{a: 10 b: 10 + 20}]
 == {a: 10 b: 10 + 20}  ; a BRACED! of length 6 (or whatever)

>> {a: 10 b: 10 + 20}
== ???a 10 b 20???  ; an object or map or dictionary or something.

There would be some kind of semi-serialization operator:

>> obj: {a: 10 b: 10 + 10}
== ???a 10 b 20???

>> bracify obj
== {
    a: 10
    b: 20
}

Which could navigate you from the object from "what would create me". But this has limits.

and maybe that's what MOLD is. (Not that I ever liked the name "mold"... SERIALIZE ?)

I'm just saying that the console needs to keep you grounded when you're talking about something that's in memory and structured vs. the source array notation. They're different and it's a losing battle to not keep people aware of that difference.

2 Likes

I 90% use {...} for multiline strings, but I think "..." could be multiline:

"This is
a multiline
string"

BTW, I don't like ^-escape, I'd prefer \-escape as in ASON/AltScript

And, if we'll impose space around strings (and blocks maybe?) then we'll open the road to various string flavours:

^"I'm an ^"^^-escaped^" string, ^^ and ^" must be escaped, \ is literal!"

\\"I'm an \\"\\\\-escaped\\" string, \\\\ and \\" must be escaped, ^ is literal!"

"I'm a ""raw"" string, ^ and \\ are literal, but "" must be doubled!"
2 Likes

A place that gets hit particularly hard by losing an alternative delimiter is the API.

rebElide("print {We've relied on this :-( and sucks to lose it}");

If quotes are all we have for strings, mixing inside other language's quotes looks bad fast:

rebElide("print \"We've relied on this :-( and sucks to lose it\"");

One axis for attacking this problem is Sea of Words and Echo, where if what you're doing is not too antagonistic (like the :-( above) you can imagine operators that turn blocks into strings.

In PRINT's case, it might be that @ does exactly this...suppressing the REDUCE:

rebElide("print @[This might just form the contents as-is.]");

Chevrons might actually not be a terrible answer to the issue for people doing a lot of editing in the API:

rebElide("print «We've relied on this :-( and sucks to lose it»");

The idea of sacrificing << and >> themselves to strings is a weird one:

rebElide("print <<We've relied on this :-( and sucks to lose it>>");

It looks too much like tags.

Of course, there's always... backquote... which may just be the best of the bad options: :frowning:

rebElide("print `We've relied on this :-( and sucks to lose it`");

In any case, this mixture issue is a real pain point on losing braces for strings. We need to keep apostrophe for what it is, so the options are pretty limited.

Carving <{...}> out of legal tags is another alternative to <<...>> which might come off as a little bit "less taglike"?

rebElide("print <{We've relied on this :-( and sucks to lose it}>");

If you squint, <{ almost looks like its own compound symbol. A relative of ﴾ornate parentheses﴿ or perhaps ⦓Arc Brackets⦔

Does It Seem Worth The Sacrifice?

For all that freeing up FENCE! does, can we suffer through <{...}> strings and/or backticks?

It feels like a tradeoff worth pursuing.

I think a nice thing about <{ is that it isn't something that would be sought after as an operator in its own right, the way << would.

I prefer ` over <{ }>

Oh please no backticks, that key should be removed from all keyboards worldwide as far as I am concerned.

I am also "attached" to our use of braces.

Braces are bad otherwise because of their similarity with parenthesis, sometimes it is hard to see they are used.
They use of the curly brackets in this language is one of the things I particularly like about Rebol.

The small list of Rebol features that kept me:

  • 1-based
  • case insensitive
  • no curly braces, only for multiline string
  • no parentheses needed for function call / parameter passing
  • many data types, no need to recreate all those trivial ones over and over again
  • Almost all functionality out of the box, no import needed even for much used "library" things
  • pleasing syntax, flexibility

I've never had a problem distinguishing them.

Do note that in dialects, the "sea of words" means that this new FENCE! could be used as a kind of string.

>> thing: '{"Almost anything" you write, it's possible to LOAD now.}

>> type of thing
== #[datatype! fence!]

>> first thing
== "Almost anything"

>> last thing
== now.

>> type of last thing
== #[datatype! tuple!]

>> print form thing
"Almost anything" you write, it's possible to LOAD now.

Maybe even a GET-FENCE! will form it?

>> :{Like "this", maybe?}
== "Like \"this\" maybe?"  ; or "Like ""this"" maybe?"

Of course some things don't work, like semicolons. And unless something changes in the plan, parentheses directly adjoining words wouldn't work either, like {foo()}

It does mean that in things like module headers or function specs, there could be a rule like saying that fences are handled like strings.

foo: func [
    {Maybe this still works, with "strings" in it, if you want it to?}
    return: [integer!]
    args...
][...]

Certainly tolerance of this would be needed as a compatibility measure. But maybe it's more than compatibility. Maybe it's an enduring duality of what "fences" are for.

1 Like

Another use of {...} is in command line:
bash> r3 script.reb string: {the-string}
<{...}> would need quoting:
bash> r3 script.reb string: "<{the-string}>"

Same issue for backticks.

Never had autocorrect change a straight quote into a backtick?

No more... Now words are case sensitive.

There may be a mode to make modules case insensitive or not...

But I'm pretty convinced that there must at least be the option of case-sensitivity.

If you look at things like module headers, I think the <{ }> is not the worst thing in the world:

Rebol [
    Title: "Your module title here"
    Type: module
    Name: your-module
    Rights: <{
        Copyright 2012 REBOL Technologies
        Copyright 2017-2021 Ren-C Open Source Contributors
    }>
    License: <{
        Licensed under the Apache License, Version 2.0
        See: http://www.apache.org/licenses/LICENSE-2.0
    }>
    Description: <{
       We've gotten used to writing anything we want inside of braced
       strings...this gives us more freedom with single braces:
       
            printf("The char is } and that's okay\n");

       So there's a benefit to it.
   }>
]

Compare with a backtick, which I do find to be worse, personally. It looks like... there's dirt on the screen. (So actually agreeing with @iArnold on that, it's a bad character.)

Rebol [
    Title: "Your module title here"
    Type: module
    Name: your-module
    Rights: `
        Copyright 2012 REBOL Technologies
        Copyright 2017-2021 Ren-C Open Source Contributors
    `
    License: `
        Licensed under the Apache License, Version 2.0
        See: http://www.apache.org/licenses/LICENSE-2.0
    `
    Description: `
       We've gotten used to writing anything we want inside of braced
       strings...this gives us more freedom with single braces:
       
            printf("The char is } and that's okay\n");

       So there's a benefit to it.
   `
]

Double-quotes just doesn't fit the bill because quotes are used too freely inside both text and code samples, and we've gotten too used to that.

I've really been wishing multiline text literals could have something like Yaml does, where indentation drives it. They use | which is clean, but we'd hate to lose the symbol for other purposes. Maybe backslash?

Rebol [
    Title: "Your module title here"
    Type: module
    Name: your-module

    Rights: \
        Copyright 2012 REBOL Technologies
        Copyright 2017-2021 Ren-C Open Source Contributors

    License: \
        Licensed under the Apache License, Version 2.0
        See: http://www.apache.org/licenses/LICENSE-2.0

    Description: \
       We've gotten used to writing anything we want inside of braced
       strings...this gives us more freedom with single braces:
       
            printf("The char is } and that's okay\n");

       So there's a benefit to it.
]

Indentation-driven answers have the benefit of not needing to worry about escape sequences.

I'll also point out that I'd been talking about a new notation for BINARY!, e.g. &{...} in order to free up #{...} as an ISSUE!+TOKEN!+CHAR! notation, so that would be an immutable string literal. (Note that ASON suggests Carl wanted to make all string literals immutable.) But I think #{} looks pretty bad for the above:

Rebol [
    Title: "Your module title here"
    Type: module
    Name: your-module
    Rights: #{
        Copyright 2012 REBOL Technologies
        Copyright 2017-2021 Ren-C Open Source Contributors
    }
    License: #{
        Licensed under the Apache License, Version 2.0
        See: http://www.apache.org/licenses/LICENSE-2.0
    }
    Description: #{
       Note that this would not have the benefit of braces not escaping:
       
            printf("The char is ^} and must be escaped\n");

       So that's a drawback.
   }
]

Something about that I find more jarring than <{...}>, but it also doesn't have the advantage of not needing to escape single braces.

As you point out, backticks have the same problem. But as I've said, one of the potentially nice things about calling the new array type FENCE! could be that we stay fluid on its dialecting purpose...and sometimes it's used for things that could be text but contain quoted items. Like I say--that won't help you with semicolons or anything non-LOADable, but it may cover a lot of cases.

No syntactic indentation please, keep the freeform style of Rebol !!

I'd like "named quotes":
WORD"..."WORD
e.g. with --"..."--

--"2^3
"string""--
=
"2^^3^/^"string^""

and with ++"..."++

++" --"string"-- "++
=
" --^"string^"-- "

This could be

Rights: --"
        Copyright 2012 REBOL Technologies
        Copyright 2017-2021 Ren-C Open Source Contributors
"--

Another possibility is to have a character that means "make a string to the end of line", and then putting these in BLOCK!s:

Rebol [
    Title: "Your module title here"
    Type: module
    Name: your-module
    Rights: [
      \ Copyright 2012 REBOL Technologies
      \ Copyright 2017-2021 Ren-C Open Source Contributors
    ]
    License: [
      \ Licensed under the Apache License, Version 2.0
      \ See: http://www.apache.org/licenses/LICENSE-2.0
    ]
    Description: [
      \ We've gotten used to writing anything we want inside of braced
      \ strings...this gives us more freedom with single braces:
      \
      \      printf("The char is } and that's okay\n");
      \
      \ So there's a benefit to it.
   ]
]

Several languages (like Haskell) basically don't have better answers for multi-line strings vs. "apply an operator that inserts newlines onto an array of strings". There's a specialization of DELIMIT/TAIL that does that, and it's called NEWLINED.

>> newlined ["one" "two"]
== "one^/two^/"

It looks kind of jarring to use a character that is not straight up-and-down.

; We're used to seeing multiple lines like this,
; and I think it's more comfortable because of the verticality.

Perhaps lone exclamation mark? We might even argue that if a FENCE! sees strings inside of it, then the evaluator will make a string out of what it gets...

Rebol [
    Title: "Your module title here"
    Type: module
    Name: your-module
    Rights: {
      ! Copyright 2012 REBOL Technologies
      ! Copyright 2017-2021 Ren-C Open Source Contributors
    }
    License: {
      ! Licensed under the Apache License, Version 2.0
      ! See: http://www.apache.org/licenses/LICENSE-2.0
    }
    Description: {
      ! We've gotten used to writing anything we want inside of braced
      ! strings...this gives us more freedom with single braces:
      !
      !      printf("The char is } and that's okay\n");
      !
      ! So there's a benefit to it.
   }
]

So imagine FENCE! having this reaction:

>> {x: 10 y: 10 + 10}
== object!##{x: 10 y: 20}

>> {"one" "two"}
== "one^/two^/"

This would mean that FENCE! alone wouldn't represent a MAP!, but maybe you make mappings with "double-fences" ?

>> {{"one" "two"}}
== map!##{{"one" "two"}}

We've talked about how some serializations would turn in-memory representations into source code, and maybe that could be true of this as well.

>> var: {
    ! Line one
    ! Line two
}
== text!{
    ! Line one
    ! Line two
}

>> serialize var
== {
    ! Line one
    ! Line two
}

It knows that the data produced by evaluation is no longer a FENCE! but a TEXT!, however you can use operations that go back to source...just like with OBJECT!s.

This has the slight advantage of being Redbol compatible, although it doesn't address single-line braced strings.

Weird, yes, but it's good to just make sure all the options are examined.

All this is showing just how valuable and rare the ASCII-range asymmetric delimiters really are. :ring:

That's not technically contentious when using -- unattached to a quote, but I feel like I'd rather reserve -- for a dumping abstraction. It kind of jumps off the page to me, and it's nice to be able to glance at code and know when you've left debugging in it.

If we introduce multiple delimiter kinds, we start getting problems on the representation...

What happens when someone appends a --"..."-- string to a ++"..."++ string? How should the appended result be molded?

Beyond that, looks-wise I really do prefer <{...}> to --"..."--. It has the advantage that we're not invoking any WORD!-characters, so it really stays in the domain of "delimiters".

Rights: <{
    Copyright 2012 REBOL Technologies
    Copyright 2017-2021 Ren-C Open Source Contributors
}>

And as I've said, it might have a unicode parallel for those who want to embrace the higher codepoints:

Title: ⦓Your "Title" Here⦔
Rights: ⦓
    Copyright 2012 REBOL Technologies
    Copyright 2017-2021 Ren-C Open Source Contributors
⦔

We see some pretty big impacts to losing braced strings in prominent places... module headers and the API jump off the page immediately.

So far I feel like <{...}> is my leading choice:

  • I think people can see <{ and }> as compound asymmetric symbols in their own right, carving out a space that lets it not be seen as a nesting of <...> and {...}.

    • If we introduce them as symbols and suggest UTF-8 alternatives, then that narrative can be solidified.
  • As mentioned, it's not borrowing from any legal WORD!-characters. It's only involving delimiters.

  • It gets a boost over {...} as a string representation by allowing the use of unpaired { and } characters inside of it.

  • It doesn't introduce any mechanics we aren't dealing with already.

The raw string idea, along with no escaping by default, seems good. I like the doubled quotes being an option.

<{I don't escape \n, that's a literal backslash}>

\<{I escape \n with backslash}>\  ; idea 1

<\{I escape \n with backslash}\>  ; idea 2

The second representation looks a little cleaner, but maybe too much like a TAG!. Also doesn't line up with what you can do with "..." so maybe it's better to have the backslashes outside.