JSON Envy: Serialization Dialect in Rebol?

hostilefork · October 23, 2021, 6:06am

Carl's thinking about a successor to Rebol's notation called ASON included the idea that braces would be used for object construction--not strings.

I've panned the notion of "lexical objects", and explained why in a thread outlining an alternative of treating braces as a new array type called FENCE!, which I have come to think is likely well worth it.

But assuming we don't have FENCE!, what might Rebol's version of JSON look like? This was a tangent explored in October 2021 on the fence thread that I've broken into its own topic.

Consider this example from json.org

{"glossary": {
    "title": "example glossary",
    "GlossDiv": {
        "title": "S",
        "GlossList": {
            "GlossEntry": {
                "ID": "SGML",
                "SortAs": "SGML",
                "GlossTerm": "Standard Generalized Markup Language",
                "Acronym": "SGML",
                "Abbrev": "ISO 8879:1986",
                "GlossDef": {
                    "para": "A meta-markup language...",
                    "GlossSeeAlso": ["GML", "XML"]
                },
               "GlossSee": "markup"
            }
        }
    }
}}

I think allowing spaces in the keys is a weakness and not a strength. And Rebol doesn't have the historical problem of disallowing language "keywords" in the keys, so the quotes wouldn't be necessary (they're not in modern JavaScript either for that reason, but they are if your word has dashes or spaces in it).

Let's drop the quotes and turn all the braces into brackets. Commas can be optional now, but let's say we don't care to use them when things are on different lines.

[glossary: [
    title: "example glossary"
    GlossDiv: [
        title: "S"
        GlossList: [
            GlossEntry: [
                ID: "SGML"
                SortAs: "SGML"
                GlossTerm: "Standard Generalized Markup Language",
                Acronym: "SGML"
                Abbrev: "ISO 8879:1986"
                GlossDef: [
                    para: "A meta-markup language..."
                    GlossSeeAlso: ["GML", "XML"]  ; can use commas if we want?
                ]
                GlossSee: "markup"
            ]
        ]
    ]
]]

In practice, the serializer/deserializer could say that any block starting with a SET-WORD! is presumed to be an object...and if you have an array that you don't want to get this treatment you use '[...]

>> deserialize "[[a: 10 b: 20] '[c: 10 <random stuff> d: 20]]"
== [
    #object![a: 10 b: 20]  ; whatever notation
    [c: 10 <random stuff> d: 20]
]

A simple rule about SET-WORD!s could give us the same object vs. array distinction in what's being transferred. It would keep us centered on one nice bracket form to be hitting... allow an escape route for arbitrary BLOCK!s that want SET-WORD!s via quote...and we keep our nice braced strings without the need for nasty escapes.

What About Evaluative Things In Lists?

The lightness of unadorned WORD! has a big draw in Rebol, and has caused a lot of headaches...for instance in deciding if we should say Type: 'module or Type: module.

While there's no rule that says module headers have to obey the same rules as whatever this operation is, it feels unwise to have them deviate. So this implies Type: 'module or moving to something inert like Type: #module or Type: "module"

The problem expands to lists, where you have category: [fun script] needing to mitigate or avoid evaluation one way or another:

category: '[fun script]

category: ['fun 'script]

category: ["fun" "script"]

category: [#fun #script]

While the first option of putting a quote on the list seems like the cleanest, things trying to generate serializations wouldn't have that choice if it was mixing words and objects:

mix: ['word-one [a: 10 b: 20] 'word-two]

If the outer block were quoted, then it wouldn't dig in to make the inner block into an object.

These kinds of odd mixtures of evaluation with objects points out a not-uncommon Rebol problem... if you're doing a deserialization and descending into a block that isn't quoted, you might be seeing BLOCK!s that are arguments to functions along with blocks that are meant to act as objects. Which wins?

Code like PRINT goes step by step, as opposed to gathering all the strings ahead of time and assuming it is for itself:

>> print ["hello" reverse "dlrow"]
hello world

The deserialization operator could work the same way, though it could effectively COMPOSE the BLOCK!s representing objects in so that any functions would be passed the object. :-/

Certainly raises some questions, but, they are fairly common Rebol questions.

...or...REDUCE requests could be explicit?

We don't have to make "recurses to look for blocks to instantiate objects" also imply that the arrays themselves are evaluative, but I think it would be confusing. JavaScript programmers expect the arrays inside objects to be evaluated:

 let x = {label: "object", data: {label: "array", data: [1 + 2, 10 + 20]}}

This gives you a nested structure with data: [3, 30]. So I feel like this operation should follow suit, reducing blocks unless you suppress that.

So if your input is:

[label: "object", data: [label: "array", data: [1 + 2, 10 + 20]]]

I think data should be [3 30], and if your input is:

[label: "object", data: [label: "array", data: '[1 + 2, 10 + 20]]]

Then data should be [1 + 2, 10 + 20]. The mixing and matching really is where you run into trouble, of an array that contains some blocks and some objects...though everything can be represented thanks to generic quoting, it could get messy.

What If We Used GROUP!s in Serialization?

>> stuff: reduce [make object! [a: 10 b: 10 + 10]]
== [object!#[a: 10 b: 20]]  ; or whatever internal representation

>> serialize stuff 
== [(a: 10 b: 20)]

>> deserialize stuff
== [object![a: 10 b: 20]]

JSON doesn't use parentheses in this way because it can't. Parentheses are not reified.

But if a Rebol system wants to exchange information with another Rebol system "in the style of JSON", GROUP! could be used to represent the key/value objects and BLOCK! could represent plain array/lists...with braces being a handy mode of string representation that can get away with less escaping.

And with modern COMPOSE you could deserialize and compose, by labeling the compose sites:

>> deserialize compose <*> [
    (a: 10 b: 20)  ; deserialize treats as object
    (<*> reduce [1 + 2 3 + 4])  ; array before deserialize
 ]

Whatever you call it, having something like COMPOSE which treats nested levels of GROUP!s as object creation requests isn't that crazy an idea. (Perhaps OBJECTIFY ?)

hostilefork · January 31, 2024, 12:29am

When I wrote this "how would you do JSON in Rebol", it was a kind of devil's advocacy...pointing out how to go about it with just blocks and quoted blocks... and, uh, maybe groups?

Looking back at it a couple of years later, it reads like an advertisement for FENCE!. Those ideas suck, and Rebol shouldn't fail this hard in comparison with the notation it inspired.

We need to introduce the {...} array type. Under evaluation it would create an object, but in dialects (such as a JSON analogue like the above) it would often be used for key/value representations. However, as a fully general array type, it's up to you what to use {...} for in your own dialects.

This gives JSON's inspiration a chance to actually be better than JSON.

{glossary: {
    title: "example glossary"
    GlossDiv: {
        title: "S"
        GlossList: {
            GlossEntry: {
                ID: "SGML"
                SortAs: "SGML"
                GlossTerm: "Standard Generalized Markup Language"
                Acronym: "SGML"
                Abbrev: "ISO 8879:1986"
                GlossDef: {
                    para: "A meta-markup language..."
                    GlossSeeAlso: ["GML" "XML"]
                }
                GlossSee: "markup"
            }
        }
    }
}}