No-Interstitial-Delimiter Sequence Type (The "SCRUNCH!" Proposal)

hostilefork · February 10, 2021, 11:33pm

@BlackATTR feels that being able to write code as operation(arg) with no space is critical to the SQL dialect. While we haven't expressly prohibited that yet, it is lossy:

>> [operation(arg)]
== [operation (arg)]

I just noticed another case of tight syntax with a %rebol-dom.r script where @Danny is trying to write expressions like:

app[.width] = "curly quew"

Also still legal at the moment, but also still broken apart...and being two separate elements means it couldn't be quoted as a unit on the left (e.g. by an assignment-oriented equal):

>> app[.width] = "curly quew"
== [app [.width] = "curly quew"]

One of the reasons we'd discussed making these kinds of constructs illegal would be to free them for new meanings. Given that people want more density, might we think of this as a new datatype... like a PATH! or a TUPLE!, that simply has no delimiter?

Let's Imagine we call it a SCRUNCH!

(Once this was proposed as PACK!, but now packs are something different... so let's just call this the "scrunch proposal")

>> p: 'operation(arg)
== operation(arg)

>> type of p
== #[datatype! scrunch!]  ; or whatever, this syntax has tons of thinking needed

>> first p
== operation  ; a WORD!

>> second p
== (arg)  ; a GROUP! with one element

Not every type would be able to participate in a scrunch. As with PATH! and TUPLE!, only the scrunch as a whole could have a decoration (SET-SCRUNCH!, GET-SCRUNCH!, etc.).

Also, while # is a legal "empty issue", it's also a modifier. So the #[a] probably shouldn't be a SCRUNCH!. Unless there's some way that would fit into a broader master plan, by virtue of what that scrunch would mean... e.g. if # = first scrunch would mean something was typically a datatype...

Let's think about other things first.

Plan -4 Issues

There's a bit of a problem in that we've been saying you could write (a)(b) and [a][b] and have it mean the same thing as (a) (b) and [a] [b]. I use this frequently, because I do not like the spacing gap you get when braces follow each other.

f: func [
    some-arg [integer!]
] [
    the gap there bothers me
]

f: func [
    some-arg [integer!]
][
    without the gap looks better
]

That would suggest the rules for scrunching would disallow adjacent GROUP!s and BLOCK!s, which would rule out interesting scrunches that might represent multidimensional array access:

>> [foo[x][y]]
== [foo[x] [y]]  ; would have to be a scrunch and a block?

Ladislav was always rather adamant that spaces should be everywhere because spaces had signifiance. So he wasn't in favor of the aesthetic gap-closing. If SCRUNCH! turned out to be truly useful, then it might vindicate his belief. I dunno.

What Would The Evaluator Behavior Be?

So this would give a whole range of things that today have no meaning in the evaluator. If we were to accept choosing out of arrays by index with array[n], how bad would that be...when BLOCK!s have other meanings? It would still be a BLOCK... just one that happens to be in a SCRUNCH!, like array/[n] is inside a PATH! or array.[n] is inside a TUPLE!.

Having foo(bar) be a function call might appeal to certain users, if that was a stylistic choice they could make. Maybe it could even use COMMA! to delimit arguments.

How Would It Reconcile Priority With TUPLE! and PATH! ?

The rule for TUPLE! is that the dots bind more tightly than the slashes. So a.b/c.d is a 2-element PATH! with a 2-element TUPLE!s on the left and right... not a 3-element tuple with a path in the middle.

What would you get with a.b/c.d[e f] ? How about a.b[c d]e/f ?

My general instinct would be that the SCRUNCH! would be the outermost.

a.b/c.d[e f] => to scrunch! [a.b/c.d [e f]]  ; 2-element scrunch
a.b[c d]e/f => to scrunch! [a.b [c d] e/f]  ; 3-element scrunch

But that's not based on any particular evidence of utility. I'm just saying that there has to be a rule...and that's the one that seemed the most natural to me on first glance.

There'd Be Some Issues With Numbers

If numbers are allowed, you are going to run into trouble with things like how 1e... leads into exponential notation, and the fact that you get problematic reversals with 1a and a1. The first seems like a candidate for being a SCRUNCH!, while the second is what we'd see as a normal WORD!

BLANK!s terminating Paths or Tuples Would Be Impossible

You couldn't merge a/ and [b] to get a SCRUNCH!, because a/[b] is a PATH!. This may be a feature and not a flaw...no clue.

Anyway, a lot to think about. But if people are dug in and insistent that they have to have these notations, then we should give consideration to the idea.

rgchris · February 10, 2021, 11:56pm

I'd be more inclined to recognizing thing( ... ) as a distinct container type recognizing the pattern across other data formats/languages. It could be inert in and of itself thus really only useful in dialects. Forgoing Plan -4 by packing certain combinations of types may tie hands with new forms in the future.

BlackATTR · February 11, 2021, 12:13am

I appreciate the consideration here. I also have no idea if a scrunch! might be useful outside of a specific kind of dialect. Which I suppose isn't a very strong vote in favor.

It would be a slight boost to supporting an SQL syntax (or spreadsheet macro). My main challenge is not parsing the pattern (be it SCRUNCH! or just word! + group!), but recursively unraveling the nested construct to translate into ren-c text! manipulating expressions. If a SCRUNCH! cannot be nested, then it might not be worth the complexity of adding as a datatype.

hostilefork · February 11, 2021, 12:21am

Not sure what you mean. You mean would it be legal to say x(y(z)) ? The presumption here would be that it would be... just as you can say x/(y/(z)) today.

The key issue is that Ren-C has made PATH! match TUPLE! in being immutable. This means if you want to do surgery on such a structure, you'd have to turn in into blocks or groups...and then transform it back into the sequence structure. There's lots to still figure out about that.

BlackATTR · February 11, 2021, 12:24am

Ah, sorry, I misinterpreted then.
I was thinking the limitations might preclude one from doing that.

I have a low-key need to be able to express scrunch!-like structures as though they're blocks:
foo(bar(baz)))
foo(bar, baz("bip"))
foo(bar, baz("bim", "bap"))

rgchris · February 11, 2021, 12:33am

My CSS brain does appreciate the possibilities here—it could be a way to express small dialected values:

red: rgb(255 0 0)
move: translate(-1x1)

Or indeed SQL, or Javascript, or other things.

My Rebol brain says, this is un-Rebol-like! I certainly don't think it should become an operative Rebol language construct.

BlackATTR · February 11, 2021, 1:14am

I think the Hypertalk (from Hypercard) family of languages, e.g., Applescript also use this function form.

hostilefork · February 11, 2021, 3:02pm

I do not see any particular reason why thing(...) would be more or less legitimate to want than thing[...]

And I think that if people had syntax open to thing(...)(...) and thing[...][...] they would find uses for those too.

I can't think of a good API for dealing with this besides essentially the same thing as PATH! and TUPLE!: where since you have constraints on what you can form this way, and they "change types" when things are removed, they are immutable while in the scrunched form. Because if you could mutate them you'd have:

>> p: 'a(b)
== a(b)

>> take p
== (b)  ; looks like a GROUP!

>> take p
==  ; back to those same flaky vanishing issues

It seems to me in line with Rebolism to say these are all different:

(a)(b c)(c)
(a) (b c) (d)

a+bc+d
a + bc + d

a(b c)d
a (b c) d

a[b c](d)
a [b c] (d)

I know there's a certain comfort zone about "the way things are", but I don't like a/(n) very much for picking the Nth thing out of a when compared with a[n]. When you think "hm, but it actually is a WORD! and a BLOCK!, still, just inside an invisible aggregator" then you can see it through the lens of not being "un-Rebol-like" but more of the same pattern that has been building all along.

Being able to write 100px and then say parse scrunch [value: integer!, 'px] seems like a gateway to some fairly powerful things expression-wise, and I think this leads pretty quickly to wanting to do things like compose '(num)px (you wouldn't need the quote if it were evaluatively inert, which maybe you're right and these things should be). But the point is that if we can lean on a general mechanic--especially one that's mostly already written--this might be a breakthrough worth a bit of shakeup... like having to put spaces between your blocks ordinarily.

rgchris · February 11, 2021, 6:34pm

I'd say primarily based on its history in functional notation.

IngoHohmann · February 11, 2021, 10:53pm

I'm not much in favour of this proposal, but you got me with 10px or 11USD or 5.05EUR .
This could be really nice in a lot of dialects.

Brett · February 13, 2021, 7:52am

Excuse my random comment here. But I've often wondered whether support (beyond parse) for user defined foreign syntaxes would be feasible, useful and not horribly painful. The idea coming from the number of syntaxes I've attempted to deconstruct using parse, manipulate, evaluate and sometimes re-form over my time with rebol. It's using rebol as the rosetta stone of playing with data/code stored in foreign syntax - not necessarily always validly (as in a parse tree), but perhaps usefully.

Perhaps I misunderstood and that's what you're saying with scrunch!.

The follow on from that being whether custom evaluators are a possibility, for the standard rebol types in a dialect, say as an attribute of a function (frame or block?) as way of making dialects a bit more first class rather than fringe interest and perhaps leverage more of the lego box parts of evaluation in custom dialect evaluation.

Not thought through, just random musings from someone who is perhaps out-of-date, and certainly not requests for functionality since these days I'm playing with photos rather than code.

BlackATTR · February 13, 2021, 4:35pm

I agree -- this is an interesting area of discussion. My view of Ren-C is that on one (basic) level , it's a convenient personal language for writing shell-scripts. And while I value Ren-C for that purpose, the shell-scripting field is picked fairly clean of computing ideas-- it's not exactly fertile ground for interesting/grassroots language upstarts.

For similar reasons languages like Ren-C don't elicit strong prospects for general-purpose programming. Anywhere you could plug-in a Ren-C or Red, you could insert Python, Ruby, Javascript, Go, Java, etc. and inherit the benefits of those ecosystems. You'd need pretty clear-headed justification to choose a Rebol, probably one where "I find coding in this paradigm to be satisfying" is an outsized factor in the decision.

"A fox knows many things, but a hedgehog knows one big thing."

Ren-C or Red are a bit like the family of soft-body invertebrates in the language kingdom. Their special traits don't necessarily shine in common computing domains, but... On a second level , I think Domain Specific Languages remains an open frontier largely uncracked by the hidebound languages who originally mapped the territory.

(Sidenote: Let's acknowledge that DSLs, dialects, mini-languages, etc. appear to come with detestable drawbacks, e.g., as described in this great XKCD panel, or like the self-limiting "angle of repose" of sand piles.)

XKCD-standards

If you've been following along, @hostilefork has been pursuing a design philosophy to preserve (reform and fix) Ren-C's friendly rebol traits while boosting its flex as a fun language toolkit-- "the minecraft of programming". If the retooled Ren-C parse + language toolkit gel, then custom evaluators might add yet another attractive feature. But first: Can you explain a bit more or provide an example of what you mean by "custom evaluators"? Are these written in Ren-C usermode, in C libs, external calls?

Anyhoo, I think Ren-C due to its strengths could stake a sizeable claim in the open-range of DSL territory . I.e., where more depends on lone developer creativity (also: luck/serendipity) and language flexibility than, say, speed or efficiency. I think the history for Ren-C (e.g., in a WASM environment) may be written by developers who and make a connection between human-friendly expression and the fun, experimental, accessible, yet sophisticated box of parts/language-tools that is Ren-C.

BlackATTR · February 13, 2021, 4:53pm

I have a related datatype! question.

If it were possible to transcode {whatever} and "whatever" such that they could be identified as different symbols (not asking for different properties, just the ability to distinguish between them as different symbols), that would open up another lane of lexical space for some dialects.

In other words, if you could transcode {whatever} and be able to identify it as a (making up this name) LONG-TEXT!, which is a form in the family of TEXT! with the same property as a SERIES.

Brett · February 14, 2021, 1:11am

We have custom evaluators today in the form of functions we write. To process a dialect we must use Parse or specially craft our "dialect" to accommodate Ren-c's evaluation process. Parse is good at recognition, terrible at building structures, it's aimed at step by step evaluation, which is fine but leaves the heavy lifting to the dialect writer to build context as required from scratch for every new dialect.

It would be good if evaluators by user's feel more first class. Operations like DO and REDUCE on a block is so nice. I'd like that niceness to extend to dialects, my intuition is that perhaps by raising a usermode evaluator to plug in to the system operations such as DO and REDUCE can apply to a dialect and produce useful outcomes and sane errors. Maybe there would be an opportunity to "compile" them upon load for performance.

Writing an evaluator is hard as Hostilefork has so nicely documented, and he is striving to create orthogonal parts that can be stuck together by users for powerful expression. I'm wondering how that work can be leveraged even further for dialects.

Coming back to foreign syntax and your datatype question, if one has broken text into tokens with parse or whatever, I see a need to be able to distinguish/annotate them. Objects might be ok if they were performant and were not saddled with the terrible baggage of "make object" they have when saved out, but I suspect a more lightweight system would be better - more equivalent to rebol words - something with spellings and bindings. I guess I'm suggesting supporting multiple syntaxes that map onto ren-c datatypes and user datatypes where the default sytnax is ren-c.

I think back to the C-lexicals of the build process. I feel these other elements could make that process so much better. We don't need to evaluate C as C, but being able to edit it or interpret it as symbols in a linear or tree structures would be very useful.

All very intuitive and light on answers I know...But being able to talk pidgin with other languages might be a really useful trick.

BlackATTR · February 14, 2021, 1:35am

Thanks. I think concrete examples are helpful. Maybe a custom evaluators conversation can continue with @hostilefork in his Whitespace Interpreter forum topic.

hostilefork · February 14, 2021, 4:44am

Hey Brett, good to hear you're out there having fun somewhere. You're probably making a good choice avoiding code!

The fact that people are commenting on this proposal--and even beaming messages from retirement--is indicative of...something. Probably that Rebol notation simply leaves a lot to be desired.

Rebol's PARSE isn't the only game in town for people looking for alternatives to RegEx, and I've talked about how parser combinators in languages like Haskell offer a quite a lot...with tons of infrastructure to assist in real world tasks.

So before trying to extend the syntax of Rebol's medium, I'd side with the idea of shoring up PARSE when you're not trying to use it to mix full-on representations from different languages in the same source file. Until that really demonstrates "bringing the magic" reliably, then getting entangled in syntax extensions for LOAD just adds other layers of complication.

@BlackATTR brings up the Whitespace Dialect and I think that really hitting home-runs on problems like that is a prerequisite to getting too much further out in ambition.

I think SCRUNCH! might be able to be a relatively low-cost way to get a little more satisfaction for people. Little things like that and COMMA! may help get closer to a sweet spot where it feels "flexible enough".

The question of how to make the evaluator "hookable" is definitely on my mind. One big step of this has been trying to make FRAME! a very reusable part...and to unify services across PARSE and plain DO for things like single stepping and debug stacks.

And yes, all of this is hard...as my notes show. It's especially hard when it's all on the backdrop of a weird dependency game where you're writing it all yourself in C89.

Danny · February 17, 2021, 11:13pm

I been giving more thought to the DOM format in Rebol and I think it needs to present a clear, concise demonstration of how it would be used as an object model for DSL's. While improving on or extending choices of formatting DSL data can be easy, feeding it a subset of a complete DSL may not. It takes a lot of thought in creating the DSL's and a subset of say HTML some would rightfully say isn't Good enough. The html standard is huge. Even so i believe it really can be done all in Rebol. instead of html only you would have a dsl choice, say make doc, markdown, or defualt DOM code. So Rebol-DOM.r is very simple. But to make it better i would need others coding experience. You mentioned tests. Wich types of testing would be best for now.

Danny · October 6, 2021, 6:12pm

Old post, but new-old thoughts.

For a name other than "Scrunch" I've decided in relationship to a Dialect Object Model it should simply be an element-node of type Sequence.

Used as a variable set-word, the array-obj!() uses the sequence as a set-builder notation.

Its base form is serialized data thats represented in Rebol as molded data.

Its grouping order is {[()]}, but not necessarily followed in dialects. And in Rebol there is no difference in using "{}" or "[]", unless your sending, receiving, or loading data. It must be serialized.

I chose the "{}" because it allows you to manipulate dialect data in block form while not allowed to be of type! block.

This i believe is some of the differences between a Rebol Series and a Sequence.

Rebol sequence type is its molded data form. But it should have been a native type from the beginning.

The %Rebol-Dom.r set-builder notation, the Var., like Uparse allows me to set, search, parse, save, serialize, and share data maybe slower than BLOCKS, but as it was intended to be used. Any way i like. I wish i had your know how Hostilefork to do this right but i just don't have the time. Oh well, Rebol-Dom. Sorry, bout that, i meant Rebol On.

hostilefork · October 24, 2021, 2:13am

This SCRUNCH! question is a big uncertainty, with many cross-cutting implications. It's paralyzing to have such huge uncertainties. They add up and make progress impossible.

So I Think We Should Probably Axe The Idea

One might say another option would be to try and leave the door open, by forcing people to space out their blocks, instead of letting them be tight as in Plan -4:

foo: func [
    return: [integer!]
    arg [integer!]
][  ; <-- this would be illegal
    return arg + 1
]

That would mean someday there could be things like a SET-SCRUNCH! to denote matrix access and look like other languages:

matrix[row + 1][column + 1]: 10

However...

Differentiating `[...][...]` and `[...] [...]` Sucks

These gaps make things ugly to me that were previously pretty. Barring what new things may be enabled, if I don't like the way the old code has to change... that's a loss.

Generic TUPLE! and Comma Offer Decent Expressivity

Maybe you can't write:

matrix[row + 1][column + 1]: 10

But you can write:

matrix.(row + 1).(column + 1): 10

matrix.[row + 1, column + 1]: 10

With units, 14.px is available. Doesn't cover fractional units but it's there, and if you want fractional I guess you could use paths. 14.5/px

So there are a non-zero number of options for these things, and really the focus should be on spaced out readable English-like dialects.

If anything we should be trying to find ways to make it easier to interpret size: 14 px despite the fact that the thing on the right is an INTEGER! and a WORD! instead of being forced to squish it together. It looks better that way than as size: 14px

String Interpolation May Help Some Cases

It's still under development, but we may be able to have something like {matrix[row][col]} be able to know what matrix, row, and col should be applied to... at least as well as it would had you passed the block [matrix row col] instead.

So if you were up against a wall representationally for a mini-dialect, you might be able to use strings to achieve your goal. We'll see.

And As I Say, Too Many Unknowns

There's just too many unknowns in this. There'll be no project if things aren't cut back. The focus needs to be on clear wins like UPARSE.

hostilefork · April 16, 2022, 11:04pm

A post was merged into an existing topic: Danny, dRebol, Inetw3 (Daniel Murrill)