Expression Barriers Redux: PARSE 2020 Edition

Due to Rebol's run-on-sentence aesthetic, I've always thought it needed some way to punctuate that was both visually lighter than grouping, and that by not using groups wouldn't interfere with COMPOSE.

Some people would be content to just use spaces:

any [even? x  match [block! group!] y  foo baz bar]

On the downsides of that, it doesn't have any "teeth", e.g y might take foo as an argument. It's also lost in any MOLD operation as the specifics of spacing is not preserved.

The vertical bar has been in place for a while, which I liked for its barrier-ness and even looking weight:

any [even? x | match [block! group!] y | foo baz bar]

But PARSE was immovably attached to its usage for alternates. Looking at PARSE lately and trying to make sense of it, I really feel like it needs this just as much as ordinary code.

Period was another option that was looked at:

any [even? x . match [block! group!] y . foo baz bar]

And by now everything has been considered...backslashes:

any [even? x \ match [block! group!] y \ foo baz bar]

Two periods:

any [even? x .. match [block! group!] y .. foo baz bar]

Having BLANK! itself act as a barrier as opposed to something that can be taken directly as an argument to a function:

any [even? x _ match [block! group!] y _ foo baz bar]

Strange ideas like a COMMA! type that renders glued to the thing to its left, even though it's an independent element:

any [even? x, match [block! group!] y, foo baz bar]

In a parallel universe where semicolons weren't taken for comments, perhaps they'd be considered:

 any [even? x; match [block! group!] y; foo baz bar]

...although that looks a bit too much like SET-WORD!s, there. Which is yet another argument for why abc; is now rightfully illegal as a WORD!.

Should We Lower the BAR?

There's a lot of mechanism to support barriers and invisibility, and it's something you can use generically...so the facilities that started out as a distinct "BAR!" datatype have grown into a whole lot of power that you can apply to any function.

But I have to admit I don't feel as attached to the vertical bar for separating expressions as I did some years ago.

  • Can't use it in PARSE because | is pretty much set as rule alternate
  • It seems too heavy for what it's doing.
  • Might be useful for a pipe operator or other ideas in plain dialecting (noticed "Arturo" used it for that)

While I might have rejected a simple dot as "too slight" in the past, I may now feel that the subtlety is favorable. And the COMMA! idea is actually looking pretty good--if you just imagine that being a datatype that behaves exactly as expression barriers do today:

 >> length of [1,]
 == 2

 >> type of second [1,]
 == @[comma]

 >> 1 + 2,
 == 3

 >> 1 +, 2
 ** Script Error: comma encountered while fulfilling argument
 ** Near: [1 +, ~~ 2]

You Could Still Make BAR! or Whatever You Want

The current mechanisms would work, and I think I'm going to make sure they work for . too.

(I'll need to figure out how to let you make any UTF-8 symbol you want act like an expression barrier in PARSE. But let's say I figure out how.)

Is COMMA! worth a shot? Does period look better or worse to your tastes?

Important to bear in mind is that on the table is now the idea that accessing things which you want to say aren't calling functions would be done with a terminal period. This could risk creating things like:

any [even? x., match [block! group!] y., foo baz bar]

Compared to letting lone periods act as barriers:

any [even? x. . match [block! group!] y. . foo baz bar]

Compared to status quo of today's barriers:

any [even? x. | match [block! group!] y. | foo baz bar]

Compared to selective grouping:

any [(even? x.) (match [block! group!] y.) foo baz bar]

(Note that in PARSE, the "grouping" is actually "blocking"...which looks even heavier)

Oddly enough, I don't think the commas do all that badly even in this pathological case. And as I show with the groups, it's not like you have to use the comma if you don't feel the code "looks right".

Anyway--remember that the goal is that you can twist up the language even for the span of a function. So while running that function if you decide you want | to be a barrier but not for the rest of your program, you can do that!

2 Likes

I don't like the dots/periods-- they remind me of regular expressions. Out of the options I prefer the comma and then our standard pipe character.

1 Like

How about comma with space on either side?

any [even? x. , match [block! group!] y. , foo baz bar]

One wacky idea could be possible would be to say that commas never start newlines, but instead use their newline bit to say whether there was a space before them. Then they become up-to-one-space preserving.

So if you wrote [x, y] then that's what you'd get back, and if you wrote [x     , y] then you'd get [x , y]

Which would break pretty much every "do this thing until you hit a newline" tool. So it's probably not a great idea.

But commas have a full cell worth of bits. They could have some formatting data.

I'll point out this is slightly redundant as a protection against functions, because comma alone protects against argument-taking functions.

any [even? x, match [block! group!] y, foo baz bar]

If x is x: does [print "formatting hard drive"], that would still run. But as I tried to express in "From Liability to Asset: WORD! and PATH! always running code" then that may be the rightful common case for the language model.

Sounds like we could use some experience with COMMA! to know the realities of how it feels.

Unfortunately, until how typesets work is reformed...we are out of datatypes and I'm a bit stuck on it. We could recover one if PAIR! and TUPLE! unified...the block there is that I'm not so sure how I feel about the path picking of X and Y work generically as meaning "first item" or "second item" of tuples and paths. It makes me feel like there's something missing like a light alias over fundamental types that can skin them (like an enum class in C++ giving typechecking-unique integers without costing more than an integer per instance). Still thinking to do.

2 Likes

We can have only a very limited number of datatypes?

There's a bit of a relevance of the number "64" in terms of how cells are designed, in terms of fundamental types. However one of these fundamental types is "custom/extension type" which indicates that part of the cell is sacrificed for a pointer to information about the extension types.

More explanation here, or you can review what I talk about in terms of the cell format in the video

The issue is not so much not having the types, but not having a notion of typeset which covers extension types.

What we want to avoid is answers like "represent typesets as a BLOCK! of DATATYPE! values"...because once you have an infinite number of types, how do you represent ANY-TYPE! ?

This was just something completely missing from R3-Alpha, so there's no design for it. The best idea I've had is to shift from thinking of type sets as data and instead see them as type constraint functions.

Anyway, I talk about all that as well:

The Typeset Representation Problem

So the upshot is:

  • It will always be the case that some types are cheaper than others, and the cheap types are limited. This isn't to say the non-cheap types are stupidly expensive, but just have some various aspects that make them cost a bit more than the fundamental types. With things like COMMA! I have the feeling it should probably be one of the cheaper types, so positioning it is complex in terms of bumping the other limited things around.

  • Regardless of that, there's no implementation of generalized typesets or type constraints yet...because no grand design for them has been handed down from the heavens yet. So making the non-cheap type form means having to hack together some kind of solution for type checking when you pass that type to a function, and I'd rather not write those hacks before figuring out what the real answer is.

2 Likes

So it's turning out I'm really digging COMMA!, and it looks and feels more natural than I thought it would.

I've really come to think the non-spacing is what makes comma turn out so appealingly. So most of the time I definitely don't want the space. If it comes right after a period, hmmm

Right now I put it so that spacing is used when they are sequential (e.g. you can build a BLOCK! of three commas, so you need to be able to render them in a way that would LOAD, and I'm saying ,,, is illegal in the scanner)

[, , ,]

Should we go into automatic space mode after a period, and make it illegal in the scanner to have a period and comma sequentially? Hrrrm. Is it really that bad if it doesn't?

any [even? x., match [block! group!] y., foo baz bar]

I feel like preferences would definitely be split on whether that warrants the space. I'm leaning to saying it doesn't, and if it's hard to see use a font that makes them more clearly different. :-/

Just how I'm looking at it right now.

1 Like