The Language World's Weirdest COMMA! Mechanic

All right, I went and wrote it:

Though sounding a bit like a broken record at this point, please add tests as you can think of them.

%comma.test.reb

Due to having to juggle about a million design decisions at once that have actual deep difficult ramifications, I kind of have to draw the line after I feel I've gotten past any obvious disproofs of an idea...so the tests only go so far in the initial commit. Unfortunately that's about as far as they tend to stay until bugs are found.

>> all [1 + 2, 3 + 4]
== 7

>> all [1 +, 2 3 + 4]
** Script Error: + is missing its value2 argument

>> all [(1 +) 2 3 + 4]  ; error parity
** Script Error: + is missing its value2 argument

I also added the feature to UPARSE and made a patch of the feature into PARSE3 (it is difficult with how parse3 is written to do a "good" job of this):

>> parse "aaabbb" [some "a", some "b"]
== "aaabbb"

>> parse "aaabbb" [some, "a" some "b"]
** Script Error: expression barrier hit while fulfilling argument

Beyond the expression barrier feature--which I have always believed to be important--having comma available in dialects is powerful.

Note that I left in support for 1,1 being a synonym for 1.0 based on the idea of space significance. This means you'll have less ability to copy/paste data from other contexts directly, like (1,2,3) ... but it's not clear how much gain there is from allowing that when so many other things won't work (especially in plan -4, where foo(a, b c) isn't loadable either).

At least for the moment, this breaks the idea of supporting commas in URL!s directly. While it might be possible to say that http://a,b is legal but http://a, is not, how such things are scanned needs to be redesigned so there's not so much sporadic code all over the place. I'm open to the idea...just not assuming it as a foregone conclusion.

I Actually Like It

Many years ago when expression barriers were first being cooked up, I had been mostly swayed by the "commas and periods are too hard to tell apart" idea...to say that the only purpose they should have in the language would be as synonyms.

If this idea is taken to extremes, we would also say that the number 1 and lowercase L and the uppercase i and the vertical bar all have to be the same, due to I1l|. being too hard to differentiate. (Though fonts and syntax highlighting choices can go a long way in that.)

Today my thinking is that you don't necessarily control this kind of thing from the language level. You give people choices, and they write their code as it feels good to them. If they've got a certain mix of data and don't like comma with it, they should use another arrangement. Put things in groups, or on newlines, or whatever.

I'll point out you have the option to put an entire expression in a group, or just the last thing and keep the comma. It's up to you. So taking the maybe-ugly combo of blank-tailed TUPLE! and comma, you could go with your taste:

1 + a., 2 + 3
1 + (a.), 2 + 3
(1 + a.) 2 + 3
(1 + a.) (2 + 3)
[
    1 + a.
    2 + 3
]

It's something people can ignore entirely if they want. Don't use it if you don't like it!

2 Likes

Wow, I'm in love with COMMA! in parse expressions. :heartpulse:

1 Like

I had a similar thoughts way back:

In most areas, the comma has been a reserved character in REBOL - set aside for some mysterious potential future use. My suggestion for how to utilise the comma is based on usage in other languages - some similar in appearance to REBOL (like CSS), some not (Ruby, JavaScript): Introduce a comma! datatype. It'd be an any-word! value and could be most useful in dialect creation:

1, word, here => [integer! comma! word! comma! word!]

In Ruby, commas are used to separate values in Arrays and pairs in Hashes, but are also used to bind both types where the bounding braces are omitted enabling Ruby 'dialects'

read :url => 'some url', :method => 'get'

In REBOL dialects, the implied block could be a similarly employed technique to aid readability.

pattern red, 0.0.255, green 1x2, 5, spaced

In CSS, commas are used to bind and separate:

font: bold italic 1.2em/1.4 'Georgia','Times New Roman',serif;
background: rgba(255,0,0,0.4) url('top image') repeat-x center top, url('bottom image') repeat-x center bottom;

Such freewheeling, inconsistent use of commas can still be intuitive, but not with any hard rules about what their syntactic role is, rather as a value in their own right with their own semantic presence.

pair 1.5 3, 7 pi, 1.7 infinity

I had originally wanted to copy the 'implied block' model of CSS and Ruby, but am now more or less on the same page as your proposal here.

1 Like

This may be a stretch, but I wonder if there's an argument here for copying a MarkDown convention for an alternative way to express URLs: allowing <> as delimiters. It would only apply to URLs that contain :// so as to distinguish from tags with namespaces.

One thought I had was that the COMMA! might represent a pause instead of a barrier when it came to enfix. This could make it a precedence manipulator:

>> add 1 2 * 3
== 7

>> add 1 2, * 3
== 9  ; as if you'd written `(add 1 2) * 3`

The current thought is it acts more like you've written (add 1 2) (* 3).

We probably want such an operator, but I don't think comma is it. The barrier behavior seems more reasonable and uniform, and lets you throw commas into lists of generated expressions and know that you're keeping them separate. (well, modulo hard quoting)

2 Likes

Above I mention the idea of "error parity" between comma and reaching the end of a group:

Today we get this equivalence due to a number of complex flags and conditions, in particular BARRIER_HIT:

//=//// FEED_FLAG_BARRIER_HIT /////////////////////////////////////////////=//
//
// Evaluation of arguments can wind up seeing a comma and "consuming" it.
// But the evaluation will advance the frame.  So if a function has more than
// one argument it has to remember that one of its arguments saw a "barrier",
// otherwise it would receive an end signal on an earlier argument yet then
// get a later argument fulfilled.
//
#define FEED_FLAG_BARRIER_HIT \
    FLAG_LEFT_BIT(3)

So what's going on is:

  • Explicit commas (as well as specially marked functions) put the evaluator into a BARRIER_HIT state.

  • While in the barrier state, subsequent evaluations trying to gather arguments for the same function are rigged to act the same as end-of-block.

    • If a function's arguments don't tolerate <end> then this will give an error

    • If it is tolerant of <end> then it will be able to call the function.

  • Starting the next fresh evaluation step clears the BARRIER_HIT state.

An Early Weakness Noted: Literal Arguments

I quickly spotted the case of taking the next argument literally (remember that the x is just x):

the, 1 + 2
; vs
(the) 1 + 2

I wasn't sure of the value of protecting people from themselves here, vs. making COMMA! the one datatype you could not pass as quoted. If it were to be prohibited, we could ask where the right place for the prohibition is:

  1. In the evaluator guts, by simulating an <end> signal for this case

  2. Inside of THE, testing for if comma? arg [...] in the implementation

  3. In the type checking, via the: native [arg [(non-comma? element?)]]

Today we could choose (1), which is what happens for evaluative parameters. However we don't do this for quoted parameters...so today it is allowed:

; today's behavior
>> the,
== ,

Or it could be an error unless you say THE/COMMA, which would force use of (2).

But... Would Simpler Be Better?

The existing designs predate NIHIL (an isotopic empty pack). One major aspect of nihil is that almost no functions will accept it as a parameter.

So we might ask how different COMMA! needs to be from reaching a nihil state, e.g. a COMMENT? Would this be sensible as being equivalent mechanics:

>> all [1 +, "hi" 2 3 + 4]
** Script Error: + is missing its value2 argument

>> all [1 + comment "hi" 2 3 + 4]
** Script Error: + is missing its value2 argument

e.g. How bad would it be if the BARRIER_HIT mechanics went away, and we simply leveraged the idea that commas evaluated to nihil...and most functions refuse nihil arguments?

Downside of Simplicity: Behavior in PACK!

I decided to test this idea of making COMMA! evaluate NIHIL just like a comment would. But found trouble because I had code that was doing something like this:

[a b]: pack [1 + 2, 10 + 20]

It didn't work, because PACK was built on top of REDUCE-EACH with a ^META variable. REDUCE-EACH performed three evaluation steps on the right hand side... the second evaluated the comma and got back a nihil (empty pack):

>> meta pack [1 + 2, 10 + 20]
== ~['3 ~[]~ '30]~

The concept here is that if you use REDUCE-EACH with a ^META variable, you have to handle everything--that includes packs and errors. This needs to be legal in order to do things like multi-returns with unstable isotopes (this is integral to UPARSE, for instance).

So this means we definitely want this behavior:

>> meta pack [1 + 2 comment "hi" 10 + 20]
== ~['3 ~[]~ '30]~

This suggests either that you can't use commas in PACK, -or- that PACK needs to be proactive about skipping over the commas at source level. So long as PACK is based on REDUCE-EACH, then that suggests REDUCE-EACH needs to be able to skip commas...because you wouldn't be able to distinguish the cases based on the NIHIL evaluation product alone.

Something to notice about that idea is that if it's literally looking for commas, that means you can't make your own comma-like construct that acts like a barrier.

Another Wrinkle: SET-BLOCK! with Comma on Right

If you were to write something like this, it wouldn't give an error:

>> [/a /b]: ,
== ~null~  ; isotope

>> a
== ~null~  ; isotope

>> b
== ~null~  ; isotope

This is because the slashes indicate the results are optional, e.g. a shorter pack is accepted. If COMMA!'s stopping power in the main evaluator comes only from the idea that it evaluates to an empty pack, it won't complain at the lack of a meaningful expression to its right.

Things like META which can accept empty packs would also not trigger an error:

>> meta,
== ~[]~

These don't offhand seem that bad, and maybe could even be seen as good if you look at it from a certain point of view. But it does show that the "stopping power" of commas isn't bullteproof.

What About THEN/ELSE/etc. ?

THEN and ELSE are enfix and treat their left hand side evaluatively:

(1 + 2, then [print "what happens here?"])

This would wind up acting the same as:

(1 + 2 comment "hi" then [print "what happens here?"])

It needs to be an error...and it currently is. The error arises because THEN and ELSE refuse to operate on nihil. But at the moment this is a distinct case from not having a left hand argument at all.

(then [print "what happens here?"])

Today, there are mechanics that make the left hand side look like an <end> condition...which falls under the complexity of BARRIER_HIT.

Alternative: Evaluator Skips Over COMMA! When Possible

This would mean if you wrote something like:

>> do/next [, , , 1 + 2, , 10 + 20] 'pos
== 3

>> pos
== [, , 10 + 20]  ; or possibly just [10 + 20] if it skipped trailing commas

I think this is how things worked long ago before the BARRIER_HIT flag was introduced. The concept was that a literal barrier (was |, now ,) would be greedily consumed in interstitial evaluations, but raise errors otherwise.

This way, a COMMA! could just stay un-consumed by the evaluator. Function calls gathering their argument would look ahead and say "hey is there either an end of block or a COMMA!" and if so, not run an evaluation and report an <end> condition. This could be reported for arbitrarily many arguments...and so long as they were endable you would receive that condition. In other words: the BARRIER_HIT flag was conveyed merely by a lagging comma that would stick around.

This feels very regressive, because every stepwise evaluating function inherits this complexity. The nice consequence of saying that COMMA! just evaluates to NIHIL is that it triggers the same handling you would use for COMMENT-like functions.

On Balance, I Think the BARRIER_HIT Flag Has To Die

I'm bullish on COMMA! as a great addition to the language. But the various hoops that are jumped through to try and make it mimic the end of a block seem like a bridge too far.

To me, having commas vaporize is neat tech... and the idea is that if you handle things like COMMENT and ELIDE you get the comma handling for free. This seems quite elegant to me.

Maybe functions like REDUCE-EACH need a refinement that lets you detect commas differently:

>> reduce-each x [1 + 2, (elide print "Hi!") 10 + 20] [probe x]
3
Hi!
30
== 30

>> reduce-each ^x [1 + 2, (elide print "Hi!") 10 + 20] [probe x]
'3
Hi!  ; skipped over comma, by default
~[]~
'30
== '30

>> reduce-each/comma ^x [1 + 2, (elide print "Hi!") 10 + 20] [probe x]
'3
~,~  ; isotope
Hi!
~[]~
'30
== '30

Here I fancifully suggest giving back an isotopic comma to draw attention to it. Since all other values come back meta'd, this doesn't conflate with any "real" evaluative result. e.g. [~,~, ~,~] could distinguish "real isotopic commas" from source-level commas that REDUCE-EACH is offering to tell you about.

That requires commas to be stable isotopes. But one could also not worry about the conflation, and reduce to an unstable isotope:

>> ,
== ~,~  ; isotope

Then instead of people directly testing for NIHIL? most people could test for NOTHING? or VAPOR? which meant either nihil or isotopic comma. :-/

Some crazy ideas here, but I think on balance that trying to make commas "bulletproof" leads to more confusion and problems than just making them do whatever comments do, with meta-REDUCE-EACH skipping them by default.

This Means Killing Off User-Defined Expression Barriers

Paring it down such that NIHIL results are "mean" enough to stop evaluations for all practical purposes, the special flag for letting functions cause the BARRIER_HIT flag can be dropped too. This was made available via the TWEAK function:

//=//// DETAILS_FLAG_IS_BARRIER ///////////////////////////////////////////=//
//
// Special action property set with TWEAK.  Used by |
//
// The "expression barrier" was once a built-in type (BAR!) in order to get
// a property not possible to achieve with functions...that it would error
// if it was used during FULFILL_ARG and would be transparent in evaluation.
//
// Transparency was eventually generalized as "invisibility".  But attempts
// to intuit the barrier-ness from another property (e.g. "enfix but no args")
// were confusing.  It seems an orthogonal feature in its own right, so it
// was added to the TWEAK list pending a notation in function specs.
//
#define DETAILS_FLAG_IS_BARRIER \
    SERIES_FLAG_25

I don't feel any particular qualm about losing this feature, because I've never really used it.

And under the new concept, you get "barriery-enough" just by evaluating to NIHIL. You're no better or worse than COMMA!

(Actually that's not quite true, because currently COMMA! doesn't do lookahead, so it can't serve as the left hand side of an enfix function that doesn't quote the left hand side. But WORD! does do lookahead. If we wanted to go for full parity, we'd allow the lookahead for evaluative commas...but making comma worse just to give it parity with a WORD! doesn't seem too smart. If we decided that COMMA! was actually a WORD! that was a synonym for NIHIL and just rendered funny, then it might make sense... but that would wreck other things--like the PACK exception for commas.)

1 Like

So the Bad news is, I don't think this holds water...

If you have functions that can take a varying number of arguments, and (foo 10) means something different and legitimately distinct from (foo)...

... then you probably want (foo, foo) to act equivalently to ((foo) (foo)), and not give an error.

And if you try to accomplish that by saying COMMA! vaporizes as NIHIL, but then establish that anything evaluating to NIHIL acts as an expression barrier, it's bad. Worse now in a world where PRINT returns NIHIL...

Why worse? Because it means foo print "Hi" will act like (foo) print "Hi". That's not intuitive, if you think of foo as a function that generally takes arguments. PRINT doesn't seem like something that has this magic "stopping power" of things on its left (or right) from consuming arguments.

So do I have to bring back those clunky and non-configurable internal BARRIER_HIT mechanics? NO...because....

The Good News is, A New Isotope Can Solve It! :atom_symbol:

We essentially just need a way for constructs to generate an isotope which disappears in interstitals like an empty pack (nihil)... but represents the additional nuance of expression-barrier-ness with simulating <end> and curtailing lookahead.

COMMA! Isotopes Are Up For The Job!

If we let commas evaluate to comma isotopes and give them this behavior, then anyone who wants to mimic it can simply have their function return an isotopic comma...and they'll act like a barrier too. So let's call a comma isotope a BARRIER.

It may sound like a headache to check for another type. But having a test which checks for either NIHIL? or BARRIER? is easy enough to make for if you don't care about the difference. I chose to call that ELISION? because trying to find another word that means nothing (e.g. well, NOTHING?) would throw in a fair bit of confusion with whether that subsumed VOIDs or NONEs.

It Gets Rid Of This Weird Workaround

I mentioned how PACK was built on REDUCE-EACH and was having trouble putting nihils in packs with commas. That problem goes away now, so this is no longer necessary:

; REDUCE-EACH is the basis of functions like PACK, and hence by default it
; wants to skip over commas.  However, there's an option which will give you
; the isotopic word ~comma~ when it gets a barrier.  This will be conflated
; with evaluations that produce ~comma~ unless ^META variable used.
[
    (['3 '30 ~comma~] = collect [
        reduce-each x [1 + 2, 10 + 20 ~comma~] [keep ^x]
    ])
    (['3 ~comma~ '30 ~comma~] = collect [
        reduce-each/commas x [1 + 2, 10 + 20 ~comma~] [keep ^x]
    ])

    ([''3 ''30 '~comma~] = collect [
        reduce-each ^x [1 + 2, 10 + 20 ~comma~] [keep ^x]
    ])
    ([''3 ~comma~ ''30 '~comma~] = collect [
        reduce-each/commas ^x [1 + 2, 10 + 20 ~comma~] [keep ^x]
    ])
]

But Is This Too Many Moving Parts? :face_with_spiral_eyes:

I'm pretty sure it isn't.

The reason I say this is simply because flags like FEED_FLAG_BARRIER_HIT were instituted a long, long time ago. They were products of necessity, because I've always felt that Rebol without some kind of expression barrier concept was downright unreadable.

But my tries to implement this have always represented a thorn in terms of hidden state. I had to imagine strange places to represent the barrier-ness, like "what if that's what being enfix but taking no arguments meant".

This brings it all out in the open. Some routines will hide it, and others won't. It's already showing much more promise than what it was replacing.

1 Like