COMMA! and GROUP! Error Parity, e.g. (1 +) and 1,

hostilefork · July 24, 2023, 4:47am

I've mentioned the idea of "error parity" between comma and reaching the end of a group:

Today we get this equivalence due to a number of complex flags and conditions, in particular BARRIER_HIT:

//=//// FEED_FLAG_BARRIER_HIT /////////////////////////////////////////////=//
//
// Evaluation of arguments can wind up seeing a comma and "consuming" it.
// But the evaluation will advance the frame.  So if a function has more than
// one argument it has to remember that one of its arguments saw a "barrier",
// otherwise it would receive an end signal on an earlier argument yet then
// get a later argument fulfilled.
//
#define FEED_FLAG_BARRIER_HIT \
    FLAG_LEFT_BIT(3)

So what's going on is:

Explicit commas (as well as specially marked functions) put the evaluator into a BARRIER_HIT state.
While in the barrier state, subsequent evaluations trying to gather arguments for the same function are rigged to act the same as end-of-block.
- If a function's arguments don't tolerate <end> then this will give an error
- If it is tolerant of <end> then it will be able to call the function.
Starting the next fresh evaluation step clears the BARRIER_HIT state.

An Early Weakness Noted: Literal Arguments

I quickly spotted the case of taking the next argument literally (remember that the x is just x):

the, 1 + 2
; vs
(the) 1 + 2

I wasn't sure of the value of protecting people from themselves here, vs. making COMMA! the one datatype you could not pass as quoted. If it were to be prohibited, we could ask where the right place for the prohibition is:

In the evaluator guts, by simulating an <end> signal for this case
Inside of THE, testing for if comma? arg [...] in the implementation
In the type checking, via the: native [arg [(non-comma? element?)]]

Today we could choose (1), which is what happens for evaluative parameters. However we don't do this for quoted parameters...so today it is allowed:

; today's behavior
>> the,
== ,

Or it could be an error unless you say THE/COMMA, which would force use of (2).

But... Would Simpler Be Better?

The existing designs predate NIHIL (an isotopic empty pack). One major aspect of nihil is that almost no functions will accept it as a parameter.

So we might ask how different COMMA! needs to be from reaching a nihil state, e.g. a COMMENT? Would this be sensible as being equivalent mechanics:

>> all [1 +, "hi" 2 3 + 4]
** Script Error: + is missing its value2 argument

>> all [1 + comment "hi" 2 3 + 4]
** Script Error: + is missing its value2 argument

e.g. How bad would it be if the BARRIER_HIT mechanics went away, and we simply leveraged the idea that commas evaluated to nihil...and most functions refuse nihil arguments?

Downside of Simplicity: Behavior in PACK!

I decided to test this idea of making COMMA! evaluate NIHIL just like a comment would. But found trouble because I had code that was doing something like this:

[a b]: pack [1 + 2, 10 + 20]

It didn't work, because PACK was built on top of REDUCE-EACH with a ^META variable. REDUCE-EACH performed three evaluation steps on the right hand side... the second evaluated the comma and got back a nihil (empty pack):

>> meta pack [1 + 2, 10 + 20]
== ~['3 ~[]~ '30]~

The concept here is that if you use REDUCE-EACH with a ^META variable, you have to handle everything--that includes packs and errors. This needs to be legal in order to do things like multi-returns with unstable isotopes (this is integral to UPARSE, for instance).

So this means we definitely want this behavior:

>> meta pack [1 + 2 comment "hi" 10 + 20]
== ~['3 ~[]~ '30]~

This suggests either that you can't use commas in PACK, -or- that PACK needs to be proactive about skipping over the commas at source level. So long as PACK is based on REDUCE-EACH, then that suggests REDUCE-EACH needs to be able to skip commas...because you wouldn't be able to distinguish the cases based on the NIHIL evaluation product alone.

Something to notice about that idea is that if it's literally looking for commas, that means you can't make your own comma-like construct that acts like a barrier.

Another Wrinkle: SET-BLOCK! with Comma on Right

If you were to write something like this, it wouldn't give an error:

>> [/a /b]: ,
== ~null~  ; isotope

>> a
== ~null~  ; isotope

>> b
== ~null~  ; isotope

This is because the slashes indicate the results are optional, e.g. a shorter pack is accepted. If COMMA!'s stopping power in the main evaluator comes only from the idea that it evaluates to an empty pack, it won't complain at the lack of a meaningful expression to its right.

Things like META which can accept empty packs would also not trigger an error:

>> meta,
== ~[]~

These don't offhand seem that bad, and maybe could even be seen as good if you look at it from a certain point of view. But it does show that the "stopping power" of commas isn't bullteproof.

What About THEN/ELSE/etc. ?

THEN and ELSE are enfix and treat their left hand side evaluatively:

(1 + 2, then [print "what happens here?"])

This would wind up acting the same as:

(1 + 2 comment "hi" then [print "what happens here?"])

It needs to be an error...and it currently is. The error arises because THEN and ELSE refuse to operate on nihil. But at the moment this is a distinct case from not having a left hand argument at all.

(then [print "what happens here?"])

Today, there are mechanics that make the left hand side look like an <end> condition...which falls under the complexity of BARRIER_HIT.

Alternative: Evaluator Skips Over COMMA! When Possible

This would mean if you wrote something like:

>> do/next [, , , 1 + 2, , 10 + 20] 'pos
== 3

>> pos
== [, , 10 + 20]  ; or possibly just [10 + 20] if it skipped trailing commas

I think this is how things worked long ago before the BARRIER_HIT flag was introduced. The concept was that a literal barrier (was |, now ,) would be greedily consumed in interstitial evaluations, but raise errors otherwise.

This way, a COMMA! could just stay un-consumed by the evaluator. Function calls gathering their argument would look ahead and say "hey is there either an end of block or a COMMA!" and if so, not run an evaluation and report an <end> condition. This could be reported for arbitrarily many arguments...and so long as they were endable you would receive that condition. In other words: the BARRIER_HIT flag was conveyed merely by a lagging comma that would stick around.

This feels very regressive, because every stepwise evaluating function inherits this complexity. The nice consequence of saying that COMMA! just evaluates to NIHIL is that it triggers the same handling you would use for COMMENT-like functions.

On Balance, I Think the BARRIER_HIT Flag Has To Die

I'm bullish on COMMA! as a great addition to the language. But the various hoops that are jumped through to try and make it mimic the end of a block seem like a bridge too far.

To me, having commas vaporize is neat tech... and the idea is that if you handle things like COMMENT and ELIDE you get the comma handling for free. This seems quite elegant to me.

Maybe functions like REDUCE-EACH need a refinement that lets you detect commas differently:

>> reduce-each x [1 + 2, (elide print "Hi!") 10 + 20] [probe x]
3
Hi!
30
== 30

>> reduce-each ^x [1 + 2, (elide print "Hi!") 10 + 20] [probe x]
'3
Hi!  ; skipped over comma, by default
~[]~
'30
== '30

>> reduce-each/comma ^x [1 + 2, (elide print "Hi!") 10 + 20] [probe x]
'3
~,~  ; isotope
Hi!
~[]~
'30
== '30

Here I fancifully suggest giving back an isotopic comma to draw attention to it. Since all other values come back meta'd, this doesn't conflate with any "real" evaluative result. e.g. [~,~, ~,~] could distinguish "real isotopic commas" from source-level commas that REDUCE-EACH is offering to tell you about.

That requires commas to be stable isotopes. But one could also not worry about the conflation, and reduce to an unstable isotope:

>> ,
== ~,~  ; isotope

Then instead of people directly testing for NIHIL? most people could test for NOTHING? or VAPOR? which meant either nihil or isotopic comma. :-/

Some crazy ideas here, but I think on balance that trying to make commas "bulletproof" leads to more confusion and problems than just making them do whatever comments do, with meta-REDUCE-EACH skipping them by default.

This Means Killing Off User-Defined Expression Barriers

Paring it down such that NIHIL results are "mean" enough to stop evaluations for all practical purposes, the special flag for letting functions cause the BARRIER_HIT flag can be dropped too. This was made available via the TWEAK function:

//=//// DETAILS_FLAG_IS_BARRIER ///////////////////////////////////////////=//
//
// Special action property set with TWEAK.  Used by |
//
// The "expression barrier" was once a built-in type (BAR!) in order to get
// a property not possible to achieve with functions...that it would error
// if it was used during FULFILL_ARG and would be transparent in evaluation.
//
// Transparency was eventually generalized as "invisibility".  But attempts
// to intuit the barrier-ness from another property (e.g. "enfix but no args")
// were confusing.  It seems an orthogonal feature in its own right, so it
// was added to the TWEAK list pending a notation in function specs.
//
#define DETAILS_FLAG_IS_BARRIER \
    SERIES_FLAG_25

I don't feel any particular qualm about losing this feature, because I've never really used it.

And under the new concept, you get "barriery-enough" just by evaluating to NIHIL. You're no better or worse than COMMA!

(Actually that's not quite true, because currently COMMA! doesn't do lookahead, so it can't serve as the left hand side of an enfix function that doesn't quote the left hand side. But WORD! does do lookahead. If we wanted to go for full parity, we'd allow the lookahead for evaluative commas...but making comma worse just to give it parity with a WORD! doesn't seem too smart. If we decided that COMMA! was actually a WORD! that was a synonym for NIHIL and just rendered funny, then it might make sense... but that would wreck other things--like the PACK exception for commas.)

hostilefork · November 2, 2023, 5:47pm

So the Bad news is, I don't think this holds water...

If you have functions that can take a varying number of arguments, and (foo 10) means something different and legitimately distinct from (foo)...

... then you probably want (foo, foo) to act equivalently to ((foo) (foo)), and not give an error.

And if you try to accomplish that by saying COMMA! vaporizes as NIHIL, but then establish that anything evaluating to NIHIL acts as an expression barrier, it's bad. Worse now in a world where PRINT returns NIHIL...

Why worse? Because it means foo print "Hi" will act like (foo) print "Hi". That's not intuitive, if you think of foo as a function that generally takes arguments. PRINT doesn't seem like something that has this magic "stopping power" of things on its left (or right) from consuming arguments.

So do I have to bring back those clunky and non-configurable internal BARRIER_HIT mechanics? NO...because....

The Good News is, A New Isotope Can Solve It!

We essentially just need a way for constructs to generate an isotope which disappears in interstitals like an empty pack (nihil)... but represents the additional nuance of expression-barrier-ness with simulating <end> and curtailing lookahead.

COMMA! Isotopes Are Up For The Job!

If we let commas evaluate to comma isotopes and give them this behavior, then anyone who wants to mimic it can simply have their function return an isotopic comma...and they'll act like a barrier too. So let's call a comma isotope a BARRIER.

It may sound like a headache to check for another type. But having a test which checks for either NIHIL? or BARRIER? is easy enough to make for if you don't care about the difference. I chose to call that ELISION? because trying to find another word that means nothing (e.g. well, NOTHING?) would throw in a fair bit of confusion with whether that subsumed VOIDs or NONEs.

It Gets Rid Of This Weird Workaround

I mentioned how PACK was built on REDUCE-EACH and was having trouble putting nihils in packs with commas. That problem goes away now, so this is no longer necessary:

; REDUCE-EACH is the basis of functions like PACK, and hence by default it
; wants to skip over commas.  However, there's an option which will give you
; the isotopic word ~comma~ when it gets a barrier.  This will be conflated
; with evaluations that produce ~comma~ unless ^META variable used.
[
    (['3 '30 ~comma~] = collect [
        reduce-each x [1 + 2, 10 + 20 ~comma~] [keep ^x]
    ])
    (['3 ~comma~ '30 ~comma~] = collect [
        reduce-each/commas x [1 + 2, 10 + 20 ~comma~] [keep ^x]
    ])

    ([''3 ''30 '~comma~] = collect [
        reduce-each ^x [1 + 2, 10 + 20 ~comma~] [keep ^x]
    ])
    ([''3 ~comma~ ''30 '~comma~] = collect [
        reduce-each/commas ^x [1 + 2, 10 + 20 ~comma~] [keep ^x]
    ])
]

But Is This Too Many Moving Parts?

I'm pretty sure it isn't.

The reason I say this is simply because flags like FEED_FLAG_BARRIER_HIT were instituted a long, long time ago. They were products of necessity, because I've always felt that Rebol without some kind of expression barrier concept was downright unreadable.

But my tries to implement this have always represented a thorn in terms of hidden state. I had to imagine strange places to represent the barrier-ness, like "what if that's what being enfix but taking no arguments meant".

This brings it all out in the open. Some routines will hide it, and others won't. It's already showing much more promise than what it was replacing.