Quoting Ergonomics in the API -- Solved with Terminology?

Old questions need to be pinned down. And a huge one in the API--practically the first question there was about the libRebol model--is what this should do:

rebSpell("mold", print_word);   // print_word is the WORD! for PRINT

Should it:

  1. Give back the UTF-8 C string "print"
  2. Error, and say "PRINT is missing its LINE parameter"

If you go with option #1, you have to make evaluation explicit when you want it... e.g. with some kind of EVAL operator:

rebRun(rebEVAL(print_word), "{This only prints if you use rebEVAL}");

rebRun(rebE(print_word), "{Instructions like this have shorthands in the API.}");

If you go with option #2, you'll be putting a lot of QUOTE operators in:

rebSpell("mold", rebQUOTE(print_word));

rebSpell("mold", rebQ(print_word));

(Note: In all proposals considered thus far, UTF-8 C text like "mold" is scanned and left in place as-is. It's only the spliced values to which this question applies.)

Beta/One Decision Update March-2019: Both options #1 and #2 will be available. #2 will be done with undecorated operator names like rebRun() and rebSpell(). Option #1 will be done with just one letter difference, as rebRunQ() and rebSpellQ(). The rebU() operator proposed in this post is kept with the meaning of "unquoting" splices, with rebQ() taking the place of rebE and simply meaning "quoting" splices...you can add and remove levels as you go. See the complete rationale here

The rest of this post is still historically relevant and points are valid, but the puzzle of "how to name operators that undo implicit things" is gone...since quoting only happens when you ask for it, it really is quoting and unquoting. Which is much easier to understand.

The API has been defaulting to option #1 (no error)

Which interpretation is most convenient will depend on what you're looking at. But just going by the numbers, there are a lot more calls just like rebSpell(print_word); than there are invocations of functions stored by WORD! or ACTION! in a C variable. It seems a shame if they have to be written as rebSpell(rebQ(print_word));

I could give a ton of examples where option #1 wins...if you're running code and loops and branches, it's perfect. Yet it's still very frequent that you get bitten on quoting by default--whenever you've got a construct that's just putting together raw material. Consider this simple example:

REBVAL *word = ...;  // code generating some WORD!
int i_ten = rebInteger(10);
int i_twenty = rebInteger(20);

REBVAL *result = rebRun(
    "select [ten", i_ten, "twenty", i_twenty, "]", word
);

Now think about it quoting every splice. Sure, that's good news for word. But i_ten and i_twenty aren't getting evaluated to take the quote off. So your selection product that comes back is going to be either '10 or '20... QUOTED! integers. :-/ Yuck.

Naming Quote Manipulators Has Been A Point of Confusion

If you go with the solution described above of rebEVAL() to tell the splice not to quote, you get something that works, but is kind of inelegant:

REBVAL *result = rebRun(
    "select [ten", rebE(i_ten), "twenty", rebE(i_twenty), "]", word
);

You managed to get the message to rebRun() not to put quotes on. But you said to do it with...EVAL? It's not really like EVAL-the-evaluator-ACTION!, and can't substitute for it generically. Its function is narrower...more like rebU for "unquote":

REBVAL *result = rebRun(
    "select [ten", rebU(i_ten), "twenty", rebU(i_twenty), "]", word
);

Except...it's not really that either. If you try to UNQUOTE things that don't have quotes on them you get an error in normal evaluation. The REBVALs you're talking about--the integers--are unquoted here.

So what you're actually doing is un-asking for a special treatment. There's not a great word for asking for the absence of something that is implicit. (It reminds me a bit of the struggle we had trying to find a way to name something to use in PRINT to mean "don't put anything between these components...no space, nothing".)

Trying to push the operator out to be something like rebBLOCK() with rebB() and subsuming the brackets loses a nice aspect the brackets offered:

REBVAL *result = rebRun(
    "select", rebB("ten", i_ten, "twenty", i_twenty), word
);

That's fine if you're trying to optimize--especially if you have no runs of text that would fire up the scanner otherwise. (It doesn't cost much to process the brackets if you already need the scanner.) But I think it loses the clarity the brackets gave.

Better Living Through Terminology

I propose the rebU() and rebE() operators doing the opposite of what they do today, and having them represent rebUNEVALUATIVE() and rebEVALUATIVE()

They are not saying what to do with a splice. They are describing the environment in which splicing is done. Hence you'd do the example above as:

REBVAL *result = rebRun(
    "select", rebU("[ten", i_ten, "twenty", i_twenty, "]"), word
);

You are saying "the contents underneath this are intended to be used in an unevaluative sense, so don't bother trying to quote-escape them". This is different from rebUNEVAL(), which was previously a quoting operation...the thought was "if you UNEVAL something, then if you EVAL what you get you'll get the original thing back". It was hacked together in an ugly way that is handled far more elegantly with UNQUOTE and (new) QUOTE.

You can apply rebU() it to a single item or multiple. You may even be able to apply it across partial spans of arrays:

REBVAL *result = rebRun(
    "select", rebU("[ten", i_ten), "twenty", i_twenty, "]", word
);

If the "unevaluative" doesn't gel, you can mentally file it under "unquote_splices". But I've explained why this is misleading, so it's not what the API operation will be called. (If there was a rebUNQUOTE, it would error if you passed it a plain INTEGER! like the ones above.)

But now the question is...what about this confusing-looking thing?

rebRun(rebU(print_word), "{This prints, but you said 'unevaluative'}");

Yep. Well, an "unevaluative" context would still run if you DO it. You can't somehow magically make things never evaluate.

There's nothing saying rebE() has to be a shorthand for rebEVALUATIVE() just because rebU() is rebUNEVALUATIVE(). Since evaluative is the default you probably don't need a shorthand for it.

What's more, rebEVAL could be broken out as a separate instruction, which folds the parentheses into the macro itself:

rebRun(rebEVAL, print_word, "{It's three characters longer this way...}");
rebRun(rebE(print_word), "{...but likely clearer w.r.t. distinction from rebU()}");

The extra three characters don't seem so bad, considering EVAL is something that applies to only the thing directly following it. And as I say, wanting to EVAL is not as common as you think (once you remove the cases that are now covered by rebU(), which were poorly expressed and thought of as being actual EVALs). This rebEVAL could be nothing but a clever trick that just folds into not quoting if at a non-BLOCK! level, but injects the EVAL native as an ACTION! if it winds up being inside a scanned block! (yes, details I know few people understand the ramifications of what I'm talking about.)

Anyway...when seen through this lens, I think it all makes sense. You start out in an EVALUATIVE context, which quotes splices by default. Then you can rest easy knowing that if you're in a situation where it's not what you want, you're just one rebU() away from having what you need...

Afterthought...should ACTION!s be "special" and auto-evaluate?

If the concept of the API is to treat the C variable lookup as if it were a WORD! being fetched, and thus give the inert value, there are three notable exceptions in the evaluator:

  • a WORD! looking up to an ACTION! evaluates
  • a WORD! looking up to a null errors without a special GET-WORD! exception
  • a WORD! looking up to a VOID! errors without a special GET-WORD! exception

Indeed, most of the rebEVAL examples are on ACTION!s, and the cases where you are passing them by value are few and far between.

But...I'd be very wary of making the behavior at the API level take this on. There are levels at which the API simply will not work the way the evaluator does...because it can't. You can't say rebRun("lit", my_word) and get back the WORD! my_word...as an obvious example. Do we really want to have to say:

if (rebDid("null?", rebG(maybe_null_or_void_REBVAL)) {
      // rebG() as in rebGET(), but as in EVAL TO GET-WORD!...?
}

That seems pretty insane, when compared to the comfortable "it gets a quote level in evaluative contexts" or "it gets no quote level in unevaluative contexts".

At 1st look I feel comfortable with that proposal.
I really should tinker with that, to clarify this feeling.

1 Like

Another question to ask: Should libRebol functions in JavaScript be done as e.g. reb.Run(), so it can work with JS modules better? Node.JS would then say const reb = require('rebol')

This Functionality Has Not Been Used:

It's hard to know in advance of designing an API what features will be needed. But when you notice something not getting used that's complex, it's probably just hindering the development of more interesting features. And I've felt no shortage of expressiveness in the API without this.

So I'm Paring It Back to arity-1 rebQ(v) and rebU(v)

Basically you can inject an automatically-self-disposing entry into the API stream that adds 1 level of quoting to the argument, or removes a level. That's it.

Here are some headache-inducing comments from the code being deleted:

// The rebQ instruction was designed such that it doesn't mean "quote", it means
// "quote any value splices in this section".  And if you turned around and
// said `rebU(rebQ(...))` that should undo your effect.  The two operations
// share a mostly common implementation.
//
// Note that `rebValue("print {One}", rebQ("print {Two}", ...), ...)` should not
// execute rebQ()'s code right when C runs it.  If it did, then `Two` would
// print before `One`.  It has to give back something that provides more than
// one value when the feed visits it.
//
// So what these operations produce is an array.  If it quotes a single value
// then it will just be a singular array (sizeof(REBSER)).  This array is not
// managed by the GC directly--which means it's cheap to allocate and then
// free as the feed passes it by.  which is one of the reasons that a GC has to
// force reification of outstanding variadic feeds)
//
// We lie and say the array is NODE_FLAG_MANAGED when we create it so it
// won't get manuals tracked.  Then clear the managed flag.  If the GC kicks
// in it will spool the va_list() to the end first and take care of it.  If
// it does not kick in, then the array will just be freed as it's passed.

Even scarier:

// !!! It may be possible to create variations of this which are done in a
// way that would allow arbitrary spans, `rebU("[, value1), value2, "]"`.
// But those variants would have to be more sophisticated than this.

:face_with_head_bandage: ...or we could not do that. This also gets rid of the "Quoting Byte", which has the following writeup:


There was significant deliberation over what the following code should do:

REBVAL *word = rebValue("'print");
REBVAL *type = rebValue("type of", word);

If the WORD! is simply spliced into the code and run, then that will be an error. It would be as if you had written:

do compose [type of (word)]

It may seem to be more desirable to pretend you had fetched word from a variable, as if the code had been Rebol. The illusion could be given by automatically splicing quotes, but doing this without being asked creates other negative side effects:

REBVAL *x = rebInteger(10);
REBVAL *y = rebInteger(20);
REBVAL *coordinate = rebValue("[", x, y, "]");

You don't want to wind up with ['10 '20] in that block. So automatic splicing with quotes is fraught with problems. Hence you should use the rebQ() operator to add quoting levels when needed, or use the @ operator as part of a string:

REBVAL *type = rebValue("type of", rebQ(word));
REBVAL *type = rebValue("type of @", word);

rebQ() and rebU() are generalized so that one may add and drop quoting from splices on a feed via ranges, countering any additions via rebQ() with a corresponding rebU(). This is kept within reason at up to 255 levels in a byte, and that byte is in the feed flags in the second byte (where it is least likely to be needed to line up with cell bits etc.) Being in the flags means it can be initialized with them in one assignment if it does not change.

2 Likes