Should WORD!, PATH!, FUNCTION! be "live" by default in rebDo()

hostilefork · November 9, 2017, 7:13pm

To make a medium-size story short: No, I don't believe so.

Beta/One Update March 2019: The final decision is to give people the choice; all operators come in "live" and "quoted" forms. The live forms have names like rebRun() or rebSpell(). The quoting forms have names like rebRunQ() or rebSpellQ(). Additional rebQ()-quoting and rebU()-unquoting operators let you raise or lower the quoting level of splices in mid-expression.

Hence one could say "I've changed my mind" because the "default" without the Q is to splice the value as-is. But it's been made as easy as a single letter to make the choice, so certainly making it easy is still a priority. See the complete rationale here

To make a medium-size story long: ...

Let's imagine you have a simple situation that generates a function and stores it in an API handle value:

REBVAL *foo = rebDo("function [x] [print x]", END);

What should happen if you tried to use a SET-WORD! in code to assign it to another variable?

rebDo("also-foo:", foo, END);

Perhaps one thinks that looks most like accessing a function through a WORD! foo. That might imply referring to it by its C variable name would execute it by default. Hence the burden would be upon you to do some kind of trick to "disarm" it, such as:

rebDo("also-foo: quote", foo, END);

However, the idea that referring to the C variable name executes it is not generally what happens. Had you written code as a simple C assignment:

REBVAL *also_foo = foo;

foo would not try to execute, and it can't. In plain C there's no opportunity for such a statement to execute arbitrary code. At least in that context, a simple reference to a C variable has to be inert...and its mere appearance does not imply execution.

This means you would bias things the other way. If you wanted an execution, you would use an EVAL.

rebDo("eval", foo, "10", END); // to print 10

This concept feels like it makes sense, because mostly what I see in C code is that the calculation to produce the values has already occurred. You don't want them to undergo a double evaluation. Consider this:

REBVAL *value = rebPath("a/b/c");
// now value holds an item of type PATH!
rebDo("target:", value, END);

Here you're abstracting through a variable that doesn't look like a path at all. It seems sketchy to be pulling out "live" behavior when it's not requested, because think about how the following would act:

value: 'a/b/c
;-- now value holds an item of type PATH!
target: value

This provides convincing evidence that access through a C variable should be seen as more akin to picking something out of a Rebol variable via a GET-WORD!. It should be seen as inert to the evaluator on the first pass, and it takes something more than just running the evaluator across it once to get it to execute. You either COMPOSE it into a block and DO it, or use EVAL.

I'm curious about what might be possible notationally for a fast/API-level EVAL:

rebDo(rebEval(), foo, "10", END); // 0-arity a bit confusing
rebDo(rebEvalNative(), foo, "10", END); // a bit wordy
rebDo(rebEval(foo), "10", END); // unusual but nicer

The last one seems coolest to me, even though it's "strange". But EVAL is strange...it takes one argument and then potentially bounces it and keeps on going. This suggests the return result of rebEval(foo) would not be a finalized REBVAL, rather something that has to be spliced into a rebDo() chain...a new datatype. Such things are tricky, but possible...

hostilefork · February 13, 2018, 11:40am

Having experimented with this for a bit, I will say that requiring rebEval() to make a value provided from C "live" vs. merely "spliced" is a little bit annoying at times But I'm still pretty sure it makes sense as the right answer.

One good reason for it would be that there are other "splicing" scenarios to which evaluation shouldn't apply. For instance, a variadic constructor for BLOCK!s:

REBVAL *block = rebBlock("10", value, "20", END);

The person who wanted to make a 3 element block presumably meant to splice value in there, and doesn't want to worry about any execution. I'd suggest they'd want the same behavior with:

REBVAL *block = rebDo("[", "10", value, "20]", END);

This actually brings up the point of what to do with a rebEval() in such a situation...since it's inside a block, the evaluator never "sees" it. So what should this do?

REBVAL *block = rebDo("[", "10", rebEval(value), "20]", END);

I think that should be an error. But whatever it does, hopefully this helps drive home my point about why splicing without evaluation should be the default.

In the shorthands department, I'm thinking rebE() for rebEval(). We'll have to be a little careful with these, and see what ones get used most in practice before picking them, but rebEval() will be used very frequently.

hostilefork · December 29, 2018, 8:28am

Coming back to this topic, there's some exciting news related to the advent of arbitrary levels of escaping.

The API frequently would get in situations where you'd be building blocks of code that were intended to run, but you didn't want values to "double evaluate". You'd be unable to pass nulls, for instance:

 ... = rebRun("if condition [some-function", value_may_be_null, "]");

That has to create a BLOCK! before it can run. But NULLs can't be spliced into blocks. You end up having to make up something along the lines of:

  uneval: func [x [<opt> any-value]] [
       if null? :x [return quote (null)] ;-- when run, produces NULL
       return as group! compose/only [quote (:x)]
  ]

  ... = rebRun("if condition [some-function uneval", value_may_be_null, "]");

So you'd fabricate an expression that would produce null when evaluated, for instance. You could optimize that to notice if you had something with no evaluator behavior, like an INTEGER!, and just return it. But it's all very dicey.

No longer do you need this, you only need to escape your value! @giuliolunati proposes this escaping keep the name quoting, so it becomes nice:

 ... = rebRun("if condition [some-function", rebQ(value_may_be_null), "]");

So what happens when the API streams across that and builds a block, is that the block really just contains a single value cell. If value was 1, the evaluator will see [some-function '1]. If it was null, the evaluator will see [some-function ']. If it was ''(1 + 2), then [some-function '''(1 + 2)]. Etc.

The previous trick for getting a similar effect complicated the code. This is going to be a big win. Once again, yay for thinking!

hostilefork · January 4, 2019, 2:10pm

I think the answer gets even better: we can quote by default on all splices for rebRun() and friends. But then offer another entry point--something with a name like rebBuild()--which doesn't execute anything, and splices as-is by default.

Quoting by default is nice for entry points like rebSpell(word), where you want to get the spelling, but you don't want to have to say you don't want it to evaluate (who wants to write rebSpell(rebQ(word))? plus you'd forget, plus it's an extra API call and instruction allocation overhead). But you want the executable bias if you are going to say something like rebSpell("first", block), so any scanned text runs become live.

And you want to be able to override the quotedness, though I don't know if I like rebU() for unquote...

 block = rebRun(rebU(action), "arg1 arg2");

While it's mechanically consistent to call it unquote (with rebRun always adding a quote level unless you ask it to) rebE() seems more meaningful for "make evaluative".

If you can escape into-and-out-of the building vs. running forms, I think it would cover pretty much everything. Let's say rebB for the build instruction:

 block = rebRun(rebE(action), "arg1", v1, v2, rebB("[", v4, v5, "]"));

So that would splice v1 and v2 quoted, but v4 and v5 as is. If you had omitted the rebB, you'd have gotten a block with quoted values in it as the last item.

(If you don't follow quite the difference between an "instruction" and an ordinary API entry point, instructions generate transient entities that are freed automatically and can only be used in variadic calls. They're easier to use than worrying about an API handle, if you are really going to have something you are using just once.)