Issues With Writing Output Cells Directly

How R3-Alpha Natives Returned Their Result

The protocol for return result for natives in R3-Alpha was that an enumerated type said where the output result could be found:

enum {
	R_RET = 0,
	R_TOS,
	R_TOS1,
	R_NONE,
	R_UNSET,
	R_TRUE,
	R_FALSE,
	R_ARG1,
	R_ARG2,
	R_ARG3
};

Each invocation of a native pushed some space for a cell where you could write a return result, and DS_RETURN was that arbitrary cell. If that's where the result was, the native would return R_RET.

The other return values were shorthands to save you from having to copy or initialize a result into DS_RETURN from somewhere else. e.g. return R_TOS meant to look for the result at the "Top Of Stack", so your native wouldn't have to copy the cell from that location and then drop an element off the stack. So it was a shorthand for:

*DS_RETURN = *DS_TOP;
DS_DROP;
return R_RET;

return R_TRUE kept you from having to initialize the DS_RETURN slot with a logic, hence a shorthand for:

SET_TRUE(DS_RETURN);
return R_RET;

Ren-C Writes Directly To A Target Cell

Early changes for Ren-C brought more rigor to the data stack and checks on how it was used. (I have explained many of these changes.)

It also expanded the return value of natives to be a fairly arbitrary pointer...a role that I call a Bounce. This which can be detected as being Cells, or UTF-8 strings, or other indicators (some indicators ask for the trampoline to cycle back and run a pushed stack level, without creating a nested C stack). You can even return a C nullptr to indicate a ~null~ antiform.

But rather than having a slot on the data stack where results are expected to be written, each interpreter stack level has an OUT pointer. When you instantiate a stack level, this pointer is specified in the instantiation...and it's supposed to be somewhere that already exists.

Notably, this pointer cannot be in the data stack...because the data stack can be resized at arbitrary moments (e.g. on a stack expansion). However, it would be possible to do something similar to R3-Alpha and have a Bounce signal that said the result lived there...which would just mean the code executing natives would copy whatever was on the top of the stack into the OUT location at the moment of return as a convenience.

Direct Write Was Conceived As An Optimization

It would have been possible for Ren-C to have a Cell's worth of space in the "Level" representing an intepreter stack level, instead of being given a pre-existing pointer. But the concept was that saying where to write the output would save on needing to move the result after evaluation was finished.

But there's a few catches that have come up...

  1. Indirect Writes Are Slower - The OUT cell is used for intermediate calculations. Locality-wise, performance has shown that writing to L->out is noticeably more expensive than if it were a plain cell and you were writing to &L->out. If you do a lot of these intermediate calculations the extra dereferences wind up outweighing having to do a single move of the output cell at the end.

  2. Stack Suspension Gets Complex - In things like generators, you want to suspend a Level stack. When you do so, the place that was requested as "where to write to" will change... so anywhere in the suspended stack where the output-to pointer is mentioned has to be turned into a placeholder value, so that when you restore the stack with a new idea of where to write the output cell of the top of stack it has to go through and fix up those placeholders to point to the new location.

  3. Handling Failures May Have Invalid Stack Locations - In the model that has been established regarding things like abrupt failures, it's possible for a stack Level to run some cleanup code if it needs to. So the throw or longjmp happens and the Trampoline catches it with the last pushed Level still intact. But if the Level's L->out pointer was to a cell on the stack, then it may be invalid during this handling code.

[2] was annoying to work through, but it's really [3] that I am struggling with. Things would be simpler if there was a cell as part of the Level itself, whose lifetime was equal to the Level's, where results were written.

Changing It Feels Like A Step Backward :pouting_cat: But...

It would still be possible to avoid copying if all you were interested in was the result. (Think of something like rebUnboxInteger() which could push a Level, do an evaluation keeping the Level on the stack, extract the integer, drop the Level, return the integer.)

I'm not thrilled, but I'll try the change and see how much damage/help it does.

ChatGPT thoughts:

Upon reflection, this would be an incredibly difficult change to make, breaking pretty much everything.

The system generally targets Level->out to cells that are in other levels above the level which are guaranteed to outlive it (with special attention paid in generators). The rare cases where C stack cells are used it's generally because a trampoline is spawned beneath it which would intercept any errors.

The problem cases are lingering stackful code that doesn't invoke a trampoline when pushing levels, and is being called from a native without passing an output.

I found a trick: The trick is to take advantage of a cell's worth of storage in "Stepper Executor" levels that is usually used by an "Evaluator Executor" that calls successive stepper executors... this cell is how it stores the previous result to implement invisibility (e.g. how 1 + 2 comment "hi" can be 3). By allowing a stepper level to target that cell inside itself as output when it knows it's not being called by an evaluator executor, it works.

So basically... still uses the OUT as being a Cell*, but in these edge cases that Cell* is allowed to point inside the Level at what would be otherwise unused space.

Details for whom it may concern:

// !!! This is for historical non-stackless code, which needs a place to write
// output for a stepper that has a lifetime at least as long as the Level.
// e.g. this is illegal:
//
//      DECLARE_ATOM (result);
//      Level* L = Make_Level_At(
//          &Stepper_Executor, spec, LEVEL_FLAG_TRAMPOLINE_KEEPALIVE
//      );
//      Push_Level_Erase_Out_If_State_0(result, L);
//      fail ("This throws a level to the trampoline where result is dead");
//
// Simply put, when the Trampoline gets after a longjmp() or throw, that
// Level's L->out pointer will be corrupt...the stack-declared result is gone.
//
// Instead of DECLARE_ATOM, use Level_Lifetime_Atom(L).  This takes advantage
// of the fact that there's a cell's worth of spare space which a stepper
// that is not called by Evaluator_Executor() does not use.
//
INLINE Sink(Atom) Level_Lifetime_Atom(Level* L) {
    assert(L->executor == &Stepper_Executor);
    Force_Erase_Cell_Untracked(&L->u.eval.primed);
    return cast(Atom*, &L->u.eval.primed);
}

There needs to be some auditing of the system to find all the cases that need to use this. It's a sneaky problem, but address sanitizer can catch it when it does happen.