Lifetime of handles given back to C code

hostilefork · October 19, 2017, 2:17am

The simplest interface for calling Rebol from C is to basically treat it as "string in, string out". Hence you might say:

char *output = rebDo("1 + 1");
int result = atoi(output); // "ascii to integer"
printf("The result of the addition was %d\n", result);
free(output);

Even with an interface this simple, it raises a lot of questions. What happens in case of a syntax error? What if there's a THROW with no CATCH? Who frees the result string, and when? And there's also clearly an annoyance with having to do a lot of string conversions...anything you want to pass in you have to merge into a string, and any result you want to process you have to extract out.

But beyond those issues and annoyances, there's a fundamental problem of lost identity. If an object is returned from a call, then how do you ask to perform subsequent operations on that object?

Hence it is desirable that your C program have a way of getting a return result which models the actual types and values in Rebol more closely.

Long ago in the Rebol world there was R3-Alpha's "RL_Api" (Rebol Lib). It had a datatype called RXIARG, which could represent a subset of Rebol values. There was an exported routine RL_Do_String which would return an instance of this value:

/***********************************************************************
**
*/  RL_API int RL_Do_String(REBYTE *text, REBCNT flags, RXIARG *result)
/*
**  Load a string and evaluate the resulting block.
**
**  Returns:
**	     The datatype of the result.
**  Arguments:
**	     text - A null terminated UTF-8 (or ASCII) string to transcode
**          into a block and evaluate.
**       flags - set to zero for now
**       result - value returned from evaluation.
**
***********************************************************************/

To briefly summarize this, the calling C code provides the memory for the result to be written into. In order to avoid exposing Rebol's internal format for value cells, this RXIARG was a skeletal clone of that structure, which did not pull in all the include files for implementing Rebol because it used void* where a pointer to the internal series type might be otherwise.

But regardless of it being obscured by the fact that it's a void*, you have a problem now...because the caller owns the memory where a pointer to a series node lives. And there's no way of knowing how long they will be holding onto that memory and wanting to read from it. If the garbage collector runs, it won't take into account that pointer as a living reference to the series.

Ren-C got rid of RXIARG, and instead said that all the client gets are opaque pointers to REBVAL cells that live in memory which Ren-C controls. There can be various contracts for how long these cells live...by default they will expire when the frame of the extension function they live in goes off the stack. But they may be expired sooner, re-parented to a different frame, reference counted, or just explicitly freed.

(Note: They accomplish this magic by actually being the size of 2 x REBVAL, which is also the size of a Rebol series node. These "pairings" have a meta value for controlling lifetime that can be any other Rebol value. For details of the implementation trickery, read here)

Red's libRed uses a strategy that is "similar"--well, similar in the sense that clients of the library never get their mitts on an internal series node pointer directly, and it also insists on owning the memory which a value cell lives in. But that owned memory is quite limited...return results from APIs point into a "ring" of values (currently 50 elements long). If you examine the implementation of redPick, the macro TRAP_ERRORS takes the result on the top of the stack and puts it in the ring.

The short-lived nature of these results is explained in the section on Value References, and a sample "unsafe usage" is given:

long a, blk;

a = redSymbol("a");
redSet(a, redBlock(0)); // returned reference is used immediately here

blk = redGet(a);
redPrint(blk); // safe reference usage

for(i = 0; i < 100, i++) {
    // redAppend(blk, redNone()); // unsafe reference usage!
    redAppend(redGet("a"), redNone()); // safe version
}

From my perspective, this kind of thinking is dangerous. For a lot of reasons. Even if I feel comfortable with the restriction--and consider myself within the "safe" number of return results--what if I unwittingly have a call which as part of its implementation switches from not using any Red functions to calling some Red functions? While I've changed nothing at my callsite, a function I'm dependent on has broken my previously-okay assumptions.

Even worse, the failure mode is guaranteed reuse. Using an expired value doesn't crash, you just get some other value.

But if you're not using C++ which can safely construct and destruct things, the alternative seems daunting. You'd have to consider every API call that returns a value to be something like a malloc() which needs to be freed. :-/

for(i = 0; i < 100, i++) {
    REBVAL *gotten = rebGet("a");
    rebAppend(gotten, rebBlank());
    rebFree(gotten);
}

That seems laborious.

But on the plus side with Ren-C, since it has a way of tying values to the current FRAME! by default, they will be GC'd when that function call ends. Someone interested in whether one of their functions is "leaking" could register interest in knowing if the GC had to clean up after them...and then insert frees for any significant-seeming loads. Lazier people could just assume it's fine.

And for C++ users, because RenCpp wraps the values with constructors and destructors, they can be reference counted and not tax the garbage collector. ("freeing" and "allocating" the API pairings is a cheap operation)

I'm not too stressed about it, in the sense that it's very easy to do what Red is doing. So if someone really dislikes the performance profile of Ren-C and is happy with an ad-hoc rule, a flag on their extension could say "use a ring". Or perhaps you could have a function that converts results into ring-lifetime, e.g.

for(i = 0; i < 100, i++) {
    rebAppend(rebRing(rebGet("a")), rebBlank());
}

So in this case the "meta" value for indicating lifetime could actually wind up being a special handle, which links the current value into the tail of the reuse ring. Hence you wouldn't need to store the value in an intermediate variable. It's a thought.

In any case, Ren Garden wouldn't work when only the last 50 API values stay valid...think of features like the watchlist, that watchlist has to remember the N values you're watching and the N results of evaluating them to show them. You'd have to write everything into a Rebol variable, inventing a parallel variable structure for each value you wish to hold in a C/C++ variable. Then you'd have to re-get the Rebol variable into a C variable each time you wanted to use it...which sounds to me like just pushing the complexity to a different and worse place.

middayc · December 27, 2017, 11:42am

this is above my head, ... if stupid just delete my comment. but could GC be modified to have an "ignore list" and you would have an C api call that lets you add pointer (you just recieved from reb) to ignore list, and to remove it from ignore list when you don't care about it any more?

hostilefork · December 27, 2017, 11:59am

The default behavior if you write a function that uses the API will be to hook the lifetime to the function/frame lifetime. You can rebRelease() things if you know you want to free them earlier. (committed code calls this rebFree(), but release is more accurate)

There is rebUnmanage() which means that you want to extend a value's lifetime from the function/frame that is running, and you must explicitly use rebRelease() to free it.

A long-running top-level process, like the console, must rebRelease() or else it would just accrue values indefinitely, because there is no function running to which the lifetime can be attached.