Taming Handle Tracking with a Uniform Variadic API

In the post about lifetime of handles given back to C code, I brought up the tough problem of "who frees a floating REBVAL* that the system has entrusted an API user with?"

So every time you run a rebDo() you are getting back a pointer that the system has to count as a "live" reference... if it's a BLOCK!, there's no way to know a-priori how long the caller is going to be picking and poking values around in that block.

For now, all values get cells, and their lifetimes are managed. That includes INTEGER!. So let's take a simple example:

int AddOneToSomething(int something) {
    REBVAL *somethingVal = rebInteger(something);
    REBVAL *sum = rebDo("1 +", somethingVal, END);
    int result = rebUnboxInteger(sum);
    rebRelease(somethingVal);
    rebRelease(sum);
    return result;
}

It's a pain to have to write so many rebRelease()s. One of the answers is to have a default moment at which such things are GC'd automatically, and that's going to be possible sometimes--but not always, and it will mean leaving things alive longer than they would be freed up otherwise. Another answer is to use C++, where ren::Value automatically knows when it's out of scope and can release things.

But a wilder cross-language answer came to mind, which applies to C, JavaScript, and anything else. What if all APIs that could take a single REBVAL* to process it, might also be variadic? We could see rebUnboxInteger() as being a variant of rebDo(), instead of simply taking one argument:

int AddOneToSomething(int something) {
    REBVAL *somethingVal = rebInteger(something);
    int result = rebUnboxInteger("1 +", somethingVal, END);
    rebRelease(somethingVal);
    return result;
}

Now the rebDo() is folded into the rebUnboxInteger() call, and we've gotten rid of one userspace handle. That's one handle that doesn't need to be allocated, tracked, or freed. I've also proposed the idea of marking certain handles as releasable by the rebDo() mechanics once it sees them in the line of processing, like a rebT() instruction for marking things as "temporary":

int AddOneToSomething(int something) {
    return rebUnboxInteger("1 +", rebT(rebInteger(something)), END);
}

We might even go so far as to say for something common like this, that rebI(...) could be a shorthand for rebT(rebInteger(...)):

int AddOneToSomething(int something) {
    return rebUnboxInteger("1 +", rebI(something), END);
}

An interesting point is that this is made more palatable because things like WORD! and FUNCTION! are not "live" by default. You don't want to instead of saying rebSpellingOf(someWord), rather rebSpellingOf(rebUneval(someWord), END); !

(I should also point out that having to have END is something that @giuliolunati has already avoided in JavaScript, and could be avoided in C++...or even in C for C99 builds.)

But it seems to me on average, you're looking at enough savings on the total amount of code that even if you have to put END on, it's a win. If you don't need an END, it's seemingly kind of a slam-dunk win.

So basically any API that would perhaps otherwise have been seen as taking a plain REBVAL* would now take a variadic stream of string, value, and instruction components.


Examples:

  • instead of rebUnboxLogic(REBVAL *logic_value)...what about rebDid(...) and rebNot(...)? e.g. rebDid("all [", condition1, condition2, "]", END); or rebNot("error?", value, END);

  • instead of rebRelease(rebDo(...)) what about rebElide(...)?

One downside of this proposal: what to do when an error happens.

The initial model was that when rebDo() was the only entry point to running code: it gives the opportunity to check for NULL as an error condition and react to it. So the real "correct" code for the above--prepared for an error condition--would be:

int AddOneToSomething(int something) {
    REBVAL *somethingVal = rebInteger(something);
    REBVAL *sum = rebDo("1 +", somethingVal, END);
    if (sum == NULL) {
        /* Do whatever you wanted to do on failure (?) */
        /* e.g. if + was redefined somehow */
    }
    int result = rebUnboxInteger(sum);
    rebRelease(somethingVal);
    rebRelease(sum);
    return result;
}

If something like rebUnboxInteger() doesn't have an option to return NULL, there's no return value to check for error. :frowning:

But checking for errors on each and every API call is pretty arduous. A lot of the time you want to just assume things will work, handling the so-called "exceptional" cases with specialized code. But while C++ and JavaScript have a "throw" and "catch" mechanism, C itself is notoriously lacking in exception handling.

So at least in C, this suggests there needs to be some sort of rebTrapWith() where the code being trapped (and the code handling the trap) is two C functions.

Unsurprisingly, Ruby (which also uses setjmp/longjmp for exceptions, see rb_exc_raise()) does something along these lines. They call it "rescue2"...where the 2 is the aforementioned two C functions.

You can also tunnel through a single parameter to each function, but it's has to be of type VALUE, their REBVAL. (This seems a bit limiting...like the qsort() function where the comparator is a callback which can only use global state; I'd think you'd want something more like qsort_r() which tunnels a void* so you can pass an arbitrary struct of data.)

To kind of draft out the usage, let's say you've done some malloc's and want to be sure you've freed them in the case of failure. Code without error handling would look like:

if (rebDid("some", prelude, "code", END)) {...}
char *some_data = malloc(100);
rebElide("more code that might fail", END);
free(some_data)

You'd have to transform this into:

--- STATE STRUCTURE ---
struct TrapState { char *some_data; };

--- WORK FUNCTION ---
void tryMe(struct TrapState *ts) {
    if (rebDid("some", prelude, "code", END)) {...}
    ts->some_data = malloc(100);
    rebElide("more code that might fail", END);
    free(ts->some_data)
}

--- RECOVERY FUNCTION ---
void recoverMe(struct TrapState *ts) {
    if (ts->some_data != NULL)
        free(ts->some_data); // you can actually free NULL, though
}

--- CALLING CODE ---
struct TrapState ts;
ts.some_data = NULL;

rebTrapWith(&tryMe, &ts, &recoverMe, &ts); 

For the exact case of allocating and freeing memory, we can probably make this easier...with a rebMalloc() that will automatically free the buffer in the case of an error. That could get rid of the need for TrapState in the above example, for instance. We could also expose the HANDLE! cleanup functionality, which is beyond the scope of this post.

Regardless, the note from the embedding Ruby docs still applies:

If you’re embedding the Ruby interpreter in C, you need to be extremely careful when calling API functions that could raise exceptions: an uncaught exception will segfault the VM and kill your program.

...and the slightly troubling bit of this proposal is that it extends the API functions that could raise exceptions to "anything you call variadically". But I think there are some avenues of attack...including that it might wind up being much cleaner in JavaScript and C++...if setjmp/longjmp are not exposed in the API (or perhaps not even used internally, when compiled as JavaScript or C++).