The Nuts and Bolts of where API strings are bound

hostilefork · September 17, 2018, 9:18pm

In Limiting API Entry Points in Favor of Exchanging Strings, I explain the rationale behind Ren-C's choice to gear the "user-friendly" external API around strings.

The huge question this opens up is where the strings are bound (at least, where they start out being bound...BIND operations can further modify that).

Let's start with a simple yet "pathological" program that mucks up its environment by redefining APPEND:

#include "rebol.h"

int main(int argc, char* argv[]) {
   rebStartup();

   const int ten = 5 + 5;
   rebElide(
       "append:", rebI(ten),
       "print [{APPEND is} append]"
   );

   rebShutdown();
}

Today, that would print APPEND is 10. You probably expect this would only affect further code using the API, because this only overwrites APPEND in something like the "user context". Mezzanine routines that were using APPEND would be bound to lib's variable, which still points to the function they were expecting.

(Be sure to read "The Real Story About User and Lib Contexts" for background on what's going on there, and the challenges this presents in Rebol's paradigm.)

While that sounds a sensible starting point, let's look further.

Function Args and Locals

Let's say that your code isn't just in main(), but rather part of the implementation of a body of a function. When discussing how JavaScript or TCC natives access their arguments, I suggested that the current ACTION! on the stack might influence the binding.

Continuing the study of pathological cases, let's imagine one of those has a refinement called /APPEND:

frobulate: native [x [integer!] /append] {
    int x = rebUnboxInteger("x");
    bool append = rebDid("append"); // e.g. binds to the refinement argument
    int result = some_c_based_frobulator(x, append);
    return rebInteger(result);
}

This idea seems pretty nice on the surface. You wouldn't expect APPEND to call the series action in a Rebol function either...since it would be overridden. You'd have to use lib/append.

But what if the source for the C function some_c_based_frobulator() is something like this:

 int some_c_based_frobulator(int value, bool extend) {
     REBVAL *block = rebRun("copy []");
     int n;
     for (n = 0; n < value; ++n)
         rebRun("append", block, rebI(n)); // Uh oh...still the refinement!
     return rebUnboxInteger(extend ? "last" : "first", rebR(block));
 }

Putting aside that this example is nonsensical (it could have just returned value if extend or 0 if not), the important part is that you called an arbitrary C subroutine, that's now trying to use APPEND. As far as the Rebol ACTION! stack goes, there's nothing to distinguish this from the body of the frobulator native...so if that stack is used to determine the binding of APPEND, it will still be the refinement.

Hence: C subroutines which utilize libRebol API code are the Achilles heel of using the ACTION! stack to inform binding.

It affects extension MODULE!s too...

The concept of guiding binding by the currently running ACTION! wasn't just an idea for looking up arguments to the function. Because when extensions register natives, those natives remember that extension's MODULE!...which (in theory) isolates any redefinitions it needs. So finding a native meant also finding what context to look up words in--instead of just assuming the user context.

But again we have the problem. Let's say you're inside a native defined in the ODBC module, where perhaps it has redefined APPEND to be something database related (and it uses LIB/APPEND if it needs the series operation). But when you're in the Crypto module, it wants APPEND to be something else.

 // in a shared library
 int common_c_routine(...)
    { ... rebElide("append", ...); ... }

 // in the ODBC extension
 some-odbc-thing: native [...]
     { ... common_c_routine(...) ... }

 // in the Crypto extension
 some-crypto-thing: native [...]
     { ... common_c_routine(...) ... }

Are these different problems?

Certainly they're problems in the same spirit. However, the module granularity presents a sort of "worldview", and maybe the burden could be on a shared routine that knows it's going to be shared to establish some kind of context switch. The scenario of using a shared routine within a module between different ACTION!s seems much more common, and much more likely to cause confusion.

It might be better to separate out the binding to function arguments via special operators like rebArg("name"), rather than have them affect all API calls in any routine while that action happens to be on the stack.