No Preprocessing, No FFI, Just Awesome: rebFunction()

hostilefork · August 27, 2024, 2:37pm

You can now create your own amazingly powerful Rebol natives in plain C, powered by the new binding, in a way that is OUT OF THIS WORLD.

Here's a full C program using the Ren-C libRebol

The mechanics heavily rely on Pure Virtual Binding II, and having it look so clean is due to macro tricks involving shadowed variables as a proxy for knowing the C function stack:

#define LIBREBOL_SPECIFIER (&specifier)

#include "rebol.h"
typedef RebolValue Value;
typedef RebolSpecifier Specifier;
typedef RebolBounce Bounce;

static Specifier* specifier = nullptr;  // default inherit of LIB

void Subroutine(void) {
    rebElide(
        "assert [action? :print]",
        "print {Subroutine() has original ASSERT and PRINT!}"
    );
}

const char* Sum_Plus_1000_Spec = "[ \
    {Demonstration native that shadows ASSERT and PRINT} \
    assert [integer!] \
    print [integer!] \
]";
Bounce Sum_Plus_1000_Impl(Specifier* specifier)
{
    Value* hundred = rebValue("fourth [1 10 100 1000]");
    Subroutine();
    return rebValue("print + assert +", rebR(hundred));
}

int main() {
    rebStartup();

    Value* action = rebFunction(Sum_Plus_1000_Spec, &Sum_Plus_1000_Impl);

    rebElide(
        "let sum-plus-1000: @", action,
        "print [{Sum Plus 1000 is:} sum-plus-1000 5 15]"
    )

    rebRelease(action);
    rebShutdown();
    return 0;
}

This outputs:

Subroutine() has original ASSERT and PRINT!
Sum Plus 1000 is 1020

If you use C++, It Gets Niftier, But Same Internals!

Raw strings R"(...)" mean you don't need backslashes
Lambdas mean you don't need to name your implementation function

Variadic Template Packing allows custom conversions to Value* from int (no rebI() needed!) or any other datatype! Add your own converters for any C++ class!

Value* action = rebFunction(R"([
    {Demonstration native that shadows ASSERT and ADD}
    assert [integer!]
    add [integer!]
])",
[](Specifier specifier) -> Bounce {
    int thousand = Subroutine();
    return rebValue("add + assert +", thousand);
});

But it's better than that because we can make Value a smart pointer that automatically gets released when the last reference goes away. RenCpp did that, but we can do it much more lightweight in libRebol...coming soon!

It's a very elegant bridge, working without resorting to FFI or similar.

The smarts of the API macros like rebElide() and rebValue() is that they pick up the specifier by name that you give, so you don't have to pass it every time. When you're inside your native's implementation, the
shadowing of the argument overrides the global variable.

And of course being to do this at all hinges on throwing out the playbook from Rebol's historical binding, and doing something coherent and useful.

The Function Gets a Definitional Return. But...Why?

So you might think there's no good reason to have a definitional return. Because how would you ever run it?

const char* Illegal_Return_Spec = "[ \
    {Showing that you can't use RETURN in an API Call} \
    arg [integer!] \
]";
Bounce Illegal_Return_Impl(Specifier* specifier)
{
    rebElide("return arg + 1000");
    DEAD_END;
}

When you call rebElide(), it crosses the API boundary and the C code is still on the stack. You can't unwind across it... unless you use longjmp or exceptions, and that's very thorny and brittle.

But Ren-C has Continuations

Note that the function you supply to do the native's work doesn't return a Value*, it returns something called a Bounce.

Bounce is a superset of Value*, that includes the ability to encode other instructions. One of those instructions is to ask the evaluator to do more work on the C function's behalf--even though it's no longer on the stack--before returning a value. You can ask to be called back again after that work is done (rebContinue())...or you can just transfer control to some additional code and let what it does be the answer (rebDelegate()).

And within that code, it can use the definitional RETURN to deliver the value to the caller of your native!

const char* Working_Return_Spec = "[ \
    {Showing that you *can* use RETURN in an API Continuation} \
    return: [tag!] \
    arg [integer!] \
];
Bounce Working_Return_Impl(Specifier* specifier)
{
    int bigger = rebUnboxInteger(arg) + 1000;  // whatever C processing

    return rebDelegate(
        "if", rebI(bigger), "> 10000 [return <big>]",
        "print {It wasn't big!}",
        "return <small>"
     );
}

I believe this is one of the most clever language bridging ideas ever made - bringing still more uniqueness to Rebol's already very unique offering. And of course, C++ can throw in many improvements (not needing rebI(...) and just using integers directly and getting values, lifetime management for API handles with smart pointers so you don't need to rebRelease() them, etc. etc.

So much is enabled by this new binding, it's light years ahead of what we're used to.

hostilefork · August 27, 2024, 4:54pm

I should mention that RenCpp could register C callbacks as function implementations nine years ago:

auto watchFunction = Function::construct(
    " {WATCH dialect for monitoring and un-monitoring in the workbench}"
    " :arg [word! get-word! path! get-path! block! group! integer! tag!]"
    "     {word to watch or other legal parameter, see documentation)}"
    " /dialect {Interpret as instruction to WATCH vs. raw value}",

    [this](
        AnyValue const & argOriginal, AnyValue const & dialect
    )
        -> optional<AnyValue>
    {
        WatchList & watchList = *getTabInfo(repl()).watchList;

        AnyValue arg = argOriginal;

        optional<Tag> label;

        if (hasType<Block>(arg) || hasType<Group>(arg)) {
    .....

But the mechanics to try and get it to pass the values as parameters to a function like that (a C++ lambda in that case) were horrific.

That function.hpp file is a blight! But of course, today's techniques were far from possible, so it seemed like the only way to do it. Also I was spending a lot of time just whipping Rebol into shape so it could do anything like this.

Overall RenCpp was pretty well on track for what the API should look like and how it should function. But it took some clever innovations in Cell and Series design--plus a revolution in binding--plus me realizing what not to do--to make it happen in a truly good way.

hostilefork · August 27, 2024, 9:31pm

Comparison to redRoutine()

So libRed's whole model is pretty much a dead end. But speaking just about redRoutine() specifically, it's more or less the same principle for getting called as RenCpp's was.

libRed - Registering a callback function

#include "red.h"
#include <stdio.h>

red_integer add(red_integer a, red_integer b) {
    return redInteger(redCInt32(a) + redCInt32(b));
}

int main(void) {
    redRoutine(redWord("c-add"), "[a [integer!] b [integer!]]", (void*) &add);
    printf(redCInt32(redDo("c-add 2 3")));
    return 0;
}

You write a C function with a certain arity, and then line it up with a spec that has the same arity. The function receives the arguments as multiple C function arguments...so individual Cell pointers.

Here's the Red/System implementation that creates the routine:

red/libRed/libRed.red at dbc93da47047667023a66c5edf1aa1d63ff6f0d0 · red/red · GitHub

But let's get to what actually runs the routine, EXEC-ROUTINE

red/runtime/interpreter.reds at dbc93da47047667023a66c5edf1aa1d63ff6f0d0 · red/red · GitHub

They don't actually know how many arguments the function takes (RenCpp could know by recursive decomposition, but that required C++). Since they don't know the number of args, if there's a mismatch between your implementation function's args and how many args they pass in the spec it will likely crash.

A very weak point of this is lost on the casual observer, which is that all the API red_values (a pointer) are kept in a "ring". You can see it pushing it with push red/ext-ring/store arg This is a fixed number of API handles that are given out with no lifetime management. It's just that after you've allocated 50 handles one of your previously known handles goes bad. Yup, 50:

red/libRed/libRed.red at dbc93da47047667023a66c5edf1aa1d63ff6f0d0 · red/red · GitHub

So if you write one redRoutine, and if you get your arguments, those arguments could go bad if you do something that also uses that ring. Like if you call another redRoutine that takes 5 arguments 10 times, the arguments you received are now corrupt. But other libRed functions make things on this ring. Definitely broken.

Anyhow... being able to access the local variables and arguments by name in the C function as part of textual code is many orders of magnitude better... but it builds on a LOT of design and implementation work. libRebol is there for Red to steal from if they wish--which they should wish--but it would probably take them years (another decade?) to get parity in functionality.