New datatype idea: SINK!

hostilefork · December 6, 2018, 6:37pm

Right now the only way you can get handles back from the API is as return results. This means if you want to extract N values, you have to make N calls:

REBVAL *a = rebRun("select", obj, "'a");
REBVAL *b = rebRun("select", obj, "'b");

But what if you could target API handle pointers directly, to get something like a SET-WORD!, except that the evaluator would write into an API handle pointer instead of an ordinary variable:

 REBVAL *a;
 REBVAL *b;
 rebRun(
     rebSink(&a), "select", obj, "'a",
     rebSink(&b), "select", obj, "'b"
 );

Why would this be useful?

It's often tricky to tunnel the variable your code wants out of a complex expression. Take the upcoming change to TRAP, for example. The TRAP only gives back an ERROR!, but what if you want to write your error handling code in Rebol...tunneling your value out of the trap?

REBVAL *data = ...;
rebElide("trap [",
    rebSink(&data), "inflate/max", data, "uncompressed-size",
"] then [",
    "info {^- -> failed [deflate]^/}",
    "throw blank",
"]");

Note how this parallels data: inflate/max data uncompressed-size. It's a lot easier than what you'd have to do otherwise in the C.

This is difficult to do safely...

One risk with something like this what happens if a SINK that's holding onto a pointer gets copied out somewhere and then used after that pointer is no longer valid. Consider:

 REBVAL *Make_Evil_Block() {
      REBVAL *local;
      return rebRun("[", rebSink(&local), "10]");
 }

 rebRun("do", Make_Evil_Block());

What'll happen there is you get something parallel to local: 10, except it's writing to a local variable that's no longer on the stack when Make_Evil_Block() finishes.

A possible way to address this would be to require the variable to already exist so that it could become managed. So'd you'd pass a pointer to a REBVAL, not a pointer-to-a-pointer to a REBVAL:

 REBVAL *Make_Evil_Block() {
      REBVAL *local = rebBlank();
      return rebRun("[", rebSink(local), "10]");
 }

This would mean the GC would be responsible for making sure the API handle's memory slot remained valid until places it was referenced were gone. Hence a rebRelease() call wouldn't actually release it. That would mean the interpreter wouldn't crash if it hit a bad sink, it would just say C was done with that variable so you can't assign it.

Unfortunately this rules out sinking nulls. You'd have to at least TRY anything you'd sink, because from the API's point of view the only representation of a nulled cell is a nulled pointer.

I think doing it unsafely would be too risky. It's one thing to have a random C native someone wrote crash, and another to give the evaluator something it crashes on during an evaluation. So if this were implemented, it would be one of the motivating cases for mutable API handles, which has remained an open question.

hostilefork · December 12, 2018, 10:45am

I thought of a pretty silly syntax trick for sinks, which would allow using a colon in C. It twists the ternary operator up a bit:

#define rebS(slot) \
    rebSink(slot), false ? nullptr

So then you can write:

REBVAL *foo = rebVoid();
REBVAL *bar = rebVoid();
rebElide(
    rebS(foo): "1 + 2",
    rebS(bar): "3 + 4
);

Which preprocesses into:

REBVAL *foo = rebVoid();
REBVAL *bar = rebVoid();
rebElide(
    rebSink(foo), false ? nullptr: "1 + 2",
    rebSink(bar), false ? nullptr: "3 + 4
);

It's just a sneaky way of throwing in a no-op ternary operator for the syntax purpose of getting a colon. As unusual as it looks, it's quite standard C and C++, and I think it conveys what's going on more clearly. With this syntax you'd have trouble putting sinks at the end of runs, since some piece of evaluator material has to come after the colon. But there is rebEND if you really want to opt out of that.

As tricks go, I think that's pretty awesome! People who didn't want to use it can just use plain rebSink.

If you try to write a null and "unset" a sink (impossible, it's an allocated cell, you can't turn it into a null) then it can just write void. If doing this becomes common, it probably justifies a rebDevoid() API that turns voids to nulls and frees their API cells--or something of that sort.

hostilefork · December 13, 2018, 8:53pm

Or, maybe it doesn't rule it out! Imagine if the parameter to rebSink is an "in/out" pointer...it is passed by address and may or may not change.

The key is you'd have to make sure it was initialized to some valid bit pattern:

REBVAL *a = nullptr; // nullptr is okay, initialized bits
REBVAL *b = rebInteger(1020); // any other initialized value fine too
rebElide(
    rebS(&a): "select", obj, "'a",
    rebS(&b): "select", obj, "'b"
);

Then, if you sink a null into a cell that wasn't previously null before, it would rebRelease() the old cell. If you sink a non-null cell into a cell that is null, it will allocate a new handle.

There's still a lot of devils in the details here, but I believe it's both technically possible and quite desirable.

But what about JavaScript?

JavaScript doesn't have a notion of passing simple variables by reference. This is another good argument for not making REBVAL* just plain integer pointers in the JS version of the API. If you pass an object, then you can change its fields just by having a reference to it.