Limiting API Entry Points in Favor of Exchanging Strings

hostilefork · February 21, 2018, 7:40pm

When I first saw the libRed documentation, it struck me as being...a grab-bag.

Why was there a redAppend() with no /PART or /ONLY? Why no redInsert()? Were these things missing on purpose? Or was it just in an incomplete state--with the ultimate goal to clone the entire Rebol2 manual as a catalog of C entry points?

Moreover: if one is embedding a "Redbol" module or system into another language, the theory is that the win comes from dialects. You presumably aren't just loading a raw interpreter so you can program in Rebol as awkwardly as possible--you've got some prep work already and loaded a module of code you want to use. So who's to say I plan to be doing any APPENDing or INSERTing at all? Or perhaps I've defined those words to mean something entirely different from the stock series operations...

Hence though it's certainly necessary for a proper API to be able to reference Rebol values (BLOCK!s, WORD!s, etc.) via some kind of language handle, it crossed my mind that ordinary requests to take action on these items should primarily be made through text strings. Rather than a rebAppend(block, value) and rebAppendPart(block, value, limit)...the entry points would be collapsed down to the likes of rebElide("append/part", block, value, limit).

(Note: Whether that seems like a good idea to you on first reading or not, such a thing wouldn't typically be on the table at all for a pure C-based API. Clever bit-twiddling makes it possible--and assuming your C compiler passes through character literal bytes as-is and you saved your file in UTF8, it even works with unicode strings.)

This possibly-"radical" idea raises questions about semantics and performance. Here's a few talking points.

What if basic operations don't look up to what you meant?

Taking libRed as an example, if you say append: does [print "potato"], then what does redAppend() do after that? In their case, it doesn't heed any redefinitions, because the API entry points are fixed at time of compilation. There's exactly as much stack pushing and popping as necessary for the parameters of the append "action!".

This may seem like a good thing. If you were trying to write some code to do series surgery, having the C API keep its semantics more stable means your code will do-what-you-meant, even in the face of change.

But...if you're programming in just plain Rebol, you don't get this guarantee. If you want that you have to bind directly to lib, or if you've overridden things you have to use lib.append.

What makes coding in C so sacred that it needs special rules or rights? And as mentioned earlier, what if this is part of the whole point of embedding a Rebol...you want to run in the potentially mutated environment.

I'm reminded somewhat of the story of why airplanes are made out of aluminum instead of a stronger metal, when aluminum will crumple in a crash. When you consider all the other factors of how bad a plane crash is, slightly stronger metal won't help relative to the big picture of the benefit of the lighter weight.

My opinion is that it's a good thing--not a bad thing--to be beholden to the same mutable universe by hinging on text. This puts pressure on improving the mechanisms by which Rebol code can be isolated into modules/etc. It just means the API will need to do things like speak about "which module it wants to run in".

What about the overhead of scanning/binding strings?

This is a bit of a nuisance...and doubly so for languages whose string literal representation is not UTF-8. (Which right now is looking like basically every language except C/C++/Rust.)

Shortcuts are certainly possible; it may not be necessary to kick in the whole scanner to know that "{foo}" should be a string. But no matter how many shortcuts like that you throw in, there's no getting around that loading and binding repeatedly will cost more than if you did it just once.

Yet I think one has to remember the context of our times--and the problem space for which Rebol is suited. Compared to a network request to get a task done, how bad is running a few pieces of boilerplate through a scanner in raw C on the local machine?

And following the 80/20 rule, if you really find some hybrid string-and-splicing instruction is taking a lot of time, you can cache that. Beyond manually constructing reusable blocks or functions "the hard way", the API might be able to help with something like a prepared statement in databases.

So there are plans of attack. And if you're trying to extend Rebol with new native behavior, the "internal API" can be used instead when performance is at issue.

If the "official" API scales back the number of entry points, what justifies a new one?

One aspect I've spoken about is that making a new entry point under this scheme should offer an explicit convenience to the language user. So if you're looking at something like:

/* result = */ rebValue("spelling of", value);
/* result = */ rebSpellingOf(value);

There needs to be something more to it--for instance, the return result of the latter should be an ordinary string class in the language, not a Rebol value that needs further processing and lifetime management before a usable string can be extracted from it.

This is a fuzzy point, and it's a bit easier right now to suggest what shouldn't be an entry point (e.g. rebAppendPartOnly()) than what should. So we'll have to see.

IngoHohmann · March 1, 2018, 4:38pm

I am not a C programmer but (or maybe that's why) I am all for a string based api.

From other discussions it seems that a more direct api seems to be seen to be more performant. If performance is important, could there be something like prepared statements?

rebDo( "f: function[a][ a * 2]
handle = rebPrepare( "print [f #1 #2]")
rebCall( handle, 5, "apples", END)

And this prints

10 apples

I have now idea if this is possible, and if it would work, just putting the idea out there.

hostilefork · March 1, 2018, 5:30pm

Yup...I mention prepared statements above as a way to tune the performance, if and when it matters.

I'm not sure the exact notation. But it wouldn't need to use string-based escaping since we can do rebPrepare("print [f", rebSlot(2), rebSlot(1), "]"); or similar. The mechanics would be a little bit weird because it would have to point into a loaded block structure and patch cells virtually into these "meta" slots.

rebPrepare() and rebExecute() would be the usual pairing of terms. But really, how much better would this be than making a function and calling it? Probably not enough to be worth the complexity.

Anyway... so far I really feel the direction with this API is a lot better. And as I'm thinking about who the clients of this kind of API really are, it just makes more sense.