Switching to Stackless: Why this, Why now?

I wanted to write a short explanation about why switching Rebol to a stackless implementation--similar to Stackless Python--has jumped itself up in the priority queue, to something I'm working on in the near term.

It's not a frivolous feature. It's because:

  • the Wasm build broke. The patch I made to "fix" it means that if you say foo: func [] [foo] and run it, you will irrevocably crash the web REPL.

  • Wasm was allowed to break it. There is nothing in the C standard for being able to predict a stack overflow, and the method R3-Alpha had been using was knowingly non-standard. It was a bit of a thorn in my side--given how much care was taken in general to be on the right side of the standard...so I was always hoping for a better answer to come along.

  • size optimization was eating stack. We like to use emscripten's best options for making a small library. But when we do, clang and emscripten avoid inlining...and instead choose to leave the functions as-is and make a lot of function calls. With inlining disabled, Ren-C uses a fair number of function calls per evaluation step, and that competes with the JavaScript call stack...we were getting stack overflows not just on infinite recursions, but also seemingly not-too-deep routines (that didn't even have to be recursive).

So this has been a bit of a setback. It took down the REPL site for a while and has involved a fair amount of mulling about what to do.

Can We Just Crash On StackOverflows for Now and Press On?

We could put in a recursion counter and call that good enough...like old CPython does.

But...the problem with pressing on when you know something is wrong, is that there's some fairly deep impact on how the system and APIs work. If enough groundwork isn't laid to know where the stackless influence is, we might write bad extensions or interoperability that fundamentally won't work later.

Not only that, we don't want to look bad against the competition. As remarked in the motivations section for Stackless Python's "Why the C Stack Should Vanish", they give a number of points, one of which is:

"PERL has already no longer a C stack. They will have some good reasons for this. Python has no reason to be limited and stay behind."

We can take some shortcuts. Full stackless operation won't ever be possible if you call out of Rebol. Each transition from Rebol into code that calls back into Rebol without deferring its stack back will not be stackless. So if some Rebol natives fall into this category as well for a while, that's okay. I will likely find a balance point that is "good enough" and leave a lot of natives to be converted on an as-needed basis.

Why Wasn't Stackless Ren-C on the Agenda Before?

I was always contemplating this direction. So while it will be by no means trivial to switch, the design of frames and basic fabric of the interpreter should afford bending that way more than if I hadn't. Preliminary tests are promising that conversion should progress at a decent pace.

One reason I'd avoided making a commitment to a stackless approach was because I felt it could be obfuscating when debugging the interpreter--and that such obfuscation would run counter to the "keep it simple" Amish principles. It's nice when you can fire into the C debugger and scroll through the C stack frames and see things like code for a native "CASE" statement right there...with the execution point in the middle on the branch it's running right now. A stackless interpreter has its state in data structures that aren't the C stack, and inspecting those data structures is a bit less intuitive.

As an alternative to stacklessness: I imagined that the platforms C programs were built on would be able to build a costing model to where a recursion could reliably predict if it would overflow (so long as it wasn't calling 3rd party code). Then your program wouldn't be stuck in the situation of simulating a stack to protect itself from the C one it should be able to take for granted. It would ask the C runtime if there was space for a recursion, and it would have a measurement for the worst-case cost of that function and tell you yes or no--as it might tell you that it couldn't malloc() a certain amount of memory from the heap, so you can raise an error.

But it seems indeed this was just in my imagination. Guarantees about the stack aren't on anyone's agenda in the C world; so it's the job of the higher levels like Rebol or Python or whoever to worry about it. So long as Ren-C is written in... C (or anything like it in the systems world), stack overflows are unpredictable and you must just find a way to bound your stack use if it's important.

What Happened To Make Wasm Stack Detection Fail?

While I haven't dug into all the details, the basic understanding is Emscripten jumped to a new version of Clang, which had a new extension model. Emscripten could then integrate more deeply in the compilation process. So instead of just getting a blob of output from the compiler and transforming it after-the-fact, they could add more intelligence to the code generation process.

The biggest obvious feature we got is that if your browser doesn't have shared memory or threads, the workaround build got a lot better and faster. The new method was "Bsyncify" (an update of an older deprecated technique called "Asyncify") that leapfrogged the Emterpreter to the point they deprecated it entirely. It actually was not hard to get this workaround build tweaked and going, but missing features in the threaded build kept us using the old emscripten SDK for threading until those were sorted.

Fortunately: the threading features were added. Unfortunately: after those feature kicked in, the resulting build didn't then run at all. (And also unfortunately, debugging this stuff isn't easy.)

That's when I found out our problem turned out to be related to how Rebol detected stack overflow conditions. And stack overflow errors turn out to be a tricky topic that is misunderstood by many programmers.

Once A Stack Overflow Happens, You Are Hosed

C programmers know that a stack overflow error is not recoverable. (Or at least, they should know.) But people using other languages are under the impression that they can catch them and do something meaningful, when they cannot do so in any general sense:

Sensibly, Rebol didn't try to install strange hooks to let C code badly "recover" from an actual stack overflow error. It tried to conservatively guess about how close you probably were to the stack's limit...and pre-emptively generated an error. On most people's machines, this was good enough to not crash the interpreter, but give you an error:

>> foo: func [] [foo]
>> foo
** Internal Error: Stack overflow

Though "recovery" has all the same problems you can see in the JavaScript question I ask. A lot of what people experience as "Rebol the language" is written in Rebol. You don't know what mezzanine might have crashed, and if crashing that has left the system in an unstable or unusable state. Bulletproofing all of the system against this...is hard.

Well, There's Some of the Motivation

Early tinkering has been promising...though this really does muck with the primordial debugger. But it's good to have that debugger in there so that it doesn't get lost in the shuffle while considering this.

One thing about Stackless Python is seeing how they use pluggable evaluator functions which can become part of the stackless approach. That's good, because this ties into how we can have a stackless parse. Previously it might have been seen as bad that Ren-C was roping in the frame system used by the evaluator and debugger for each PARSE recursion due to inefficiency and eating more stack--but this should address any concerns from that.

1 Like