Switching to Stackless: Why this, Why now?

hostilefork · July 7, 2020, 8:51pm

So we have witnessed the first successfully trapped stack overflow error in the Wasm build (both Asyncify and Threaded). This required switching it to the ABORTING_MALLOC=0 build setting...because by default, Emscripten assumes a failing malloc() should kill your program instead of return NULL.

As this shows, the "stack overflow" error manifested as running out of memory:

>> recursive: func [] [array: copy [a b c] recursive]

>> recursive
** Internal Error: not enough memory

I included a COPY to make the point that when you overflow in a "stackless" system, you won't necessarily run out of memory on the attempt to allocate a stack frame. When stack frames are consuming heap/pooled memory just like everything else, it's just as likely that the COPY to make a new array will be the trigger as trying to allocate a new frame.

You may notice that there's no stack frame list to hint you at the deep recursion being the cause of the problem. That's because when you run out of memory and are trying to present an error, you can't be guaranteed to have enough memory to allocate an array for this purpose. Similarly, you can't expect to be able to allocate the object to represent the error itself. These must be preallocated at some size. That's all additional work that needs to be done, that had never been done (as Rebol never really handled out of memory errors rigorously).

I have removed the C_STACK_OVERFLOWING() check which happened when you pushed function calls. To revisit a writeup on why it was bad, consult the comments. The tests have been adapted so that all the "infinite" tests currently just throw after some large number of recursions, and they complete.

BUT... the evaluator and function calls are not the only place that C_STACK_OVERFLOWING() was used to detect deep recursions. Other operations like scanning, molding, binding, etc. used C stack recursions, and for deeply nested blocks this could overflow. Also, not all such operations were protected. Here's a list of the other clients of C_STACK_OVERFLOWING:

Binding - Binding is currently implemented as a native that calls a C function, which is "C-recursive"... that C function calls itself for as deep as the block structure it is binding is. With a sufficiently deep block, this could cause an "untrappable" stack overflow. One way to attack this is to make each bind level create a frame...as is done with stackless COMPOSE, that uses a "Binding Executor". While this may sound like it would be slow, running through common code paths would incentivize optimizations that would--in turn--speed up the whole evaluator, PARSE, etc.
Copying - There are many different forms of copying and options while doing so. Things like "relativizing" blocks are weaved in with the copying process...and one can imagine tying it in with binding as well. In any case, the core of copying is currently something called "clonification" which uses C_STACK_OVERFLOWING. It's not as obvious to me just by looking at it how this code could use the trampoline to power it... but it's likely possible.
Comparing - The general topic of comparison semantics in Rebol is rather neglected, but it's yet another routine that is done through plain recursions. It seems to be something amenable to frame-based recursion.
Scanning - If you imagine writing something like load "[[[[[[[[[[[[[..." there is going to be a limit somewhere. Red seems to use a specialized stack for this vs. doing recursions (the error you get for deep scans is PARSE - stack limit reached, and not a typical stack overflow). But Rebol's scanner was based on plain C recursion. I again wonder just how generic and fast the "trampoline" could be engineered to be, to where it would be close enough to as fast to be worth it vs. a specialized stack.
Molding - Another C recursive routine, which might be reframable to use the trampoline. As with most of these routines that use C recursions, there has to be some concern over dealing with cycles. Another wrench thrown in is that within the system, molding is designed to work on so-called "Relative" values, which users never see but are part of the implementation. This means that if all the C parts in the system tried to call a native MOLD, they would not have the proper bit patterns to give as a fulfilled value. :-/

So that's a little map of some of the parts of the code that could trip up with stack overflows when given a sufficiently pathological data structure. This isn't about operating on big data structures, just deep ones.

I'm not really as concerned about these deep-datastructure recursions as I am about having a story for the debugger in this stackless world, and a bunch of bigger questions.

Also, while stackless is coming along, it's likely the biggest single change ever done (previous competitors: specific binding, UTF-8 Everywhere). So it's probably healthier to take breaks and mix in other things.