Stackless Is Here, Today, Now! 🥞

I've explained why we need stacklessness:

Switching to Stackless: Why this, why now?

It's taken a lot of thinking. But as proof that it's real, you can try this:

>> counter: func [value] [
      if 0 = modulo value 1000 [print [value]]
      counter value + 1
  ]

How many stack levels you get will depend on the available memory of your platform--not your CPU stack.

In the WebAssembly build that runs in the browser, I got this on Chromium on Linux:

>> counter 0
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
13000
14000
15000
16000
17000

It just terminates, but the error that shows up in the browser console is:

Uncaught RuntimeError: Aborted(Cannot enlarge memory arrays to size 16867328 bytes (OOM). Either (1) compile with -sINITIAL_MEMORY=X with X higher than the current value 16777216, (2) compile with -sALLOW_MEMORY_GROWTH which allows increasing the size at runtime, or (3) if you want malloc to return NULL (0) instead of this abort, compile with -sABORTING_MALLOC=0)

The same amount was gotten on Firefox, probably because emscripten requests the same amount of Wasm memory space on each.

Point Is, It Ran Out of Memory... not Stack!

We can see there's work to do on responding to out-of-memory errors (we've never really had a great story for that). But at least we can (!) -- I've explained already that stack overflows can't be caught the way a failed memory allocation can. That's a huge motivator for the change.

But still, over 17000 stack levels is kind of neat in and of itself!

So how do desktop executables "stack up", on my 16-core Lenovo ThinkPad i7 with 64GB of memory...?

As a quick comparison: for Red on Windows, it doesn't make it to 2000:

red>> counter 0
0
1000
*** Internal Error: stack overflow
*** Where: %
*** Stack: counter counter counter counter counter counter counter counter coun
ter counter counter counter counter counter counter counter counter counter cou
nter counter counter counter counter counter counter counter counter counter co
unter counter counter coun...

Running an actual count got me to 1943 and then it crashed.

(I take it that encountering a stack overflow -during- a PRINT leads them to a harsher situation than just during a recursion. But this is all the more reason you can't play fast-and-loose with the CPU stack, it cannot be trapped like an out of memory error! Right now we may be crashing too, but it's substantially closer to being managed.)

For Rebol2 on Windows it makes it to 14262, and does not crash. (So Red's an order of magnitude worse, and crashes.)

I hadn't built a 32-bit optimized Windows executable for a while (and actually had to fix a couple of compiler warnings to get one). But I figured I'd do so for an apples-to-apples comparison with the Rebol2 and Red EXEs.

32-bit Ren-C makes it to...well, jeez. Printing it out one by one could go on forever, so I tried only doing a print on modulo 10,000. It starts to slow down a bit around 2 million, but ticks past 3 million pretty easily...and makes it to 4 million. It eventually terminates without a message at 4780000 (it's an optimized build, and our out-of memory handling needs work...this would likely be an assert in a debug build).

So generously rounding Red up to 2,000... Ren-C can handle over 2000x the stack levels, at least. :slight_smile:

Now, To Reap The Benefits...

  • I need to rework the JavaScript extension to use this intrinsic stackless property, instead of "Asyncify". DONE!

    • Asyncify gave us the pseudo-stackless ability of suspending arbitrary non-stackless code in order to run browser events. This let us do things like ASK for input while in the middle of a loop. With real stackless, we (shouldn't) need that.

    • Asyncify added bloat to our code generation that I recall making the .wasm about double the size. We'll see how accurate my memory is on that, but hopefully the size of libr3.wasm will go way down.

  • One by one the natives have to be redone to stackless. Only then can we truly ditch Asyncify, because if something like FOR-EACH isn't stackless we wouldn't be able to do something like PRINT or ASK inside of that kind of loop... it would raise an error. MOSTLY DONE!

    • We can probably punt on some things and drop asyncify even if not everything is stackless yet.

    • You might not be able to do something like foo.(ask "Field Name?"): 10 until TUPLE! lookup has stackless processing bubble up through its GROUP! evaluations, but I think we can live with that being an error for a while--if it halves the size of the wasm.

  • Generators and Yielders need to be resurrected. They were pretty well designed before, but locking issues created headaches...and it's time to attack those again.

  • Out-of-memory errors have to be reined in--not just so stack overflows can be handled gracefully, but that any kind of out-of-memory is handled gracefully!

    • We also might want users to be able to set a policy (if they wish) to limit the stack before memory runs out, just so that infinite recursions get caught earlier in casual programs that aren't intentionally using big stacks.

In any case, it feels good to get the core of the work done two years ago hammered into shape...and see it running in the browser!

4 Likes

I thought it would be informative to do a quick comparison of code, to see how something stackless is implemented...and how "messy" (or not) it makes things that were "formerly simple".


R3-ALPHA while loop

/***********************************************************************
**
*/	REBNATIVE(while)
/*
***********************************************************************/
{
	REBSER *b1 = VAL_SERIES(D_ARG(1));
	REBCNT i1  = VAL_INDEX(D_ARG(1));
	REBSER *b2 = VAL_SERIES(D_ARG(2));
	REBCNT i2  = VAL_INDEX(D_ARG(2));

	SET_NONE(D_RET);

	do {
		ds = Do_Blk(b1, i1);
		if (IS_UNSET(ds) || IS_ERROR(ds)) {	// Unset, break, throw, error.
			if (Check_Error(ds) >= 0) return R_TOS1;
		}
		if (!IS_TRUE(ds)) return R_RET;
		ds = Do_Blk(b2, i2);
		*DS_RETURN = *ds;	// save here (to avoid GC during error handling)
		if (THROWN(ds)) {	// Break, throw, continue, error.
			if (Check_Error(ds) >= 0) return R_TOS1;
			*DS_RETURN = *ds; // Check_Error modified it
		}
	} while (TRUE);
}

RED while loop

Red is different because it has an interpreted form and a compiled form of WHILE. This is just the interpreted form, which makes more sense to compare.

UPDATE July 2022: Red is dropping the compiled form due to the predictable difficulty of maintaining semantic parity of two different implementations. So this is the only WHILE to consider now.

while*:    func [
    check? [logic!]
    /local
        cond  [red-block!]
        body  [red-block!]
][
    #typecheck while
    cond: as red-block! stack/arguments
    body: as red-block! stack/arguments + 1
    
    stack/mark-loop words/_body
    while [
        assert system/thrown = 0
        catch RED_THROWN_BREAK [interpreter/eval cond yes]
        switch system/thrown [
            RED_THROWN_BREAK
            RED_THROWN_CONTINUE    [
                system/thrown: 0
                fire [TO_ERROR(throw while-cond)]
            ]
            0                     [0]
            default                [re-throw]
        ]
        logic/true?
    ][
        stack/reset
        assert system/thrown = 0
        catch RED_THROWN_BREAK [interpreter/eval body yes]
        switch system/thrown [
            RED_THROWN_BREAK    [system/thrown: 0 break]
            RED_THROWN_CONTINUE    [system/thrown: 0 continue]
            0                     [0]
            default                [re-throw]
        ]
    ]
    stack/unwind
    stack/reset
    unset/push-last
]

while: make native! [[
		"Evaluates body as long as condition block evaluates to truthy value"
		cond [block!]	"Condition block to evaluate on each iteration"
		body [block!]	"Block to evaluate on each iteration"
	]
	#get-definition NAT_WHILE
]

REN-C while loop

Terminology note: every frame has an extra scratch cell which is GC protected. This is called the "spare" cell. It's a good place to put temporary evaluations. For instance, the WHILE evaluates its condition there to avoid overwriting the output cell--since the result in OUT is supposed to be the previous body evaluation (or VOID if never written).

Implementation note: the comments are scraped by the build process and are the actual spec used for the function, so it can be maintained next to the source that uses it.

//
//  while: native [
//
//  "So long as a condition is truthy, evaluate the body"
//
//      return: "VOID if body never run, NULL if BREAK, else last body result"
//          [any-value?]
//      condition [<const> block!]
//      body [<unrun> <const> block! frame!]
//  ]
//
DECLARE_NATIVE(while) {
{
    INCLUDE_PARAMS_OF_WHILE;

    Value* condition = ARG(condition);
    Value* body = ARG(body);

    enum {
        ST_WHILE_INITIAL_ENTRY = STATE_0,
        ST_WHILE_EVALUATING_CONDITION,
        ST_WHILE_EVALUATING_BODY
    };

    switch (STATE) {
      case ST_WHILE_INITIAL_ENTRY : goto initial_entry;
      case ST_WHILE_EVALUATING_CONDITION : goto condition_was_evaluated;
      case ST_WHILE_EVALUATING_BODY : goto body_was_evaluated;
      default: assert(false);
    }

  initial_entry: {  //////////////////////////////////////////////////////////

    Add_Definitional_Break_Continue(body, LEVEL);  // don't bind condition [2]

} evaluate_condition: {  /////////////////////////////////////////////////////

    STATE = ST_WHILE_EVALUATING_CONDITION;
    return CONTINUE(SPARE, condition);

} condition_was_evaluated: {  ////////////////////////////////////////////////

    if (Is_Inhibitor(SPARE)) {  // null condition => return last body result
        if (Is_Fresh(OUT))
            return VOID;  // body never ran, so no result to return!

        return BRANCHED(OUT);  // put void and null in packs [3]
    }

    STATE = ST_WHILE_EVALUATING_BODY;  // body result => OUT
    return CATCH_CONTINUE_BRANCH(OUT, body, SPARE);  // catch break/continue

} body_was_evaluated: {  /////////////////////////////////////////////////////

    if (THROWING) {
        bool breaking;
        if (not Try_Catch_Break_Or_Continue(OUT, LEVEL, &breaking))
            return THROWN;

        if (breaking)
            return nullptr;
    }

    goto evaluate_condition;
}}

Here are the footnotes:

  1. It was considered if while true [...] should infinite loop, and then while false [...] never ran. However, that could lead to accidents of like while x > 10 [...] instead of while [x > 10] [...]. It is probably safer to require a BLOCK! vs. falling back on such behaviors. (It's now easy for people to make their own weird polymorphic loops.)

  2. We could make it so CONTINUE in the condition of a WHILE meant you skip the execution of the body of that loop, and run the condition again. That might be interesting for some strange stylized usage that puts complex branching code in a condition. But it adds some cost, and would override the default meaning of CONTINUE (of continuing some enclosing loop)...which is free, and enables other strange stylized usages.

  3. If someone writes:

    flag: okay
    while [flag] [flag: null, null]
    

    We don't want the overall WHILE result to evaluate to NULL--because NULL is reserved for signaling BREAK. Similarly, VOID results are reserved for when the body never runs. BRANCHED() encloses these states in single-element packs.

I Think The Work Speaks for Itself :speak_no_evil:

...but still, I'll say that not only is the Ren-C version expressed more clearly and documented more clearly, it is much more powerful.

And it all still compiles with TinyC. :star2:

4 Likes

Really impressive work you have done there.

1 Like

Bravo! What an absolute legend you are, Brian. :clap: :clap: :clap:

:hear_no_evil::see_no_evil::speak_no_evil:
Really amazing. See if I can build it again soon.