Stackless Is Here, Today, Now! 🥞

As proof that it's real, you can try this:

>> counter: func [value] [
      if 0 = modulo value 1000 [print [value]]
      counter value + 1
  ]

How many stack levels you get will depend on the available memory of your platform--not your CPU stack.

I got this on Chromium on Linux:

>> counter 0
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
13000
14000
15000
16000
17000

It just terminates, but the error that shows up in the browser console is:

Uncaught RuntimeError: Aborted(Cannot enlarge memory arrays to size 16867328 bytes (OOM). Either (1) compile with -sINITIAL_MEMORY=X with X higher than the current value 16777216, (2) compile with -sALLOW_MEMORY_GROWTH which allows increasing the size at runtime, or (3) if you want malloc to return NULL (0) instead of this abort, compile with -sABORTING_MALLOC=0)

The same amount was gotten on Firefox, probably because emscripten requests the same amount of Wasm memory space on each.

Point Is, It Ran Out of Memory... not Stack!

We can see there's work to do on responding to out-of-memory errors (we've never really had a great story for that). But at least we can (!) -- I've explained already that stack overflows can't be caught the way a failed memory allocation can. That's a huge motivator for the change.

But still, over 17000 stack levels is kind of neat in and of itself!

So how do desktop executables "stack up", on my 16-core Lenovo ThinkPad i7 with 64GB of memory...?

As a quick comparison: for Red on Windows, it doesn't make it to 2000:

red>> counter 0
0
1000
*** Internal Error: stack overflow
*** Where: %
*** Stack: counter counter counter counter counter counter counter counter coun
ter counter counter counter counter counter counter counter counter counter cou
nter counter counter counter counter counter counter counter counter counter co
unter counter counter coun...

Running an actual count got me to 1943 and then it crashed.

(I take it that encountering a stack overflow -during- a PRINT leads them to a harsher situation than just during a recursion. But this is all the more reason you can't play fast-and-loose with the CPU stack, it cannot be trapped like an out of memory error! Right now we may be crashing too, but it's substantially closer to being managed.)

For Rebol2 on Windows it makes it to 14262, and does not crash. (So Red's an order of magnitude worse, and crashes.)

I hadn't built a 32-bit optimized Windows executable for a while (and actually had to fix a couple of compiler warnings to get one). But I figured I'd do so for an apples-to-apples comparison with the Rebol2 and Red EXEs.

32-bit Ren-C makes it to...well, jeez. Printing it out one by one could go on forever, so I tried only doing a print on modulo 10,000. It starts to slow down a bit around 2 million, but ticks past 3 million pretty easily...and makes it to 4 million. It eventually terminates without a message at 4780000 (it's an optimized build, and our out-of memory handling needs work...this would likely be an assert in a debug build).

So generously rounding Red up to 2,000... Ren-C can handle over 2000x the stack levels, at least. :slight_smile:

Now, To Reap The Benefits...

  • I need to rework the JavaScript extension to use this intrinsic stackless property, instead of "Asyncify".

    • Asyncify gave us the pseudo-stackless ability of suspending arbitrary non-stackless code in order to run browser events. This let us do things like ASK for input while in the middle of a loop. With real stackless, we (shouldn't) need that.

    • Asyncify added bloat to our code generation that I recall making the .wasm about double the size. We'll see how accurate my memory is on that, but hopefully the size of libr3.wasm will go way down.

  • One by one the natives have to be redone to stackless. Only then can we truly ditch Asyncify, because if something like FOR-EACH isn't stackless we wouldn't be able to do something like PRINT or ASK inside of that kind of loop... it would raise an error.

    • We can probably punt on some things and drop asyncify even if not everything is stackless yet.

    • You might not be able to do something like foo.(ask "Field Name?"): 10 until TUPLE! lookup has stackless processing bubble up through its GROUP! evaluations, but I think we can live with that being an error for a while--if it halves the size of the wasm.

  • Generators and Yielders need to be resurrected. They were pretty well designed before, but locking issues created headaches...and it's time to attack those again.

  • Out-of-memory errors have to be reined in--not just so stack overflows can be handled gracefully, but that any kind of out-of-memory is handled gracefully!

    • We also might want users to be able to set a policy (if they wish) to limit the stack before memory runs out, just so that infinite recursions get caught earlier in casual programs that aren't intentionally using big stacks.

In any case, it feels good to get the core of the work done two years ago hammered into shape...and see it running in the browser!

4 Likes

I thought it would be informative to do a quick comparison of code, to see how something stackless is implemented...and how "messy" (or not) it makes things that were "formerly simple".


R3-ALPHA while loop

/***********************************************************************
**
*/	REBNATIVE(while)
/*
***********************************************************************/
{
	REBSER *b1 = VAL_SERIES(D_ARG(1));
	REBCNT i1  = VAL_INDEX(D_ARG(1));
	REBSER *b2 = VAL_SERIES(D_ARG(2));
	REBCNT i2  = VAL_INDEX(D_ARG(2));

	SET_NONE(D_RET);

	do {
		ds = Do_Blk(b1, i1);
		if (IS_UNSET(ds) || IS_ERROR(ds)) {	// Unset, break, throw, error.
			if (Check_Error(ds) >= 0) return R_TOS1;
		}
		if (!IS_TRUE(ds)) return R_RET;
		ds = Do_Blk(b2, i2);
		*DS_RETURN = *ds;	// save here (to avoid GC during error handling)
		if (THROWN(ds)) {	// Break, throw, continue, error.
			if (Check_Error(ds) >= 0) return R_TOS1;
			*DS_RETURN = *ds; // Check_Error modified it
		}
	} while (TRUE);
}

RED while loop

Red is different because it has an interpreted form and a compiled form of WHILE. This is just the interpreted form, which makes more sense to compare.

while*:    func [
    check? [logic!]
    /local
        cond  [red-block!]
        body  [red-block!]
][
    #typecheck while
    cond: as red-block! stack/arguments
    body: as red-block! stack/arguments + 1
    
    stack/mark-loop words/_body
    while [
        assert system/thrown = 0
        catch RED_THROWN_BREAK [interpreter/eval cond yes]
        switch system/thrown [
            RED_THROWN_BREAK
            RED_THROWN_CONTINUE    [
                system/thrown: 0
                fire [TO_ERROR(throw while-cond)]
            ]
            0                     [0]
            default                [re-throw]
        ]
        logic/true?
    ][
        stack/reset
        assert system/thrown = 0
        catch RED_THROWN_BREAK [interpreter/eval body yes]
        switch system/thrown [
            RED_THROWN_BREAK    [system/thrown: 0 break]
            RED_THROWN_CONTINUE    [system/thrown: 0 continue]
            0                     [0]
            default                [re-throw]
        ]
    ]
    stack/unwind
    stack/reset
    unset/push-last
]

while: make native! [[
		"Evaluates body as long as condition block evaluates to truthy value"
		cond [block!]	"Condition block to evaluate on each iteration"
		body [block!]	"Block to evaluate on each iteration"
	]
	#get-definition NAT_WHILE
]

REN-C while loop

Terminology note: every frame has an extra scratch cell which is GC protected. This is called the "spare" cell. It's a good place to put temporary evaluations. For instance, the WHILE evaluates its condition there to avoid overwriting the output cell--since if the body never runs, the function wants to be "invisible" and leave whatever result was in OUT from the previous evaluation.

//
//  while: native [
//
//  {So long as a condition is truthy, evaluate the body}
//
//      return: "Last body result, or null if BREAK"
//          [<opt> <void> any-value!]
//      condition [<const> block!]
//      body [<const> block! action!]
//  ]
//
REBNATIVE(while) {
    INCLUDE_PARAMS_OF_WHILE;

    Value *condition = ARG(condition);  // condition is BLOCK! only, see [1]
    Value *body = ARG(body);

    enum {
        ST_WHILE_INITIAL_ENTRY = 0,
        ST_WHILE_EVALUATING_CONDITION,
        ST_WHILE_EVALUATING_BODY
    };

    switch (STATE) {
      case ST_WHILE_INITIAL_ENTRY : goto evaluate_condition;
      case ST_WHILE_EVALUATING_CONDITION : goto condition_was_evaluated;
      case ST_WHILE_EVALUATING_BODY : goto body_was_evaluated;
      default: assert(false);
    }

  evaluate_condition: {  /////////////////////////////////////////////////////

    STATE = ST_WHILE_EVALUATING_CONDITION;
    continue_uncatchable (SPARE, condition, END);  // ignore BREAKs, see [2]

} condition_was_evaluated: {  ////////////////////////////////////////////////

    if (Is_Void(SPARE))
        goto evaluate_condition;  // skip body, see [3]

    if (Is_Isotope(SPARE))
        fail (Error_Bad_Isotope(SPARE));

    if (Is_Falsey(SPARE)) {  // falsey condition => return last body result
        if (Is_Stale(OUT))
            return VOID;  // body never ran, so no result to return!

        return_branched (OUT);  // see [4]
    }

    STATE = ST_WHILE_EVALUATING_BODY;  // body result => OUT
    continue_catchable_branch (OUT, body, SPARE);  // catch break & continue

} body_was_evaluated: {  /////////////////////////////////////////////////////

    if (THROWING) {
        if (not Try_Catch_Break_Or_Continue(OUT, FRAME))
            return THROWN;

        if (Is_Breaking_Null(OUT))
            return nullptr;

        if (Is_Void(OUT))  // CONTINUE with no argument
            Init_None(OUT);
    }

    goto evaluate_condition;
}}

Here are the footnotes:

  1. It was considered if while true [...] should infinite loop, and then while false [...] never ran. However, that could lead to accidents of like while x > 10 [...] instead of while [x > 10] [...]. It is probably safer to require a BLOCK! vs. falling back on such behaviors. (It's now easy for people to make their own weird polymorphic loops.)

  2. We have to pick a meaning when someone writes:

    while [print "condition" continue] [print "body"]
    

    R3-Alpha would alternate printing out "condition"/"body", which isn't useful. Here the decision is to assume a BREAK or CONTINUE in the condition targets an enclosing loop--not this WHILE loop.

  3. A weird idea being tried here is that if your condition vanishes, it doesn't run the body, and it doesn't end the loop. It acts as a continue and re-runs the condition again. This offers a feature that resembles what someone might want for the semantics of [2], so it's being tried.

  4. If someone writes:

    flag: true, while [flag] [flag: false, continue null]
    

    We don't want that to evaluate to NULL--because NULL is reserved for signaling BREAK. So the result of a body with a CONTINUE in it will be turned from null to a ~null~ isotope. Similarly, void results are reserved for when the body never runs--so they're turned into none (~)

I Think The Work Speaks for Itself :speak_no_evil:

...but still, I'll say that not only is the Ren-C version expressed more clearly and documented more clearly, it is much more powerful.

  • Stacklessness is at work here.
  • Being able to vanish if the body never runs is at work here.
  • The loop composition properties that enable things like FOR-BOTH are at work here.
  • Catching when you write while [...] [data: [] ... append data ...] and incorrectly think that resets data each time through the loop is at work here.

And it all still compiles with TinyC. :star2:

3 Likes

Really impressive work you have done there.

1 Like

Bravo! What an absolute legend you are, Brian. :clap: :clap: :clap:

:hear_no_evil::see_no_evil::speak_no_evil:
Really amazing. See if I can build it again soon.