Web Build Performance Stats

hostilefork · December 8, 2023, 7:04pm

Circa January 2021, I did a capture of some statistics about the web build. It looked like this right after booting the console:

hostilefork:

ren-c/web>> stats/profile
== make object! [
    evals: 65422
    series-made: 28569
    series-freed: 11160
    series-expanded: 419
    series-bytes: 1731611
    series-recycled: 8669
    made-blocks: 16447
    made-objects: 109
    recycles: 1
]

I thought I might run it again here 3 years later. And I was a bit shocked initially, because at first glance it seems like things have gotten completely out of hand:

>> stats/profile
== make object! [
    evals: 3277865
    series-made: 928871
    series-freed: 646233
    series-expanded: 6312
    series-bytes: 6159217
    series-recycled: 196735
    blocks-made: 86893
    objects-made: 220
    recycles: 13
]

That's a factor of 50 more "evals"... with a factor of 32 more series... just to boot the web console, what the heck happened?

Series Count Is Misleading

The first thing to notice is that the console doesn't take 50x as long to load as it used to. That should be a hint that something about the statistics may have gotten thrown off.

One thing that's grossly overinflating the "series" count is that Sea of Words invented a new mechanism for binding in modules. This method makes tiny series "stubs" that get linked onto canon symbols to hold indices during a binding operation. They're very cheap to make and to destroy and aren't involved in GC. They account for at least 1/3 of those "series" being made.

These statistics need to be adjusted somehow to break those out differently.

Eval Count Is Misleading

Due to stackless processing, what counts as an "eval tick" has multiplied by a lot. It used to be that something like a REDUCE operation would count as one tick, and then each eval step it did would be a tick. Now each time the REDUCE yields to the evaluator to ask for another evaluation, that's a tick...then the evaluator ticks.

(While these added steps may sound like a burden, it actually accelerates the web build. Because it means we're building an unwindable stack that can yield to the browser's event loop, without relying on a crutch like binaryen that would bloat up the entire runtime with "stackless emulation"...that would fail anyway on deep stacks. For a reminder of why this is necessary: "Switching to Stackless: Why This, Why Now?")

Also, parsing got subsumed into the tick count. Even PARSE3 breaks its operations down into ticks now. So if you say parse3 "aaaabbbb" [some ["a" | b"]] that's not one tick, it's at least 8 * 3 or 24.

The Real "Villain": UPARSE

Once you've cut out some basic accounting anomalies, there's still certainly a lot of work to do. But reading between the lines, there's a clear central source of resource usage going through the roof...which is quite simply, UPARSE.

Not that the web console uses it all that terribly much. But using it pretty much at all will explode the amount of interpreter code that runs, quickly eclipsing everything else.

I don't think the best way to frame it is negative. The way I see this is as a big, real, challenge for the system. UPARSE is the most real, most powerful, and most tested dialect implementation that the Redbol world has ever seen.

So... What Can Be Done?

Counting up all the ways in which UPARSE pays the price for its current all-usermode implementation is to big for this post. But it's important to realize that the costly mechanisms it uses all have motivations. Meeting those needs with more efficient tools means other dialects can use those tools too.

To name an example off the top of my head, would be how every combinator function is pre-hooked with code enclosing each call with a test for "are we hooked, if so call the hook with the frame, otherwise just call the frame." That's screaming for some kind of generalized answer where you don't have to hook in advance, but be able to inject the hooks later on some category of functions. But using the massively inefficient approach is how we can test the viability of something like the Stepwise PARSE Debugger.

But at the end of the day, it does come down to the fact that parsing is just too general-purpose and useful to be done with usermode code. It would be nice if we had a language like Red/System to write the combinators in...which could be compiled to WebAssembly in the browser and x86 or ARM on the desktop. Yet the option on the table right now is hand-coding C for the combinators and the combinating processes.

Prioritization Is Difficult

While UPARSE clearly needs to be reviewed and nativized, I'm still not sure if this is the next item of business in the priority queue. Binding casts a dark shadow over the entire system--and has its own performance quandaries. Not to mention that it hasn't been figured out how variables can be instantiated in mid-parse with something like a "LET combinator".

Any investments in making UPARSE faster that also make it harder to modify and test under new designs has to be considered carefully. Also, being written in usermode exposes pitfalls that you wouldn't see otherwise--like the need for definitional CONTINUE that was exposed.

Anyway... data is data, and wanted to look at it. This is where things are at the moment.