Web Build Performance Stats

I resurrected the "stats" function to get some metrics. It's actually a good example of how nicely Ren-C can improve things:

In any case, running the statistics between R3-Alpha and Ren-C are going to show a lot more series and memory use in Ren-C. The main reasons are:

  • There's a Windows encapping issue that it reads the whole executable into memory to probe it for resource sections. This is especially crazy for debug builds. I'd raised this as an issue for Shixin to look at but forgot about it.

  • Function frames do not use the data stack, and instead the arguments of functions are stored in individual arrays. While there are some optimizations to mean this doesn't require an allocation on quite every function call, it means a good portion of function calls do allocate series. This stresses the GC, but, I've mentioned how it was important for many reasons (including that the data stack memory isn't stable, and that meant the previous approach had bugs passing pointers to arguments around. It's a given that this is how things are done now--especially with stackless--so it just needs to be designed around and tuned.

  • WORD!s are special cases of string series. Things like the word table and binding didn't count in series memory before, and wasn't tabulated in R3-Alpha in the series count. There are some other examples of this.

  • ACTION!s create more series and contexts. The HELP information for most actions that have help information has two objects linked to it...one mapping parameter names to datatypes, and one mapping parameter names to descriptions. I'm hoping that the one mapping parameter names to datatypes can be covered by the parameter information that the interpreter also sees...but for today, there's a difference because one contains TYPESET!s and the other contains human-readable BLOCK!s.

  • So Much More Is Done In Usermode. Ranging from console code to command-line argument processing, there's more source code (which counts as series itself) and more code running.

I see it as good--not bad--that a ton of things run in the boot process. Although I think you should be able to build an run a minimal system...even one that doesn't waste memory on HELP strings (it's now easier to make such things, since the spec isn't preserved).

But for today, the closest we have to a "minimal build" is the web build. It's a bit more comparable to R3-Alpha in terms of how much startup code it runs.

The Current State

Starting up R3-Alpha on Linux, I get the following for stats/profile:

r3-alpha>> stats/profile
== make object! [
    timer: 0:00:02.639939
    evals: 20375
    eval-natives: 3340
    eval-functions: 369
    series-made: 8393
    series-freed: 2597
    series-expanded: 70
    series-bytes: 2211900
    series-recycled: 2526
    made-blocks: 5761
    made-objects: 64
    recycles: 1
]

Ren-C on the web is considerably heavier, at least when it comes to evals + series made + GC churn (a little less overall series bytes...probably mostly owed to optimizations that fit small series into the place where tracking information would be stored if it were a larger one):

ren-c/web>> stats/profile
== make object! [
    evals: 65422
    series-made: 28569
    series-freed: 11160
    series-expanded: 419
    series-bytes: 1731611
    series-recycled: 8669
    made-blocks: 16447
    made-objects: 109
    recycles: 229  ; !!! see update, this is now 1
]

The increased number of evals just goes with the "a lot more is done in usermode" bit. There's lots of ways to attack that if it's bothersome.

The series-made number is much bigger. 8393 v. 28569. I mentioned how a lot of this is going to come from the fact that many evals need to make series, but we don't really have a breakdown of that number here to be sure that's accounting for them. Anyway, this number isn't all that bothersome to me given that knowledge...but it should be sanity-checked.

What does bother me is the 229 recycles. That's a lot. Despite making 3-4x as many series, I don't see how exactly that's translating into 200x the recycling.

UPDATE: This was the result of accidentally committed debug code. It's back to 1.

Writing Down The Current State is Better Than Nothing

Ideally we'd have some kind of performance regression chart that plotted some of these numbers after each build. Though really it's not too worth doing that unless the numbers carried more information that was more actionable.

But...lacking an automated method, writing it down now and having a forum thread to keep track of findings and improvements is better than nothing.

There's likely a lot that could be done to help the desktop build (such as obviously tending to that encap-reading issue). But I'd like to focus principally on improvements to the internals that offer benefit to the web build, where I think the main relevance is. And:

  • Having a system built from rigorously understood invariants is the best plan for optimization over the long-term. If you don't have a lot of assertions and confidence about what is and isn't true around your codebase, you can't know if a rearrangement will break it or not. So I spend a lot of time focusing on defining these invariants and making sure they are true.

  • Avoid optimizing things before you're sure if they're right. I'm guilty as anyone of fiddling with things for optimization reasons just because it's cool or I get curious of whether something can work or not. Programmers are tinkerers and that's just how it is. But it's definitely not time to go over things with a fine-toothed comb when so many design issues are not worked out.

2 Likes

Because I thought "oh this might be complex" I didn't immediately look at it. But I should have just set a breakpoint, because this was the result of some debugging Recycle() calls accidentally getting committed. It was recycling on every native creation!

Removing it gets us to the expected recycle of 1.

Yay.

2 Likes