Executable Size circa 2023...and tweaking INLINE

hostilefork · November 21, 2023, 5:02am

A modern Ren-C non-debug executable on Linux--with https and the libuv filesystem and networking code (which supports asynchronous file I/O etc.) is about 1.7 MB when it is built at an O2 level of optimization (optimize for speed).

When built at Os optimization it's about 1.2 MB, sacrificing 40% of the speed to get the compression. (In the modern era, most people would say that the extra size isn't a big deal to get that much of a speed improvement.)

By comparison, an R3-Alpha Linux executable is about 0.56 MB at O2. And a Red CLI-only binary on Linux is about 1.0 MB.

Why Has Size Gone Up?

I've looked under the hood at the differences with R3-Alpha to see what accounts for the disparity with modern Ren-C. libuv accounts for a couple 100k, and is worth it--it would be especially so if taking advantage of things like the async file I/O.

But the rest just generally comes down to the fact that it's about twice as much code. If you enjoy using ADAPT or ENCLOSE or SPECIALIZE, well, there's code that implements it. And it's a deeper, safer, far more advanced codebase that just does more.

I Actually Pared Out About 600K By Tweaking Inlining

When I started looking at size, the O2 binary was like 2.4 MB. That was more than I expected, so I decided to look under the hood into why.

I used Google's tool Bloaty McBloatface to get some insight, and to my surprise...some rather small functions had a disproportionate amount of code attributed to them.

It turned out that this was due to putting functions in header files and inlining them with static inline. When I moved 5 of these functions into the .c files instead of the .h files, that saved 400k in one blow... and the executable only got 0.4% slower (four tenths of a percent) as a result.

Then I managed to make it so the C++ build was about 140K lighter by changing the static inline on the remaining functions to a macro of INLINE that's either inline in the C++ build, or static inline in the C build.

I guess the takeaway here is that even if you notice that something is getting bigger due to good reasons of having more code, it always pays to look under the hood a bit when you can. A few hours of work can get some low-hanging fruit.

(Another takeaway is that being able to build a C codebase as C++--if you want to--continuously pays dividends...)

Here's some notes on the INLINE macro:


//=//// INLINE MACRO FOR LEVERAGING C++ OPTIMIZATIONS /////////////////////=//
//
// "inline" has a long history in C/C++ of being different on different
// compilers, and took a long time to get into the standard.  Once it was in
// the standard it essentially didn't mean anything in particular about
// inlining--just "this function is legal to appear in a header file and be
// included in multiple source files without generating conflicts."  The
// compiler makes no particular promises about actually inlining the code.
//
// R3-Alpha had few inline functions, but mostly used macros--in unsafe ways
// (repeating arguments, risking double evaluations, lacking typechecking.)
// Ren-C reworked the code to use inline functions fairly liberally, even
// putting fairly large functions in header files to give the compiler the
// opportunity to not need to push or pop registers to make a call.
//
// However, GCC in C99 mode requires you to say `static inline` or else you'll
// get errors at link time.  This means that every translation unit has its
// own copy of the code.  A study of the pathology of putting larger functions
// in headers as inline with `static inline` on them found that about five
// functions were getting inlined often enough to add 400K to the executable.
// Moving them out of .h files and into .c files dropped that size, and was
// only about *0.4%* slower (!) making it an obvious win to un-inline them.
//
// This led to experimentation with C++ builds just using `inline`, which
// saved a not-insignificant 8% of space in an -O2 build, as well as being ever
// so slightly faster.  Even if link-time-optimization was used, it still
// saved 3% on space.
//
// The long story short here is that plain `inline` is better if you can use
// it, but you can't use it in gcc in C99 mode (and probably not other places
// like TinyC compiler or variants). So this clunky INLINE macro actually
// isn't some pre-standards anachronism...it has concrete benefits.
//
#if CPLUSPLUS_11
    #define INLINE inline
#else
    #define INLINE static inline
#endif