Doubling-Down on TCC Bootstrap: Conference Demo Expanded!

hostilefork · March 6, 2021, 4:34am

cc: interested irregular forum readers, or, at least irregular posters... (e.g. @MarkI, @giluiolunati)

In doing some prioritization, I posted about putting aside FFI, ZeroMQ, and the Serial port...since none of those are particularly mission-critical. They did exercise some code, but they no longer represented any particular exercise that wasn't already covered elsewhere. Limited efforts are better focused on things that will matter to more people (e.g. the web build).

But one effort that is a big deal that I wanted to preserve is the TCC Extension, as well as the concept of being able to build the interpreter itself with TCC. As I demonstrated in July 2019, those two things add up to being able to download the source code and bootstrap a new interpreter with no build tools on your system besides a single "r3-with-tcc" executable¹.

(1) Well, you need the libc headers and include files and libs for your OS. Things we require like <string.h>. If we wanted to be masochistic, we could get into the business of zipping those up and either pulling them off the web via TLS/HTTPS/Unzip.reb or encapping them into the executable. If we were to do this, we should look into doing it with musl and not the likes of GNU's glibc.

So I slogged through to make a turnkey version of the conference demo. It's new and improved, and you can run it right from your Linux desktop, if you feel like it.

Single-EXE Bootstrap...Now Reproduced As A GitHub Action!

It's now more streamlined...automated, reproducible, and broken into sections. You can see these sections in the GitHub continuous integration run:

https://github.com/metaeducation/ren-c/runs/2044725810

The script behind this is really just doing a few basic steps, which I will paraphrase here:

 # Install TCC and the libraries for embedding TCC services into C programs
 sudo apt install tcc libtcc-dev

 # Use TCC (and prebuilt R3 as make tool) to build a TCC-capable interpreter
 $R3_MAKE make.r config=configs/tcc.r extensions="TCC +"

 # Keep just the executable we made (call it r3-with-tcc for clarity)
 cp build/r3 ./r3-with-tcc

 # Now, you can throw away the prebuilt R3_MAKE tool and TCC executable
 rm $R3_MAKE
 sudo apt remove tcc  # keep the `libtcc-dev` for embedding TCC in C programs

 # Delete the source code that was cloned via git
 rm <all the sources>

 # Now use the sole executable to bootstrap itself, pulling the source code
 # down from GitHub over HTTPS as a .ZIP, unzipping it with the embedded
 # unzip.r, and rigging up the r3-with-tcc executable to act as an impromptu
 # interpreter of C99 command lines.
 ./r3-with-tcc "bootstrap"

After doing basically just that, your build/r3 is now a newly minted r3-with-tcc...which could continue doing this bootstrap process indefinitely!

You can read the actual workflow script here: %tcc-build.yml

It does a few more things, including using the "build matrix" to do one run with debug=none and another with debug=normal. It runs a few tests, but the real big test here is just the build in and of itself! That exercises an insane amount of code.

More Rigorous Than The Conference Demo

This took a couple days of pretty intense work. I'd sort of hacked the demo together the day before the conference, so it needed to be made less ad-hoc. But also, there have been some changes which needed to be worked through...mbedTLS wasn't being used in 2019, so it hadn't been run through this. There are little issues that come up with new things.

Plus, I started the presentation with a prebuilt r3-with-tcc that I used...but I had built that using GCC (not TCC). And I didn't build the TCC extension into the bootstrapped executable, so it would not be capable of continuing the process. There wasn't any demo of this being a "sustainable" bootstrap.

But here it is, done legit.

Lots Still To Do...

We're well aware that Rebmake is a beast, and something needs to be done about that. But seeing one make tool written entirely in Ren-C points to the idea that it could be improved. It sets a baseline and we can just keep aiming for it.

Performance is getting pretty bad, due to virtual binding and LET and other things that are still getting hammered out. And that needs to be attacked, but the user experience has to be reasoned through completely first. Once we decide something is simply not the way things should work--like FUNCTION auto-gathering SET-WORD!s deeply through the body--then trying other approaches is the only sane response. (If there's advantages to plunging forward with known broken things just for the sake of being first-to-market, Rebol2 would have found them...or Red will be finding them... #goodluckwiththat). Outside of the performance issues, LET is looking mostly promising.

But the more we can hone this experience, I think the better the system coheres on its message, and so I'm comfortable with pushing this through and taking the time to do it.

hostilefork · September 14, 2022, 10:00pm

I'd allowed the TCC bootstrap to lapse...because holding up progress at intermediate states for a demo like this doesn't always make sense.

But taking too long to get back to it doesn't make sense, either. Because if you wait too long, it becomes a massive effort to resurrect it.

However, it's resurrected now.

Doing so exposed a bug in REMOVE where if the /PART was before the input series position it wasn't acting correctly (R3-Alpha and Rebol2 would just adjust it and flip the input with the /PART, so it didn't matter). So I brought it in line with that behavior.

TCC Doesn't Stand Still (but their VERSION stamp does...)

The most problematic TCC change to absorb is that they added a warning for when it seems like a C function was written without returning a value.

This happens on purpose sometimes--when a function jumps the stack (like a failure interrupting the control flow and using an exception or a longjmp...or if you call exit() to end a process). The function won't return in such cases.

So there are ways of saying "I know this function doesn't return"...although it's been frustratingly nonstandard. Eventually C11 standardized it as a _Noreturn signal you can put on things...and if it doesn't interfere with existing uses of the symbol "noreturn" you can #define noreturn _Noreturn to make it look better.

But... they added both an ornery warning and the _Noreturn feature... without giving code any way to detect these things showing up. In fact, they haven't updated the VERSION stamp you can test while compiling since 2017:

Public Git Hosting - tinycc.git/log - VERSION

I worked around it just by suppressing that particular warning when we embed the compiler.

Did Bringing It Up To Date Raise Any Thoughts?

Well one good thought is that Ren-C still does all its magic with nothing more than C99. This is really what building with TCC is about--a bounded measure of the complexity footprint required, vs. anything particularly important or necessary about a TCC build in and of itself.

Another good thought is that the language is really improving. The parse rules are better...passing integers to SKIP instead of having it being 0-arity...etc. If I have to change something, I don't see many instances where I feel regret.

Things are definitely getting too slow. This just happens naturally over time as you work on a design and don't do any performance work. It's around the right time to start looking at it.

Error reporting is definitely bad. If you're not me, then there are a lot of errors you don't have much of a shot at understanding. And the only reason I can understand it is that I can set a breakpoint in the debugger and look around at the environment where the error is raised...and get some intuition about it. When I use TCC there's more pain on that, because they don't generate usable debug and symbol information. (or at least not historically, I guess I should check).

But I think what I do is probably a good model for error introspection that should be available in usermode. When an error is about to be raised we should plop you in the console right there to look at it--inspect variables--and figure it out yourself.

There's a long way to go here--but--it's good to keep everything working.