libuv Integration Has Started...an I/O Renaissance

So I put together a build with libuv, and...

...It's Gone Extremely Well!

:flight_departure:

As a first step, I decided I'd try using libuv for all of our filesystem calls. They're based on the POSIX filesystem APIs, so with a not-impossible amount of work I was able to change our POSIX code over to libuv calls.

Once I did so, I could throw out the Windows-specific code...so all of %posix-windows.c could be deleted!

Of course, now we have to link in libuv. But there's a lot of good news on that front:

  • I was able to integrate libuv into our build without using any special build system, it was easy enough.

  • libuv has care and concern for weird platforms...more than we do! So all the FreeBSD/OpenBSD/NetBSD/BSDi are covered, Haiku, etc. are covered.

  • libuv is pure C and it builds with TCC...so the Rebol-built-with-TCC-and-libuv can still build itself and bootstrap!

  • I haven't done a formal study of the exact size impact but it's in the 100s of ks not the 10s of megabytes range; for what it offers it's light and it seems they care.

While I Was Unifying The Code, I Made... TESTS!

There were basically no tests of the filesystem. Features like /SEEK came late in the game, and there were lots of bugs and design holes. No one knew how buggy it was because I think no one really used anything besides READ and WRITE of entire files!

Getting the semantics for ports hammered out is a tall order, and beyond the scope of these first steps with libuv integration. But to get things on the right foot, I've started with some tests!

https://github.com/metaeducation/ren-c/blob/master/tests/file/file-port.test.reb

I'll put in my usual tearful plea for others to try kicking the tires here... but, well. People are busy I guess. :cry:

Technique-wise, there are just some really cool things even right there in those tests. There's a fuzz tester which creates an adaptation FUZZWRITE that will mimic what the write is being asked to do to the file to a buffer.

So One File Was Deleted...What Else Do We Get?

Deleting %posix.windows.c was actually a bigger deal than it sounds. Even though I'd whittled it way down from its Device-Model hairiness, it was still a small amount of bad code. And I'm sort of glossing over the kind-of-Herculean level of pain tolerance it takes to dot the i's, cross the t's, and write the tests for the new code to replace the old buggy stuff.

But it's just the beginning. We're about to get a lot.

Many libuv functions are able to take a pointer to a function to call when an operation is done. If you pass in nullptr for that callback, then the operation runs synchronously.

For my first task with the filesystem, I just passed in NULL to all the file-reading and file-writing routines for the callback. So we are doing the same blocking I/O as always.

Being able to do asynchronous file I/O isn't a priority right now. But if we need it, it's there.

The real benefit will be having vetted asynchronous network I/O. It will take the place of buggy garbage we had, that was nigh undebuggable. We should be able to write working timers and other interesting things.

Modify With Confidence: I/O Edition

I had said that a goal of Ren-C was that if there was something we thought up, the limit to doing it would be the limit of being able to articulate the design... not having murky code.

I'd claimed that point had been reached, and the new goal was just to "elevate the art" of the language.

But that conveniently overlooked the fact that the device layer and I/O were all still horrible murk. I'd kind of blocked it out of my mind, since I'd thrown it over the wall...and been able to make a WebAssembly build without it. It "didn't count".

Now the Windows/Mac/Linux/Haiku/etc. builds are back in the game, and The Design Handcuffs Are Off

I'd actually said at one point that if we wanted Network I/O to be any good we should just take Node.js's code for interfacing with V8. I didn't realize they'd actually made it a goal for that to be reusable and factored it out as a C library. I thought we'd have to rip out some C++ code and do the work ourselves. But libuv is that work already done, and we seem in good company as far as the Amish-oriented goals go!

Anyway, change is coming. So for starters...if you have ideas or complaints about files...now would be the time to speak up! The real turning point will be on improved networking, so stay tuned.

7 Likes

Sounds fantastic. Well done!

1 Like

Bravo! @hostilefork is a legend!
Is Ren-C still a rebol if virtually every part of it has been replaced?

2 Likes

I've now taken the next big step in terms of implementation, in changing over the networking TLS code to use libuv. That covers listening sockets for servers, as well as streaming data for reading and writing.

There's no new asynchronousness, but the custom/buggy/bad ways based on "devices" are gone...as well as my custom/buggy/bad temporary replacement to help migrate to libuv.

So it's one of those things where if this step goes well, you shouldn't be able to tell a difference. (It's fortunate that I'd written some stress tests for the httpd server, because that catches a lot of issues.)

We're Still Using Usermode HTTP/TLS Code

An interesting thing about libuv is that it doesn't have any built in TLS integration:

https://github.com/libuv/libuv/issues/1128

This means our usermode TLS code--built on the cryptographic primitives of mbedTLS--is actually a kind of interesting option.

The TLS code itself is sort of coherent in terms of being an "executable spec". If the dialects used continue to converge with the RFC, it might have the bones to become one of the most understandable codebases for TLS there is. Maybe. :man_shrugging:

But I say "interesting" option (as opposed to "practical") because we should remember that the usermode TLS is slow, incomplete, on the fringe of being maintained at all, and known to be insecure. As I've mentioned previously, it extracts certificates and decrypts the data without actually verifying against a certificate authority. A so-called man-in-the-middle attack could be done by someone who impersonated the other side of the connection. It's not exactly zero security...your average person watching wireshark packets go by wouldn't be able to read plaintext passwords go by.

Still...it's a bit of a novelty people might find thought-provoking, that could grow into something better in the future. And my point is that there's no free lunch with libuv and TLS. If we weren't using the usermode Rebol TLS, then we'd have to use the higher-level facilities in mbedTLS that take care of that (right now we only build in the lower-level cryptography that the usermode TLS uses).

Big Questions About The Event Loop Loom

The networking code is the first case of using the libuv callbacks and delegating to the libuv event loop. So it's a much bigger deal than the synchronous filesystem API substitutions.

Right now the only time we call the event loop is in WAIT, and it's still using some ad hoc timer logic instead of libuv timers. As I say--it's a first step. It's a long way to the point where you could have an interactive chat program feeding lines at the unix prompt coming from the internet while you're idle at the keyboard.

Regardless of the specifics...I'm proposing a grand vision of scrapping the existing attempts at asynchronousness to replace with green threads and "channels", as in Go. Event dispatch would be very different in that world...and libuv might not even be the best choice. Maybe everything should just be rewritten in Go. :man_facepalming:

Ignoring all that for now... at least we're building on a platform abstraction layer that other people develop, maintain, and document. That fights much more than half the battle in the event that you decide you want to pick another platform abstraction layer.

Anyway I just wanted to give an update that libuv networking is in. Report anything broken, at least that wasn't already broken before...

2 Likes

I've gotten rid of the "EVENT Extension"...instead moving WAIT into the libuv-based Network extension. This is because WAIT on PORT! is now only used for waiting for incoming network connections on a server (and waiting for a time, but you can use SLEEP from the TIME extension if you don't want your build to include libuv networking.)

While doing so, I converted the event loop inside that WAIT to use libuv timers...instead of the decades-old tangled multi-platform timing logic.

This means it's finally the case that the language core itself doesn't need the Reb_Device datatype, or functions like Register_Device() or OS_Poll_Devices().

And So, The Last Bit Of "Host-Kit" is Thus Gone!

...yet Ren-C is demonstrating itself doing more, on more platforms. What's the difference in approach?

R3-Alpha aimed to be closed-source, hardcoding the implementations of things like WAIT and READ and WRITE, as a fixed body of natives. These attempted to be extensible via the means of hookpoints that would be supplied as C code, with a grab-bag of structures and parameters to each function. This was supposed to avoid use of Rebol datatypes, with the concept that the functionality could be used as its own independent OS.

Ren-C basically throws that out the window. When you are packaging up a distribution of the language, you write your own natives...and any "extensibility" architectures are done through Rebol calls to those natives.

As an example: a fixed implementation of NOW is not based on a Rebol-defined "GetTime()" C API. There are various implementations of NOW... which actually is the Ren-C "get-time-API". If another extension wants the current time, it's supposed to use NOW to get it.

The implementations of READ of a URL vary so drastically between the WebAssembly build and the Desktop builds that URL reads are intercepted much earlier, and done via JavaScript fetch(). You really wouldn't want to convolute it so that network reads had to contort themselves in some way to fulfill an abstract C byte-level API. It calls "rebol functions" that are actually the JavaScript equivalent of a native.

Anyway, the desktop builds are still a lower priority than Wasm, but it's nice to see the last bit of hostkit cruft finally gone from the core.

1 Like

Bravo !! Great to see sawdust piling up on the shop floor again !

1 Like