Transitioning from mbedTLS 2 to mbedTLS 3: Promise and Peril

hostilefork · May 10, 2022, 2:47pm

Historically R3-Alpha had its few pieces of crypto math cobbled together from generally unknown sources on the Internet...

My understanding is that Rebol2's support for talking to a limited set of HTTPS sites was written entirely in C. It's never been open-sourced, so we don't know much about it--such as whether it was original code or done with some early TLS library of the time.
Saphirion chose to split out the code for the Transport-Layer-Security protocol and make it usermode Rebol. Only the foundational cryptographic primitives like Secure Hashes or Key Exchange were written in C.
- Many of R3-Alpha's C cryptography bits seemed to come from the (one-man?) effort known as Axolotl TLS (AxTLS)
- The Rebol parts were written by Cyphre (Richard Smolak)

I Was Initially Very Skeptical of Continuing Saphirion's Strategy...

If a change ever rippled into affecting the TLS file, it was a voodoo nightmare to figure out how to fix it. I didn't understand why limited efforts should be stretched into involvement with "something the language wasn't really good for".

My impression was also that the %prot-tls.r implementation was bad. But when I got to looking at the details, the most insidious problems weren't so much the fault of the protocol code. It more-or-less followed the spec, in a pretty literate way (that I improved with some dialecting).

The main frustrations regarding prot-tls came from the fact that R3-Alpha's asynchronous port model made no sense. When I rewrote it to use the "seemingly-synchronous" model (which aims to parallel the Go language), it became more clear.

Through the process of implementing TLS 1.2, I began to get the impression that such protocols may actually be a fitting domain for a language like Rebol.

Today's %prot-tls.r is an aggressive and practical test of dialecting. If it continues to be enhanced may be a case of exposing the workings of an important protocol to the layperson.

We Needed More Cryptography, and mbedTLS Fit the Bill

Adding TLS 1.2 wasn't going to do any good without also providing some of the newer exotic cipher suites that are demanded these days. That meant getting things like elliptic curve key exchange, or SHA512, or anything else the future may demand.

When I found mbedTLS it was much "cleaner" than OpenSSL, and seemed perfect:

It was targeting embedded processors, with incredibly granular controls for doing things like using smaller/slower algorithms vs. bigger/faster ones.
- Pure C code, that could be compiled even by TCC.
- This meant the conference demo of bootstrap could still work, with a TCC-built R3-WITH-TCC having enough cryptography in it to download its own source from an HTTPS GitHub link.
The cryptography primitives could be lifted out "a la carte" from the C-based TLS protocol code; the files seemed completely separate:
- If we wanted to, we could have a C-based mbedTLS extension option instead of using %prot-tls.r, and it could reuse the same cryptography.
- (We may at some point have to resort to this, if keeping %prot-tls.r up to date with the times proves impractical.)
The interfaces for every cipher and hash supported streaming, so we'd have the ability to incrementally do cryptography on large files or network connections (assuming we figured out how to expose that).

And critically, all of it was under the umbrella of a working group at ARM which would hopefully ensure that it was kept up to date, and being vetted for problems.

All of it made this seem like a no-brainer to build on, which I did in April of 2020:

The mbedTLS library is an embedded-focused set of cryptography hashes,
key exchanges, block ciphers, and other tools. Its components range
from lower-level facilities like BigNum arithmetic, to higher-level
services like TLS negotiation and certificate validation. Its
facilities are well-factored such that each piece can be used with only
its dependencies:

https://tls.mbed.org/

Because of its fine-grained control, it's possible to use its basic
tools while still keeping higher-level negotiations as spec-driven
usermode Rebol (e.g. the TLS protocol itself) to facilitate more
hooking and understanding. And because it offers a consistent set of
vetted and active code, it can replace the "hodgepodge" of cut-and-paste
snippets for cryptography (originating from axTLS, internet sources,
custom code, edited OpenSSL, etc.) where there are problems like not
being written to a common BigNum implementation.

Additionally--due to the factoring, it is hoped that this code could be
used as the basis for implementing BigNum arithmetic in the interpreter
core itself...which would be naturally reused in the implementation
of these C algorithms when cryptographic extensions are loaded.

Possibly inspired by this--or just his own coming to the same conclusions--Oldes changed his hashes to use mbedTLS in January 2021

...and Then, Version 3.0 Came...

I probably should have been paying more attention to what mbedTLS was planning in their future branches.

What I've gathered is that ARM (or someone) was pointing out that mbedTLS not only needed to implement TLS 1.3, but that it wasn't sufficiently fast vs. the competition.

A somewhat-sensible approach to optimization is to first tighten the control over your data structures, making them more opaque to clients. By limiting the APIs you can use to access those structures, you can know more about the states they are in...and take more for granted. Your functions can then make optimizations which leverage these rules--adding or rearranging fields in more clever ways.

But I didn't want cleverness, I just wanted the math. I liked that our objects for things like Diffie-Hellman showed you the true cryptographic parameters, and wasn't some kind of "black box". If we were closed off from that, everything would be a HANDLE! and you would have limited ways of extracting parameters from it.

Not only did they close off access to the structure members, many APIs they offered were TLS-specific!

If a cryptographic primitive depended on parameters X and Y and produced Z, they'd offer a function that takes in a blob of data representing X and Y in the specific format that TLS messages encode them.
If you were building some protocol that wasn't TLS using that basic crypto primitive, the only way offered to load the parameters was to make a TLS-format message buffer and pass it.
The TLS-specific functions were creeping into what were supposed to be the "a la carte" cryptography files, adding bloat at compile time (if not also runtime) if you weren't using them.

One can imagine that this seems good from the point of view of speeding up their C TLS protocol, but bad for anyone trying to use the underlying cryptography.

There was an announcement I missed that it would be split into two libraries: "mbed crypto" and "mbedTLS", to serve the two different audiences for the code. But that seemed to be short-lived, and "mbed crypto" was reabsorbed into the mbedTLS codebase. Not before the damage had been done to the layering and generality.

So now mbedTLS seems to be playing catch-up on serving the audience that wanted the crypto primitives. But functionality that had an endorsed method to do in mbedTLS 2 now requires hacking beneath the approved API to accomplish.

On the bright side...

Since the re-absorption of "mbed crypto", mbedTLS seems amenable to having answers for the a-la-carte crypto crowd. A request I made is at least marked medium importance.

Also, the API becoming more formalized is pointing out some weird mistakes that were made before in the code...filling structure parameters that were unused, for instance. Having to call each of these things into question is a good vetting of the code.

And although I had to use hacks to do it, we now can run https on top of mbedTLS 3. I've been a little on the fence of whether to stick with mbedTLS 2 (support ending in 2024) or find some other library. But writing about cryptography has made me realize it's a bit of a red herring on the importance scale, and I think we're better off rolling with the punches of mbedTLS 3 than going it some other route.