Dissecting the TLS EMIT Dialect

hostilefork · February 10, 2021, 10:56am

As we question what the rules are for evaluator state between steps, it's instructive to look at dialects that make use of stepwise evaluation.

One such dialect that comes to mind is the EMIT dialect I designed to try and make the TLS protocol code leverage the language better. It's used for constructing BINARY! packets by appending material together, where SET-WORD!s are made into variables holding positions in that binary packet based on where they occur:

emit ctx [
  TLSPlaintext: ; https://tools.ietf.org/html/rfc5246#section-6.2.1
    #{16}                       ; protocol type (22=Handshake)
    min-ver-bytes               ; TLS version for ClientHello (min? max?)
  fragment-length:
    #{00 00}                    ; length of handshake data (updated after)
]

One thing the labels provide is documentation of what's at each position. So even if they're not used, they are helpful.

But the positions also can be used for reading and writing the data. Here we see some information being assembled at the start of the packet, which has a two-byte "fragment length". We don't yet know what this length is, so it's filled in with zeros and patched later:

change fragment-length enbin [be + 2] (length of Handshake)

Note That This Is A Poster Child For Making EMIT "LET-like"

The position variables are not aggregated in some kind of context or a map. If they were, you would have to write it as:

labels: emit ctx [
  TLSPlaintext: ; https://tools.ietf.org/html/rfc5246#section-6.2.1
    #{16}                       ; protocol type (22=Handshake)
    min-ver-bytes               ; TLS version for ClientHello (min? max?)
  fragment-length:
    #{00 00}                    ; length of handshake data (updated after)
]
...
change labels.fragment-length enbin [be + 2] (length of Handshake)

I think that not having to be explicit is somewhat critical to the point. So the EMIT abstraction benefits from being "like LET" in that it generates variable bindings you can use in the ensuing code.

Dissecting the Implementation

The current EMIT is very comprehensible. It does stepwise evaluation of the block:

At the beginning of each step it looks to see if the value right in front of it is a SET-WORD!. If it is, then it sets the value of that word to the marked position in the binary so far...and skips the SET-WORD! so it's not part of the evaluation.
For non-SET-WORD!s a single evaluation step is done, and the BINARY! result of it is added to the buffer.

This is rigged in such a way that even supports comments and invisibles, such as ELIDE, ASSERT, and debug dump routines.

/emit: func [
    "Emits binary data, optionally marking positions with SET-WORD!"

    return: [~]
    ctx [object!]
    code [block! binary!]
][
    if binary? code [
        append ctx.msg code
        return ~
    ]

    while [code] [
        if set-word? code.1 [  ; set the word to the binary at current position
            add-let-binding (binding of $return) to word! code.1 (tail ctx.msg)
            code: my next
        ]
        else [
            let result
            if [code :result]: evaluate:step code [
                if void? :result [continue]  ; invisible
                append ctx.msg ensure binary! :result
            ]
        ]
    ]
]

Observation: Only Sees SET-WORD!s At Start Of Step

It only recognizes SET-WORD!s at the start of evaluation steps. Any SET-WORD!s that are part of a function parameter won't count.

So imagine if min-ver-bytes took a parameter, e.g. min-ver-bytes: func [version]

emit ctx [
  TLSPlaintext:
    #{16}
    min-ver-bytes version: 1.2
  fragment-length:
    #{00 00}
]

So you wouldn't wind up with VERSION being set to a BINARY! position. It would just be a normal assignment in the evaluator, and get 1.2

Curiously, LET wouldn't have this problem...because LET is invisible when followed by a SET-WORD!. The set-word just gets a new binding. But using LET for this wouldn't give you a definition outside of the block, only inside it:

emit ctx [
  let TLSPlaintext:  ; only defined to end of block
    #{16}
    min-ver-bytes
  let fragment-length:  ; only defined to end of block
    #{00 00}
]
...
change fragment-length enbin [be + 2] (length of Handshake)  ; error

This is why EMIT has to take over the duty of LET, adding the new bindings to the frame itself. (It could choose to do so only if you say LET...thus allowing plain SET-WORD! to be just a comment...but it would not be running LET, just recognizing the word to help document that a new declaration was being created.)

Every Step Must Produce A BINARY!...or be invisible

The concept behind this is that all the steps either produce a BINARY!. But you have the nice exception of invisible evaluations.

emit ctx [
  let TLSPlaintext:
    #{16}
    elide prep-for-min-ver-bytes arg
    min-ver-bytes
  let fragment-length:
    #{00 00}
]

Historical Rebol could have let you do what you would do with invisibles with a GROUP!, you'd just have to put the things you wanted to throw away at the beginning of it:

emit ctx [
  TLSPlaintext:
    #{16}
    (prep-for-min-ver-bytes arg, min-ver-bytes)  ; pretend they had comma
  fragment-length:
    #{00 00}
]

But this forces your dialect to sacrifice potential other uses for GROUP!s.

I think the general case of stepwise evaluation that tries to mix in its own behavior should want to gracefully handle invisibles.

I Think TLS EMIT is a Very Good Example

This is a useful thing. While it would be ridiculous to bet your internet security on this particular TLS codebase, the methodology of this dialect shows promise for how to deal with other real problems.

We do see that this case would work under the feed-based interface protocol of a variadic. That is to say, that if what EVALUATE did was spin up a FRAME!, and restrict you from going back in time...forcing you to look ahead one unit at a time and either consume or evaluate it, then it would still be fine.

I think that will be a pattern you will notice to be true in most "evaluator-compatible" dialects, that do not isolate their evaluation portions into GROUP!s or BLOCK!s. If this pattern holds up, it suggests that allowing the evaluator to accrue state that is not purely representable in terms of a block position is reasonable.

hostilefork · February 10, 2021, 1:47pm

Oldes looked at the TLS 1.2 code I wrote and so he saw the EMIT dialect, and he followed some of the idea. He seemed to get a pretty compact implementation in his rewrite, so I meant to look into it, but I hadn't.

So since I'm writing a post on this dialect concept, here is his spin... "bincode".

It's a native function in C, named BINARY.

The dialect alternates size and signedness indicators with values. He doesn't use any labels unless he's actually going to read or write them. And he does a lot of the math with "magic numbers" vs. explaining the source of those numbers. So most examples don't have labels, but here's one that does:

binary/write out [
    UI8  22          ; protocol type (22=Handshake)
    UI16 :version    ; protocol version
  pos-record-len:
    UI16 0           ; length of the (following) record data
  pos-record:
    UI8  16          ; protocol message type (16=ClientKeyExchange)
  pos-message:
    UI24 0           ; protocol message length
  pos-key: 
]

The dialect doesn't use the evaluator unless you put in a GROUP!. So you couldn't say UI8 20 + 2 above in place of UI8 22. You would have to say UI8 (20 + 2)

While there's benefits and drawbacks to both general approaches, his approach hinges on a lot of native code...see %u-bincode.c - that's just not going to be the solution for the average DSL author. (Note on the C code: I think I truly can say that he is carrying the torch for the style in which it was originally written. He has my blessing as being the rightful maintainer of R3-Alpha.)

But the key point to the discussion is that he didn't try the same thing of mixing evaluations...the code is quarantined in GROUP!s.