ENHEX and DEHEX testing, another "Micro-Dialect"

Here's a little excerpt of testing Percent Encoding (which Rebol gives the poor names ENHEX and DEHEX to, and should probably be changed).

It started out just as a table of encoded and decoded forms, e.g.

for-each [encoded decoded] [
    "a%20b" "a b"
    "a%25b" "a%b"
    "a%ce%b2c" "aβc"
    ...
][
   ; test that it decodes
]

But the encoding produces uppercase hex digits (per RFC 3896), while the decoding tolerates lowercase ones. So you don't get the same thing back.

That led me to the "whimsical" choice to denote reversible <-> and non-reversible -> transforms:

"a%25b" <-> "a%b"
"a%ce%b2c" -> "aβc" -> "a%CE%B2c"

This is strange in that -> is a WORD!, while <-> is a TAG! (and I believe this is the correct design choice when all is said and done).

But in a dialect, having something look like what you want can be enough, as it's just being looked for literally.

I threw in an additional wrinkle by letting BLOCK! with an INTEGER! in it serve as a comment. It looks visually better than having to throw in a semicolon.

It's satisfying when such things can be done in a matter of a few minutes:

; 1. Accept lowercase, but canonize to uppercase, per RFC 3896 2.1
;
; 2. A case can be made for considering the encoding of characters that
;    don't need it to be an error by default.
;
parse compose [
    "a%20b" <-> "a b"
    "a%25b" <-> "a%b"
    "a%ce%b2c" -> "aβc" -> "a%CE%B2c"  [1]
    "%2b%2b" -> "++" -> "++"  [2]
    "a%2Bb" -> "a+b" -> "a+b"  [2]
    "a%62c" -> "abc" -> "abc"  [2]
    "a%CE%B2c" <-> "aβc"
    (as text! #{2F666F726D3F763D254335253939}) -> "/form?v=ř"
][ some [
    let encoded: text!
    let arrow: ['<-> | '->]
    let decoded: text!
    let re-encoded: [when (arrow = '->) ['-> text!] | (encoded)]
    optional block!  ; headnote comment
    (
        let de: dehex encoded
        if de != decoded [
            fail ["Decode of" @encoded "gave" @de "expected" @decoded]
        ]
        let en: enhex decoded
        if en != re-encoded [
            fail ["Encode of" @decoded "gave" @en "expected" @re-encoded]
        ]
    )
]]
2 Likes

Sometimes it's the small examples that I think drive home what the project is about most clearly. And this is pretty darn close to the essential complexity of the problem being addressed.

So I thought maybe it would be good to look at the impact of a decision like arity-2 COMPOSE here, just to see it "in context".

parse compose $() [
    "a%20b" <-> "a b"
    "a%25b" <-> "a%b"
    "a%ce%b2c" -> "aβc" -> "a%CE%B2c"  [1]
    "%2b%2b" -> "++" -> "++"  [2]
    "a%2Bb" -> "a+b" -> "a+b"  [2]
    "a%62c" -> "abc" -> "abc"  [2]
    "a%CE%B2c" <-> "aβc"
    (as text! #{2F666F726D3F763D254335253939}) -> "/form?v=ř"
][ some [
    let encoded: text!
    let arrow: ['<-> | '->]
    let decoded: text!
    let re-encoded: [when (arrow = '->) ['-> text!] | (encoded)]
    optional block!  ; headnote comment
    (
        let de: dehex encoded
        if de != decoded [
            fail ["Decode of" @encoded "gave" @de "expected" @decoded]
        ]
        let en: enhex decoded
        if en != re-encoded [
            fail ["Encode of" @decoded "gave" @en "expected" @re-encoded]
        ]
    )
]]

So the question I might ask is: try and compare it with fresh eyes...what does that adjustment feel like?

Does it feel like a "wart" appeared? Does it feel like a "missing parameter" showed up?

I'll stress again that COMPOSE on a string can't work unless it either takes this parameter to draw a binding from, or unless COMPOSE "sneakily" captures the binding environment in which it is executing.

I've suggested that the sneaky form use compose*, so let's look at that as well:

parse compose* [
    "a%20b" <-> "a b"
    "a%25b" <-> "a%b"
    "a%ce%b2c" -> "aβc" -> "a%CE%B2c"  [1]
    "%2b%2b" -> "++" -> "++"  [2]
    "a%2Bb" -> "a+b" -> "a+b"  [2]
    "a%62c" -> "abc" -> "abc"  [2]
    "a%CE%B2c" <-> "aβc"
    (as text! #{2F666F726D3F763D254335253939}) -> "/form?v=ř"
][ some [
    let encoded: text!
    let arrow: ['<-> | '->]
    let decoded: text!
    let re-encoded: [when (arrow = '->) ['-> text!] | (encoded)]
    optional block!  ; headnote comment
    (
        let de: dehex encoded
        if de != decoded [
            fail ["Decode of" @encoded "gave" @de "expected" @decoded]
        ]
        let en: enhex decoded
        if en != re-encoded [
            fail ["Encode of" @decoded "gave" @en "expected" @re-encoded]
        ]
    )
]]

To me, this is more unsettling than the $(). It seems so much easier to explain that if you'd said ${<?>} instead then it would have looked for FENCE!s beginning with <?>

(I'll intentionally paste it all out again here, because I'm trying to make a point.)

parse compose ${<?>} [
    "a%20b" <-> "a b"
    "a%25b" <-> "a%b"
    "a%ce%b2c" -> "aβc" -> "a%CE%B2c"  [1]
    "%2b%2b" -> "++" -> "++"  [2]
    "a%2Bb" -> "a+b" -> "a+b"  [2]
    "a%62c" -> "abc" -> "abc"  [2]
    "a%CE%B2c" <-> "aβc"
    {<?> as text! #{2F666F726D3F763D254335253939}} -> "/form?v=ř"
][ some [
    let encoded: text!
    let arrow: ['<-> | '->]
    let decoded: text!
    let re-encoded: [when (arrow = '->) ['-> text!] | (encoded)]
    optional block!  ; headnote comment
    (
        let de: dehex encoded
        if de != decoded [
            fail ["Decode of" @encoded "gave" @de "expected" @decoded]
        ]
        let en: enhex decoded
        if en != re-encoded [
            fail ["Encode of" @decoded "gave" @en "expected" @re-encoded]
        ]
    )
]]

Anyway, I'm very reticent to make plain compose capture its calling environment. And if I'm not willing to do that, the options are that it take the binding from its template argument (won't work for strings, and a bit of a weird behavior to use as a default for lists), or it be arity-2.

I feel like arity-2 is winning, for me, so far.