USCII Seen With New Eyes

hostilefork · September 12, 2021, 1:15pm

In 2008 I used R3-Alpha for USCII, to make some 5x7 bitmaps out of a stock font I found, plus some data to make symbols for the missing control characters. The way I defined the missing characters was with blocks of data, like:

[
    code: 10
    name: "Line Feed"
    abbr: "LF"
    description: {
        On typewriters, printers, and some terminal emulators, moves the
        cursor down one row without affecting its column position. On Unix,
        used to mark end-of-line.
    }
    image: [
        "XXX  "
        "  X  "
        "  X  "
        "  X  "
        "XXXXX"
        " XXX "
        "  X  "
    ]
    notes: {
        A spin on the carraige return which emphasizes the "downness" of
        a feed, but also with a horizontal suggestion of the current line
    }
    rating: 'good
]

Besides not needing commas on the strings, this doesn't really buy a whole lot over JSON. The script had a couple of talking points--that PNG encoding and IMAGE! was built in, and that you could put BINARY! data directly in your script as hex. But largely unremarkable.

Not that the point was to demo "great Rebol practices", really. I was just using it as a tool. (And it was 2008, so I had just found the language...one shouldn't expect much regardless.)

I've gotten it working under Redbol emulation--which is kind of a cool trick in and of itself. This involved patching up the bad IMAGE! code well enough to work with the script, but it felt worth it to run the rest of it.

But the real fun begins moving to modern ideas. So I put aside a copy as a historical version to keep as a Redbol test, but then started updating it to the new world...

Rethought With UTF-8 and a Dialect

When you take away the BLOCK! and start using the parts in the box, what you get is a lot more remarkable.

=== LF: Line Feed (10) ===

◼◼◼▢▢
▢▢◼▢▢
▢▢◼▢▢
▢▢◼▢▢
◼◼◼◼◼
▢◼◼◼▢
▢▢◼▢▢

description: {
    On typewriters, printers, and some terminal emulators, moves the
    cursor down one row without affecting its column position. On Unix,
    used to mark end-of-line.
}
notes: {
    A spin on the carraige return which emphasizes the "downness" of
    a feed, but also with a horizontal suggestion of the current line
}
rating: good

It's much more interesting:

UTF-8 means that we can use solid and hollow boxes to represent the bitmap. They're legal characters in WORD!s, so instead of 7 TEXT!s the bitmap can be represented as 7 WORD!s and still be LOAD-able.
A section-divider can do double duty for the attributes. A SET-WORD! becomes the shorthand, the name can be turned into regular WORD!s spanning up to a GROUP! that contains the codepoint.
There were multiple shorthands in some cases (e.g. 17, Device Control 1, can be known as DC1 or XON). instead of a SET-WORD! like LF: that can be a SET-PATH!, like DC1/XON:.
Since it's a dialect, there's no need to put a tick mark on the rating of "good/fair/poor".

A Fluid Format that You Transform With UPARSE

The first thing I did was to transform the new representation back to the old representation with UPARSE.

So it's not optimal, and we wouldn't need such a transformation step anyway if processing directly.

But I'm pasting it here just to make the point that in just a couple of minutes I was able to have the old code up and running on the new format:

override-list: parse load %uscii-5x7-english-c0.reb [
    collect some keep gather [
        '===
        emit abbreviation: [
            temp: set-word! (as text! temp)  ; e.g. `LF`
            |
            temp: set-path! (as block! temp)  ; e.g. `DC1/XON`
        ]
        emit name: form/ between <here> [emit code: subparse group! integer!]
        '===

        emit image: collect repeat 7 [w: word!, keep (as text! w)]

        try ['description:, emit description: text!]
        try ['notes:, emit notes: text!]
        try ['rating:, emit rating: word!]
    ]
] except [
    fail ["Could not parse %uscii-5x7-english-c0.reb"]
]

IMO, This is Rebol's "Deep Lake"

Breaking free of the JSON mindset and using the parts really makes this work:

https://github.com/hostilefork/uscii/blob/master/uscii-5x7-english-c0.reb

Revisiting this and the Whitespace interpreter are just really good examples of the form.