Updated Parse Machine

rgchris · January 25, 2021, 5:23pm

I've an updated version of my Parse Machine (currently for R3C) that binds the whole operation to the SYSTEM/CODECS object.

Could use more thought about how it works, but I thought I'd share in its current iteration as I'm finding it increasingly useful.

Features

It somewhat standardizes a location for grammar rules
It somewhat standardizes the FSM parsing method for any codecs written therein
The above points together conceptually would permit mixing and matching grammars (e.g. CSS within Markup) as well as using fragments, though I'd say there's likely more to figure out before that'd all be practical.
It somewhat separates grammar rules from the application logic that processes them, thus the same rules can be used for consuming whole files or on a token-by-token streaming basis
Leans also to codecs rooted in the spirit of be liberal in what you accept.

I've used somewhat above to underscore that while this idea is opinionated about the conventions used, it does not dictate anything. There's still undiscovered patterns in there

Limitations

For efficiency, the machine only exists as a single module and can only institute one state value at a time, thus decoders built atop it either can't be called recursively or have to manage recursion explicitly
The guts of the machine is a Parse WHILE loop which has to look up the current rule each iteration
The state value is presumed to be a MAP! chosen for its less ornery handling of keys, not for its suitability/performance
These two points would indicate that this approach could be optimized if it proves as useful as I believe it might
The codec installer (PARSER/NEW) only has params for name, file extensions, options and grammar rules. It doesn't include encode/decode/identify functions. This could change to something like:
```
parser/new 'name [
    suffixes: [...]
    identify?: func [...] [...]
    rules: [rule-1 [...] [...]]
    encode: func [...] [...]
    decode: func [...] [...]
]
```
Although it's not certain that a codec may have it's own set of rules (e.g. 'markup can reuse 'html rules)

Example

This example is a little contrived, but demonstrates how to install and deploy a grammar, and how flow can both be embedded in the grammar or in functions built around the Parse Machine API:

Rebol [
    Title: "Test Parser Module"
    Needs: [
        %parser.reb
    ]
]

make object! [
    parser/new 'abba [%.abba] [] [  ; the empty block is 'options
        is-a [
            #"a" (emit 'a-in-a)
            |
            #"b" (emit 'b-in-a-to-b use is-b)
            |
            #"c" (emit 'c?! stop)
            |
            skip (report ["Misplaced " to char! mark/1])
            |
            end (emit 'end-in-a use done)
        ]

        is-b [
            mark:
            #"a" (emit 'a-in-b-to-a use is-a) :mark
            |
            #"b" (emit 'b-in-b)
            |
            #"c" (emit 'c-in-b)
            |
            skip (report ["Misplaced " to char! mark/1])
            |
            end (emit 'end-in-b-to-a use is-a)
        ]

        done [
            end (emit 'done stop)
        ]
    ]

    system/codecs/abba/decode: func [
        blob [binary!]
        <local> state
    ][
        state: parser/init 'abba blob

        state/out: make block! 10
        state/errors: make block! 0

        state/emit: func [token] [
            append state/out token
        ]

        state/report: func [token] [
            append state/errors token
        ]

        parser/start

        case [
            not state/is-done [
                fail "Blob did not meet strict ABBA requirements"
            ]

            not empty? state/errors [
                probe state/errors
                fail "Multiple ABBA errors"
            ]

            <else> [
                state/out
            ]
        ]
    ]
]

probe equal? decode 'abba to binary! "aba"
[a-in-a b-in-a-to-b a-in-b-to-a a-in-a end-in-a done]

probe equal? decode 'abba to binary! "bcab"
[b-in-a-to-b c-in-b a-in-b-to-a a-in-a b-in-a-to-b end-in-b-to-a end-in-a done]

probe error? trap [decode 'abba to binary! "abcd"]

hostilefork · August 21, 2022, 11:55am

5 posts were split to a new topic: Binding Issues Raised by Chris's PARSE-MACHINE

hostilefork · January 29, 2021, 5:39pm

I've an updated version of my Parse Machine (currently for R3C)

If you'd be willing to go ahead and jump this particular codebase to mainline Ren-C, and work up some tests for it, then it could be a lens for how we look at evolving these features. I'd be able to help keep it running...so changes wouldn't be as much a hassle.

It's definitely not throwing softballs--which I like. This is supposed to be a language you can extend, and that shouldn't entail "write an entirely new evaluator and PARSE command if you want something that deviates a bit".

hostilefork · August 21, 2022, 11:02am

2 posts were split to a new topic: Errors on "Locked" Binding?