I've an updated version of my Parse Machine (currently for R3C) that binds the whole operation to the SYSTEM/CODECS object.
Could use more thought about how it works, but I thought I'd share in its current iteration as I'm finding it increasingly useful.
Features
- It somewhat standardizes a location for grammar rules
- It somewhat standardizes the FSM parsing method for any codecs written therein
- The above points together conceptually would permit mixing and matching grammars (e.g. CSS within Markup) as well as using fragments, though I'd say there's likely more to figure out before that'd all be practical.
- It somewhat separates grammar rules from the application logic that processes them, thus the same rules can be used for consuming whole files or on a token-by-token streaming basis
- Leans also to codecs rooted in the spirit of be liberal in what you accept.
I've used somewhat above to underscore that while this idea is opinionated about the conventions used, it does not dictate anything. There's still undiscovered patterns in there
Limitations
-
For efficiency, the machine only exists as a single module and can only institute one state value at a time, thus decoders built atop it either can't be called recursively or have to manage recursion explicitly
-
The guts of the machine is a Parse WHILE loop which has to look up the current rule each iteration
-
The state value is presumed to be a MAP! chosen for its less ornery handling of keys, not for its suitability/performance
-
These two points would indicate that this approach could be optimized if it proves as useful as I believe it might
-
The codec installer (PARSER/NEW) only has params for name, file extensions, options and grammar rules. It doesn't include encode/decode/identify functions. This could change to something like:
parser/new 'name [ suffixes: [...] identify?: func [...] [...] rules: [rule-1 [...] [...]] encode: func [...] [...] decode: func [...] [...] ]
Although it's not certain that a codec may have it's own set of rules (e.g. 'markup can reuse 'html rules)
Example
This example is a little contrived, but demonstrates how to install and deploy a grammar, and how flow can both be embedded in the grammar or in functions built around the Parse Machine API:
Rebol [
Title: "Test Parser Module"
Needs: [
%parser.reb
]
]
make object! [
parser/new 'abba [%.abba] [] [ ; the empty block is 'options
is-a [
#"a" (emit 'a-in-a)
|
#"b" (emit 'b-in-a-to-b use is-b)
|
#"c" (emit 'c?! stop)
|
skip (report ["Misplaced " to char! mark/1])
|
end (emit 'end-in-a use done)
]
is-b [
mark:
#"a" (emit 'a-in-b-to-a use is-a) :mark
|
#"b" (emit 'b-in-b)
|
#"c" (emit 'c-in-b)
|
skip (report ["Misplaced " to char! mark/1])
|
end (emit 'end-in-b-to-a use is-a)
]
done [
end (emit 'done stop)
]
]
system/codecs/abba/decode: func [
blob [binary!]
<local> state
][
state: parser/init 'abba blob
state/out: make block! 10
state/errors: make block! 0
state/emit: func [token] [
append state/out token
]
state/report: func [token] [
append state/errors token
]
parser/start
case [
not state/is-done [
fail "Blob did not meet strict ABBA requirements"
]
not empty? state/errors [
probe state/errors
fail "Multiple ABBA errors"
]
<else> [
state/out
]
]
]
]
probe equal? decode 'abba to binary! "aba"
[a-in-a b-in-a-to-b a-in-b-to-a a-in-a end-in-a done]
probe equal? decode 'abba to binary! "bcab"
[b-in-a-to-b c-in-b a-in-b-to-a a-in-a b-in-a-to-b end-in-b-to-a end-in-a done]
probe error? trap [decode 'abba to binary! "abcd"]