PARSE vs. Haskell's (X)PARSEC

Rebol's PARSE might seem unique to those coming from "mainstream" languages. But when talking to those who survey alternative languages, it's important not to be ignorant of the work of others! For instance, take a skim through of:

Parser Combinators: Parsing for Haskell Beginners

The author of that page makes many of the same arguments Rebol users make about how RegEx is gibberish...and that it's much nicer when you can name and factor out rules. But there are some big differences worth knowing about.

Haskell has formulated their Parser so you can re-use patterns from the rest of the language inside the parse. It's as if in Rebol there didn't need to be syntax in the PARSE dialect for "repeat this rule N times", you just use the same REPEAT native that ran any other code a certain number of times. So making up new "parse keywords" is just writing new functions.

If that sounds super-slick, it is. How did they create a framework so general that it applies to both "function" and "parser plugin"? Well, you can't use just any function this have to use various "monad-aware" operations. e.g. the form of REPEAT you can use is called replicateM. Explanations of this are beyond the scope of this post and/or my understanding...but the upshot is that it's still all quite neat.

And as a consequence of making each subparser "just another function": All the optimizations the compiler can do for functions apply to the parse too, and it's blazing fast. Continuing the REPEAT example...if they bothered to notice that it was actually a REPEAT of 1 then it can optimize that out. While each subrule you give a name to and factor out in Rebol parsing slows it down a little bit at runtime, Haskell can optimize that out at compile-time.

There are a lot of standard parsers people who want to get something like a signed decimal number out of a stream can do so. If you want hexadecimal numbers, you've got it. (The lack of standard definitions like these has always been a weak point in Rebol.)

Also: Haskell's Parsers have a Lot of Nice Features we wish we had. Look at the list from Megaparsec, the current leader of the pack of Haskell parsers. A few:

  • Megaparsec can show the line on which parse error happened as part of parse error.

  • Megaparsec has more powerful combinators and can parse languages where indentation matters out-of-the-box.

  • Megaparsec can recover from parse errors “on the fly” and continue parsing.

  • Megaparsec allows us to conditionally process parse errors inside your parser before parsing is finished. In particular, it's possible to define regions in which parse errors, should they happen, will get a “context tag”, e.g. we could build a context stack like “in function definition foo”, “in expression x”, etc.

How Can Redbols Compete?

Forget the idea of ever competing on performance or rigor. No way.

But since Haskell is compiled, you don't get the opportunity to build parse rules dynamically. So that's something to really stress. I think Ren-C's ability to indicate the GROUP!s that you want to COMPOSE is the sort of thing that would play into a real "magic" demo of how clean it could be to build these things at runtime.

(Note: Clojure has an interesting edge here, because it has monadic parsing libraries, but you can build those parse "rules" and compile them at runtime too. This is the sort of thing Red could, in theory, do.)

The Monadic parsers can't have side-effects, so each little parsing subrule is a function that gives a specific type of result. The top-level rule you pass in builds a structure out of all these subresult.
Being able to stick in GROUP!s to run arbitrary code and use SET and COPY to write arbitrary variables really does differentiate Rebol...and it can seem very nice. But maybe it would be good if the foundations of Rebol PARSE permitted the "building an aggregate value cleanly" and these imperative abilities were thrown in as a tool to use if that didn't fit your use case.

I thought Haskell parsers were network-stream-capable out of the box...e.g. that they would have an intelligent formulation so you could parse an infinite number of data items one at a time. But nearly all of the parsers--including Megaparsec--require all the data to be in memory from the beginning (the way Rebol PARSE does today). Even the parsers that are "incremental" hold onto all the historical input in case they need to backtrack...whether the rules need to or not. I've found some research pointing to architectures of doing this better...see Section 4 of this paper


Taking time to work in other languages is helping to give me perspective, and get rid of some of the "tunnel vision" of exploring the vagaries of C, C++, and Rebol for too long. I'm looking for the best of what's out there for anything I can steal: type systems, streaming for a composable PORT! design, you-name-it.

Even so...while I do enjoy aspects of hacking around in Haskell, doing so certainly convinces me that there are very few people in the planet who would ever be able to put up with programming in it. Some parts of it do feel fairly natural for my line of thinking...but some of it is just really frustrating and dizzying, and it doesn't help that people really seem to go for absolute brevity in their code instead of labeling their subexpressions. (The compiler will optimize it out, what are they afraid of?)

Finding out that Haskell parsers weren't supporting incrementality for network streaming out of the box was a bit of a shock to me. I'm going to keep digging until I can find some neat version of a "snapped together" system that can parse a PORT! which has a decompressor codec snapped onto it, running over a https connection. If I can piece together a state-of-the-art system that does that, I'll try to approximate it.


Yes, this is key. Peculiarities aside, Rebol syntax is quite friendly. I wouldn't be interested in big changes to get more performance or rigor. Syntaxes like REBIN or Red/System make me wonder why one wouldn't just switch to Go, Rust, Java etc. to get those features plus community/docs.

I think Java 8 has Streams which might be worth looking at.