New UPARSE Experiment: GATHER and EMIT

hostilefork · March 2, 2021, 8:14am

One thing's for sure about UPARSE, you can really try new ideas out fast.

Here's one... what if it were easier to make objects via PARSE? Here's GATHER and EMIT...which feature rollback (just like COLLECT and KEEP) but are tailored for making objects:

>> uparse [* * * 1 <foo> * * *] [
      some '*
      g: gather [
          emit i: integer!, emit t: text! | emit i: integer!, emit t: tag!
      ]
      some '*
   ]
== *

>> g
== make object! [
    i: 1
    t: <foo>
]

This is more in line with Haskell-style parser combinators. There, the type strictness says that each parser combinator has to produce a typed value, so you tend to build records in this fashion.

It may be useful enough that if you use EMIT at the top level with no gather, then it assumes you want the object to be the result of the parse:

>> uparse [* * * 1 <foo> * * *] [
      some '*
      [emit i: integer!, emit t: text! | emit i: integer!, emit t: tag!]
      some '*
   ]
== make object! [
    i: 1
    t: <foo>
]

So I added that in for now. Really it can be reduced to the question of whether the EMIT combinator decides to raise an error when there's no gather in effect or not, so you could tweak just that one aspect.

Anyway, now's the time to experiment...so...

hostilefork · August 12, 2021, 11:19pm

@giuliolunati brought up the point that if "auto-gathering" exists, it is explicitly overriding what might have been an intended result. He suggested this example:

uparse "ab" [collect [emit x: "a" keep "b"]] => ?

There was no GATHER, so does the auto-gather override the COLLECT?

This kind of pattern looks more like a bug, where the author intended the collect result to be the overall expression result but just has a stray emit. The implicit GATHER just creates confusion.

That seems like a pretty good disproof. If you tell users that UPARSE gives the synthesized result of the BLOCK! rule, when can this auto-gather decide that result is less important than some half-articulated EMITs?

So I'm dropping this, but there's actually some pretty flexible room for defining handling of "leaked pendings". If you don't ask for them, you'll get an error:

>> uparse [10 20] [some keep integer!]
** Error: Residual items accumulated in pending array

But if you do ask for them (third multi-return result), it suppresses the error and lets you decide what to do with them:

>> [result furthest pending]: uparse [10 20] [some keep integer!]
== 20

>> pending
== ['10 '20]

This shows you an implementation detail of COLLECT. KEEP puts QUOTED! items in the pending array, and then COLLECT filters those out.

>> [result furthest pending]: uparse* [10 20] [
    emit x: integer! emit y: integer!
]
== 20

>> pending
== [[x: '10] [y: '20]]

There's another implementation detail showing that EMIT puts BLOCK!s into the pending stream.

In any case, you can build UPARSE derivatives that take advantage of this knowledge if you can think of a reason to.

Giulio's First-Reaction-Disproof Feedback Only Proves My Point...

I need people to be reading this stuff! You can make a difference--even if you just fiddle in the ReplPad and ask "why didn't this thing I tried work".

I'm glad people are grokking what UPARSE is, but what will really make it strong is to bring more tests and more scenarios...and to question and challenge any rough or odd edges that I may be too close to the issues to see.

hostilefork · May 2, 2021, 4:48pm

I've added an interesting experiment with virtual binding, combining a GATHER'd object with the USE function...

Consider this example for parsing out a base and extension from a filename:

let result
let filename: "demo.txt"
result: parse filename [gather [
    emit base: between <here> "."
    emit extension: thru <end>
]] else [
    fail "Not a file with an extension"
]
print ["The base was" result.base]  ; demo
print ["The extension was" result.extension]  ; txt

But what if USE was a LET-like construct, that slipstreamed an object's fields into the context:

if true [
    let filename: "demo.txt"
    use parse filename [
        emit base: between <here> "."
        emit extension: thru <end>
     ] else [
        fail "Not a file with an extension"
    ]
    print ["The base was" base]  ; demo
    print ["The extension was" extension]  ; txt
]
; base and extension would not be defined here!

Pretty cool!