New UPARSE Experiment: GATHER and EMIT

One thing's for sure about UPARSE, you can really try new ideas out fast.

Here's one... what if it were easier to make objects via PARSE? Here's GATHER and EMIT...which feature rollback (just like COLLECT and KEEP) but are tailored for making objects:

>> uparse [* * * 1 <foo> * * *] [
      some '*
      g: gather [
          emit i: integer!, emit t: text! | emit i: integer!, emit t: tag!
      ]
      some '*
   ]
== *

>> g
== make object! [
    i: 1
    t: <foo>
]

This is more in line with Haskell-style parser combinators. There, the type strictness says that each parser combinator has to produce a typed value, so you tend to build records in this fashion.

It may be useful enough that if you use EMIT at the top level with no gather, then it assumes you want the object to be the result of the parse:

>> uparse [* * * 1 <foo> * * *] [
      some '*
      [emit i: integer!, emit t: text! | emit i: integer!, emit t: tag!]
      some '*
   ]
== make object! [
    i: 1
    t: <foo>
]

So I added that in for now. Really it can be reduced to the question of whether the EMIT combinator decides to raise an error when there's no gather in effect or not, so you could tweak just that one aspect.

Anyway, now's the time to experiment...so...

3 Likes

@giuliolunati brought up the point that if "auto-gathering" exists, it is explicitly overriding what might have been an intended result. He suggested this example:

uparse "ab" [collect [emit x: "a" keep "b"]] => ?

There was no GATHER, so does the auto-gather override the COLLECT? :frowning:

This kind of pattern looks more like a bug, where the author intended the collect result to be the overall expression result but just has a stray emit. The implicit GATHER just creates confusion.

That seems like a pretty good disproof. If you tell users that UPARSE gives the synthesized result of the BLOCK! rule, when can this auto-gather decide that result is less important than some half-articulated EMITs?

So I'm dropping this, but there's actually some pretty flexible room for defining handling of "leaked pendings". If you don't ask for them, you'll get an error:

>> uparse [10 20] [some keep integer!]
** Error: Residual items accumulated in pending array

But if you do ask for them (third multi-return result), it suppresses the error and lets you decide what to do with them:

>> [result furthest pending]: uparse [10 20] [some keep integer!]
== 20

>> pending
== ['10 '20]

This shows you an implementation detail of COLLECT. KEEP puts QUOTED! items in the pending array, and then COLLECT filters those out.

>> [result furthest pending]: uparse* [10 20] [
    emit x: integer! emit y: integer!
]
== 20

>> pending
== [[x: '10] [y: '20]]

There's another implementation detail showing that EMIT puts BLOCK!s into the pending stream.

In any case, you can build UPARSE derivatives that take advantage of this knowledge if you can think of a reason to.

Giulio's First-Reaction-Disproof Feedback Only Proves My Point...

I need people to be reading this stuff! You can make a difference--even if you just fiddle in the ReplPad and ask "why didn't this thing I tried work".

I'm glad people are grokking what :fire: UPARSE is, but what will really make it strong is to bring more tests and more scenarios...and to question and challenge any rough or odd edges that I may be too close to the issues to see.

1 Like

I introduced GATHER and EMIT to UPARSE, and they're nice:

But with the discussions of the new "FENCE! array type", it made me wonder if making objects is an obvious application of the structure in parse.

I guess we can imagine it operating like a GATHER that has an EMIT on all its top-level SET-WORD!s, the way that MAKE OBJECT! does. But that it would have the same rollback features, so you could write:

>> uparse [* * * 1 <foo> * * *] [
      some '*
      g: {
          i: integer!, t: text! | i: integer!, t: tag!
      }
      some '*
   ]
== *

>> g
== objectmap##{  ; "or whatever non-serialized objects/maps look like" (tm) 
    i: 1
    t: <foo>
}

>> serialize/compact reduce [1 + 2 g 10 + 20]
== "[3 {i: 1 t: <foo>} 30]"

Nuances of how this might work in terms of inheritance of base objects are beyond the scope of this particular post.

But I thought this just shows a pretty strong argument of why having it as an array type is the stronger tool than a constrained literal notation for in-memory map or object representation. (cc: @Brett) Dialect power is supposed to be the pitch of the language, and here it is.

Not that this is even necessarily the best use. Maybe GATHER is better for this, and there's a more pressing need that isn't anything to do with making objects. I just thought I'd mention it as a thought piece.


Sidenote: ELIDE can be used to cleanly get the object without a variable here. :slight_smile:

>> uparse [* * * 1 <foo> * * *] [
      elide some '*
      {
          i: integer!, t: text! | i: integer!, t: tag!
      }
      elide some '*
   ]
== objectmap##{
    i: 1
    t: <foo>
}
2 Likes

Yes another dialecting piece looks nice and I think you have also have a nice example that shows using a declarative pattern style might be nicer and more flexible than embedding more specific purposed actions like "keep" or "emit" where parse rules are concerned.

Also just to clarify, I'm not looking for "a constrained literal notation for in-memory map or object representation.", or any specific solution representing object or context like things. However, I do think "make object!" and similar solutions should begone from screwing up textual representations of information.

3 Likes