How To Simplify Extracting Results From UPARSE?

Imagine that we want to make the following just a little bit nicer:

; Note: this actually works in UPARSE today
let result
uparse "aaabbb" [
   result: gather [
        emit x: collect some ["a", keep @(<a>)]
        emit y: collect some ["b", keep @(<b>)]
   ]
] else [
    fail "Parse failed"
]
assert [result.x = [<a> <a> <a>]] 
assert [result.y = [<b> <b> <b>]]

One Idea: Allow EMIT even if no GATHER is in Effect

Right now there's a little experiment which lets you do this, which assumes you meant to treat the PARSE overall as a gather vs. giving you an error that there's no GATHER:

let result: uparse "aaabbb" [
    emit x: collect some ["a", keep @(<a>)]
    emit y: collect some ["b", keep @(<b>)]
] else [
    fail "Parse failed"
]
assert [result.x = [<a> <a> <a>]] 
assert [result.y = [<b> <b> <b>]]

We could do a similar thing for COLLECT. But then we'd have to worry about what to do if you said uparse "..." [emit x: ..., keep ...] -- which was intended as a result?

While this may seem convenient, it isn't very general. It ties in a special relationship between UPARSE and GATHER, which violates some of the modularity.

I'm not thrilled about this, and would rather you made a specialization for it (GATHER-PARSE) that injected the gather above your rule block...so you were explicit about what you were doing.

Another Idea: Bring Back PARSE Return

When RETURN existed in PARSE, we could have said:

let result: uparse "aaabbb" [
    return gather [
        emit x: collect some ["a", keep @(<a>)]
        emit y: collect some ["b", keep @(<b>)]
    ]
] else [
    fail "Parse failed"
]
assert [result.x = [<a> <a> <a>]]
assert [result.y = [<b> <b> <b>]]

Something to think about here would be how RETURN and NULL would mix, in terms of returning null results vs. parse failures. RETURN has kind of an interesting property of rolling on to the next option if its rule fails:

>> uparse "aaa" [
       return collect some "b"
          |
       return collect some "a"
   ]
== "aaa"

But if you wanted to force a RETURN to give NULL back you could OPT it.

>> uparse "aaa" [
       return opt collect some "b"
          |
       return collect some "a"
   ]
; null

But RETURN Was Removed in Ren-C. Why, Again?

I'm a little hazy on the precise complete argument for dropping RETURN...here was the commit from over 2 years ago that did it.

The argument against RETURN may not be as strong as it was, since the idea is that the "main result" of PARSE is trying to be maximally useful to the average callsite...while the progress: is a separate requested output. See The PARSE of /PROGRESS

As a sidenote, Red's PARSE doesn't seem to have RETURN, but it doesn't seem to error either:

>> parse "" [return (10)]
== false

>> parse [10] [return integer!]
== false

Some of the things that made me uncomfortable about RETURN before are solved. e.g. I didn't like return (code in a group) using the result of the group...and now with the value-bearing concept everything seems to plug together much better. uparse [aaa] [return @(1 + 2)] meets my needs.

I know I didn't particularly care for losing grounding on what exactly it was you were RETURN-ing from...e.g. that it wasn't the function's RETURN. But it's not the same keyword meanings overall. Getting bent out of shape about that one doesn't make much sense.

So...I've Added RETURN To UPARSE

It seems the upsides outweigh the downsides.

But also, in our "flexibility is king" mindset... it seems that RETURN is the kind of thing you should be able to add if you wanted it.

The main thing to do is to keep an eye on what the implications are for the combinator protocol as a whole. Here, we're saying RETURN just bypasses everything and aborts with its result...and being able to abort is necessary for both rule failure and if an exception happens with a FAIL of an ERROR!. So it doesn't complicate the protocol...it just makes the return type for PARSE not necessarily a series type like the input.

2 Likes

So if we're going to be "all in" on the idea that you can abstract the act of variable declaration itself, that affords another interesting possibility...

:bulb: What if this turned PARSE into a LET-like construct that declared variables available at the callsite?

if true [
    let filename: "demo.txt"
    uparse filename [
        emit base: between here "."
        emit extension: thru end
    ] else [
        fail "Not a file with an extension"
    ]
    print ["The base was" base]
    print ["The extension was" extension]
]
; base and extension would not be defined here!

You'd start to see things like PARSE and IMPORT and LET as variable creators.

Wow, that sounds cool... can it be done?

It's done already. The feature is now in UPARSE for you to test. Usual caveats about "not known if this can be done efficiently", but we're trying to solve semantics first, performance...later.

Having it be the same feature as EMIT may not be the right thing. But as a general idea this seems very useful, and pushes on the definitions of binding...if we can offer this kind of thing, I feel we should, as it's giving a reality to the "you can design your own language features" claim. Declaring variables people can access by name is something that language features do.

Also, made it so you don't need ACROSS to do assignments for TO and THRU, cc @IngoHohmann

2 Likes

Maybe EMIT should still return an object, but you do another step to bring those variables in. Maybe that step is IMPORT?

import uparse filename [
    emit base: between here "."
    emit extension: thru end
]

So UPARSE would give back an object, and then IMPORT would virtually bind that object into the execution stream. :-/

That is a strange idea, but it does have a sort of level of purposeful control to it. You wouldn't accidentally use an EMIT rule and then get variables leaking out unless you seemed aware that was what you were doing.

You'd have to deal with the failing cases. ELSE helps here:

import uparse filename [
    emit base: between here "."
    emit extension: thru end
] else [
    print "This clause runs before the IMPORT gets the NULL"
]

It's generally not the plan to make things silently accept NULL, but opting out on blanks is standard practice... so you could import try uparse if you wanted to.

Anyway, a further thought on this grand puzzle of binding.

2 Likes