How To Simplify Extracting Results From UPARSE?

Imagine that we want to make the following just a little bit nicer:

; Note: this actually works in UPARSE today
let result
uparse "aaabbb" [
   result: gather [
        emit x: collect some ["a", keep @(<a>)]
        emit y: collect some ["b", keep @(<b>)]
   ]
] else [
    fail "Parse failed"
]
assert [result.x = [<a> <a> <a>]] 
assert [result.y = [<b> <b> <b>]]

One Idea: Allow EMIT even if no GATHER is in Effect

Right now there's a little experiment which lets you do this, which assumes you meant to treat the PARSE overall as a gather vs. giving you an error that there's no GATHER:

let result: uparse "aaabbb" [
    emit x: collect some ["a", keep @(<a>)]
    emit y: collect some ["b", keep @(<b>)]
] else [
    fail "Parse failed"
]
assert [result.x = [<a> <a> <a>]] 
assert [result.y = [<b> <b> <b>]]

We could do a similar thing for COLLECT. But then we'd have to worry about what to do if you said uparse "..." [emit x: ..., keep ...] -- which was intended as a result?

While this may seem convenient, it isn't very general. It ties in a special relationship between UPARSE and GATHER, which violates some of the modularity.

I'm not thrilled about this, and would rather you made a specialization for it (GATHER-PARSE) that injected the gather above your rule block...so you were explicit about what you were doing.

Another Idea: Bring Back PARSE Return

When RETURN existed in PARSE, we could have said:

let result: uparse "aaabbb" [
    return gather [
        emit x: collect some ["a", keep @(<a>)]
        emit y: collect some ["b", keep @(<b>)]
    ]
] else [
    fail "Parse failed"
]
assert [result.x = [<a> <a> <a>]]
assert [result.y = [<b> <b> <b>]]

Something to think about here would be how RETURN and NULL would mix, in terms of returning null results vs. parse failures. RETURN has kind of an interesting property of rolling on to the next option if its rule fails:

>> uparse "aaa" [
       return collect some "b"
          |
       return collect some "a"
   ]
== "aaa"

But if you wanted to force a RETURN to give NULL back you could OPT it.

>> uparse "aaa" [
       return opt collect some "b"
          |
       return collect some "a"
   ]
; null

But RETURN Was Removed in Ren-C...

I'm a little hazy on the precise complete argument for dropping RETURN...here was the commit from over 2 years ago that did it.

As a sidenote, Red's PARSE doesn't seem to have RETURN, but it doesn't seem to error either:

>> parse "" [return (10)]
== false

>> parse [10] [return integer!]
== false

I know I didn't particularly care for losing grounding on what exactly it was you were RETURN-ing from...e.g. that it wasn't the function's RETURN.

UPDATE: I figured out several more good reasons to avoid putting RETURN in the box, and came up with a much better answer.

So forget RETURN. You can make it as your own combinator if you really want it...against my advice.

2 Likes

So if we're going to be "all in" on the idea that you can abstract the act of variable declaration itself, that affords another interesting possibility...

:bulb: What if this turned PARSE into a LET-like construct that declared variables available at the callsite?

if true [
    let filename: "demo.txt"
    uparse filename [
        emit base: between here "."
        emit extension: thru end
    ] else [
        fail "Not a file with an extension"
    ]
    print ["The base was" base]
    print ["The extension was" extension]
]
; base and extension would not be defined here!

You'd start to see things like PARSE and IMPORT and LET as variable creators.

Wow, that sounds cool... can it be done?

It's done already. The feature is now in UPARSE for you to test. Usual caveats about "not known if this can be done efficiently", but we're trying to solve semantics first, performance...later.

Having it be the same feature as EMIT may not be the right thing. But as a general idea this seems very useful, and pushes on the definitions of binding...if we can offer this kind of thing, I feel we should, as it's giving a reality to the "you can design your own language features" claim. Declaring variables people can access by name is something that language features do.

Also, made it so you don't need ACROSS to do assignments for TO and THRU, cc @IngoHohmann

2 Likes

Maybe EMIT should still return an object, but you do another step to bring those variables in. Maybe that step is IMPORT?

import uparse filename [
    emit base: between here "."
    emit extension: thru end
]

So UPARSE would give back an object, and then IMPORT would virtually bind that object into the execution stream. :-/

That is a strange idea, but it does have a sort of level of purposeful control to it. You wouldn't accidentally use an EMIT rule and then get variables leaking out unless you seemed aware that was what you were doing.

You'd have to deal with the failing cases. ELSE helps here:

import uparse filename [
    emit base: between here "."
    emit extension: thru end
] else [
    print "This clause runs before the IMPORT gets the NULL"
]

It's generally not the plan to make things silently accept NULL, but opting out on blanks is standard practice... so you could import try uparse if you wanted to.

Anyway, a further thought on this grand puzzle of binding.

2 Likes

The idea of EMIT being able to throw new variables in at the callsite of UPARSE is cool... but it runs into problems with abstraction.

If I make MY-UPARSE that calls UPARSE in its implementation, then you're getting the variables emitted into the guts o MY-UPARSE...not where MY-UPARSE is called. :frowning:

So that suggests one needs to be somewhat wary of writing LET-like constructs. Hence you should be getting an object out of the parse operation, and explicitly importing that object where you want it in cope. Maybe this is a better application for the word USE? (IMPORT might need more parameters and wordiness).

if true [
    let filename: "demo.txt"
    use parse filename [
        emit base: between here "."
        emit extension: thru end
     ] else [
        fail "Not a file with an extension"
    ]
    print ["The base was" base]
    print ["The extension was" extension]
]
; base and extension would not be defined here!

Or perhaps implicit emission of the variables is a feature of PARSE only, but not the lower level PARSE*. So if you're building something that's abstracted you would build it on top of PARSE*, which wouldn't do the implicit LET emits...and then you'd decide if your parselike abstraction wanted to do it or not.

1 Like