UPARSE RETURN Subtleties

So one little annoyance about RETURN usage has come up, in that seemingly convenient casual usages of it require you to specify END or you may be missing something:

>> uparse [1 2 <gotcha!>] [return collect [some keep integer!]]
== [1 2]

With COLLECT, Red switches the overall mode of the parse into "collect mode" so the result will be the result of the collect. That seems like an opportunity to not miss out on the "didn't reach end" constraint...but...they don't take advantage of that:

red> parse [1 2 <gotcha!>] [collect [some keep integer!]]
== [1 2]

In fact, you can't even force failure outside the COLLECT, seemingly:

red> parse [1 2 <gotcha!>] [collect [some keep integer!] not tag!]
== [1 2]

red> parse [1 2 <gotcha!>] [collect [some keep integer!] fail]
== [1 2]

Worse, they don't even seem to fail if you put the END inside the collect:

red> parse [1 2 <gotcha!>] [collect [some keep integer! end]]
== [1 2]

So I don't know what they're smoking. At least UPARSE gets it right if the END is inside the COLLECT rule:

>> uparse [1 2 <gotcha!>] [return collect [some keep integer!, end]]
; null

>> uparse [1 2] [return collect [some keep integer!, end]]
== [1 2]

Is Having the END Implicit a Wrong Turn?

Well, here we are back to "The PARSE of /PROGRESS" Question again. :frowning:

If people are used to the idea that parse returns partial results by default, then this behavior of RETURN would be less surprising.

I've also wondered if UPARSE should return the last value of the rule:

>> uparse [1 2] [some integer!]
== 2

Then, we might say that END is invisible:

>> uparse [1 2] [some integer!, end]
== 2

And if people wanted to get the parse position at the end of the parse as the result, they could just use HERE (cc @rgchris):

>> uparse [1 2 <tag>] [some integer!, here]
== [<tag>]

The invisibility of END would make it easier to use things like COLLECT:

>> uparse [1 2] [collect [some keep integer!], end]
== [1 2]

Which allows for failure with a fairly clean-looking answer to the problem I started this post with:

>> uparse [1 2 <gotcha!>] [collect [some keep integer!], end]
; null

What About Rules That Succeed and Return Null ?

There's a bit of a problem here as successful rules could currently return NULL:

>> uparse "aaa" [some "a", opt "b", end]
; null

Because OPT "B" can succeed without actually matching the "B" rule. So returning "B" seems wrong. Giving back NULL while still succeeding makes the most sense for OPT.

You could override that by tacking on LOGIC! true, which continues parsing and synthesizes #[true]:

>> uparse "aaa" [some "a", opt "b", true, end]
== #[true]

...or really any other value you wanted via GROUP!

>> uparse "aaa" [some "a", opt "b", (<whatever>) end]
== <whatever>

Interesting Thought... Worth Keeping In Mind

A lot of new things are in play with UPARSE.

Maybe this calls for different operations. I dunno.

1 Like

With RETURN seeming to have a load of troublesome points, I'm less keen on putting it in the box, because I think it will be used incorrectly and create problems.

Something that is gnawing at me is that it's seeming that giving back the input is a lot less flexible than giving back the "synthesized result" of the rule (e.g. whatever fell out from the last combinator that matched in a block.)

Just look at how much an operator that only returns the synthesized product can do (let's call it PARSE*):

>> parse* "aaabbb" [some "a" end]
; null

>> parse* "aaabbb" [some "a"]
== "a" 

>> parse* "aaabbb" [some "a" here]
== "bbb"

>> parse* "aaabbb" [some "a" some "b" end]
== ""  ; assuming END evaluates to the end position

>> parse* "aaa" [some "a" end (1 + 2)]
== 3

>> parse* "aaabbb" [some "a" end (1 + 2)]
; null

Woah. And if you want to, you can write a "reached-end-enforcing PARSE" on top of this, that can offer the product as a secondary result:

endy-parse: func [
    return: [<opt> any-value!]
    synthesized: [<opt> any-value]
    input [any-series!]
    rule [block!]
][
    let product
    let pos: parse* [product: rule, here]  ; use wrapping rule
    if tail? pos [  ; reached end
        set synthesized product
        return input
    ]
    return null
]

That is rather remarkable, and I just sold myself on PARSE* as being the core of the implementation. Uniform behavior and maximum generality, all done inside the parse rules without needing controlling refinements.

But there's still one missing link out of this puzzle, to help the person who wants to write something along the lines of:

 assert [["a" "a" "a"] = parse-product "aaa" [collect [some keep "a"]]]
 assert [null = parse-product "aaabbb" [collect [some keep "a"]]]

You want the product as the result (which doesn't fit the historical return value of PARSE)...and you want to enforce reaching end (which doesn't fit this new super-general PARSE*).

The crux of the problem is that generic PARSE is tested for success, and failure needs to return something besides the product...in case the product is just incidentally falsey:

 if parse "aaa" [some "a" opt "b"] [
    ; opt "b" evaluates to null
    ; if we return that, we won't be running this branch
 ]

Using the input has the nice property of being guaranteed to be truthy, and makes the routine do double-duty as a nice lightweight validator that can pass through the input.

But with RETURN on the chopping block, do we just deal with some new name like PARSE-PRODUCT, or is there a finesse here I'm missing?

And the Winner is... PARSE? vs. PARSE

A ha, I think I've got it solved... :partying_face:

  • Just let go of the idea of returning the input from PARSE. Instead make it return the last synthesized result.

  • Then have PARSE? as a logic form. It's easy to see that it returns a logic. If people are habituated early on to seeing that they are in control of an arbitrary return result when they use plain PARSE, they won't be expecting to use it in an IF unless they've expressed the synthesized result coherently. This will draw them to PARSE? if they need it.

  • Make PARSE* be the version that doesn't force running to the end. In a pleasing piece of consistency with PARSE, they both return the synthesized result. Getting the position back is as simple as wrapping your rule in [[...] here], and there's tons of other flexibility.

    • This could be a refinement on PARSE... but I've explained why I don't want to call it /PARTIAL due to confusion with /PART meaning you're passing in a limit on how much input to process. It doesn't make it return the progress implicitly, it just says all the input need not be consumed. Any naming suggestions? /RELAX ?
  • Add some wordier construct like MATCH-PARSE to give the input if that's what you want. In practice I didn't use the matched result all that terribly often.

  • Throw in NULL isotopes so that ELSE can be generally used even with plain PARSE. It's become a regular practice. Why not?

    >> parse "aaa" [some "b" (null)] else [print "Match failure light null"]
    Match failure light null
    
    >> x: parse "aaa" [some "a" (null)] else [print "Make results heavy nulls"]
    ; null-2
    
    >> x  ; null-2 isotopes decay to null
    ; null
    
    >> parse "aaa" @[some "a" (null)] else [print "@ to avoid isotope"]
    @ to avoid isotope
    

This feels like it's adding up to the best of all worlds. Hopefully @rgchris likes it, as he can get the parse version he asked for in the past with:

chrisparse: adapt :uparse* [  ; adapt form that doesn't force going to end
    rule: reduce [rule 'here]
]

That's all it takes to make a parse that returns the progress in this model. Very cool.

3 Likes