UPARSE RETURN Subtleties

So one little annoyance about RETURN usage has come up, in that seemingly convenient casual usages of it require you to specify END or you may be missing something:

>> uparse [1 2 <gotcha!>] [return collect [some keep integer!]]
== [1 2]

With COLLECT, Red switches the overall mode of the parse into "collect mode" so the result will be the result of the collect. That seems like an opportunity to not miss out on the "didn't reach end" constraint...but...they don't take advantage of that:

red> parse [1 2 <gotcha!>] [collect [some keep integer!]]
== [1 2]

In fact, you can't even force failure outside the COLLECT, seemingly:

red> parse [1 2 <gotcha!>] [collect [some keep integer!] not tag!]
== [1 2]

red> parse [1 2 <gotcha!>] [collect [some keep integer!] fail]
== [1 2]

Worse, they don't even seem to fail if you put the END inside the collect:

red> parse [1 2 <gotcha!>] [collect [some keep integer! end]]
== [1 2]

So I don't know what they're smoking. At least UPARSE gets it right if the END is inside the COLLECT rule:

>> uparse [1 2 <gotcha!>] [return collect [some keep integer!, end]]
; null

>> uparse [1 2] [return collect [some keep integer!, end]]
== [1 2]

Is Having the END Implicit a Wrong Turn?

Well, here we are back to "The PARSE of /PROGRESS" Question again. :frowning:

If people are used to the idea that parse returns partial results by default, then this behavior of RETURN would be less surprising.

I've also wondered if UPARSE should return the last value of the rule:

>> uparse [1 2] [some integer!]
== 2

Then, we might say that END is invisible:

>> uparse [1 2] [some integer!, end]
== 2

And if people wanted to get the parse position at the end of the parse as the result, they could just use HERE (cc @rgchris):

>> uparse [1 2 <tag>] [some integer!, here]
== [<tag>]

The invisibility of END would make it easier to use things like COLLECT:

>> uparse [1 2] [collect [some keep integer!], end]
== [1 2]

Which allows for failure with a fairly clean-looking answer to the problem I started this post with:

>> uparse [1 2 <gotcha!>] [collect [some keep integer!], end]
; null

What About Rules That Succeed and Return Null ?

There's a bit of a problem here as successful rules could currently return NULL:

>> uparse "aaa" [some "a", opt "b", end]
; null

Because OPT "B" can succeed without actually matching the "B" rule. So returning "B" seems wrong. Giving back NULL while still succeeding makes the most sense for OPT.

You could override that by tacking on LOGIC! true, which continues parsing and synthesizes #[true]:

>> uparse "aaa" [some "a", opt "b", true, end]
== #[true]

...or really any other value you wanted via GROUP!

>> uparse "aaa" [some "a", opt "b", (<whatever>) end]
== <whatever>

Interesting Thought... Worth Keeping In Mind

A lot of new things are in play with UPARSE.

Maybe this calls for different operations. I dunno.

1 Like

With RETURN seeming to have a load of troublesome points, I'm less keen on putting it in the box, because I think it will be used incorrectly and create problems.

Something that is gnawing at me is that it's seeming that giving back the input is a lot less flexible than giving back the "synthesized result" of the rule (e.g. whatever fell out from the last combinator that matched in a block.)

Just look at how much an operator that only returns the synthesized product can do (let's call it PARSE*):

>> parse* "aaabbb" [some "a" end]
; null

>> parse* "aaabbb" [some "a"]
== "a" 

>> parse* "aaabbb" [some "a" here]
== "bbb"

>> parse* "aaabbb" [some "a" some "b" end]
== ""  ; assuming END evaluates to the end position

>> parse* "aaa" [some "a" end (1 + 2)]
== 3

>> parse* "aaabbb" [some "a" end (1 + 2)]
; null

Woah. And if you want to, you can write a "reached-end-enforcing PARSE" on top of this, that can offer the product as a secondary result:

endy-parse: func [
    return: [<opt> any-value!]
    synthesized: [<opt> any-value]
    input [any-series!]
    rule [block!]
][
    let product
    let pos: parse* [product: rule, here]  ; use wrapping rule
    if tail? pos [  ; reached end
        set synthesized product
        return input
    ]
    return null
]

That is rather remarkable, and I just sold myself on PARSE* as being the core of the implementation. Uniform behavior and maximum generality, all done inside the parse rules without needing controlling refinements.

But there's still one missing link out of this puzzle, to help the person who wants to write something along the lines of:

 assert [["a" "a" "a"] = parse-product "aaa" [collect [some keep "a"]]]
 assert [null = parse-product "aaabbb" [collect [some keep "a"]]]

You want the product as the result (which doesn't fit the historical return value of PARSE)...and you want to enforce reaching end (which doesn't fit this new super-general PARSE*).

The crux of the problem is that generic PARSE is tested for success, and failure needs to return something besides the product...in case the product is just incidentally falsey:

 if parse "aaa" [some "a" opt "b"] [
    ; opt "b" evaluates to null
    ; if we return that, we won't be running this branch
 ]

Using the input has the nice property of being guaranteed to be truthy, and makes the routine do double-duty as a nice lightweight validator that can pass through the input.

But with RETURN on the chopping block, do we just deal with some new name like PARSE-PRODUCT, or is there a finesse here I'm missing?

And the Winner is... PARSE? vs. PARSE

A ha, I think I've got it solved... :partying_face:

  • Just let go of the idea of returning the input from PARSE. Instead make it return the last synthesized result.

  • Then have PARSE? as a logic form. It's easy to see that it returns a logic. If people are habituated early on to seeing that they are in control of an arbitrary return result when they use plain PARSE, they won't be expecting to use it in an IF unless they've expressed the synthesized result coherently. This will draw them to PARSE? if they need it.

  • Make PARSE* be the version that doesn't force running to the end. In a pleasing piece of consistency with PARSE, they both return the synthesized result. Getting the position back is as simple as wrapping your rule in [[...] here], and there's tons of other flexibility.

    • This could be a refinement on PARSE... but I've explained why I don't want to call it /PARTIAL due to confusion with /PART meaning you're passing in a limit on how much input to process. It doesn't make it return the progress implicitly, it just says all the input need not be consumed. Any naming suggestions? /RELAX ?
  • Add some wordier construct like MATCH-PARSE to give the input if that's what you want. In practice I didn't use the matched result all that terribly often.

  • Throw in NULL isotopes so that ELSE can be generally used even with plain PARSE. It's become a regular practice. Why not?

    >> parse "aaa" [some "b" (null)] else [print "Match failure pure null"]
    Match failure light null
    
    >> x: parse "aaa" [some "a" (null)] else [print "Isotope on success"]
    == ~null~  ; isotope
    
    >> x  ; ~null~ isotopes decay to null when assigned
    ; null
    
    >> parse "aaa" @[some "a" (null)] else [print "@ to avoid isotope"]
    @ to avoid isotope
    

This feels like it's adding up to the best of all worlds. Hopefully @rgchris likes it, as he can get the parse version he asked for in the past with:

chrisparse: adapt :uparse* [  ; adapt form that doesn't force going to end
    rule: reduce [rule 'here]
]

That's all it takes to make a parse that returns the progress in this model. Very cool.

3 Likes

So...one must remember when I was writing and obsessing over this issue, PARSE wasn't returning the synthesized result of its block. So getting values out of a parse would have to be done with assignments. If you wanted to avoid naming a variable and putting a set of it in the parse rule, you'd be tempted to reach for RETURN.

I think I was rightfully concerned that people reaching for that tool would get it wrong, with patterns like return collect [...]

But Times Have Changed...

After having lived in the world where PARSE returns the result of it's block, I think it has come to feel very natural. Why would you "reach for RETURN" when it's so obviously cleaner to not use it?

>> result: parse data [collect [...]]
== [...]

People can understand that works, and checks the rules to the end...and that you should be doing that if you want the rules to be checked to the end.

With the clean answer sitting right in front of you for how to avoid a variable... the only time you would use RETURN was when it did what you meant: terminate the parse now, with this result!

So if we bring back RETURN it made me wonder if we can just drop UPARSE*.

>> uparse "aaabbb" [some "a", return <here>]
== "bbb"

Anyway, what I"m trying to say is that the thing I had been concerned about with RETURN--namely its incompleteness--becomes a feature and not a liability when that's why you are using it.

If you only use it when you want incompleteness, that sounds like fitness for purpose to me! And now that's the only time you need to use it. So, it's back!

But is UPARSE* Still Needed?

One problem is that you cannot meaningfully ELIDE a RETURN. Let's say you want this to return 3:

>> uparse "aaabbb" [tally some "a", elide return]
== ~void~  ; isotope

The ELIDE doesn't do anything. It's the same as RETURN.

Of course you can just move the RETURN:

>> uparse "aaabbb" [return tally some "a"]
== 3

But that might not be the order you wanted. We could make a magical TAG! combinator that gave you whatever the synthesized value in the stream is (e.g. what would be still there if you used elide):

>> uparse "aaabbb" [tally some "a", return <magic>]
== 3

That actually might be useful for other reasons. It's kind of like an enfix operation. It might be called <last> or <accumulator> or <result> ?

UPARSE* will always be a bit faster. And right now UPARSE is just UPARSE* that does an additional check for "did you reach the end". But maybe with RETURN it can disappear into being an implementation detail that you don't tell people about...

2 Likes

...so...another thing about UPARSE* saying it "doesn't require the parse to match all the way to the end" is that then it's a global thing, and you can't make alternate rules in which one branch has to go to the end and another doesn't.

Hence maybe elide to <end> just needs its own efficient keyword.

It feels like "STOP", but that's being used for loop constructs to mean "stop this loop but don't break out of the matching of this rule alternate...".

It could be <stop>. We already have the case where ANY the rule and <any> the tag combinator are distinct.

Less confusing might be <done>?

>> uparse "aaabbb" [tally some "a" <done>]
== 3

It looks a little weird if you combine it with another tag:

>> uparse "aaabbb" [tally some "a" <here> <done>]
== "bbb"

But since tags don't take arguments you could have said that more clearly as:

>> uparse "aaabbb" [tally some "a" return <here>]
== "bbb"

In any case, I'm feeling ever more inclined to say that we can solve this in a way that people need not ever know about UPARSE*...if it exists at all.

2 Likes