Dynamically Generating Parse Rules From Code GROUP!s

hostilefork · October 28, 2020, 6:50am

Historically, a plain GROUP! in PARSE runs the conventional evaluator. But it discards its result and doesn't put it into the stream of parse instructions.

That means you can't do things like this:

parse "aaa" [some (either condition ["a"] ["b"])]

Because the "discarding" behavior of GROUP!s in PARSE is so pervasive, it seemed hard to challenge. So Ren-C had adopted the GET-GROUP! as :(...) to be non-discarding form, which you could use to splice arbitrary arguments or instructions into the PARSE stream.

But to @rgchris's taste, the colons are ugly. I can see it that way.

We could turn it the other way, and say you have to explicitly throw results away...otherwise they are kept.

parse "aaa" [some ["a" elide (print "found an A")]]

A failure to put the ELIDE there would mean you'd be trying to splice the PRINT result (a ~none~ isotope) into the instruction stream, which we'd imagine is an error.

The choice to elide could come from inside the group as well:

parse "aaa" [some ["a" (elide print "found an A")]]

Or use something that returned a ~void~ isotope...since that's ignored by the parse stream. This is the mechanism by which branching constructs can be used to good effect:

parse "aaa" [some ["a" (if false ["b"])]]

It Gets Rid of the Colons, but Does It Suck To Use?

I'm not so much concerned about backwards compatibility, as I am about two issues:

Is it actually, measurably, knowably better.
Will the machinery bend to allow emulation of the Rebol2 semantics.

I'm interested in (2) because I kind of insist on being able to implement Redbol. So I'd like there to be some kind of hook to choose. Ideally it would be the kind of hook that would have allowed a motivated individual to add something like the behavior of COLLECT and KEEP as it acts now to PARSE.

2022 UPDATE: UPARSE offers all this, and more! How far things have come since October 2020!

BlackATTR · October 28, 2020, 10:23pm

I hope we get some good community feedback here. I like the new functionality and I'm fine with the proposed consistent GROUP! behavior. OTOH I'm not tied to a legacy codebase to convert.

iArnold · October 29, 2020, 11:38am

Try to stick to any other programming language over a longer period of time, chances are you have to make changes in your codebase to deal with the progression. Many projects in my working history have this as a common denominator.

hostilefork · October 29, 2020, 2:33pm

It may be conceptually more "healthy" to see the discarding case as the behavior requiring a notation. You don't generally conceive of things in parentheses being associated with discarding the result...it "groups" things.

Yet it does get quite wordy. Let's say you start with something like:

        parse3 skip executable string-header-offset [
            (mode: 'read) pos: section-header-rule
            (
                assert [sh_offset = string-section-offset]
                sh_size: sh_size + (1 + length of encap-section-name)
            )
            (mode: 'write) :pos section-header-rule
            to end
        ]

Becoming:

        parse3 skip executable string-header-offset [
            elide (mode: 'read) pos: section-header-rule
            elide (
                assert [sh_offset = string-section-offset]
                sh_size: sh_size + (1 + length of encap-section-name)
            )
            elide (mode: 'write) :pos section-header-rule
            to end
        ]

BlackATTR · October 29, 2020, 4:05pm

I'm all for consistency. But GET-WORD!s are commonly used to set/inject a new position into PARSE, so it doesn't seem much of a cognitive stretch to use GET-GROUP to splice results in. I don't know what the chances are for getting a notation or shorthand for ELIDE...

IngoHohmann · October 29, 2020, 6:59pm

I had to think this over, because I wasn't sure wether my reluctance was just because of change.

In my experience adjusting rules is an important usage of groups in parse, but not the most frequent usage.

Furthermore the number of elides just doesn't look good.

So I would vote to keep plain groups as vanishing and get-groups as splicing. Double groups could be used as well, but I think that might be harder when you are constructing rules programmatically.

rgchris · November 12, 2020, 6:15am

This example doesn't quite sit right:

If a group's product is to be used as a rule, it should be explicit, say:

parse "aa" [some use (first ["a" "b"]) (x: "c")]

This could be a GET-GROUP! where this type is available, I suppose (I'm not wholly convinced of GET-GROUP!'s necessity), though my leaning would be to solve through words.

hostilefork · April 13, 2021, 3:11am

A post was split to a new topic: Role of @(xxx) in PARSE

hostilefork · May 24, 2022, 12:54am

Certainly from that experiment we have agreement on this. Swapping GROUP! and GET-GROUP! usually, probably, is ugly.

Buuuuuuut... what if you wanted to?

With UPARSE, you can be as ugly as you want to be!

Though This Particular Intent Is a Little Bit Subtle

For starters, we could think of just replacing the GROUP! combinator with the GET-GROUP! combinator:

ugly-combinators: copy default-combinators
ugly-combinators.group!: :default-combinators.(get-group!)
ugly-combinators.get-group!: null

ugly-parse: specialize :uparse [combinators: ugly-combinators]

Nice idea, but in the examples above I used ELIDE for the intent of "let the GROUP! rule run, but don't use its result as a parse rule to match".

But ELIDE only erases rule products...it doesn't stop the rule from doing whatever it does. And if the rule fails, the ELIDE still fails.

COMMENT will suppress the argument rule from running altogether. So that's also no good: the point in the examples was that you need a way to run the code in the group--just not splice a new parse rule.

So this example needs one GROUP! combinator that works two completely different ways--based on the influence of an ELIDE-like modifying rule (I'll call it DISCARD). It's an even "weirder" idea than it seems on the surface. None of the options are great:

Make DISCARD something that quotes its argument, and maybe only narrowly works with GROUP!. It would then run the DO on the GROUP! itself.
Make the GROUP! combinator enfix left quoting (if UPARSE had such a thing!) and have it look to the left to see if it sees a DISCARD. If it does, then only run the group...don't try and retrigger the resulting combinator.
Have DISCARD feed the next combinator an empty series of the same type as its input, and hope that means it won't do much...then ignore whether it was successful or not. But consider discard (if true '[opt "a" (print "side effect")]) - just because the series you feed to a rule is empty doesn't mean it won't do anything!

Let's Demo #1 (Because We Can!)

Let's take it from the top...

ugly-combinators: copy default-combinators
ugly-combinators.group!: :default-combinators.(get-group!)
ugly-combinators.get-group!: null

 ugly-combinators.discard: combinator [
    return: "Don't return anything" [<invisible>]
    'group "Capture GROUP! so it doesn't get a combinator" [group!]
][
    do group  ; we DO the group since it didn't get "combinated"
    set remainder input
    return
]

ugly-parse: specialize :uparse [combinators: ugly-combinators]

>> ugly-parse "aaabbb" [
       (print "A RULE", if true '[some "a"])
       some "b"
       discard (print "C RULE", if true '[some "c"])
   ]
A RULE
C RULE
== #b

There you have it... you can indeed flip the GROUP! behavior on its head. But this is just the tip of the iceberg about how the UPARSE engine can serve as the backbone for your dialects. Expect to see much more of this kind of acrobatics coming down the pipe!

hostilefork · May 24, 2022, 12:58am

Note: Two years after this discussion, we now have UPARSE ... and its idea of GROUP! combinators yielding evaluative products has revolutionized and solved entire classes of problems.

Prior to UPARSE, this USE would just be bringing in an epicycle of my same uneasy feeling: if GROUP!s are conventionally being discarded, the USE would be breaking the rules by seeing it.

But there's a whole new narrative. You're always aware--and expect--each rule to have a return value, consumable by the rules above it. The last synthesized value drops out of the BLOCK! combinator just as with DO. So the unsettling aspect of "vanishing" results sometimes and not others is no longer a sticking point.

Maybe it seemed I was making a big deal over nothing at the time--but--I believe my feeling that something was off was key to the pursuit of the new design!

Power users might appreciate the brevity of the GET-GROUP! when situations warrant it. But I think I'm in agreement it needs a good word form.

We might call it REPARSE (in the spirit of REEVAL/REEVALUATE).

>> reeval (second [a: b:]) 1 + 2
== 3

>> b
== 3

There's a bit of a twist, because that's a variadic function that does something gnarlier than what UPARSE can do with a combinator. e.g. we'd have trouble with putting down an incomplete rule that "combinates" with things that come after it:

parse "aa" [reparse (second [opt some]) "a"]  ; remember UPARSE will be PARSE

That would require some kind of additional return signal from REPARSE to ask the BLOCK! combinator to run some more material. (Does anyone recall RETURN/REDO?...)

Another option would be INLINE. It's currently broken, but the idea was that it would be able to splice arbitrary code into the stream of execution:

>> stuff: [+ 2 *]

>> 1 inline stuff 3
== 9

Some of these ideas run up against fundamental problems, and won't work. In the above example, you face a problem that INLINE has to know a-priori if it's going to operate enfixedly and look to its left...and that has to be declared before it executes code.

But more modest scenarios can work, and INLINE may be a good name for it.