Literal Matching with the @ Types In UPARSE

I mentioned that the @ types were slated for use for literal matching. The most frequent example I have given is:

>> block: [some "a"]

>> uparse [[some "a"] [some "a"]] [some @block]
== [some "a"]  ; success gives result of last matching rule

Works with all types:

>> num: 1

>> uparse [1 1 1] [some @num]
== 1

I didn't mention things like @(gr o up) but those work too:

>> uparse [1 1 1] [some @(3 - 2)]
== 1

I realized I actually do not know how to write the above two cases in Red or Rebol2. You can't use the number as a plain variable in Red, since it acts as a repeat rule (UPARSE prohibits that, since it's a rule that takes an argument, you must use REPEAT for such behavior)

red>> num: 1

red>> parse [1 1 1] [some num]
*** Script Error: PARSE - invalid rule or usage of rule: 1

Also in Red, I'm not clear on why the following isn't an error, since the GROUP! product is just discarded:

red>> parse [1 1 1] [some (3 - 2)]
== false

This is something that would work in R3-Alpha, but doesn't in Red or Rebol2:

red>> parse [1 1 1] [some quote (3 - 2)]
== false

Your guess is as good as mine. Whatever the answer in their world is, it's not obvious. But I think the @ types give a clean answer in UPARSE.

But What About @[bl o ck] ?

In the past I suggested that one reason why @[...] might be taken for datatypes is because in cases like this, there'd be no difference between @[bl o ck] and '[block]. I was imagining these being synonyms, because I couldn't think of anything else (since plain block was already "run rule"):

>> uparse [[some "a"] [some "a"]] [some '[some "a"]]
== [some "a"]

>> uparse [[some "a"] [some "a"]] [some @[some "a"]]
== [some "a"]  ; "wasteful application of @[...], so why not datatype?"

But UPARSE has changed the game for why @[...] and [...] can mean different things...because block rules synthesize values. And who's to say you might not want to match a rule and use its product as the literal thing to match against?

>> uparse [1 1 1 2] [@[some '10, (10 + 10) | some '1 (1 + 1)]]
== 2

In other words your rule can match and provide an answer for the thing to match next. We have zero experience with how often that might be useful. But it does have meaning, which I guess is probably the death knell for using the @ types as DATATYPE!.

So this all looks pretty good. But back to the drawing board for types.


Usage is making me wonder if this is the best use for @, or if it would be more helpful for literal non-matching... e.g. synthesizing values like a GROUP! would.

>> uparse "a" [collect [keep @[keep some], keep <any>]]
== [keep some "a"]

>> uparse "a" [collect [keep @ 'keep, keep <any>]]
== [keep "a"]

Doing this kind of thing right now has several ugly alternatives:

>> uparse "a" [collect [keep ^('keep), keep <any>]]
== [keep "a"]

>> uparse "a" [collect [keep ([keep]), keep <any>]]
== [keep "a"]

"What's so ugly about keep ([keep])", you might ask. The problem isn't with it when it's written literally like that. It's when you're trying to build a rule with COMPOSE and the contention over GROUP! and trying to nest inside of it becomes a bummer.

Let's say you're trying to build a KEEP parse rule inside a parse rule (which I am actually at this moment trying to do in the whitespace interpreter project):

name: "Binky"
uparse ... [... collect [
     keep (compose [keep (to word! name)])
] ...]

Okay so that gives you [keep Binky]. Not what you wanted. You can quote it...

keep (compose [keep '(to word! name)])

Now you've got [keep 'Binky]. But you're not trying to match Binky in the input, you're trying to synthesize it out of thin air. Need it in a GROUP!... let's just go ahead and use the wacky engroup operator to do that:

keep (compose [keep (engroup quote to word! name)])

So are we set, we've got [keep ('Binky)]? Well, no... because KEEP demands evaluative values be quoted, and that quote is vaporized during the group evaluation so KEEP is getting a plain Binky word. D'oh, so we need another quote on there somehow. The ^ operator is one way, there are others that are more verbose:

keep (compose [keep ^(engroup quote to word! name)])

Notably you can't say keep '(engroup quote to word! name) because that becomes keep '('Binky), which would look literally for a match in the input of a GROUP! like ('Binky). :roll_eyes:

What A Freaking PITA... :cloud_with_rain: ...We Can Do Better!

If @ were used for literalization and synthesis of a non-match, we can make something less head-scratchy:

keep (compose [keep @ '(to word! name)])

Poof. If you wanted to quote inside the group you could do that too (effectively putting the "ONLY" on the value, as you would for keeping a block literally):

keep (compose [keep @(quote to word! name)])

The key to seeing why this breaks us out of the problem is that it lets us get at literal values without a GROUP!, which means we aren't trying to COMPOSE inside of COMPOSE groups.

And there are lots of places we could benefit from this, I think.

repeat ([2 3]) rule  =>  repeat @[2 3] rule

Some situations might be a bit unsettling for people who don't see the @ as being quoting, e.g.:

repeat (2) rule  =>  repeat @ 2 rule

The 2 might look a bit too attached to the rule, if you don't realize the @ takes one unit of quoted parameterization to its right... not a full combinatorized parser like [2 rule]

But, I think it's just something you would have to get used to. Like I say, it's actually this "let the @ stand off as a separate value" form that breaks the Gordian knot, when it comes to deeper forms of composition.

What Would @(...) Mean In This Context?

I guess it would just have to mean literally that GROUP, e.g. a synonym for @ (...)?

Because if it evaluated the group, what would it do to the result that would make it "more literal" than it already was? You have ^(...) to add a quote level already--which wouldn't make sense for @.

It's an odd thing, but not useless. After all, keep @[a b] is only one less character than keep ([a b])... though it would be more efficient due to knowing it doesn't need to run the evaluator on a GROUP!. And keep @('foo) saves an additional character due to not needing to quote the group, as in keep ('('foo)).

Hm, so actually, I can see that being rather useful.

How To Literal Match If Not With @ ?

We were back to the drawing board on this when @ started acting like META, but then got it back when the ^XXX types came around...

And now we're back again.

You can splice the value in as a quoted rule, so at least there are options... :(quote var) or :(^var)

>> block: [some "a"]
>> uparse [[some "a"] [some "a"]] [some :(quote block)]
== [some "a"]

Not my favorite, but by no means incoherent.

Though I'd already noticed one nagging missing point on literal matches, which was how to literally match a fetched block as a splice.

block: [a b]
uparse [a b a b] [some ?operation? (block)]

When @block was a literal match, it had [[a b] [a b]] covered, but not this. I don't know what that means, but it just pointed out to me there was something else afoot.

Still available at the moment are .foo and .(foo), and /foo and /(foo) -- though I don't like those for this purpose.

I've been eyeing $foo $[foo] $(foo) etc for reasons like getting environment variables in shell dialects, and because I honestly think the potential here goes pretty far beyond MONEY!. It would take some reckoning in the type system to get those available.

If $ becomes synonymous with "substitution", then might be that $(foo) makes more sense for "substitute this expansion as a rule" than GET-BLOCK! does, which might let the GET-XXX! variations mean "use this as a literal value".

For Now I Need to Change @ So Code Gets Better... so...

1 Like

So trying this out, it does solve my problem, but it creates a disconnect I'm not totally comfortable with...

The @ operator itself has a parity. In the normal evaluator:

>> var: @ x
== x

>> var
== x

And in UPARSE:

>> uparse "" [var: @ x]
== x

>> var
== x

All's good so far... this is the Gordian-knot slicer I talked about. Synthesizing values out of thin air without a GROUP!, so it plays nicely with COMPOSE.

But I think this difference may be a mistake:

>> var: @[a b]
== @[a b]

>> uparse "" [var: @[a b]]
== [a b]

While we accept that UPARSE and regular code act differently, I'm not clear on why @ would act the same but @[...] wouldn't. The concept was maybe making it easier to do REPEAT ranges, but...well, if you wanted that, why not make REPEAT allow you to give it either BLOCK! or THE-BLOCK! ?

This also lets us maybe experiment with that trick I was suggesting, where even though evaluative types don't get added to blocks without a quote that maybe the @[...] types would.

>> uparse "a" [collect [keep @keep, keep <any>]]]  ; no quote or ^ needed
== [keep #a]

This way you have both options for blocks:

>> uparse "a" [collect [keep @[keep some], keep <any>]]
== [[keep some] #a]

>> uparse "a" [collect [keep @ [keep some], keep <any>]]
== [keep some #a]

I think I was trying to make those both do the same thing, but when you look at the broader picture, that doesn't seem all that useful. If that inconsistency bothers you, then use THE instead:

>> uparse "a" [collect [keep @[keep some], keep <any>]]
== [[keep some] #a]

>> uparse "a" [collect [keep the [keep some], keep <any>]]
== [keep some #a]

If we had infinite symbols on the keyboard I might propose the symbolic THE use something else besides @, but, we don't have infinite symbols and I think I can wrap my head around the rule:

@ x => x
@x => @x

I haven't fully absorbed the implications here, but I do think this concept of having the @xxx types being a new "do whatever you want with it" argument type that plays along with combinators is pretty interesting. The ANY rule could accept it, for example:

uparse [10 <twenty> "hello" 20 304] [some any @[integer! tag! text!]]

Which to my eyes feels a little bit more comfortable than:

uparse [10 <twenty> "hello" 20 304] [some any ([integer! tag! text!])]

I'll just have to see how it comes along.

1 Like

This has become a real sticking point, between two really good potential uses for the @XXX types in PARSE.

  • It's neat to have them be completely inert. That lines up with their behavior in the normal evaluator, and by having them just passed through by the PARSE dialect you can make your combinators react to them in novel ways.

  • But without a way of saying "I mean match this variable literally", you can't do things like num: 1, parse [1 1 1] [some @num]

One of the motivators to support the neutral interpretation was to allow things like PARSE's COLLECT+KEEP to build on this behavior:

>> append [a b c] @d
== [a b c d]

You could do similar things with:

>> uparse [1 2] [collect some [keep @hello, keep integer!]]
== [hello 1 hello 2]

It's messier without this, especially with the rules prohibiting raw evaluative values. You have to quote the WORD! to keep it from evaluating, put it inside a group, and then either META it or ONLY it:

>> uparse [1 2] [collect some [keep ^('hello), keep integer!]]
== [hello 1 hello 2]

>> uparse [1 2] [collect some [keep only ('hello), keep integer!]]
== [hello 1 hello 2]

Arguably you could do this with a combinator like THE, or if lone @ was allowed to do the behavior:

>> uparse [1 2] [collect some [keep only the hello, keep integer!]]
== [hello 1 hello 2]

>> uparse [1 2] [collect some [keep only @ hello, keep integer!]]
== [hello 1 hello 2]

Is The Inert-Only Rule Cure Worse Than The Disease?

I'm not really ready to give up on the rules for block mechanics that have emerged, which make meta usage with variables very sensible:

; if var is a BAD-WORD! isotope (like ~unset~ isotope) then ^var will
; be a plain BAD-WORD!, which APPEND won't accept unless you quote it
; if var is NULL then ^var will still be NULL, which append won't accept
; unless you TRY it to produce a BLANK!, where it will be a no-op
; if var is anything else, it will get quoted and appended as-is
append data ^var

There's nice properties there...something about that feels just right. The problem we wind up with isn't when things are in variables, but how to work with literal content. Blocks are certainly an option:

append data [whatever]

Going back to PARSE, it winds up looking like this:

>> uparse [1 2] [collect some [keep ([hello]), keep integer!]]
== [hello 1 hello 2]

>> uparse [1 2] [collect some [keep the [hello], keep integer!]]
== [hello 1 hello 2]

This all feels close but it's not quite there. If I squint long enough maybe I'll see it.

Maybe this really is just pointing to the fact that we need another part of speech, the $xxx ?

One idea here, would be: uparse [a b a b] [some $((block))]

So $block would assume you meant as-is, and $(block) would similarly make the same assumption, but $((block)) would think you wanted to match across the spliced elements.

It's an idea.

Things here need sorting out, but after all this time there's no point in making hasty bad decisions. Keep working on it until it's right.


We need an answer to matching literal content

I've been punting on this bug in SPLIT:

>> split [a <t> b c <t>] <t>  ; expecting [[a] [b c] []]
** Error: TAG! combinator must be <here> or <end> ATM
** Near: [to dlm ** | to <end>]

That error is coming from PARSE3, not PARSE (which is now what UPARSE is called, as it is decidedly superior semantics, and I don't want to teach any new users PARSE3/PARSE2).

Even though PARSE3 doesn't have much in the way of TAG! combinators (just <here> and <end>) it's aware of the PARSE state-of-the-art, and letting us know that using <t> as a rule will not match a literal <t> tag. (Unless you rewire your parse combinators and say that's what you want.)

With @ for literal, this could be solved in SPLIT by saying [to @dlm | to <end>]

This doesn't solve the case of wanting to match spliced blocks:

>> data: [a b]

>> parse [a b a b] [tally ???data???]  ; solve for ???
== 2

I suggested one possibility, as:

>> parse [a b a b] [tally @((data))]
== 2

That's creative. Anyway, this interpretation of @ is contentious with the other interpretation, which was "be inert"... so that combinators could decide what to do with the item.

However, some of the cases for that aren't really interesting anymore. I had theorized that as being a way to pull blocks out of the "BLOCK! combinator interpretation"

parse data [... any @[integer! text!] ...]

The idea that usually, [integer! text!] would mean integer followed by text...but the @ would tell us that the block was being passed literally to ANY, as if you'd said:

parse data [... any ([integer! text!]) ...]

But I ruled that ANY is perfectly allowed to decide its parameter will not be interpreted as a combinator, but taken as a literal BLOCK! This is to say that combinators have parameter conventions just like ordinary functions do... which is why you don't have to write for-each 'var but can say for-each var.

This is just how the cookie bounces in this language, where context is everything. I've had similar verdicts on the way things like GROUP! work, in branching:

>> if true (print "in group" [print "in block"])
in group
in block

>> if false (print "in group" [print "in block"])
; void

So in the vein of "how I learned to stop worrying and love the BLOCK!", this feels fine:

parse data [... any [integer! text!] ...]

What About Keeping Literal Data?

I'd argued that this was important:

parse data [collect [... keep @word ...]]
; vs
parse data [collect [... keep ^('word) ...]]
; or
parse data [collect [... keep ([word]) ...]]
; etc.

I had a particularly irritating use case of synthesizing parse rules with COMPOSE, where the "simple" desire to synthesize a rule that kept literal content got annoyingly complex... and letting @ serve the purpose of a signal to KEEP to literalize seemed like the way out.

But generally, I've soured on the @-word KEEP technique.

>> append [a b c] @d
== [a b c d]   ; I've developed no great love for this handling of @d

I know it's been a while since this has been at the forefront of discussion, but...I think I've decided this sucks.

We've dug the hole that APPEND and KEEP and all their friends splice by default. That's the call that was made, and we should live with it. Use a BLOCK!.

>> append [a b c] [d]
== [a b c d] 

But this does put us in a bind with generative rules. We're saying if you have a simple WORD! in a variable that you want to generate a KEEP for, you have to somehow get that word inside a block (to suppress evaluation and avoid splice warnings), and then get that inside a GROUP! (to avoid parse trying to interpret it as something matched).

There is a keyword which was proposed for this purpose, it was the proposal known as JUST.

 >> append [a b c] just d
 == [a b c d]

So JUST takes its argument literally, and then quotes it. It's a shorthand for QUOTE THE

 >> append [a b c] quote the d
 == [a b c d]

If PARSE had JUST, it would be assumed that you were outside of the domain of matching by using it.

 >> parse [1 2] [collect some [keep just negate, keep integer!]]
 == [negate 1 negate 2]

What nags at me is that the "naked" negate...just in there inline like it's a keyword...feels more uncomfortable than the block-in-group.

When you say keep ([negate]) you can conceptually break down why it works:

  • It's in a GROUP!, so there's no matching going on

    • Users have to know that all GROUP!s evaluate to synthesize a product, and don't advance the parse position in the process of doing so.
  • Inside the group, it's in a BLOCK!, so you know it's not acting as a keyword or function.

You can look at that and use your general knowledge to get it: yes, I see why this works, in two clear steps.

But... it's time to JUST do it :athletic_shoe: ...and add JUST

The way to think about this is probably to say that the casual user goes with the BLOCK!-in-GROUP because it's clear and it's easy to make at source level...then JUST is targeted at the power users in generative code scenarios.

Someone working with generated code will prioritize ease-of-generation vs. source-level-obviousness. After all, the code is being generated, so you're not going to read it unless you're sophisticated enough to not be thrown off by JUST and a naked NEGATE.

And with this change, we can absorb the @XXX for the application of the missing functionality...the literal match. (I'll retrofit PARSE3 so it works there as well.)