Literal Matching with the @ Types In UPARSE

I mentioned that the @ types were slated for use for literal matching. The most frequent example I have given is:

>> block: [some "a"]

>> uparse [[some "a"] [some "a"]] [some @block]
== [some "a"]  ; success gives result of last matching rule

Works with all types:

>> num: 1

>> uparse [1 1 1] [some @num]
== 1

I didn't mention things like @(gr o up) but those work too:

>> uparse [1 1 1] [some @(3 - 2)]
== 1

I realized I actually do not know how to write the above two cases in Red or Rebol2. You can't use the number as a plain variable in Red, since it acts as a repeat rule (UPARSE prohibits that, since it's a rule that takes an argument, you must use REPEAT for such behavior)

red>> num: 1

red>> parse [1 1 1] [some num]
*** Script Error: PARSE - invalid rule or usage of rule: 1

Also in Red, I'm not clear on why the following isn't an error, since the GROUP! product is just discarded:

red>> parse [1 1 1] [some (3 - 2)]
== false

This is something that would work in R3-Alpha, but doesn't in Red or Rebol2:

red>> parse [1 1 1] [some quote (3 - 2)]
== false

Your guess is as good as mine. Whatever the answer in their world is, it's not obvious. But I think the @ types give a clean answer in UPARSE.

But What About @[bl o ck] ?

In the past I suggested that one reason why @[...] might be taken for datatypes is because in cases like this, there'd be no difference between @[bl o ck] and '[block]. I was imagining these being synonyms, because I couldn't think of anything else (since plain block was already "run rule"):

>> uparse [[some "a"] [some "a"]] [some '[some "a"]]
== [some "a"]

>> uparse [[some "a"] [some "a"]] [some @[some "a"]]
== [some "a"]  ; "wasteful application of @[...], so why not datatype?"

But UPARSE has changed the game for why @[...] and [...] can mean different things...because block rules synthesize values. And who's to say you might not want to match a rule and use its product as the literal thing to match against?

>> uparse [1 1 1 2] [@[some '10, (10 + 10) | some '1 (1 + 1)]]
== 2

In other words your rule can match and provide an answer for the thing to match next. We have zero experience with how often that might be useful. But it does have meaning, which I guess is probably the death knell for using the @ types as DATATYPE!.

So this all looks pretty good. But back to the drawing board for types.

2 Likes

Usage is making me wonder if this is the best use for @, or if it would be more helpful for literal non-matching... e.g. synthesizing values like a GROUP! would.

>> uparse "a" [collect [keep @[keep some], keep <any>]]
== [keep some "a"]

>> uparse "a" [collect [keep @ 'keep, keep <any>]]
== [keep "a"]

Doing this kind of thing right now has several ugly alternatives:

>> uparse "a" [collect [keep ^('keep), keep <any>]]
== [keep "a"]

>> uparse "a" [collect [keep ([keep]), keep <any>]]
== [keep "a"]

"What's so ugly about keep ([keep])", you might ask. The problem isn't with it when it's written literally like that. It's when you're trying to build a rule with COMPOSE and the contention over GROUP! and trying to nest inside of it becomes a bummer.

Let's say you're trying to build a KEEP parse rule inside a parse rule (which I am actually at this moment trying to do in the whitespace interpreter project):

name: "Binky"
uparse ... [... collect [
     keep (compose [keep (to word! name)])
] ...]

Okay so that gives you [keep Binky]. Not what you wanted. You can quote it...

keep (compose [keep '(to word! name)])

Now you've got [keep 'Binky]. But you're not trying to match Binky in the input, you're trying to synthesize it out of thin air. Need it in a GROUP!... let's just go ahead and use the wacky engroup operator to do that:

keep (compose [keep (engroup quote to word! name)])

So are we set, we've got [keep ('Binky)]? Well, no... because KEEP demands evaluative values be quoted, and that quote is vaporized during the group evaluation so KEEP is getting a plain Binky word. D'oh, so we need another quote on there somehow. The ^ operator is one way, there are others that are more verbose:

keep (compose [keep ^(engroup quote to word! name)])

Notably you can't say keep '(engroup quote to word! name) because that becomes keep '('Binky), which would look literally for a match in the input of a GROUP! like ('Binky). :roll_eyes:

What A Freaking PITA... :cloud_with_rain: ...We Can Do Better!

If @ were used for literalization and synthesis of a non-match, we can make something less head-scratchy:

keep (compose [keep @ '(to word! name)])

Poof. If you wanted to quote inside the group you could do that too (effectively putting the "ONLY" on the value, as you would for keeping a block literally):

keep (compose [keep @(quote to word! name)])

The key to seeing why this breaks us out of the problem is that it lets us get at literal values without a GROUP!, which means we aren't trying to COMPOSE inside of COMPOSE groups.

And there are lots of places we could benefit from this, I think.

repeat ([2 3]) rule  =>  repeat @[2 3] rule

Some situations might be a bit unsettling for people who don't see the @ as being quoting, e.g.:

repeat (2) rule  =>  repeat @ 2 rule

The 2 might look a bit too attached to the rule, if you don't realize the @ takes one unit of quoted parameterization to its right... not a full combinatorized parser like [2 rule]

But, I think it's just something you would have to get used to. Like I say, it's actually this "let the @ stand off as a separate value" form that breaks the Gordian knot, when it comes to deeper forms of composition.

What Would @(...) Mean In This Context?

I guess it would just have to mean literally that GROUP, e.g. a synonym for @ (...)?

Because if it evaluated the group, what would it do to the result that would make it "more literal" than it already was? You have ^(...) to add a quote level already--which wouldn't make sense for @.

It's an odd thing, but not useless. After all, keep @[a b] is only one less character than keep ([a b])... though it would be more efficient due to knowing it doesn't need to run the evaluator on a GROUP!. And keep @('foo) saves an additional character due to not needing to quote the group, as in keep ('('foo)).

Hm, so actually, I can see that being rather useful.

How To Literal Match If Not With @ ?

We were back to the drawing board on this when @ started acting like META, but then got it back when the ^XXX types came around...

And now we're back again.

You can splice the value in as a quoted rule, so at least there are options... :(quote var) or :(^var)

>> block: [some "a"]
>> uparse [[some "a"] [some "a"]] [some :(quote block)]
== [some "a"]

Not my favorite, but by no means incoherent.

Though I'd already noticed one nagging missing point on literal matches, which was how to literally match a fetched block as a splice.

block: [a b]
uparse [a b a b] [some ?operation? (block)]

When @block was a literal match, it had [[a b] [a b]] covered, but not this. I don't know what that means, but it just pointed out to me there was something else afoot.

Still available at the moment are .foo and .(foo), and /foo and /(foo) -- though I don't like those for this purpose.

I've been eyeing $foo $[foo] $(foo) etc for reasons like getting environment variables in shell dialects, and because I honestly think the potential here goes pretty far beyond MONEY!. It would take some reckoning in the type system to get those available.

If $ becomes synonymous with "substitution", then might be that $(foo) makes more sense for "substitute this expansion as a rule" than GET-BLOCK! does, which might let the GET-XXX! variations mean "use this as a literal value".

For Now I Need to Change @ So Code Gets Better... so...

1 Like

So trying this out, it does solve my problem, but it creates a disconnect I'm not totally comfortable with...

The @ operator itself has a parity. In the normal evaluator:

>> var: @ x
== x

>> var
== x

And in UPARSE:

>> uparse "" [var: @ x]
== x

>> var
== x

All's good so far... this is the Gordian-knot slicer I talked about. Synthesizing values out of thin air without a GROUP!, so it plays nicely with COMPOSE.

But I think this difference may be a mistake:

>> var: @[a b]
== @[a b]

>> uparse "" [var: @[a b]]
== [a b]

While we accept that UPARSE and regular code act differently, I'm not clear on why @ would act the same but @[...] wouldn't. The concept was maybe making it easier to do REPEAT ranges, but...well, if you wanted that, why not make REPEAT allow you to give it either BLOCK! or THE-BLOCK! ?

This also lets us maybe experiment with that trick I was suggesting, where even though evaluative types don't get added to blocks without a quote that maybe the @[...] types would.

>> uparse "a" [collect [keep @keep, keep <any>]]]  ; no quote or ^ needed
== [keep #a]

This way you have both options for blocks:

>> uparse "a" [collect [keep @[keep some], keep <any>]]
== [[keep some] #a]

>> uparse "a" [collect [keep @ [keep some], keep <any>]]
== [keep some #a]

I think I was trying to make those both do the same thing, but when you look at the broader picture, that doesn't seem all that useful. If that inconsistency bothers you, then use THE instead:

>> uparse "a" [collect [keep @[keep some], keep <any>]]
== [[keep some] #a]

>> uparse "a" [collect [keep the [keep some], keep <any>]]
== [keep some #a]

If we had infinite symbols on the keyboard I might propose the symbolic THE use something else besides @, but, we don't have infinite symbols and I think I can wrap my head around the rule:

@ x => x
@x => @x

I haven't fully absorbed the implications here, but I do think this concept of having the @xxx types being a new "do whatever you want with it" argument type that plays along with combinators is pretty interesting. The ANY rule could accept it, for example:

uparse [10 <twenty> "hello" 20 304] [some any @[integer! tag! text!]]

Which to my eyes feels a little bit more comfortable than:

uparse [10 <twenty> "hello" 20 304] [some any ([integer! tag! text!])]

I'll just have to see how it comes along.

1 Like