Exact Matching of Variables with the @ Types In UPARSE

I mentioned that the @ types were slated for use for matching the contents of a variable exactly. The most frequent example I have given is:

>> block: [some "a"]

>> uparse [[some "a"] [some "a"]] [some @block]
== [some "a"]  ; success gives result of last matching rule

So that's different than [some block], which would treat block as a rule.

Works with all types:

>> num: 1

>> uparse [1 1 1] [some @num]
== 1

I didn't mention things like @(gr o up) but those work too:

>> uparse [1 1 1] [some @(3 - 2)]
== 1

I realized I actually do not know how to write the above two cases in Red or Rebol2. You can't use the number as a plain variable in Red, since it acts as a repeat rule (UPARSE prohibits that, since it's a rule that takes an argument, you must use REPEAT for such behavior)

red>> num: 1

red>> parse [1 1 1] [some num]
*** Script Error: PARSE - invalid rule or usage of rule: 1

Also in Red, I'm not clear on why the following isn't an error, since the GROUP! product is just discarded:

red>> parse [1 1 1] [some (3 - 2)]
== false

This is something that would work in R3-Alpha, but doesn't in Red or Rebol2:

red>> parse [1 1 1] [some quote (3 - 2)]
== false

Your guess is as good as mine. Whatever the answer in their world is, it's not obvious. But I think the @ types give a clean answer in UPARSE.

But What About @[bl o ck] ?

We might say that it means match a block literally:

>> uparse [[some "a"] [some "a"]] [some @[some "a"]]
== [some "a"]

That would be wasteful, since we already have a way to match blocks literally by quoting them:

>> uparse [[some "a"] [some "a"]] [some '[some "a"]]
== [some "a"]

But UPARSE has changed the game for why @[...] and [...] can mean different things...because block rules synthesize values. And who's to say you might not want to match a rule and use its product as the literal thing to match against?

>> uparse [1 1 1 2] [@[some '10, (10 + 10) | some '1 (1 + 1)]]
== 2

In other words your rule can match and provide an answer for the thing to match next. We have zero experience with how often that might be useful, but maybe it is? :man_shrugging:

2 Likes

I think I've finally decided to declare @ to be a shorthand for "match any item at this position", and not take an argument.

This combinator's "long" form is called ONE (a replacement for historical SKIP, because reading [x: skip] and expecting that to store an item in a variable sounds like the opposite of skipping... and also, SKIP being arity-0 doesn't fit with the rest of the system... UPARSE instead has a SKIP combinator that takes how much to skip):

>> parse [#foo <bar>] [issue! one]
== <bar>

So now, it simply has a shorthand:

>> parse [#foo <bar>] [issue! @]
== <bar>

To justify why this isn't a fully arbitrary choice: when we see something like @var that's matching at the current position under the constraint of the provided variable:

>> block: [some "a"]

>> parse [[some "a"] [some "a"]] [some @block]
== [some "a"]

So it doesn't seem too crazy that when you take away the variable name that's being looked up for the constraint, you'd get a combinator that matches anything.

This Frees BLANK! Up For Literal Match Blank, Or Space

Since the dawn of the BLANK! datatype, I have wanted to use it for space in string PARSE (among other places), and literal match of blanks in blocks:

>> parse "a b" ["a" _ "b"]
== "b"

>> parse [a _ b] ['a _ 'b]
== b

But the idea of blank being either "match anything at current position" or "no-op" have been competing intents.

Even the question of being an underscore in strings comes up, but now that's done with quoting (which mold-matches any type)

>> parse "a_<b>" ['a '_ '<b>]
== <b>

The @ symbol is a bit bulkier for match any item here, but I think its bulk is to scale of its intent...and as I point out, puts it in the family of the other @xxx combinators.

We now have a pretty good answer for people who want a way to opt out of rules without using an empty block... use a void:

>> rule: if 1 = 2 [[some "b"]]
== ~void~  ; anti

>> parse "aaa" [rule some "a"]
== "a"

So I don't think it's necessary to dabble in the idea of having a fetched blank mean something different. I'm happy enough saying that the BLANK! combinator only applies in the rule as source, and gives you an error if you try to fetch it via word.

Though I will point out that @var has quoting semantics--as if the fetched var were in the rule block with one quote level added. Hence you would get the underscore behavior:

>> parse "a_<b>" ['a @blank '<b>]
== <b>

...but if fetching BLANK! from a WORD! did anything (though I think it shouldn't), it should be a no-op:

 >> parse "ab" ['a blank 'b]
 == b  ; not that I think it should do this, but if it DID do something...

If some amazingly compelling case for that shows up, then perhaps it should be enabled.

For Quoting, There's JUST and LITERAL

This means @ doesn't behave like it does in the main evaluator as an arity-1 operator for literalizing the subsequent argument.

But you have other options. JUST will "just" synthesize the value (don't match it), while LITERAL will match it (and synthesize if matched).

>> parse [] [just x]
== x

>> parse [''x] [literal ''x]
== ''x

LITERAL is nice when the thing you are matching has more than one quote level, because otherwise it can feel a little confusing:

>> parse [''x] ['''x]
== ''x

It's also nice if something has a quote mark in the name:

 >> foo': "foo prime"

 >> parse [foo'] ['foo']  ; hrrrm
 == foo'

 >> parse [foo'] [literal foo']
 == foo'

As a shorthand, there's LIT.

 >> parse [foo'] [lit foo']
 == foo'
1 Like

One downside I discovered of using a SIGIL! for "match anything" is that if you try and apply that more broadly outside of PARSE, you run into trouble if you're going to try using some sequence itself as a matching template.

For example, if you wanted a.1.2 to match against @.1.2. The @ isn't in the first position, it's a decoration on .1.2

Of course, there's going to be some trouble no matter what you pick... if it's legal to occur in that position, then you have to deal with the case that it's literally there.

But if it were * then it would at least afford:

*.1.2  ; matches a.1.2

['*].1.2   ; matches *.1.2

['[*]].1.2  ; matches [*].1.2

etc.

This is sort of a tangentially related thing, because if you try and apply the logic of PARSE to this matching scenario, then pretty much everything has to be in a block to quote it literally.

So whatever this "sequence-globbing" domain is, would be different.

Also, given that I'm talking about something that doesn't exist, what would * really mean?

 a.b.c.1.2  ; would this match *.1.2 but not ?.1.2

If we were to say that PARSE needed to bow to this, then it kind of suggests that ? would be "match any one item".

Anyway, just making the point here... that SIGIL!s are slippery. If PARSE is trying to set a precedent for a systemic recognizable idea of "match single item" then maybe it shouldn't be done with a sigil.

(But note that _ is equally problematic for sequences, as it makes the slot disappear--and moreover, is no longer legal except in head and tail sequence positions. So it wouldn't be better than @)