Time to Meet Your MATCH... (...dialect)

hostilefork · January 11, 2019, 8:59am

MATCH is a handy tool for testing a value against some basic rules, and passing it through if they match...or evaluating to null if they don't. The rules can be combined in some interesting ways that make them rather powerful!

It uses the "match dialect". This looks pretty simple on the surface, like what you put in a function spec block for the legal types:

>> match [integer! tag!] 1020
== 1020

>> match [integer! tag!] "this text value won't match"
;-- null

>> match [integer! tag!] <matches!>
== <matches!>

But with the new features of generalized quoting, now you can test for quotedness.

>> match ['word!] first [foo]
;-- null

>> match ['word!] first ['foo]
== 'foo

It actually dereferences what you give it and sums up the quote levels. So you can do things like:

 >> quoted-word!: quote word! ;-- Note: during transition, QUOTE is called UNEVAL

 >> match 'quoted-word! first [''foo]
 == ''foo

So since there is a quote level in the QUOTED-WORD! value, that gets added in with the quote on the match, so it looks for a doubly quoted value.

The premise is that each MATCH rule component is one item, and even types like INTEGER! are used...to test a length:

 >> match 2 [a b]
 == [a b]

 >> match 2 [a b c]
 ;-- null

You can use single arity ACTION!s as well, if you use a GET-WORD! or GET-PATH! to indicate them:

 >> match :odd? 304
 ;-- null

 >> match :lib/even? 1020
 == 1020

BLOCK! will OR rules together, PATH! will AND them

So this is a cool little trick:

>> match block!/2 [a b]
== [a b]

>> match text!/2 [a b]
;-- null

>> match text!/2 "ab"
== "ab"

>> match [block! text!]/2 "ab"
== "ab"

>> match '''[block! text!]/2 lit '''[a b]
== '''[a b]

>> match [integer!/[:even?] block!/[:empty?]] []
== []

>> match [integer!/[:even?] block!/[:empty?]] 1020
== 1020

Pretty cool huh? And as I mentioned, you can factor these rules out like in PARSE... note also that instead of :empty? you can just use 0.

 >> even-int!: lit integer!/[:even?]
 >> empty-block!: lit block!/0

 >> match [even-int! empty-block!] []
 == []

 >> match [even-int! empty-block!] [a b]
 ;-- null

MATCH has an automatic erroring form, called ENSURE

If you want a quick and dirty way to typecheck something and pass it through, but error otherwise, use ENSURE.

>> ensure [even-int! empty-block!] [a b]
** Script Error: ...

>> ensure [even-int! empty-block!] 1020
== 1020

MATCH is now built in as a PARSE keyword...

The quoting features of MATCH were important for PARSE to help pick up the slack after LIT-WORD!.

>> did parse ['a b 'c d] [some [match [word! 'word!]] end]
== #[true]

There was a little bit of a incongruity previously, which is that MATCH did not want to quote its first argument. So you couldn't say match 'word! lit ['foo] and have it match, because the evaluator would strip off the quote. When all things were considered, it seemed to make more sense to have MATCH soft-quote its first argument, so it doesn't throw away the quote marks...but uses them in the rule.

PARSE then has a compatible expression, without a block! needed:

>> did parse ['a 'b 'c 'd] [some [match 'word!] end]
== #[true]

`<opt>`, falsey values, and /ELSE

You might imagine a lot of code wants to say if match [...] whatever [...]. This could lead to unsatisfactory results if the thing you're matching is a null, blank, or logic false -- even though you matched, the falsey nature of the thing you were testing would foil your intent.

To help catch those errors, any falsey input that matches will be voidified. So at least you'll get a clear error if you used the result. But since voids are values, you'll be okay if you use THEN or ELSE

>> match [<opt> integer!] null then [print "matched, and void cued then!"]
matched, and void cued then!

There's also an /ELSE refinement, so you can provide a branch of code to run if there's no match...and it won't mutate the result at all:

>> match/else [<opt> integer!] null [print "didn't match" 100]
;-- null

>> match/else [<opt> integer!] #foo [print "didn't match" 100]
didn't match
== 100

Useful Dialect, Good Testbed for PATH!s

You can see a detail above of how I want to use things like :even? as a test, and then use PATH! for AND-ing tests together generically. But then, :even?/integer! is a GET-PATH!, while integer!/:even? would be an ordinary path. The meaning gets confuzzled... how would you specify a function with refinements, or otherwise get something out of a path?

obj: make object! [even-int: lit integer!/[:even?]]
match :obj/even-int 4

To get this distinction, we have to treat :[:obj]/even-int differently from :obj/even-int. And this really does suggest to me that the notion of allowing GET-WORD!s, SET-WORD!s, and LIT-WORD!s in PATH! is a mistake...it doesn't generalize and will fall down at the head and tail. Even when it works, it's ugly.

I think this is going in the direction of making PATH! a stronger dialecting part. And hopefully, with more good examples we can keep pushing on some of the other things (like "does Rebol need a date format with slashes in it, and if so can it be accomplished naturally as a PATH!")...

BlackATTR · January 11, 2019, 5:15pm

Very cool. Should be muy bueno for simple lexing and tokenizing values of a dialect-- as well as the handling of syntax errors.

hostilefork · January 11, 2019, 5:21pm

The real thing to look at here, is the way that GET-WORD! and GET-PATH! are used, and why I'm prescribing that we disallow direct usage of GET-WORD!, SET-WORD!, and LIT-WORD! directly in PATH!, GET-PATH!, or SET-PATH!.

Now that there is a controlled point of path creation, and immutability after that, these rules are possible. I am looking at this idea of making a/(b): c equally efficient to a/:b: c, and if I can do so, then I think the dialect design of things using paths will get good guidance and be more solid to prohibit the latter. Because it's incoherent in the long run... if you allow a/:b: c, what's wrong with ::a/:b::c etc.

I kind of feel like this is one of the first attempts to get ambitious with PATH! in a dialect. It's been hard to do, because for reasons beyond my ability to understand, DocKimbel has defended (a + b)/c being interpreted as (a + b) /c, as a GROUP! and then a REFINEMENT! (further in the ANY-WORD! dept. vs. a PATH!). It's kind of a house of cards, IMO, and so anything done to tighten the whole thing up is good.

Clearly I'm angling for a very different idea, and the goal is specifically to enable dialect design.
PATH! should not be just about what the evaluator does with it, but these kinds of usages and beyond. It may be different from BLOCK! and GROUP!, but it can come into its own, I think.

hostilefork · January 29, 2021, 1:27pm

So MATCH has become an extremely useful tool, used all the time.

But some of the wilder things it did in trying to become a matching dialect turned out to be junk. In surveying how the type block acts I mentioned that the weirder features in MATCH were not things we were likely going to want to carry forward to build on.

Not only does the C code implementing it suck, the syntax is ugly:

I'd already dropped the idea of MATCH quoting its first argument. That means it doesn't see the number of quotes you put on it's argument unless you put it inside a block:

>> match ['integer!] just '10
== #[true]

We can still talk about whether that's a great idea or not.

PATH!s For And Was Ugly

The concept that each clause in a BLOCK! is an OR makes sense with the type dialect. But using PATH!s for AND is pretty hideous.

GROUP!s might be more palatable:

>> match [(integer! :even?) (block! :empty?)] 1020
== 1020

Should Functions Need The GET-WORD! ?

Is the GET-WORD! even necessary? Could we assume that any functions that test values have names that suggest they do so...and understand that we aren't actually calling any functions?

>> match [(integer! even?) (block! empty?)] 1020
== 1020

The reason it was done with a GET-WORD! initially was for consistency with when you didn't give the rule as a BLOCK!

>> match :even? 10
== #[true]

>> match [:even?] 10
== #[true]

But is that interesting? If consistency of that kind is so important, might it be better to say MATCH always takes a BLOCK! or... a GROUP!?

Another option would be to use predicate format for functions and preface them with a dot, which would help call out that they were functions but be a bit less jarring:

>> match [(integer! .even?) (block! .empty?)] 1020
== 1020

GROUP! Can Be Used As The Main Match

With the argument no longer having quoting mean the argument is quoted, you could use a GROUP! as the main match:

>> match '(integer! even?) <not an integer>
; null

>> match '(integer! even?) 304
== 304

>> match '([block! text!] 2) "ab" 
== "ab"  ; acted like `parse try match [block! text!] [2 skip] "ab"`

Bear In Mind PARSE is now MATCH-ish

Before going too far in terms of the powers of MATCH... I should point out that now that PARSE returns its input on success and is back to require reaching the end by default, it can be used for matching purposes...e.g. "tuple"-style matches

 >> parse [1020 "hello"] [integer! text!]
 == [1020 "hello"]

PARSE is looking at sequence by default, while MATCH is looking at alternates. MATCH does not "destructure" its input...all its tests are running on the same single value.

Should (Cleaned Up) MATCH Be The Function Arg Dialect?

It was the intention that the parameter to MATCH would be the same format as the blocks used for type checking arguments.

But when you think about reading the HELP, it gets a bit verbose. It's as if anyone who comes up with a sufficiently complex parameter spec should probably name it and make a function for it.

I'm assuming no one used any of the weird MATCH features. But would you be more likely if it used GROUP!s and didn't have the need for the GET-WORD!s on functions?

hostilefork · September 22, 2021, 11:09pm

I'm getting more merciless about eliminating anything that is "dicey", and basically every fringe idea of match is in that category.

As I do that, let me quickly review some of the justification and how things continue to evolve:

PARSE design no longer returns input on match...but it's extremely flexible and accommodates that, with the <input> tag combinator:

>> uparse "aaa" [some "a" <input>]
== "aaa"

If you want that to work with any arbitrary rules on the left, the inline sequencing combinator makes it clean to do so:

>> uparse "aaa" [some "a" | some "b" || <input>]
== "aaa"

; possibly easier way of writing [[some "a" | some "b"] <input>]
; (at least easier in the sense you can tack it on just at the end
; without having to go back to the start).

Some of the more "resonable" match concepts were things like:

>> match block!.3 [a b c]
== [a b c]  ; matched a 3 element block

>> match block!.3 [a b c d]
; null

Expressing the same thing in a UPARSE rule is certainly not as brief, and you'd have to do the block match before the parse even started:

uparse (try match block! data) [3 <any> || <input>]

Or you could try putting the data into a BLOCK! and using INTO...
uparse :[data] [into block! 3 <any> || <input>]

But main point being, that although I can point to some examples where the MATCH dialect could be a good way of filtering values, it is the kind of thing that in the C code just represents a liability.

It's nice, but it's not making or breaking anyone's day...besides the feature of type checking that I use constantly (and is the basis for ENSURE as well). There's been really interesting progress from how that was implemented originally on the basis of something called EITHER-MATCH, until null isotopes came and made it possible to solve the whole thing with just one MATCH function.

Anyway MATCH and its as-far-as-I-know-unused parse incarnation are now gone. I'll point out that in UPARSE, you get MATCH with ANY... as in any @[integer! tuple! word!]

Sample Weird Stuff That Is Not Going To Be Worried About Now

The real reason for keeping these around had been testing the robustness of weird path forms, but they've outlived their usefulness...

(1020 = match [integer!/[:even?]] 1020)
(null = match [integer!/[:odd?]] 304)
([a b] = match [block!/2 integer!/[:even?]] [a b])
(null = match [block!/3 integer!/[:even?]] null)
(304 = match [block!/3 integer!/[:even?]] 304)
(null = match [block!/3 integer!/[:even?]] 303)

There was an idea that the quoting level of the test would match the quoting level of the type, so match ''integer! first [''1] would match. That got killed off a while ago, here were some notes on that:

; !!! There was once special accounting for where the quoting level of the
; test would match the quoting level of the rule:
;
;    (the 'foo = match the 'word! the 'foo)
;    (null = match the 'word! the foo)
;
;    quoted-word!: quote word!
;    (''foo = match ['quoted-word!] the ''foo)
;    (null = match ['quoted-word!] the '''foo)
;    ('''foo = match the '['quoted-word!] the '''foo)
;
;    even-int: 'integer!/[:even?]
;    (the '304 = match the '[block!/3 even-int] the '304)
;
; This idea was killed off in steps; one step made it so that MATCH itself did
; not take its argument literally so it would not see quotes.  That made it
; less useful.  But then, also there were problems with quoteds not matching
; ANY-TYPE! because their quote levels were different than the quote level on
; the any type typeset.  It was a half-baked experiment that needs rethinking.