What Should BLANK! in UPARSE Do?

hostilefork · May 15, 2022, 10:35pm

It's always good to look at history first. So let's compare and contrast #[none] vs. empty block in old Redbols.

In Rebol2 and R3-Alpha, both are no-ops. The input type doesn't matter.

r2/r3>> parse {ab} [[] "a" [] "b" []]
== true

r2/r3>> parse {ab} [#[none] "a" #[none] "b" #[none]]
== true

r2/r3>> parse [a b] [[] 'a [] 'b []]
== true

r2/r3>> parse [a b] [#[none] 'a #[none] 'b #[none]]
== true

In Red, #[none]s are expected to be literal. However, you don't get an error on string inputs...just a failure.

red>> parse {ab} [[] "a" [] "b" []]
== true

red>> parse {ab} [#[none] "a" #[none] "b" #[none]]
== false

red>> parse [a b] [[] 'a [] 'b []]
== true

red>> parse [a b] [#[none] 'a #[none] 'b #[none]]
== false

red>> parse [#[none] a #[none] b #[none]] [#[none] 'a #[none] 'b #[none]]
== true

hostilefork · July 11, 2022, 7:38pm

What Should BLANK! in UPARSE Do?

I've really found that I like BLANK! literally at source level as a way to say SPACE in string operations.

So it could be useful in PARSE for this purpose:

>> parse "aaa bbb" [some "a" _ some "b"]
== "b"

We haven't talked about the "blank and space" duality for a while, but I'd even gone as far to suggest that when you do something like TO BLOCK! of a string it might transform the spaces into blanks:

>> to block! "the cat"
== [#t #h #e _ #c #a #t]

(People might not recall why I was mentioning this, but around the time of UTF-8 Everywhere it was pointed out that since we had non-fixed-size codepoints, seeking in strings and mutating them could be costly. So if you had a string algorithm you might want to "explode" a string into a BLOCK! representation to work on it. This would give you great flexibility to do things like put in substitutions with full strings, or mark the cells with intermediate states for your algorithm...and then you would collapse it all down at the end by turning it back into a string.)

The Literal Interpretation Is Also Compelling in Arrays/Sequences

I've thought of BLANK! as being the analogue to space in blocks, so matching them literally there makes sense:

>> parse [a a a _ b b b] [some 'a _ some 'b]
== 'b

But where it really shines is in processing things like paths and tuples, to match the gaps in them:

>> refinement-rule: [subparse path! [_ word!]]

>> parse [/foo] [refinement-rule]
== 'foo

That's a slam dunk. So now we have the behavior tied up.

hostilefork · December 1, 2022, 9:51am

I'm...pretty sure (?) this is still the best plan.

So there's a new philosophy I've outlined for why BLANK! exists at all, and its purposes as being a kind of generic "nothing to see, here" is distinct from what might be thought of more as a disruptor like null or an unset variable. It is not related at all to soft failure. Blank is simply a wildcard that you can choose to treat equivalently to an empty series or missing value, without committing to being anything in particular.

I've also mentioned that in some mechanical contexts (like APPEND), we are simply more interested in blank's "thingness" than in its representation of nothingness. So you have to DECAY it or SPREAD it or otherwise interact with it to get it to not act mechanically.

PARSE strikes me as one of the more mechanical contexts.

parse [_ _ _] [repeat 3 _]

And I think the value of having it to represent space in string contexts is probably high.

It may be one of those things where to prevent accidents where you didn't mean it to be interpreted as a space, it shouldn't allow you to use it fetched from a WORD! in the rules. You either use @var to say "I mean literally a blank" or you can make the rule contain a quoted value.

But It's Important To Point Out There Are Other Tools

Already, a quasi-void acts like an empty rule in source:

>> parse "abc" ["a" ~void~ "b" "c"]
== "c"

And a void antiform does too:

>> rule: if false [<whatever>]

>> parse "abc" ["a" rule "b" "c"]
== "c"

You can think of there being infinitely many voids at any position in a block.

Also already, ~ triggers a failure (though it should probably give a better message):

>> parse "abc" ["a" ~ "b" "c"]
** Error: ~ encountered in parse rule

Speculatively, you can use other quasiwords for failures...though this might wind up being disallowed:

>> parse "ccc" [some "a" | some "b" | ~mismatch~]
** Error: ~mismatch~ in PARSE

IngoHohmann · December 5, 2022, 8:50am

That's how I see Blank as well. As a generic place holder: nothing interesting here (yet), but no need to worry, this is not an error.