Matching Characters in BINARY! PARSE

hostilefork · August 21, 2022, 5:43am

In Rebol2, you can't match a character (or string) against a binary!:

rebol2>> to binary! " "
== #{20}

rebol2>> parse #{20} [" "]
== false

rebol2>> parse #{20} [#" "]
== false

In Red and R3-Alpha, you can do both...

red>> parse #{20} [" "]
== true

red>> parse #{20} [#" "]
== true

But their unicode model means they really don't know what they're doing in any general sense, and I'm sure whatever's under the hood is incoherent:

red>> to binary! "Æ"
== #{C386}

red>> parse #{C386} ["Æ"]
== false

Ren-C is much more coherent!

>> did parse #{C386} ["Æ"]
== #[true]

So the PARSE Succeeds, but... what should it return?

Right now a parse on a string returns the rule when it matches.

>> rule: "cd"

>> result: parse "abcd" ["ab" rule]
== "cd"

>> append result "ef"
== "cdef"

>> rule
"cdef"

This is clearly correct, because you don't want it to make a copy if it doesn't know if you're going to use the copy. Basic rule matching should not produce a new series.

The same logic applies to BINARY!...but should it give you the string as a string, or aliased to its binary form?

>> parse #{C386} ["Æ"]
== "Æ"  ; option 1

>> parse #{C386} ["Æ"]
== #{C386}  ; option 2

I think the answer is that you should match it as whatever form it was in the rule.

But then...how about something like BLANK!, which acts equivalently to space if the input is a string... or BLANK! if it's an array?

>> parse [_] [_]
== _

>> parse " " [_]
== ???

>> parse #{20} [_]
== ???

If we're going with the idea of rule as being primacy, then the language of the match should be the same as the language of the rule... e.g. the above all return blank.

But this is something of a gray area, IMO. I feel like blank is acting as a stand-in for space and should probably be looked at as if you said space.

>> parse " " [_]
== #" "

>> parse #{20} [_]
== #" "  ; instead of 32

So this is what I'm going with, unless someone has a really good argument for something else.

IngoHohmann · August 22, 2022, 10:05am

If you don’t break the rule to give back the rule, then it is in your hands what you want to get back. Blank, or space character, or one character string.