With the divergence of STRING! and BINARY! you have two similarly diverging needs of BITSET!
- For STRING!—matching character sequences including higher unicode values:
dashes: charset ["-" "–" "—"]
- For BINARY!—matching byte ranges:
upper: charset [128 - 255]
However, there are situations where it'd be desirable to mix usages, for example where UTF-8 sequences are delimited by certain byte sequences:
parse some-stream [2 upper some dashes]
Presumably some-stream
would be BINARY! as there'd be no non-UTF-8 higher byte sequences in STRING!
1 Like
I guess I had thought that whether you were parsing a BINARY! or an ANY-STRING! would provide the interpretation. But you are right that when parsing a BINARY! one might want the codepoint interpretation vs. the byte interpretation.
There's really only two avenues of solution: a PARSE keyword to distinguish the usage, or a datatype distinction (byteset!)? I have a vague feeling this is an esoteric problem for which PARSE's oddity should be paying the tax, and it should be a PARSE keyword...maybe even something strange like being able to go INTO a BINARY! from a string parse or vice versa, switching the interpretation mode?