ANY-WORD! and ANY-STRING!: The Limits of Unification


#1

There's been a very longstanding question of just how much ANY-WORD!s and ANY-STRING!s should share in common--and how aware users should be of the differences.

1. What about spaces, or starting with digits, etc.?

One question is whether ANY-WORD! should differ because there were illegal characters, like space:

rebol2>> to word! "word with spaces"
== word with spaces  ; Rebol2 allowed it

r3-alpha>> to word! "word with spaces"
** Script error: contains invalid characters  ; R3-Alpha disallowed it

red>> to word! "word with spaces"
== word  ; Red has its own...idea

While spaces in words should be discouraged, so should they in filenames. But this doesn't necessarily mean they should be illegal.

The problem with making them illegal is it would mean dialect authors would have to create their own escaping mechanisms. If every dialect author had to come up with a notation for putting spaces in words, they might do it a different way. Standardizing it would avoid that problem, and then you'd just have a notation (we've discussed a seemingly worst-case #[set-word! "set word with spaces"])

But Ren-C has enough parts to live with "no ANY-WORD! escaping"

Fork: "It's nice to use SET-WORD!s for character names in a screenplay."

3-Headed-Clown: <annoyed> "But there's only one dash in my name, not two!"

Fork: "We have a lot more tools for tackling this now..."
[Hostile Fork]: "...SET-BLOCK! with WORD!s covers spaces for instance."

["3-Headed Clown"]: "I see--but I'd need a TEXT!"
["3-Headed" Clown]: "Or, at least I'd need *one* for invalid WORD!s"
[3 Headed Clown]: "I could also change my name use INTEGER!.  Hahaha...not!"

(Hostile Fork): "Another option here is SET-GROUP!s"
@(Hostile Fork) "SYM-GROUP!s are a planned feature for the near future."
@[Hostile Fork] "SYM-BLOCK!s should be available too."

"3-Headed Clown"/: "Or PATH!s too!  It's all up to the dialect author."
//"3-Headed Clown"//: "That's pretty awesome."

Fork: "Sure, but know that not all types are legal in PATH! (e.g. URL!)"

If you think about it, this is kind of a "standardized form of escaping". The only pain point would be--for instance--if you have another clever idea for what SET-BLOCK!s should do in your dialect. You might feel it was wasted being the "container for TEXT! that represents what a SET-WORD! would have been it could have held spaces".

I think this falls into the realm of acceptable cost. Guaranteeing ANY-WORD! doesn't have spaces may be a limitation of a "freedom to", but it gives you a "freedom from" that has some value.

2. What about binding?

R3-Alpha made ISSUE! an ANY-WORD. This was a controversial idea, and giving ISSUE!s bindings wasn't really thought through:

r3-alpha>> set #123 456
== 456

r3-alpha>> words-of bind? #123
== [system set 123 words-of bind?]

You just created a context key called 123, despite that being an illegal word:

r3-alpha>> to word! "123"
** Syntax error: invalid character in: "123"

So ISSUE! became the poster child for "weird things getting bound". But now, I think this just represents a mistake. It opened a can of worms, and we should re-seal the can.

Getting an inert word type in the form of @SYM-WORD is going to make up for it.

Executive Summary

UTF-8 Everywhere has arrived! The best information we're going to get for Beta/One is probably in front of us now.

I think I see the architectural lines drawn up pretty well for what has to happen.

  • You can't have a binding on an ANY-STRING!, and you can't index into an ANY-WORD! directly. Hence all ANY-WORD! are at their head position--as they have always been.

  • All valid content for an ANY-WORD! is legal for an ANY-STRING!. Hence you can alias an ANY-WORD! as an ANY-STRING! without creating a new series. as text! (first [foo]) would not allocate any additional memory--the string would merely honor the "locked" status bit of the underlying foo node.

  • Not all content for ANY-STRING! is legal in ANY-WORD!. For example, spaces--or starting with a number. This may imply disallowing as word! "foo" as collateral damage for the illegality of as word! "foo bar". Or there may be tricks to allow some conversions sometimes, but we don't really need to worry about.

  • ISSUE! is going back to being an ANY-STRING!, where #123 is legal without raising any questions about what is legal in an ANY-WORD!

  • REFINEMENT! is gone, so /foo is a PATH! with a WORD! in it, and /1 is a PATH! with an INTEGER! in it. So it doesn't factor into the WORD! question any more.

  • SYMBOL-WORD! (working title) will bring a category of inert word to take ISSUE!'s place, where @123 will be illegal. It will be part of a family of inerts, along with @(...), @.../.../..., and @[...].