Removing "&" from Legal Word Characters - Any Objections?

The & character is not usually a legal character in identifiers in programming languages, and is saved for operators and such.

I have not seen much in the way of good uses of & in names... it's rather ugly.

AT&T: "this seems pretty pointless"

Even if we took it away from word characters we could have exceptions such as for & and && where those standalone could be WORD!, if anyone really cared about that.

One of my pet usage suggestions for & has been to embrace the HTML Entity List, and allow it as a syntax for characters (minus the semicolon of course).

append some-string &nbsp  ; adding a non-breaking space by entity name

Also, having that table built into the executable and offering it out could be pretty useful, even preferring to mold a known character using that instead of the numeric form.

This would help reduce the over-saturation of usages of #. An additional practical matter for that would be that you could specify characters as &{...} as well as &"..." ... this would help avoid escaping when putting characters in quotes as in the API:

 REBVAL *ch = rebValue("second [10 &{b} 20]");

Today we can't use #{b} instead of #"b" because that gets interpreted as a binary. So you have to do this as the less appealing:

 REBVAL *ch = rebValue("second [10 #\"b\" 20]");

The same thing happens trying to pass characters in double quotes inside --do code on the command line. I think desire for this duality of forms applies to the other string-like things as well (e.g. FILE! should be able to be either %{...} or %"...")

We don't necessarily need to do it right now, but deprecating & in words ASAP helps clear the path to this or other applications.

Does anyone have particularly great arguments for why & should be allowed in WORD!?

Take it out. Take them all out. I have gotten burned a number of times by special characters in identifiers.

2 Likes

This is really important for a bunch of the things I'm trying to achieve. Working with XML in particular means dealing with a ton of these entities, and I would love to figure out a way to map these so that someone who's searching (in an automated way) for a piece of HTML stored in XML doesn't have to know the myriad character substitutions to do an effective/comprehensive and accurate search.

1 Like

Can you give me code examples of what you mean as useful?

I mention that I am thinking if a character exists in the table, we would (in the cell) cache its table entry, so we could quickly say, even:

 >> first "Æae"
 == &AElig

 >> second "Æae"
 == &aelig

This strikes me as appealing, and dovetails well with the web build. But what are you thinking exactly?

That looks great. I don't have code examples yet. But to give a more specific example, let's say that I have tens of thousands of text files. Some of these are XML files containing embedded HTML. This requires that the embedded HTML tags are carefully escaped. E.g.,

<?xml version="1.0" encoding="UTF-8"?><WYSIWYG>
<GenericHTML>
<Content>&lt;div class="tf module pt-20"&gt;
&lt;div class="content"&gt;
&lt;h1 class="tf content-title"&gt;What do I do if I&amp;rsquo;ve found the right event but the date is far off and I&amp;rsquo;m not sure if my plans will change?&lt;/h1&gt;
&lt;p class="normal pb-20"&gt;We understand a lot can change in a year or two. If you cancel your plans you&amp;rsquo;ll have your credit restored. &lt;br /&gt;&lt;br /&gt; &lt;strong&gt;Note:&lt;/strong&gt; Fees do not cover insurance.&lt;/p&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
</Content></GenericHTML></WYSIWYG>

A person wanting to perform an automated search through thousands of files like this to make updates/replacements to the content would have to search for both the unescaped and escaped forms of these characters and delimiters.

Well we can't achieve DWIM. Are they escaped or not?

The proposal at hand would offer an easy way to speak in terms of the unescaped characters. It's very early in the thinking process to say mold first ">" returning &gt, but that is the kind of thing I'm saying is on the table.

(Really the post is about not making this decision, but restricting use of & so we can open these doors post Beta/One...)

So in that world, you could search for unspaced [mold ch ";"] and replace it, and search for ch and replace it. But again this is highly speculative.

Yes, I think we agree. There is no DWIM here, somewhere, somehow the characters need a map/lookup table. That map could certainly get unweildy to manage manually when you consider all of the foreign character-sets. So naturally I support this type of proposal.

1 Like

Like swhite I am ok to remove special characters from identifiers. a-zA-Z0-9_ and all kinds of ascii and unicode dashes suffice.

I am on the strippers team as well! :slight_smile: