Idea: Agreed Upon Symbol Number for Extensions

A concept in the R3-Alpha codebase is that there are a certain number of built-in words...which come from a file called %words.r

https://github.com/rebol/rebol/blob/master/src/boot/words.r

This is done so you can switch on a numeric code for these words, and not bother with needing to do a string comparison in C. Some words (like PARSE keywords) are strategically chosen to be in a sequential range, to make testing for them faster.

If you write an extension in C that operates at the internal level API and want the performance of a native, you might want to talk about a word that's not in that list. You can get a bit close to the performance for a single test by caching a pointer to the canonized version of that word, and comparing to that canon pointer. But it won't be quite as fast, and since that won't be a constant...C can't use it in switch statements.

To be more concrete, imagine you have some words not in %words.r like OVERLOAD, MULTIPLE, INHERITANCE. You couldn't write:

 switch (VAL_WORD_SYM(some_word)) {  ; small 16-bit # can be cached in word
     case SYM_OVERLOAD: ...  ; ...but these weren't in %words.r!
     case SYM_MULTIPLE: ...
     case SYM_INHERITANCE: ...
     default: ...
}

Can't do that for those new terms. You'd have to do case-insensitive string comparisons, or something like this pseudocode:

 REBSTR *canon_overload;
 REBSTR *canon_multiple;
 REBSTR *canon_inheritance;

 void On_Module_Load() {
      canon_overload = Register_Word("overload");
      canon_multiple = Register_Word("multiple");
      canon_inheritance = Register_Word("inheritance");
 }

 void On_Module_Shutdown() {
     Unregister_Word(canon_overload);
     Unregister_Word(canon_multiple);
     Unregister_Word(canon_inheritance);
 }

So imagine this gives you word series pointers that are guarded from GC for as long as your module is loaded. Then you could say:

 REBSTR *canon = VAL_WORD_CANON(some_word);
 if (canon == canon_overload) { ... }
 else if (canon == canon_multiple) { ... }
 else if (canon == canon_inheritance) { ... }
 else { ... }

It's less elegant than the switch(), and since the numbers are runtime pointers and not fixed at compile-time, there's no way to optimize as in a switch() by repeatedly bisecting the range of values...if you have N words, you will do N comparisons.

Weird idea: Agree on a list of words and numbers, commit on Internet

It would be pretty heinous to make a much bigger %words.r and ship it in every executable...inflating the size of Rebol to include a dictionary.

But there's a possibility that doesn't go that far yet still gets the benefit. Make the word list and commit it somewhere on the internet that developers can look. Give every common word a number. Then, the extension ships with just the spellings and numbers it needs. All extensions agree to use the same numbers:

 #define SYM_OVERLOAD 15092
 #define SYM_MULTIPLE 32091
 #define SYM_INHERITANCE 63029

 void On_Module_Load() {
      Register_Word("overload", SYM_OVERLOAD);
      Register_Word("multiple", SYM_MULTIPLE);
      Register_Word("inheritance", SYM_INHERITANCE);
 }

 void On_Module_Shutdown() {
     Unregister_Word(SYM_OVERLOAD);
     Unregister_Word(SYM_MULTIPLE);
     Unregister_Word(SYM_INHERITANCE);
 }

Your switch() statements can work just fine, and you're only out of luck if you use a sequence of characters that wasn't committed to in the database. But the database can grow, so long as it grows centrally and not inconsistently. (In fact, it's probably better to do it that way, where extension authors ask for the words they want and get them approved before shipping the extension.)

The worst that can happen is you load two extensions that disagree, and it refuses to load them. It could print out the disagreeing numbers and you could consult the internet to decide who was the culprit using the wrong number.

It's a weird idea but kind of interesting--not in particular because of the performance aspect, but because of enabling the C switch()es. Since there's only 16 bits of space in the word available for the symbol trick, it's an exhaustible resource. But maybe still worth doing. This really isn't difficult, outside of the administrative headache of deciding the policy on giving out #s