Mutate strings aliased as BINARY!, and vice-versa, as in Rebol2!
Rebol2's AS-BINARY and AS-STRING provided a convenient aliasing between binary and string as Latin1 single-byte characters:
rebol2>> b: as-binary s: "hello"
== #{68656C6C6F}
rebol2>> append b #{68}
== #{68656C6C6F68}
rebol2>> s
== "helloh" ; binary mutation reflected in original string
rebol2>> append s "ello"
== "hellohello"
rebol2>> b
== #{68656C6C6F68656C6C6F}
That was lost when R3-Alpha's internal string format became too unpredictable (swinging between Latin1 and UCS2) and was only canonized as UTF-8 for I/O. Red suffered a similar fate.
But with UTF-8 Everywhere as the fixed internal format of strings, Ren-C has done some voodoo to bring it back.
It offers a more generic AS operation, along with higher-than-UCS2 codepoint support:
>> b: as binary! s: "hello"
== #{68656C6C6F}
>> to binary! "🐱"
== #{F09F90B1}
>> append b #{F09F90B1} ; add that high-codepoint cat!
== #{68656C6C6FF09F90B1}
>> s
== "hello🐱"
>> append s "hello🐱"
== "hello🐱hello🐱"
>> b
== #{68656C6C6FF09F90B168656C6C6FF09F90B1}
But a binary alias of a string is constrained to staying as valid UTF-8:
>> append b #{FEFEFEFE}
** Internal Error: invalid UTF-8 byte sequence found during decoding
You can actually alias WORD! as BINARY! also, without doing a separate allocation. But it will be read-only view, so all you're doing is saving on memory and GC load:
>> b: as binary! 'immutable-word
== #{696D6D757461626C652D776F7264}
>> append b #{1020}
** Access Error: series is source or permanently locked, can't modify
Similarly, you can alias words as strings...again without making a new allocation, but with the same read-only constraint:
>> t: as tag! 'append
== <append>
>> append t "nope"
** Access Error: series is source or permanently locked, can't modify
The /PART refinement has just been implemented for UTF-8
The controversial behavior can be discussed on issue #2096 (which you can discuss on that ticket). But what R3-Alpha and Red choose to (buggily) implement is that it applies to the target series only...and is thus measured in the units of that series:
>> append/part "abc" [100 "de" "fg"] 2
== "abc10" ; 2 string units, not "abc100de" from 2 block units
The argument is that COPY/PART on the source series gives you that form /PART if you need it, so this is "strictly more powerful". Rightly or wrongly... Ren-C is now doing it hopefully less buggily (though almost certainly with its own bugs), but with UTF-8 Everywhere support.
If you like, you can limit how much of a binary you extract from UTF-8, counted in bytes:
>> to binary! "🐱"
== #{F09F90B1}
>> append/part #{} "🐱" 2 ; e.g. 2 bytes (half a cat)
== #{F09F}
Extracting bytes from UTF-8 will always work. Going the other way, not all binary strings are valid UTF-8. But as long as the number of characters you ask for in that section of the binary are valid, having other invalid bytes isn't a problem...only when you ask for part out of the unchecked region:
>> append/part "" #{F09F90B1F09F90B1FEFEFEFE} 2 ; e.g. 2 characters
== "🐱🐱"
>> append/part "" #{F09F90B1F09F90B1FEFEFEFE} 3
** Internal Error: invalid UTF-8 byte sequence found during decoding
If a binary is actually an alias of a UTF-8 string, this can be more efficient by not rescanning... (though the code is still in its early life yet, so it has a number of areas for improvement).
Pretty cool, eh?