"Finding the Invariant" - Case Study: TO

hostilefork · October 8, 2020, 3:42pm

When I gave this example, I was contrasting TEXT! and CHAR! conversions TO INTEGER!.

I should have pointed out that ISSUE! is a more obvious dissonance:

rebol2>> to integer! #1
== 1

rebol2>> to integer! #"1"
== 49

That just looks wrong as-is. But when #1 and #"1" are synonyms as immutable "issuechar!"/TOKEN! (as by all appearances they should be), there can't be a distinction.

It may be that some other operator like as integer! fits into getting the codepoint. But I mention codepoint of as a possibility, which is only one character longer (and doesn't require hitting <shift> to get an !) and helps you know exactly what's going on. It seems like a good choice.

So as part of the issuechar conversion, TO INTEGER! conversions of issuechar! can be temporarily disabled...advising you to use either CODEPOINT OF TOKEN or TO INTEGER! AS TEXT! TOKEN. I'm proposing that for empty issuechar! the codepoint of # will be 0, in order to make sure that we're not actually putting codepoint 0 into any strings. When the transition period is over, you can change the TO INTEGER! AS TEXT! TOKEN to just TO INTEGER! TOKEN.

How Does This Inform the TO/MAKE Matrix?

In "Hacking Away on the TO/MAKE Matrix", I'd previously said:

A TO conversion won't run arbitrary code that you pass to it , or possibly A TO conversion won't even GET any variables, much less evaluate

Every TO conversion targeting a series type performs a new allocation

TO TEXT! 10 is "10" and TO INTEGER! "10" is 10

A TO conversion of a value to its own datatype will do the same thing as COPY

So here we have another stake in the ground, for TO INTEGER! of #1 and #10.

This emboldened me a bit to try changing the function signatures of TO conversions to not be passed bindings. That would make it mechanically impossible for a TO conversion to do anything resembling a REDUCE or GET to dereference variables on what is passed into it.

Then I hit rule #4, and there's some friction. If you TO BLOCK! a block, you historically have not lost the bindings in that block in the process.

But maybe you should (?)

The historical behavior of TO BLOCK! of TEXT! is to load it, but what you get isn't bound:

rebol2>> blk: to block! "print {Hello}"
== [print "Hello"]

rebol2>> do blk
** Script Error: print word has no context

So maybe TO BLOCK! of another block gives you an unbound block. A problem with that is that if it's giving you a shallow copy, then the blocks underneath it would still be bound. I'm thinking about a feature where there is a way of viewing cells--kind of like CONST--in which you can carry a reference to a block that has bindings but can't use them. So it wouldn't necessarily require a deep copy to get this.

But there's a little hint at what the rules are clamping down to suggest. There may well be a difference between TO GROUP! BLOCK and COPY AS GROUP! BLOCK...because the former may give you something unbound, while the latter preserves the binding.

It would actually be good if bindings were erased whenever they could be. Stray bindings not only contribute to leaking references into code that shouldn't have it, but also means that anything pointed to still appears live in the garbage collector. Inert code meant to be used purely symbolically can wind up costing as much as all of its unused bindings. So if you can use a form of copying that strips bindings, it would be ideal if you did so.