Hacking away on the TO and MAKE Matrix

hostilefork · May 29, 2018, 2:24am

The expanded behavior of AS is unambiguous and reassuring. All you have to remember is that no new series is allocated. What you'll get is what you had:

>> first <abc>
== #"a"

>> t: as text! <abc>
== "abc"

>> first t
== #"a" ;-- same answer, more bindingly due to no allocation

AS BINARY! of an ANY-STRING! will be a little weird since you'll get a byte position in UTF-8, so the index may be different than you had...and AS to ANY-STRING! of a BINARY! will not work on non-UTF-8 binaries. Different, but understandable.

Since AS never allocates new memory, you have to copy if you need to.

as binary! copy x ;-- doesn't matter which way you write it...
copy as binary! x ;-- ...but I like this a little more

(Note: if AS were infix then copy x as binary! would act as ((copy x) as binary!), which seems like a non-obvious evaluation order.)

It makes little sense to have MAKE or TO just act like COPY AS. Rather, having AS pinned down opens up possibilities for reshaping those operators, that have been historically "unkempt" in Rebol2/R3-Alpha/Red.

TO TEXT! of ANY-VALUE! and TO ANY-VALUE! of TEXT!

to text! 10 => "10" and to integer! "10" => 10 seem like two obvious points that TO conversion needs to pass through.

If we try to naively generalize that, it looks like using TO to convert from text to an arbitrary ANY-VALUE! is acting as something like LOAD. Then trying to convert TO TEXT! is either doing a MOLD or FORM of what we give it. But...which is it?

When you APPEND something that is not a string to a string, an automatic conversion has historically been done. There might seem to be a fairly strong argument that if such an operation is to be legal, then the refinement-less and parameter-less TO should share behavior. e.g. append string non-string should give the same behavior as some variant of append string TO any-string! non-string That intent has usually been more FORM-like.
On the other hand, TO doesn't have refinements. If someone has a wide range of potential formatting options, they'll need something else. One thing about MOLD is that since it is a Rebol format with the obligation to be re-loadable back, it makes a pretty good fit. And MOLD is a terrible name. So if there's no options, and if the reverse is choosing to act very "LOAD-like", there seems to be a good case for acting like MOLD.

Wanting see the MOLD name die out--along with the lack of formatting options for TO (which MOLD "doesn't need")--makes me more moved by the second argument. Sounds good, but... let's keep delving. :-/

Automatically adding quotes on TEXT! has problems

Going with option 2 has some friction on option 1. On the surface, the following might seem more or less reasonable:

 >> txt: copy "abc"
 >> append txt [def: <ghi>]
 == {abcdef:<ghi>}

But is the following "reasonable"?

 >> txt: copy "abc"
 >> append txt ["def" "ghi"]
 == {abc["def" "ghi"]}

It's probably not that useful. If there were a special rule for BLOCK!s besides the "moldy" TO TEXT! of it, then maybe it wouldn't try to TO TEXT! the block, but each individual item...even still, that's a bit of an unlikely intent:

 >> txt: copy "abc"
 >> append txt ["def" "ghi"]
 == {abc"def""ghi"}

If it were deemed that no TO conversion would be done if the types matched, this example might clean up but others would vary weirdly:

 >> tag: copy <abc>
 >> append tag ["def" "ghi"]
 == <abc"def""ghi">

 >> tag: copy <abc>
 >> append tag "def"
 == <abc"def">

Basically: though WORD! is the "string type" with no delimiters in its representation, one tends to expect that TEXT! (a.k.a. STRING!) won't be systemically splicing its delimiters around. There are fewer preconceptions about what other types do, which is why R3-Alpha changed some of it...e.g. such that TO STRING! of a SET-WORD! would have a colon in the generated data.

Also, a paradox...

Earlier it was suggested that TO ANY-VALUE! of a text would be LOAD-like. That's because TO INTEGER! "10" is 10. But what about to text! "{abc}"? It can't be both LOAD and MOLD (nor FORM) at the same time! When it reaches this point...should it be "removing the delimiters" in the series it makes, or adding more?

Or what if it would just COPY it?

Should TO SOME-TYPE if already SOME-TYPE just COPY?

While AS never performs series allocations, it seems reasonable if TO always does a new allocation when the target is a series.

Given that seems fair enough, pinning down the idea that the system itself enforces the idea that a TO conversion of a type to itself does the same thing as a copy seems reasonable. This is done by C++ casting; you can't write an overloaded cast operator for your data type that casts to its own type. (Well, you can write one, but clang will warn you it will never get used--gcc does not currently do so.)

Making a TO of a type to itself copy itself feels intuitively correct, to the point one can even imagine defining COPY that way.

Should appending blocks to strings fail?

In the argument for explicit over implicit, one has to wonder just how realistic it is to be "automatically" making good decisions about converting things is. Do you evaluate or not? Do you space or not? Do you put in delimiters or not? Are operations like APPEND and INSERT doing too much guesswork?

At minimum, I mention above is that it seems a TO without any parameters should probably be connected somehow to what the system does automatically. If we think about an invariant like "what happens when you do append copy [a b] "c", it is pleasing if we can say something like "the types do not match, a TEXT! is not a BLOCK!, so it is first converted to a BLOCK!... and acts the same as append copy [a b] ["c"].

(Maybe that's not a useful invariant, I'm just re-iterating that they can be comforting, and we should be looking for them--at least the invariants that pass through specific examples you want to work.)

I feel like that invariant has to be articulated. And it seems like it prevents TO TEXT! from being a general-purpose replacement for MOLD.

Conclusions?

This is really just a bit of a brainstorm as TO gets remapped. It's always been a problem area, and I note people in Red are bringing it up again and looking for wikis to talk about it.

Pushing against the boundaries I'm starting to feel the TO of a type to itself is a synonym for COPY. But...deep copy? Shallow copy? There's always been this question of if TO BLOCK! of something would wrap an existing block or just "blockify" things that weren't blocks.

I'm curious if anyone wants to put any pins in the map. What TO conversion has to work to make "TO" a reasonable/good operation? What's a dealbreaker if it doesn't?

How will this generalize to user defined types? One thing I thought is that a TO conversion is a method of the source type parameterized with the target type (like "casting" overloads in C++), while a MAKE construction is a business of the target type (like a "constructor" in C++). That is to say if you do TO INTEGER! on a value of type FOO!, it's FOO! making the decision of how to do it...not INTEGER!... and if you say MAKE FOO! with some spec (perhaps an integer), that's code that FOO! provides.

hostilefork · May 29, 2018, 9:33pm

I'm reminded of another thing I thought about which might help distinguish a TO from a MAKE. That could be that you are assured that TO won't evaluate anything you pass in. So if you said x: does [print "hi" 10] | to text! [x + 20], whatever did happen, it would not print out "hi" and come back as the text string "30".

That would limit what to text! of a BLOCK! would be allowed to do, e.g. to things like MOLD or FORM (but not "REMOLD" <gack!> or "REFORM" <gaaack!>)

As far as I know, there aren't any TOs today that break this rule. e.g. TO STRING! doesn't evaluate:

>> x: 10
>> to string! [x + 20]
== "x+20"

And even though some TOs and MAKEs fall through to each other willy-nilly, TO OBJECT! of a BLOCK! isn't one of the things that acts like the evaluating MAKE OBJECT! of a BLOCK!:

>> to object! [x: 10 + 20]
** Script Error: Cannot use to on object! value

Not being able to evaluate might help narrow down what a TO conversion still could do that's useful. For instance, to object! [...] might be like R3-Alpha's CONSTRUCT, used in loading script headers without risking doing any evaluation.

It also might suggest that so-called "construction syntax" should be sharing code with TO... not with MAKE. Consider what you get in Rebol2/R3-Alpha/Red today:

>> obj: make object! [x: quote 'foo]
== make object! [
    x: 'foo
]

>> type? obj/x
== lit-word!

If you actually ran the make object! that came back in the "molding" of the first result, you wouldn't get the same data pattern...because make object! [x: 'foo] would evaluate to putting a WORD! in foo. to object! [x: 'foo] could be work "as-is", and perhaps to object! [x: y:] could put the SET-WORD! of y: into the x key.

Really just a continuation of the brainstorming. Summary so far of potentially known things:

A TO conversion won't run arbitrary code that you pass to it, or possibly A TO conversion won't even GET any variables, much less evaluate
Every TO conversion targeting a series type performs a new allocation
TO TEXT! 10 is "10" and TO INTEGER! "10" is 10
A TO conversion of a value to its own datatype will do the same thing as COPY

One sort of sad-seeming part of accepting this so far is #4, which means TO TEXT! of something that is already a TEXT! would not be a mold, since COPY would not add delimiters. That sways the purpose of TO as being the mechanic the system uses for any "automatic" conversion.

I mentioned a potential invariant that APPEND of two non-matching types should be equivalent to converting and then appending. This would mean that TO BLOCK! of any atomic item would wrap it in a single element block, while TO BLOCK! of something that was already a block would just copy it and not add another outer layer. This would be different from history, which is

 rebol2/r3-alpha/red> to block! "foo 10"
 == [foo 10] ;-- acts as load

 rebol2> to block! #foo
 == [foo] ;-- hm, loading again?

 rebol2> to block! <foo 10>
 == [foo 10] ;-- yup, that was a good guess

 red/r3-alpha> to block! #foo
 == [#foo]

 red/r3-alpha> to block! <foo 10>
 == [<foo 10>] ;-- no LOAD here

 red/r3-alpha> to block! "foo 10"
 == [foo 10] ;-- okay, only loads if STRING!

This makes TO BLOCK! not too useful for one of its imagined purposes...to get a block out of something if it wasn't one already (e.g. for enumeration). So I'd suggest the TO BLOCK! of a TEXT! fall in line with putting the single element in a block. This gets that invariant that append [a b] "c" can act compatibly with append [a b] to block! "c".

Scanning strings to blocks would then be done with another operation.

hostilefork · September 20, 2020, 8:53pm

I came up with one new rule that sounds good on paper:

to-block: func [x] [
    collect [for-each item x [keep x]]
]

The suggestion is that this behavior should be the same as TO BLOCK!. If it doesn't work, then the conversion should error.

One notable historical behavior would go away:

>> to block! "abc def"
== [ab cd]

This would now be done with TRANSCODE:

>> transcode "abc def"
== [ab cd]

>> to block! "abc def"
== [#"a" #"b" #" " #"c" #"d"]

I'm thus proposing there be no TO BLOCK! conversion on things like DATE!, vs. returning a lame answer like just putting the date in a block.

These rules don't come easy :-/ but the hope is that if enough of them are in place, then decisions should flow easily and people can take some invariants for granted.