Behavior of TO STRING!, AS STRING!, MOLD

hostilefork · May 18, 2018, 2:15am

One big and unanswered question has been regarding the conversion of things to strings. I think that SPACED has come along as a pretty good name for what was "reform", and as a specialization of DELIMIT which also has UNSPACED for much of what was REJOIN, that territory is pretty well covered these days. They handle nulls and blanks nicely and are good words for what they do.

But though I haven't said it in a while, I used to mention a few times a week that I never liked the term MOLD. (Supposedly it originated from when Rebol was going to be called "Clay" and so you wold "mold" your output, but it's an eyesore and makes people think they're looking at moldy old code.)

And TO STRING! historically has been pretty bananas:

In Rebol2 and Red:

>> to string! quote foo:
== "foo"

>> to string! #foo
== "foo"

>> to string! <foo>
== "foo" ;-- well, at least it's consistent...

>> to string! [foo: #foo <foo>]
== "foofoo<foo>" ;-- oh, no, of course it isn't

In R3-Alpha:

>> to string! quote foo:
== "foo:"

>> to string! #foo
== "#foo"

>> to string! <foo>
== "foo" ;-- seems to break the pattern?

>> to string! [foo: #foo <foo>]
== "foo:#foo<foo>" ;-- errr...

With the advent of ANY-STRING! unification I believe we have an option which may cut through yet another design epic. Here it is:

Proposal: TO TEXT! for mold(-ish), AS TEXT! for (copyable) spelling

Ren-C has AS. It aliases types without making a copy. For instance:

>> block: copy [1 2 3]
== [1 2 3]

>> group: as group! block
== (1 2 3)

>> append group [4 5 6]
== (1 2 3 4 5 6)

>> block
== [1 2 3 4 5 6]

It is related to the function of AS-STRING and AS-BINARY in Rebol2, which could be used to view a byte sequence through the lens of being a STRING! or being a BINARY!, depending on which you needed. When the internal string format got complicated with unicode there couldn't be a guarantee of the byte format underlying, so it was removed.

But Ren-C is going to let you AS any string types between each other (including words). It will even be bringing back AS BINARY! on ANY-STRING. (In the UTF-8 Everywhere branch, the idea is to keep aliased series marked as having to stay valid as UTF-8. You'd get errors if you tried to put a bad byte sequence into a BINARY! which was elsewhere UTF8.)

So if all ANY-STRING! types can be aliased between each other, including words, why not use that for getting the spelling? It might be read-only, but you can copy it if you need to.

>> as text! quote foo:
== "foo"

>> as text! #foo
== "foo"

>> as text! <foo>
== "foo"

You can switch any string type over to any other:

>> as tag! #foo
== <foo>

It lets you make the copying decision--often you probably don't need it. And it's really only two more characters than today's copying "spelling of" operation, on the off chance you need to copy:

copy as text! x
spelling-of x

So with TO freed up, the concept is that the TO-TEXT conversions can act MOLD-like, and giving you whatever delimiters you "expect" to see:

>> to text! quote #foo
== "#foo"

>> to text! quote <foo>
== "<foo>"

Since TO makes copies, it's a good fit. Those delimiters/sigils need to actually be in there, visited by series traversals.

It may not be exactly MOLD, the needs for TO TEXT! of a CHAR! might not want to put the delimiters on. Maybe feeding it a BLOCK! would be invalid, even, just because there's so many potential ways to make strings out of blocks (evaluative, unevaluative, spaced, unspaced, with brackets, without brackets...)

But it would be moldish-enough that you would get the delimiters on your strings:

>> to text! "foo"
== {"foo"}

I guess the same would be true for the other ANY-STRING!s.

>> to tag! #foo
== <#foo>

>> to issue! <foo>
== #<foo> ;-- or however this construction-syntax-es

It raises more questions for what things like blocks should do, though:

>> to tag! [a b c]
== <[a b c]> ;-- hmmm.

How would AS EMAIL! (AS IDENTITY!) work?

I've spoken recently about the importance of being able to be able to "namespace" things by their string type. e.g. if you have an ACTION! defined via FUNCTION in TLS named CLIENT-HELLO, but also want to talk about the CLIENT-HELLO being a state you are in, why use a WORD! both times?

 instructions: [... client-hello ...]
 ...
 parse instructions [
     some [... 'client-hello (client-hello arg1 arg2...) ...]
 ]

It's lame to have to come up with new words each time you want to talk about a different aspect of things, or put things in namespaces. Why not use a different datatype?

 instructions: [... <client-hello> ...]
 ...
 parse instructions [
     some [... <client-hello> (client-hello arg1 arg2...) ...]
 ]

It makes it easier to keep track of things. And being able to flip easily back and forth with AS...especially when it doesn't mean you're making series copies each time...is useful.

But here's a puzzle. That means you want this:

>> as tag! @foo
== <foo>

>> as identity! quote foo:
== @foo

But if foo@bar for EMAIL! and @foo are the same data type (as Red has made them), what does the AS conversion mean there? The sigil is not always at the beginning.

Here's an idea: we could say that a leading @ is just the "construction syntax" for an email/identity that doesn't carry an internal sigil:

 >> id: copy f@oo

 >> length of id
 == 4

 >> take next id
 == #"@"

 >> id
 == @foo

 >> length of id
 == 3

 >> first id
 #"f"

Then you'd wind up with:

 >> as tag! @foo
 == <foo>

 >> as tag f@oo
 == <f@oo>

Slightly confusing, but at least it's explainable. And I think it points to an answer for what would happen with URL! and such, too. If you say as word! http://example.com you'll get a word with a colon in its spelling, which will require some kind of construction syntax to show you it's not an ordinary loadable word.

Something would have to be done for series with @ in the first position, and I guess it's just one of those things that would send it into construction syntax rendering. Also rendering when you're not at the head. I guess the question is if something can be done that's sane enough to work for most purposes--when you allow delimiters/sigils in the material itself it will cause problems (spaces in words, colons in set-words, etc.) but I do think these things have to be legal...and escaped in rendering somehow.

gchiu · May 18, 2018, 3:42am

Sounds as though you are solving some thorny issues here.

hostilefork · May 18, 2018, 3:59am

Longstanding things, for sure.

One big change is seeing all the "word" classes as peer string types, and believing that to be a good thing. The separation into very different categories made that not so clear.

For instance, if you were an R3-Alpha programmer, was this a "good" thing or a "bad" thing?

>> label: quote foo:
== foo:

>> print [label "stuff"]
foo: stuff

Rebol2 would say foo stuff. The behavior throughout the system wasn't consistent enough for you to know if you were using the language correctly, or on some broken fringe that would be scuttled away at some point. (Red didn't adopt the change, for instance and note that in R3-Alpha TO STRING! of a tag didn't preserve delimiters, even when TO STRING! of a SET-WORD! would...)

Yet as pieces sort of slide into place, I think it makes sense to say that usages like the above are right, and you can count on it and build upon it. Words will be ANY-STRING! ... and using them as you would use any other string type historically is natural and appropriate.

Another stake driven into the ground, and the design focuses a little more each time...

rgchris · May 18, 2018, 7:27pm

One possible count against a unified EMAIL!/HANDLE! (sorry, running with HANDLE! for now) type is that in an email, the @ is part of the content, where in a handle, it may not be (especially in this usage where you emphasise the wordiness properties of the value).

Note also that in Rebol 2, you could access the two parts of an EMAIL! value with USER/HOST path notation:

>> fb: foo@bar
== foo@bar

>> fb/user
== foo

>> fb/host
== bar

hostilefork · May 18, 2018, 7:32pm

Works for me, I'd been thinking that the HANDLE! type might be changed to C-DATA! and C-FUNCTION! (they're distinct pointer types, size of a function pointer can be different from the size of a C data pointer).

Well, that's what I talk about. The workaround I suggest is that prefixing with @ be a "rendering tic" which "HANDLE!"s have when they do not have an internal @ at a non-head position (and that "unnatural" HANDLE! which you make by putting @ at the head position have something of the same problem that an "unnatural" WORD! would have if you put a : at the beginning of it, and need some kind of escaping render to let you know what's going on).

Does that sound like an avenue of solution, to help avoid making "too many datatypes"?

hostilefork · May 22, 2018, 3:31am

The AS and TO distinction feels very clarifying, and I think it's a winner. But there are some inconvenient truths, and a question of how to mitigate potential problems.

Under this system, we understand that AS aliasing doesn't make a copy by default:

>> text: copy "foo"

>> file: as file! "foo"
== %foo

>> append file "bar"
== %foobar

>> text
== foobar

So you need to be sensitive about whether you need to make a COPY or not. Seem fair enough--and COPY AS TEXT! reads fairly light and literately.

But now remember the angle is that if you're not using AS, then the TO conversion will assume you meant the delimiters were part of the deal. This is why TO TEXT! on a tag will give you the angle brackets in the series. And it's also why TO FILE! on TEXT! will assume you meant you wanted the quotes, however that winds up presented:

>> file: to file! "foo"
== %^"foo^"

>> first file
== #"^""

>> second file
== #"f"

This is not completely outlandish...especially when you consider that quotes are legal in filenames in linux.

But there's been an idiom in practice something like this:

 foo: func [
     item [file! word! path!]
         "Will be converted to file!
][
    file: to file! item
    ...
]

Under the new rules you can't write this with AS FILE!, because a PATH! is an array. So in that case, cannot be aliased as a string, you really do need a conversion. But if you use TO FILE! under these rules you'll end up getting the % character in it, if it was a file to start with!

>> to file! %foo
== %^%foo

So you'll end up writing:

 if match [word! path!] file [
     file: to file! file
 ]

Though that gives you a situation where it's a unique copy only if it was a WORD! or PATH!. To get a new copy in all situations:

 file: either match [word! path!] file [
     to file! file
 ][
     copy file
 ]

In a way, I think that what this is doing is "weird" enough that it should take responsibility for its weirdness. Think of the following list--which is similar to the motivating example in Rebol's make process:

source-files [
   %foo.reb
   bar.reb
   mumble/foo.reb
   mumble/"bar with spaces.reb"
   %mumble/frotz.reb
   "stuff.reb"
]

There you've got a FILE!, a WORD!, a PATH! with two words, a PATH! with a word and a string, a FILE!, and a STRING!. It's hard to argue that you get much closer to a solution when TO FILE! of something that is already a FILE! is a no-op, especially when it breaks the other advantages.

What if MAKE of a type to itself was always a COPY?

There's a third place we might look to, which is the infamous MAKE. Where TO is similar to casting/conversion in other languages, MAKE's concept is more like construction from a loosely defined "spec". The difference between TO and MAKE has been debated (it seemed rather ad-hoc), but it's known that to string! 10 would give you "10", while make string! 10 could give an empty string with 10 units of data in it.

Given MAKE's nebulous nature, might it be the missing link here, so that MAKE FILE! on something that was already a FILE! would leave it as-is...and could it be creative with the interpretation of PATH!s, STRING!s, and WORD!s? We might even force that MAKE SOME-TYPE! of a value of type SOME-TYPE! would just copy it. It may not be a ton of terra firma, but it means you'd only have to worry about the way MAKE would be interpreting types other than what you gave it.

(Note: This would be giving MAKE a rule similar to C++ handling of static_cast conversions. You can't write a cast operator of a type to itself, just to any other type. Just interesting to look at parallels in other systems.)

And there's an advantage to going this route. It's compatible with R3-Alpha, Rebol2, and Red:

>> make file! %foo
== %foo

>> make file! %foo/bar
== %foo/bar

>> make file! "foo"
== %foo

Given that this option exists, it seems there's no good reason to undermine the initiative for to text! "abc" to be {"abc"}. All it means is people would have to use MAKE in places they previously expected to use TO, sometimes.