Upcoming Datatype $WORD... What Will It Mean?

hostilefork · February 13, 2024, 11:51am

Something that has been nagging at me is that I've known I want to introduce $WORD, $(GR O UP), $TU.P.LE etc. (tentative names VAR-WORD!, VAR-GROUP!, VAR-TUPLE!).

At first glance it seems like it would be a loss if $XXX didn't in the baseline evaluator evaluate to looking up environment variables.

But this would be a different behavior for an ANY-WORD!...which up until now, the type of word has not mattered. Would it spread to SET?

>> set-env "SOMETHING" "TRUE"

>> SOMETHING: 10

>> set (in [] '$SOMETHING) "FALSE"
== "FALSE"

>> SOMETHING
== 10

>> get-env "SOMETHING"
== "FALSE"

Hmmm. Lots of issues there...including that Unix environment variables are case-sensitive, Windows ones are not.

The implementation mechanism of this would presumably have to introduce some sort of "pseudo-object" named environment/env, and then the "specifier" would have to say (in a similar way to which the ".WORD" lookups would say to look in a "current object") that the $ words should look up in env. (See related discussion about "Binding Indirection")

Simpler Thought...

A simpler thought (that doesn't rock the boat for one feature) would be that this is a bridge too far for an ANY-WORD!, and they should look up just like any other word, and it's only weird shell dialects that would think that a $WORD meant environment variables.

But then the question might be what the $ buys you.

Maybe I was too hasty in saying that the @ was the right thing to sacrifice for "get variable with binding", and $ should have done that?

>> $word
== word  ; bound

>> @word
== @word  ; bound?

(A line of argumentation that @word should be bound, is that if you want an unbound one you can get it by quoting with '@word, but then again if the @ operator does not bind e.g. @ foo... but $ does, e.g. $ foo. So maybe not affecting the binding is the better choice.)

This would let us put back the @ for "as-is" variable usage in parse. And it would make more sense for a thing named VAR-WORD! (bound variable in evaluator, environment var in shell dialect...)

Urrrgh. I hate that it seems like that's probably right. :-/ (Thankfully, git lets us audit/reverse such decisions...assuming you're diligent about not changing too many unrelated things in one commit, which I thankfully was careful about with the @ change.)

Loss of $ For Weird Idea I Had

If the $ operator were used for binding that would be a bit sad, as I'd kind of hoped that could be a variadic function that could run the shell dialect:

extension: "txt"

$ ls -alF *.(extension)

But, maybe that's a bad way to package it in the box, and specialty scripts that don't care about a $ operator for binding purposes can override it, encouraging the more traditional:

extension: "txt"

shell [ls -alF *.(extension), echo $SOMETHING]

bradrn · February 13, 2024, 12:26pm

I don’t love this proposal. Outside dialects, I see no particular need to add yet another kind of $WORD. I feel particularly worried about the idea of making them look up environment variables by default — it’s clever, but could run into subtle cross-platform difficulties like you mention.

I don’t love this either. Using @WORD for bound variables felt to me like the right thing to do. I never could see the point of its old semantics, anyway.

hostilefork · February 13, 2024, 1:01pm

It is true that the @word => @word evaluative behavior didn't turn out to be terribly useful, as I argued when making the change. But having it not be interesting meant in a sense that it was "free" for dialects.

One thing that is pretty key about the @ semantics is the standalone behavior meaning "take the next thing literally":

This comes in especially handy in the API:

 REBVAL* word = rebValue("in [] 'var");

 REBVAL* value = rebValue("get @", word);

If you didn't have the @ operator to mean "get me this as is" then you'd have to write:

 REBVAL* value = rebValue("get the", word);

 REBVAL* value = rebValue("get", rebQ(word));

I think having a single-character operator that means "exactly the next thing" is pretty important. I glossed over the fact that it is now adding binding to things that don't have them, because it wasn't causing any problems (that I saw yet) in the API code... but it needed resolution, and I think the right resolution is that it should not affect binding... which raises the question of what an operator should do that does.

My concept here would be that the $ operator would be evaluative but add the binding, so an alternative for in []:

>> $ 'foo
== foo  ; bound

This helps resolve my concern about the operator's behavior in PARSE, where it is effectively "evaluative" vs. "quoting". (I mentioned that the "deals with quoting, and with binding" was overloaded.)

bradrn · February 15, 2024, 1:11pm

With my improved understanding of sigils, I think I can pinpoint more precisely why this behaviour bothers me. Currently — at least in the main evaluator — all the sigils except one serve some purpose relating to getting, setting or binding the word(s) they modify. This turns what would otherwise be a rather ad hoc matrix of datatypes into a powerful toolkit for manipulating variables and bindings.

The proposed (and former) behaviour goes against the grain of this. Making @word evaluate to @word has very little in common with the purpose of the other sigils. It’s simply a value which evaluates to itself. Even worse, it makes no sense when extended to other types: as I’ve complained before, THE-GROUP! and THE-BLOCK! become completely redundant types with this behaviour, having the same meaning as plain BLOCK!.

(Incidentally, this is also a problem with TYPE-* sigils, as you’ve mentioned — which is surely related to my expressed dislike of them. At least I’m consistent, I guess!)

Instead, I’ll make an alternative proposal: if we do go back to that behaviour, scrap @(groups), @[blocks] and all the other useless types, and just make the single datatype @word. That particular datatype fills a real usecase, in that it’s the word-level equivalent of blocks. It can be like #issue — a type which is useful on its own, but where its sigil doesn’t need to be extended to other things.

Incidentally, in that case, I’d also prefer to keep the existing meaning of @, and give this new type the syntax of $word. I’m rather partial to using @word for bound words, and would like to keep that syntax. Besides, if we want shell-like environment variables, I personally feel that fits the concept of ‘evaluates to itself’ much better than it fits ‘evaluates to a bound word’.

hostilefork · February 16, 2024, 6:48am

On this aesthetic choice, I lean pretty strongly the opposite way.

It's good to consider "top of the page" issues, and having a nice-looking way to import libraries seems good to me:

import @library

Perhaps even putting in versions as @ paths:

import @library/1.1.3

Something possible when sigils don't vanish under evaluation is they can eliminate the need for a block, while still giving a signal to the callee that you passed what you meant intentionally. When IMPORT has to take WORD! and PATH! in its type signature it can't tell the difference if you'd just said import 'library vs import @library if the sigil vanished. I don't really want that axis of flexibility in this case.

Swapping out the behaviors to import $library with the $ being sticky, I don't care for as much.

But also, I'm partial to @ meaning "next thing literally", and it's used extensively in the API. $ doesn't feel as good to me for that.

 rebElide("append word-list", rebQ(word));  // verbosity I'd often like to avoid

 rebElide("append word-list @", word);  // I've liked this since it was instituted

 rebElide("append word-list $", word);  // don't really like this for the purpose

The idea of $ having "something to do with variables" is fairly common, and can be suggested with a name (VAR-WORD!). And I think this jibes:

>> var: 1020

>> get 'var  ; receives `var` plain WORD!, unbound
** Error: var is not bound to a context

>> get $var  ; receives `var` plain WORD!, bound
== 1020

The concept of a free $ being a binding operator synonym for in [] (or in PARSE serving the role of what I called *in*) feels good to me. Now that people are being forced to confront binding more directly in their everyday programming (when doing anything interesting) it makes sense to be able to do it succinctly.

(Also, single-arity functions can be optimized as intrinsics, for what that's worth. I haven't yet decided if I want to make things like $ and @ their own datatypes that can't be overridden in the evaluator... but now that we can have more datatypes, I might go that direction as well...)

bradrn · February 16, 2024, 7:26am

hostilefork:

But also, I'm partial to @ meaning "next thing literally", and it's used extensively in the API. $ doesn't feel as good to me for that.
 rebElide("append word-list", rebQ(word));  // verbosity I'd often like to avoid

 rebElide("append word-list @", word);  // I've liked this since it was instituted

 rebElide("append word-list $", word);  // don't really like this for the purpose

Is this not an argument for @word evaluating to bound word? As far as I understand it, the intention is for @ word to also evaluate to bound word, and we’d like the two to behave the same.

But on the other hand, get $var does make sense, as does import @library. So I suppose there’s arguments for both syntactic choices.

hostilefork · February 16, 2024, 7:38am

The intention is to be a substitute for rebQ()-then-evaluate...where rebQ() adds a quoting level to the value when it's spliced into the va_arg instruction stream.

Here the API code is wishing to not influence the binding that's already on word. So if it's unbound, it should stay unbound.

Imagining that WORD is some-unbound, the equivalence is:

append word-list word  ; some-unbound shielded from exec/bind by WORD! lookup
=>
do compose [append word-list '(word)]  ; shielded from exec/bind by quotation
=>
rebElide("append word-list", rebQ(word));  // C var shielded by rebQ() splice w/quote
=>
append word-list 'some-unbound

So when you substitute with the @ operator you want that same semantic, just cleaner:

rebElide("append word-list @", word);
=>
append word-list @ some-unbound

Hence @ definitely should not bind.

I'm a bit puzzled about whether the @word @[block] etc. should bind or not. Making them not bind in the evaluator may fall under the category of consistent-but-useless behavior. The decoration is on the value so it's already different by keeping it, so binding should probably be done too.

bradrn · February 16, 2024, 7:48am

Actually, I just had a thought…

Could this not take an ISSUE!? At least to me, it feels reasonable to write import #library.

Ah, OK. I must have gotten confused.

If I think of @word as the word-level analogue to blocks — as a value which evaluates to itself — then it makes sense for it to also bind under evaluation. Like I said previously, I quite strongly believe that @[block] et al. should not exist, since they’re completely useless.

hostilefork · February 16, 2024, 8:15am

I use them. But I may not be able to convincingly argue that the project as a whole isn't useless... so you might be in that sense right.

I'd thought you were somewhat convinced by the argument that being able to put sigils on array classes is sort of the "composition API" for those sigils. You can't say @foo: but you can say @[foo:] and [@foo]:, and there are known mechanisms for picking those structures apart, with this extending indefinitely e.g. [@[@[foo:]]]:

But either way... since they're useless... why don't we deal with this conundrum for now by having you not use them. We'll put a pin in it for later discussion, after all the other problems are solved, and maybe that would be a good time to take them out.

Using words+paths+tuples vs. strings have advantages. Word symbols are interned (words that are spelled the same look up to the same UTF-8 bytes in memory) which all things being equal makes storage less, comparisons faster, etc.

Limiting library names to what's legal in words has the advantage that you can turn that word into other forms, so if you wanted to make some dialect that mentioned libraries named by symbol you'd have those parts to manipulate. You could use them as keys in objects, etc.

And that manipulation being easy is an advantage of having things like a version tuple be part of a path. It's easier to parse because much of the parsing is already done for you, and easier to do things like compose...with the usual niceties of having the structure and checking and such all working for you:

>> version: 1.1.20

>> first version
== 1

>> version.3
== 20

>> compose @library/(version)
== @library/1.1.20

>> curtail compose @library/(version) 
== @library/1.1.20


>> version: null
== ~null~  ; anti

>> compose @library/(version)
** Script Error: non-NULL value required (see MAYBE, TRY, REIFY)
** Near: [@library ** (version)]

>> curtail compose @library/(version) 

>> compose @library/(maybe version)
== @library

(See explanation of CURTAIL)

bradrn · February 16, 2024, 3:33pm

Yes, I find this argument convincing… for those sigils which do allow composition at all. But there’s at least one which doesn’t, namely ISSUE! — we don’t see any ISSUE-BLOCK!s or anything like that. So there’s precedent for saying, ‘these sigils can’t be composed with other types because it makes no sense’.

From a broader perspective, I’ve been unhappy with the way some sigils combine with a specific set of types, while others only combine with one. I eventually rationalised it by saying that the sigils which can modify many types are those related to word-specific operations (getting/setting/binding), and the types they modify are either words or series of words.

Currently, there is only one exception to that trend: TYPE-*, which has no particular relationship to these operations, but can modify other datatypes nonetheless. And it shows, in that the usage of TYPE-GROUP! and TYPE-BLOCK! seems fairly ad-hoc, in that it’s hard to predict from the types themselves. And that situation makes me feel uncomfortable.

This is why I’m so unhappy with re-adding a whole series of ‘evaluate-to-self’ types: not only are they largely useless, they further muddy the waters by making the point of sigils even less clear. I can see only one reason to justify them, namely that they’re useful in dialects… but if ‘useful in dialects’ alone becomes sufficient reason to add new datatypes, then there’s a whole bunch more which we could add. And we’ve already established that adding those isn’t a great idea, largely because such types are only useful in such limited ways.

hostilefork · February 16, 2024, 11:30pm

I will willingly admit/shout that there is a whole lot of nebulous murk regarding the question of datatypes, and I am extremely open to suggestions on what to do better.

I made a new thread where you can solve it:

Ugly Types: Less Ugly Than History, Can We Do Better?