A proliferation of $#@^':~WORD~:s

bradrn · February 13, 2024, 12:58pm

One of the things which surprised me when I first looked into Ren-C was the number of WORD variants it has. By my count, this includes:

plain WORD
:GET-WORD
SET-WORD:
@THE-WORD *
^META-WORD
&TYPE-WORD
#ISSUE
'QUOTED
~ANTIFORM~
…and probably more that I’ve forgotten.

* monospaced so Discourse doesn’t think it’s a ping

Now, in many ways this is perfectly expected for a language like Ren-C. Firstly, dialecting means we value having as many syntactic options as possible. Secondly, Ren-C has a lot of different kinds of values — plain, quoted, anti and quasi, and now bound and unbound versions of each — and most of these words are simply making it easier to deal with that huge variety.

But, on the other hand, I feel we’re starting to encounter some problems with the current way of doing things. Most notably:

None of this is compositional. When we run into a situation where we’d like to, say, have a word which is both META- and THE-, it’s impossible.
Some dialects would like to use words outside this fixed inventory. For instance, it would be nice to have $WORDs to use in a shell dialect.

The root cause of both is the same: the inventory of word-like datatypes is hard-coded into the interpreter. If you want to use something outside that set, you can’t, no matter how similar to the existing types it may seem.

I can imagine a hypothetical design which would avoid this. This would allow some characters to be freely added to the beginning and end of words — let’s call those special characters ‘sigils’, like in Perl. Every combination of sigils would then specify a separate datatype. So you would still have :WORDs and ^WORDs and ~WORD~s, but also $WORDs and ^@WORDs and ~#WORD&s and whatever else you could imagine. This would quite easily solve both of the problems I mentioned.

One might even contemplate generalising this ‘sigil’ idea to non-word types. We already have {GET,SET,THE,META}-{BLOCK,GROUP}s, so it would make sense to allow arbitrary sigils on blocks and groups too.

Unfortunately, I’m not sure this would work with the current design of Ren-C. At the moment, there is currently a hard maximum of (as I recall) 256 possible datatypes, whereas this proposal obviously allows for an infinite amount of datatypes. However, I do think it’s at least worth thinking about, for the simple reason that it would give us a lot more flexibility than we currently have.

hostilefork · February 13, 2024, 9:06pm

To be sure, this is a strange medium to step into...and Ren-C has made it appear stranger on first look.

(But I think many things--like isotopes--are a logical consequence of wanting the medium to "actually work". Some others disagree, mostly because they appear not to care if it works or not.)

Having used the medium for quite a while, I'm far more concerned about the problems that would arise from having to distinguish META-GET-WORD!s from GET-META-WORD!s (or GET-WORD-META!s) than I am by the problems currently faced by some compositional desires.

It's probably the case that the problematic compositions show when a design is at the breaking point where another angle of thinking is needed, than be evidence that the one-sigil-limit rule should be lifted.

It's hard to give some kind of first-principles proof of why the chosen rules are "good". Probably not all of them are good. But the choice to limit to one sigil is intentional, as an attempt to focus thinking along the particular strata of parts when solving problems. (So far, there are _almost no isotopes of things that carry sigils...so maybe we could even commit to prohibiting quasiforms or antiforms of anything that has one, to rein things in slightly more.)

Really it's all a big balancing act. And I think the time for real judgment on where to draw the lines will be once the sloppy language pile actually has that "first functioning transistor" moment and demonstrates living up to its claimed potential. As evidenced by wanting to push things around with $ and @ and what they imply, I'm really not there yet when it comes to something as fundamental as binding...it's vaporware in my estimation until it isn't. But feeling closer.

"The test of the machine is the satisfaction it gives you. There isn't any other test. If the machine produces tranquility it's right. If it disturbs you it's wrong until either the machine or your mind is changed."

Note that in Ren-C, #ISSUE (probably going to be called #TOKEN) is an immutable stringlike type (no series position) that does not carry a binding. It's a unification of the CHAR! and ISSUE! types (a single-codepoint token is a character).

There have been some historical changes to this, where it was a string type in Rebol2 but a word type in R3-Alpha. Red has its own decisions, whatever they are.

bradrn · February 14, 2024, 1:47pm

Starting with what I feel is most important:

The question I’m asking here is a slightly more fundamental one: why this particular collection of sigils, and no others? It’s not even necessarily about combining sigils: it just seems a bit arbitrary to me that a dialect can use @word or :word or #word or &word, but not $word or ~word or \word. And if we treat all of those uniformly, it makes sense to extend it even further to allow combinations of characters too.

I do see this as having real utility in dialects. For instance, a dialect to interface with C++ code might use this to refer to ::global variables. Or a statistical modelling library might want to follow precedent when talking about regression ~parameters. Or filesystem operations on Windows might want to refer to \\REMOTE directories… anyway, you get my point, hopefully.

I think this is probably the single strongest argument against these ‘generalised sigils’: some datatypes need to have a different underlying representation, even if they look similar on the surface. I’m not sure if there’s a way to work around that.

Fair enough. But I think such things can be worked around: for instance, the parser could ignore the relative positions of the sigils, making :^word and ^:word merely two ways of spelling the same thing. (It already does this with e.g. {string} vs "string".)

I don’t completely understand this position. Is there any particular reason you feel this way, or is it more just a gut feeling?

Indeed… which is why, when this seeming inconsistency makes me feel uneasy, I made this post to figure out which of the two must change!

hostilefork · February 15, 2024, 1:50am

To give you a sense of some of the issues of carving up the lexical space, let me draw your attention to just one example: consider the PATH!.

Historical Rebol considers these to be in the same family as blocks and groups, so you can do things like append to them or take things from them.

red>> path: 'a/b/c
== a/b/c

red>> append path 'd
== a/b/c/d

red>> take path
== a

red>> path
== b/c/d

Since they are just an alternate rendering of blocks (with interstitials instead of outer brackets), you can put pretty much whatever you want in them. For instance a GET-WORD!

red>> insert path quote :a
== b/c/d

red>> path
== :a/b/c/d

But wait. That looks like a GET-PATH! now. But it is not, it's a plain path:

 red>> type? path
 == path!

What if we convert it to a GET-PATH?

 red>> path: to get-path! path
 == ::a/b/c/d

How about a SET-PATH?

red>> path: to set-path! path
== :a/b/c/d:

You lose track of things pretty easily. Not to mention what happens if you clear it. And then maybe turn it back into a plain path:

red>> clear path
== :

red>> path: to path! path
==

I'm focused on putting the right restrictions in place, vs. saying everything goes anywhere. In this particular case:

Paths (and new generalized tuples) must contain at least two items, though possibly BLANK!
Elements with sigils aren't allowed inside the path/tuple
Paths/tuples are immutable once created.
- The immutability affords various compressions, such as values like /foo or foo/ or foo. or
  .foo costing no more than a WORD!

Taking away the ability to put anything in a path is a good restriction, and it tames inconsistencies in the lexical space and helps people know what they're looking at.

When you look at the impacts across the board...I favor a small set of sigils following a pattern that can build upon the robustness of the parts they modify. That leaves enough space to work on the bigger picture issues.

"A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away".

Someday there might be a way to go after things like you're talking about. Maybe there's a dialected pattern for testing for the sigils, like if {::}.word.? item [...] or if {$}.path.{^}.? item [...] which could help avoiding the need to specifically name ::word or $pa/th^. And maybe it would be an improvement, not a detriment to the experience of solving problems with the tool.

But I'm a skeptic, and that's not really what I'm looking for in terms of the look and feel of this particular artifact. I don't actually like it having as many parts as it does, but I'm tempering that by trying to make the parts as uniform and pleasing as I can.

I'm sure there are things that would be useful. I've tried to make it legal to do things like lone colon, for instance (and you can trust it's not an empty get/set path...just a "weird WORD!")

>> type of second [10 : 20]
== &word

It has to have a special notation for adding sigils to it (else how could you tell if :: was a SET-XXX! or GET-XXX! of the colon?) And it's the early days of preparing the code to be able to handle such cases:

>> |:|: 1020
== 1020

>> :
== 1020

But it's sticks and glue even to get that little bit going--the lexer/scanner is an organic mess that's barely holding together. There needs to be a major overhaul before anything ambitious is tried. And sorting out binding is more important at this time than delving into the (already extremely saturated) lexical space.

bradrn · February 15, 2024, 8:45am

Putting it this way, it sounds reasonable. It’s fair to say my proposal was too flexible, in that it allowed a huge range of options, few of which were really useful while nonetheless making the whole language more difficult.

However, that being said…

I feel this sentiment is belied by the actual state of the language. To see why, let’s review the current sigiled types (excluding ones like ISSUE! which can be considered their own thing):

Initial : is used in GET-WORD!, GET-BLOCK!, GET-GROUP!, GET-TUPLE!, GET-PATH!
Final : is used in SET-WORD!, SET-BLOCK!, SET-GROUP!, SET-TUPLE!, SET-PATH!
Initial @ is used in THE-WORD!, THE-BLOCK!, THE-GROUP! THE-PATH!, THE-TUPLE!
Initial ^ is used in META-WORD!, META-BLOCK!, META-GROUP!, META-PATH!, META-TUPLE!, META-WORD!
Initial & is used in TYPE-WORD!, TYPE-BLOCK!, TYPE-GROUP!, TYPE-PATH!, TYPE-TUPLE!, TYPE-WORD!

I would find it difficult to describe this situation as ‘robust’. Most of these types are almost entirely useless: only the *-WORD! variants see really wide use. GET-BLOCK!s are used for multi-returns, and COMPOSE gets some use out of the various *-GROUP!s, but other than that, it isn’t completely clear why anyone would use them.

Of course, the reason they’re present is obvious: they’re following a pattern! But that immediately raises more questions:

Why this specific set of base types, and no others? I see no obvious reason why types like WORD! and PATH! should be able to take sigils, but types like ISSUE! and FILE! can’t. (Or, for that matter, why THE-WORD! can’t either.)
Why is this pattern not formalised within the language? Sure, there’s a few predicates (like ANY-META-VALUE? or ANY-PATH?), but by and large, Ren-C barely seems to make use of the fact that it has this consistent pattern.

Essentially, this feels to me like a few components which were found to work well for specific usecases, but were generalised in a way which doesn’t fully cohere when you look at it closely. In that regard it reminds me of binding, another part of the language which worked well in places but was hard to reason about in its full generality. And, even if sigils are less significant than binding, this still gives me a similar feeling of discomfort.

bradrn · February 15, 2024, 10:21am

On reflection… this makes more sense if I think of the allowed base types as things which can take bindings — something which is important for GET, SET, THE to work sensibly. That is to say, you can add sigils to words, and to things that contain words. This feels reasonable enough to me, even if it gives you a bunch of useless types.

It still doesn’t explain why you can’t add sigils to things like THE-WORD!, though. I still maintain that that capability would be useful to have, considering that the various sigils have more or less orthogonal semantics. It also doesn’t address my second bullet point (that Ren-C has this elaborate structure but doesn’t seem to get much use out of it).

hostilefork · February 15, 2024, 10:26am

There's a lot of code. Have you read it all?

If you can see the usage of the *-WORD! forms, then the *-TUPLE! forms should seem fairly obvious as tagging along on that. Because you can just as easily want to suppress an action from execution if it is a member of an object as if it's in a plain bound variable. Or set it, or get the META of it, or whatever.

GET-WORD!/TUPLE! suppresses action evaluation. GET-BLOCK! reduces, and can even be used as a branch type:

>> if true :[1 + 2 10 + 20]
== [3 30]

>> case [
          false [print "skip me"]
          true :[1 + 2 10 + 20]
     ]
 == [3 30]

GET-PATH! does partial specialization.

>> append/dup [a b c] [d] 2
== [a b c [d] [d]]

>> apd: :append/dup
== ~#[frame! {apd} [series value dup /part /line]]~  ; anti

>> apd [a b c] [d] 2
== [a b c [d] [d]]

GET-GROUP! has been contemplated as a "reevaluate" as it had been in the parse dialect, so you can do a kind of inline composition instead of DO COMPOSE... though I've hedged on it while thinking about how the pieces fit together.

>> :(second [print b:]) 10
== 10

>> b
== 10

That's pending thinking about whether it's a great idea or not. But it certainly does have uses in dialects.

SET-WORD!/TUPLE! is needed for assignment. SET-BLOCK! is featured in multi-return.

SET-GROUP! sets with the thing that you have in your hand, which is pretty nice:

 >> word: in [] 'var

 >> (word): 10
 == 10

 >> var
 == 10

It's generic so it retriggers for other SET-XXX! types.

 >> block: [b a]

 >> (reverse block): pack [1 2]
 == 1

 >> a
 == 1

 >> b
 == 2

SET-PATH! has legacy application in the evaluator for Redbol emulation since it didn't have tuples that were as general as paths (it only has numeric tuples like 1.2.3). Outside of dialects I'm not entirely sure what it should do otherwise, but that doesn't mean I won't think of something.

I've explained that these do have aggressive use in dialects, although in the main evaluator the biggest leverage comes from just the @ operator itself.

In PARSE it's difficult to imagine how to write some things without the literal matching operator, e.g. you can use @num or @(3 - 2) here... and I don't know what Red suggests you do (and there are larger guiding principles in UPARSE that make it work):

red>> num: 1

red>> parse [1 1 1] [some num]
*** Script Error: PARSE - invalid rule or usage of rule: 1

red>> parse [1 1 1] [some (3 - 2)]
== false

Anyway, they are definitely used.

Used and useful. Note in particular that a META-WORD! or META-TUPLE! is quite distinct from a separate meta operation.

>> ^ 10
== '10

>> x: ~

>> ^ x
** Error: x is not set

>> ^x
== ~

While I've pondered things like "what would META-BLOCK! do in the main evaluator" it has a very clear purpose in PARSE, since blocks execute match results and produce products.

Datatypes and typesets have always been a sticking point in the language, and this stuff is new, so I'm not 100% sure about where it's going yet. I think we may be headed for a situation where the & is used directly with predicate functions, e.g. &integer? instead of &integer, and this will change the shape of things.

But right now, &[...] does a logical or of type operations, while &(...) does logical and.

 >> switch/type -10 [
        &(odd? negative?) [print "It's odd and negative"]
        &[odd? negative?] [print "It's either odd or negative"]
    ] 
 It's either odd or negative

I can certainly see uses in dialects for things like HTML entities, &AElig and such.

The upcoming $XXX types will have a usage on day 1 for giving bindings to things that can carry bindings, and every last one of them will be in use.

Overall I just disagree with the premise that the existing matrix is wasteful or all that arbitrary. The coverage seems reasonable to me, and there are plenty of problems to sort out with the strategy being how it is.

bradrn · February 15, 2024, 10:45am

Well… thank you for putting me in my place, I guess! I really should know better than to make sweeping statements like that.

Although I’m still glad I mentioned it, because this is a really useful guide. I didn’t know most of these things were possible in current Ren-C.

It also confirms the statement I just made, namely that ‘the allowed base types [are] things which can take bindings’. Most of the sigils have functionality related to getting, setting or binding variables, so this makes perfect sense. (The exception is TYPE-*, but you did note that as a sticking point.)

However, I still think I’m justified in making this assertion:

If anything, I’m now leaning even more in the direction that combining sigils should be possible, because I can see obvious interpretations of those combinations. For instance, ^x: could assign the meta of the following value to x. Or @:x could get the value of :x, then apply @ to it. (Note that these should probably be unordered, as I mentioned earlier.) Of course, they’re useful in dialects too.

hostilefork · February 16, 2024, 4:57am

I try to never say never--many proposals pop up and someday things may get to the point where a weird idea becomes needed. The "SCRUNCH!" proposal is one of the things that nags at me now and then as expanding the expressive space so much that it may turn out to be a good idea after all.

But at the moment, composite sigils address only one small problem in multi-returns that's been on my mind... while they introduce countless usage and implementation questions.

Bear in mind that the things you see here aren't implemented by a magic genie. I have to write some sort of code for them. And even "small" changes have wide-ranging ramifications. Only just today have I been able to lift the limit of 64 datatypes so the full byte in the cell can be used. (I will probably write up why the legacy of the uint64_t TYPESET! implementation ended up being such a difficult design point to overcome.)

Anyway... so if I say I have no clue how to get something to work without wrecking the implementation or having bad impacts on userspace code, that's actually a pretty strong vote against even the biggest hunch one might have (even if it's my hunch).

If you wind up sticking around and hacking on Ren-C instead of developing your own language--and want to pick composite sigils as an area of study--then a fleshed out implementation would certainly be given due consideration. However, there's some combination of I don't know how to do it along with I think the result would wind up being ugly and complicated ... whereas I mostly know where the current path is going and am largely at peace with it.

Note that this is what [^x]: does, and I'm not particularly bothered by needing a block for the few times it comes up (if anything, it improves visibility).

Using blocks and groups as structuring parts helps you distinguish between e.g. that and ^[x:]. So to me, they are the mechanism for accomplishing composition.

Introducing ANY-FENCE! to bring in another array type will give another tool for structuring. And as I mentioned, it will be a good composable choice for picking the main return result out of a multi-return.

[x {y} ^z]: ...

[x y {^z}]: ...

That's something we have a vocabulary and API and mechanism for, and I think the ergonomics in sum surpass what a system based on multi-sigils would be like.

bradrn · February 16, 2024, 5:18am

Oh, this is quite interesting. I never realised this was possible! It does seem fair to say that this improves visibility, which is a nice bonus.