Different Dialect Behavior For Literal vs. Fetched Items

hostilefork · September 27, 2020, 5:47am

(Suggested background reading on "what is a literal in a language where every element of source code can be taken literally?")

Let's imagine someone proposed a behavior in Rebol2, where seeing a number "literally" in the parse rule would act like a repeat rule, while using it fetched from a variable would try to match the literal INTEGER!.

rebol2>> parse "aa" [2 "a"]  ; since 2 is directly in rule, act as repeat count
== true

rebol2>> count: 2
rebol2>> parse "aa" [count "a"]  ; interpreted as looking for literal 2
== false  

rebol2>> did parse [2] [count]  ; again looking for literal 2
== true

There's clearly something a bit confusing about this. But is it always bad?

Clearly not Always Bad, Consider WORD! Fetch...

Just look at how PRINT handles the word FOO "literally", and then how it acts when it is fetched via VAR:

>> foo: ~

>> print ["a" foo "b"]
** Error: foo is unset

>> var: 'foo
>> print ["a" var "b]
a foo b

We wouldn't want to say that print ["a" var "b"] is chained to behaving the same as if FOO had been literally in the block. That would be nonsense.

So we have at least one example in defense of the idea that there's no hard rule saying that when you access something via a fetch it has to act as if the fetched thing was written in that place.

Generic Quoting Is A Powerful Tool

Historical Rebol only had LIT-WORD!:

rebol2>> print ["a" 'foo "b"]
a foo b

But dialect authors should rejoice, as Ren-C has QUOTED! for any value type (including something already QUOTED!), and QUASI! for isotopes. This is heavy artillery.

As a general rule, I believe dialects are stylized to act on variable fetches should make the handling of QUOTED! act just like if a variable had unfetched the quoted thing.

So if you have variables or expressions that act like this:

 >> var
 == <X>

 >> (some expression)
 == <X>

The expression behaviors should give the same answer, on potentially a subset of the literal forms:

 >> dialect [... '<X> ...]
 == <Y>

 >> dialect [... var ...]
 == <Y>  ; or error if non-literal use is suspect

 >> dialect [... (some expression) ...]
 == <Y>  ; or error if non-literal use is suspect

Don't forget that the quoting behavior in COMPOSE makes getting those quote marks on easy when building dialect blocks:

 >> compose [... '(var) ...]
 == [... '<X> ...]

 >> dialect compose [... '(var) ...]
 == <Y>

So Now... Back To What If It's not Quoted...

I've offered the plausibility that dialects could by default reserve QUOTED! behavior to be a superset of what would happen if the unquoted item were fetched from a variable or a product of expressions.
I've shown that plain WORD! is a kind of tautological example of a datatype in dialects where the fetched result does not act like the original WORD! value it was fetched from.
- We can extend this to however our dialect evaluates expressions, e.g. from GROUP!s or BLOCK!s, or just inline as a DO/NEXT step.

...does this mean that other behavior variances between non-quoted dialect elements and their fetched form are fair game--as with the deviating behavior of INTEGER! in PARSE proposed at the top of this post?

Pertinent Case Study: BLANK! As Nothing vs. Nothinglike-Thing

BLANK! has something of a split personality, from its dual nature of being a "reified concept that stands for nothing".

Some cases could really benefit from it meaning something. For instance, I've consistently wanted it to mean space in DELIMIT's "dialect" :

write port unspaced [
    "HTTP/1.0" _ code _ code-map/:code CR LF
    "Content-type:" _ type CR LF
    "Content-length:" _ (length of body) CR LF
    CR LF
]

That is awesome. It lets you see what's going on very clearly. I'm always annoyed when people include spacing in the edges of the things being spaced, like "HTTP/1.0 " ... because it makes it harder to factor. It also makes it hard to see what's a string and what's not:

 print [" At a " glance " how " do " you tell " whats " a string "]

But it could potentially sucks if a variable holds BLANK! with the intent to mean nothing if that nothing becomes a space character.

To demonstrate how it sucks, let's say you've got a situation like this where the second item is supposed to be conceptually "not there":

values: ["one" _ <three>]

print "And here the values are!"
for-each item values [
    print [item]
]

When a fetched BLANK! is allowed to act as a space, you get something that's actually worse than it looks:

And here the values are!
one

<three>

I say it's worse because that empty line isn't actually empty. It actually wrote a space and a newline.

So what if things like PRINT/UNSPACED would only treat a "literal" blank as a space?

>> unspaced ["a" _ "b"]
"a b"

>> var: _

>> unspaced ["a" var "b"]
** Error: Can't Turn Evaluative BLANK! Into Space

>> unspaced ["a" maybe var "b"]
"ab"

>> unspaced compose ["a" (var) "b"]
"a b"

On the surface, that seems pretty reasonable. Trusting that you know what you're doing in the COMPOSE case makes sense, because you could be putting WORD! or arbitrary code in there...so if you got BLANK! you must have meant it.

hostilefork · August 21, 2022, 8:35am

hostilefork:

>> unspaced ["a" _ "b"]
"a b"

>> var: _

>> unspaced ["a" var "b"]
** Error: Can't Turn Evaluative BLANK! Into Space

>> unspaced ["a" maybe var "b"]
"ab"

>> unspaced compose ["a" (var) "b"]
"a b"

This seems pretty nice on the surface. But there's a bit of a technical problem with it.

Imagine you are the person who is implementing UNSPACED. One likely thing you might do would be to run through a REDUCE step, and then join the parts together:

But these make the same product:

>> reduce ["a" _ "b"]
== ["a" _ "b"]

>> reduce ["a" var "b"]
== ["a" _ "b"]

In order to raise an error, you have to know if a value was literal or the product of an evaluation.

Deriving this information yourself is a hassle. You might think you could step through the block and literally examine each blank before doing an eval step, and if it's a blank you just skip over it. But then you have trouble like enfix operators that consume their left hand side and are able to take blanks, and so if you consume the blanks without deferring to lookahead you get mangled results.

Inside the system it's a little easier--there actually is an internal flag on cells that can track whether they were evaluative products or not. But this feature has been kind of closely guarded and not exposed to usermode, because it's such a sneaky invisible property.

Is it time to expose the evaluated bit... like the NEW-LINE flag is exposed??? This would let you snoop the results of something like a REDUCE and know if something was an evaluative product.

>> block: reduce [1 2 + 3]
== [1 5]

>> evaluated? block
== #[false]

>> evaluated? next block
== #[true]

I am presenting it as a property of block positions...not of values themselves, because all values are evaluative if you get them as a function result--because the function evaluated!

>> first block
== 1

>> evaluative-value? first block
== #[true]  ; e.g. it came from running FIRST

(You might try to argue that it's only certain blessed evaluator functions like REDUCE that fiddle the bit, but the problem is you have to consider how this bit is produced in the first place. That would be something that doesn't exist today and may not be coherent. I'm talking about something that already does exist and works.)

I'm somewhat reticent to expose this bit. But if you read the arguments in the post above, it leans toward saying that this is a legitimate piece of information to want to know.

Less dodgy might be asking REDUCE to do something to augment the products of the array to indicate their evaluatedness?

>> reduce/weird [1 2 + 3]
== [[1] (5)]

Everything that's an evaluative product could be put in a group, and everything that's not could be put in a block. Not the most efficient concept in the universe, but also not the least (single blocks and groups are about as optimized as they can be). Internal routines could leverage the bit more efficiently and this could just be for usermode to avoid invisible state.

Anyway...more to think about.

IngoHohmann · August 21, 2022, 11:21am

Being able to get at that information is good, I think, especially if it's already available.
The weird block looks really weird, though.

hostilefork · August 21, 2022, 11:37am

Actually...it could cost no more than the block otherwise would, if it made the unevaluated things QUASI!, and the evaluated things added a level of QUOTED!:

>> block: reduce/quasi [1 1 + 2 first [~a~ b c] first [''d ''e]]
== [~1~ '3 '~a~ '''d]

You know the QUASI! things didn't evaluate, so they had to be inert and have no quote level (hence a quasiform for them exists).

Exposing it as a bit is pretty fiddly API-wise, and its hidden-ness makes me nervous.

While the above may look weird, it's actually rather elegant. And as a client of the service, I'd rather be able to do a for-each over the values and examine a property of a value directly, vs. needing some strange position-based API to read invisible bits scribbled on things.

It's easy enough to UNQUASI and UNQUOTE things--easier than extracting items from blocks or groups. And as I say, it's cheap.

(I have to say...seeing how these parts all serve their niches is getting to be very pleasing. I'm also increasingly satisfied with QUASI as a name...as we see here it's more than just an isotope producer. That's one role, but it's also a quote-like operator that can only be used exactly once on non-quoted non-NULL things.)

hostilefork · August 21, 2022, 12:39pm

Hmmm...technically you don't need the QUASI!, if you remember that everything quoted is one level quoted higher than it actually is. Hmm.

>> block: reduce/weird [1 1 + 2 first [~a~ b c] first [''d ''e]]
== [1 '3 '~a~ '''d]

That again has the round-trip property that if you reduce it you get the block back.

>> reduce block
== [1 3 ~a~ ''d]

>> reduce [1 1 + 2 first [~a~ b c] first [''d ''e]]
== [1 3 ~a~ ''d]  ; same

So even simpler. The QUOTED! bit is the evaluated bit, and remove it to get the actual value. If it's not quoted it was inert in the input.

As long as you know what you're doing and don't forget what the quoted bit or its absence means, that seems pretty good.