Plugging The Script Header Hole

hostilefork · December 11, 2020, 1:44am

In the "MOLD and LOAD Parity" thread, @rgchris suggests something I have been wanting to do (but hadn't) which is to make LOAD always return a BLOCK!.

Now that is done:

GitHub Commit: LOAD of code always BLOCK!, LOAD-VALUE for 1 item

But...What About Script Headers?

If I'd just blindly flipped a switch that /ALL was true, we'd have gotten the historical behavior that LOAD/ALL includes headers as part of the data:

rebol2>> load "Rebol [] 1 2 3"
== [1 2 3]

rebol2>> load/all "Rebol [] 1 2 3"
== [Rebol [] 1 2 3]

So I didn't do that. Even though a block is always returned now, it still does header processing.

But there's a catch:

>> load "rebol [%.r %.r3] ren-c [%.r %.reb] red [%.red %.reds]"
** syntax Error: script header is not valid:

So now, how are you supposed to LOAD a string that starts with the (legitimate) WORD! Rebol? More broadly we want more signals than this, based on the dialect or sublanguage...that would be any word (or path?) What can be done about this?

The Header Needs A Special Signal

I'm quite certain it's a design flaw that the header signal can be conflated with what could be legal data.

Contrast with something that didn't have that property:

>> load/all {$Rebol [Title: "stuff"] no contention here}
== [no contention here]

>> load/all {3.0-Rebol [Title: "stuff"] non symbolic option}
== [non symbolic option]

I know a symbol might be a bit off-putting, but everything "pleasant" is already taken in-band for the mainline of the data format.

^Rebol [
    Title: {My Script}
]

$Rebol [
    Title: {My Script}
]

\Rebol [
    Title: {My Script}
]

Rebol=[
    Title: {My Script}
]

rebol>> [
    Title: {My Script}
]

Rebol> [
    Title: {My Script}
]

Okay, I actually like that "prompt-looking" last idea. Rebol> seems nice and light enough... it's even kind of "pointing at" the header block to say it's special and not part of the data. And we've pretty clearly ruled out Xyz> as a WORD!. It can cover any name vs. encoding a word like "Rebol" specifically.

LOAD/HEADER could slipstream whatever was in front of the > to be in-band in the returned header.

 >> [data hdr]: load {StyleTalk/3.0> [Title: "Whatever"] your data here}
 == [your data here]

 >> hdr
 == make object! [
     Format: 'StyleTalk/3.0
     Title: "Whatever"
 ]

But maybe allowing paths that opens too many cans of worms (Rebol/<tag-in-path>> [...]) and it should have to just be a WORD! otherwise, with everything else you need broken out in the header.

Looks good to me. Thoughts?

rgchris · December 11, 2020, 3:06pm

My inclination would be to leave as-is, and discern between a string and a file/url as to whether a header is expected.

hostilefork · December 11, 2020, 3:59pm

One possibility we might say in a UTF-8 everywhere world is that the distinction comes from whether the input is BINARY! or TEXT!, where binary input to LOAD is expected to always have a header and text isn't. Just mentioning it as I thought of it.

But you lose a lot by not having the ability to copy/paste code from a file into do "..." or code: load "...", do code and get the same behavior as running the file. The header contains information important to the function of code...which is going to become even more true. You'd need a parallel track of "header-aware execution" to "header-unaware execution", and it's error prone.

I also think forcing files to have headers is inflexible, and when you are going to open up the first word in the file to being more than just "Rebol" (which it will have to be, considering that's not going to be the language name), then you're suddenly saying that you will mis-identify anything that starts with a WORD! when read as a file if you didn't use a header.

So let's keep all this in mind while studying the problem. It's now more in our face than before. If you're taking away the ability to DO things copy/pasted with header information then a complete story is needed for all the places that rely on information in the header to execute code. The fact is that everything that's been done so far as a detection mechanism is way too flaky!

We also have to think about what the rule is that permits leading lines like the #/usr/bin/r3 instruction to UNIX shells.

rgchris · December 11, 2020, 6:48pm

Seems a better distinction.

Could we codify that every Rebol derivative begin with 'Re'?

parse file [
    opt ["#!" thru newline any newline]
    ... script rule ...
]

May possibly be worth storing this information somewhere as script metadata.

hostilefork · February 1, 2021, 9:29am

For whatever it's worth as an observation: Red's "DiaGrammar" app has changed from saying Red [...] at the top of the files to DiaGrammar [...].

rgchris · February 1, 2021, 5:59pm

I think the right answer is to use the TYPE header.

hostilefork · January 19, 2024, 12:46pm

Okay, not so certain.

The better idea here is to say that DO requires/expect a header, and that the header pattern is word! from a known or expected list followed by a block!. It requires this even if you DO a text string... which you really shouldn't be doing often unless you've obtained a script from an unconventional source (e.g. if you are the implementation of DO of a URL! and read the script from the network).

TRANSCODE is easy enough to use if you just want to turn a string into unbound data, and the definition of binding is shaping up to where that unbound data is easier to reason about in terms of what to do with it. A lot of the time, you can just poke that unbound data into an environment which is already bound and it will "just work".

I'm really not sure what the scope of LOAD and SAVE are supposed to be in this model. IMPORT is what you use to load a module... does it make sense to LOAD a module but not import it? DO is what you use to run a script, does it really make sense to LOAD a script with the intent of DO-ing it later? Can you reasonably DO a script that's been LOAD-ed more than once?

It seems to me more work on what LOAD and SAVE actually mean are needed. In the meantime, the pieces are there to TRANSCODE READ FILE and bind it how you intend, which is probably better for most purposes.