Modularizing The Whitespace Interpreter: Experience Report

hostilefork · August 24, 2021, 8:29am

The Whitespace Interpreter Project Has Now Been Modulariz'd...

...and is running in the "Sea of Words".

It is passing the tests on Windows, Mac, and Linux, in both debug and release builds.

While the project may seem "modest", I'd argue it actually is a pretty deep dive into the core premise of why the methodology of the language is interesting.

And I'm using it as a testbed for new ideas. I brought up the idea of using === as a section divider that acts like a comment, but could be shown at higher verbosity levels. That's exactly the way the whitespace interpreter is using it. It's just using the lexical parts of the language freeform...not creating variables, thanks to Sea of Words!

So when I write a section divider like:

=== LOAD THE SOURCE INTO PROGRAM VARIABLE ===

There's a bit in the command line processing that ticks over the variadic === into a mode where it stops being purely commentary/invisible, and prints the line in the program output:

if vm.verbose > 0 [
    ===/visibility true  ; show the `=== xxxx ===` lines
]

As with all things...there's some glitches to sort out. Like that the console uses these dividers too, so if you are running from the console these get all mixed up. It means you'd have to declare a common instance of this === from some generic one, and make that common across your program.

Either way, it let me avoid redundancy and focus on the program. And it might seem a small detail to not have a string:

=== {LOAD THE SOURCE INTO PROGRAM VARIABLE} ===

But every character counts when it comes to being happy with how your program looks. Also, that doesn't provide any way for us to escape things...whereas we can put things like @(...) in there...

=== LOAD SCRIPT FILE: @(filename) ===

(Note: We may be getting these facilities via string interpolation, but the point remains that it looks better when we can just write without delimiters.)

Problems Encountered

The single biggest problem encountered in the modularization is:

If you IMPORT a variable from a module, you get a copy of that variable in your own module, and you thus do not see changes to the imported library's original version of the variable.

This is an artifact of using the IMPORT statement in the body of the module, after it has already been scanned and "interned". In other words: by the time the IMPORT has ran, the word you are importing already got connected to the module you are importing it to. So the only way to wire it up is to add a copy of that variable into the module so the already-scanned-and-bound instances will see it.

If we knew about the importation you wanted to do before we scanned and "interned" the module, then we'd be able to bind the variable directly to what you were importing. But is that really what you want? Certainly it hasn't been with LIB. You don't want to bind variables to LIB because then overwriting anything the mezzanine uses--like PRINT--would trash the system. This is part of the justification for why importing gets a copy.

One way to get the latest version of a variable is not to import the module's words into a local scope, but to capture the module in a variable and access the fields through that. Compare:

import %some-module.r  ; imports `something` and `change-something`
change-something
print [something]  ; won't see change

m: import %some-module.r
m.change-something
print [m.something]  ; will see the field as it changes

Stylistically, some languages enforce this anyway. Like in Node.js you can only do it the second way, and if you want a local name for things you have to make it yourself as a local variable to your script. Then of course you don't expect that to change when the module variable changes--they're disconnected.

I don't want to get into the business of being prescriptive about how much you import or export. That's a policy of whatever module framework you use and your personal decisions. So I think there needs to be a workable way to bind in a way that sees changes to variables without also giving you access to accidentally overwrite them and break the module you are using for itself and other clients.

Why Doesn't "Attachment"/"Inheritance" Solve This?

Right now we don't have this problem with Lib. But that's because it's being treated specially. When you access a variable by name--and it's not in your evaluation context (your module or the implicit module where "do" is running)--then it falls through to look in Lib.

We'd need some way to have modules be able to have this fall-through on a per-variable basis, to different libs. They have to remember that they are attached to your module (in case you overwrite them), but they also have to know there's no instance in your module and point to where they can be found so long as they're not overwritten.

Fortunately there's enough space for that. We can make something the size of a variable, with no variable in it, that uses the variable spot to store where the module was that it was imported from. An "import stub", basically.

The goal of module inheritance from modules like Lib is to avoid a situation where you have to create these stubs for everything. I'm not sure exactly how different the ideas are though, and if it should just be an implementation detail where the system decides "okay, that's a big library and you're importing it all... we use search mechanism 1" vs. "you only imported 3 things, better to make stubs for those 3".

That Aside, Everything Seemed To Work Smoothly

And even the Redbol emulation of the old R3-Alpha version of the interpreter works, using modularized Redbol!

Redbol is definitely a case I want to keep central in the design, because it is executing on the pitch of what we're saying: This is a kit that you can rewrite the laws of the language on a whim. And Redbol is a great case to look at, that has tests.

But I also want to see more of how I can fuse together what's happening with LET and virtual binding, along with the sea of words, and what hope there is for string interning. Having things like Whitespace around are good because if some change to the binding model makes it not possible to do what Whitespace does, that indicates it's not the right model. So it's a stake in the ground.