Code Ordering in Files

hostilefork · February 1, 2021, 9:43am

While thinking about how to update the decade-old Whitespace interpreter project, the reorganization fused the spec for operation definitions along with code that was dispatched by the "virtual machine".

This reorganization had the nice property of keeping the code together with the rule. So it became easier to maintain, and a better showcase of the claimed premise of Rebol-like languages.

Dialect Improvement...at the cost of "First Impressions"

An unfortunate side effect of switching away from inert blocks to using dialecting functions is that those functions must be defined before using them.

Look at how the original implementation started off, going straight into the command specifications.
Then look at the first declarations in the reboot, for CATEGORY and INSTRUCTION. Intimidating!

Even though the specification as a whole got much tighter, the voodoo to get them working is the first thing people see. Those definitions are more bewildering right now than they'll need to be after whipping things into shape (we hope). But no matter what, they're going to be harder to look at than if we could start with the nice command definitions.

Breaking Things Into Their Own Files Is Possible... but...

The blunt tool we have at the moment for dealing with this "first impressions problem" is to move stuff into multiple files. So %ws-dialect.reb for defining things like CATEGORY and OPERATION. Then include that into %ws-commands.reb for the command list. Probably even splitting out %ws-interpreter.reb for the actual engine.

But something about this feels disheartening. The reason it would get broken up isn't really because of any problem with the length per se...the end goal is an amount of code that would be reasonable as one file. It sucks if you have to break it into more files due to a language limitation that doesn't let you organize things within the file the way that you want.

The Nature Of Rebol Is Rigid Ordering

Languages have been moving toward a model where within a file, it doesn't matter what order you put things in. Compilers like Rust don't have to "forward declare" functions, you can use them at the top of a file and declare them later.

But a language need not be compiled to let you use things out of order. Python is interpreted and you can use a function at the top of a file and declare it at the end.

So what's Rebol's excuse?

Rebol is in a different boat fundamentally: it can't "scan ahead" to catalog things, because it can't know what the words defining that thing would mean at the moment it would naturally encounter it..

The problem is that fundamentals like "what declares a function" can change on a whim. The interpreter can't go scouting for the FUNC word and create a note like "oh, a function will be declared in the future!" so that information is available ahead of time. The meaning of FUNC itself could be slated for change, so it could get a false positive. Or it could get a false negative: skipping something that's an abstraction for declaring functions it hasn't heard about.

What If You Could Specify An Out-Of-Order Ordering?

Something that crossed my mind for dealing with the particular situation in Whitespace file was if there was some kind of syntax for "sections", as well as being able to say what order those sections run in.

A first thought might be of something like this:

Rebol [
   Type: 'Module
   Section-Run-Order: [Dialect Interpreter Commands]
]

Commands: Section [
    Stack-Manipulation: category [...]
    Arithmetic: category [...]
    ...
]

Dialect: Section [
    category: func [...] [...]
    operation: func [...] [...]
]

Interpreter: Section [...]

But that's awkward, and it also is invasive to the module's operation (pushing top-level declarations into blocks throws a wrench into everything).

More reasonable would be if you could put multiple module definitions in a file. Then they could automatically sort out their dependencies and find a non-conflicting order:

Rebol [
   Type: 'Module
   Name: 'Whitespace-Commands
   Needs: [Whitespace-Dialect Whitespace-Interpreter]
]

Stack-Manipulation: category [...]
Arithmetic: category [...]

Rebol [  ; imagine still in same file...
    Type: 'Module
    Name: 'Whitespace-Dialect
]

category: func [...] [...]
operation: func [...] [...]

Rebol [
    Type: 'Module
    Name: 'Whitespace-Interpreter
]

...

Having something like this where the order is figured out by the system seems appealing, and this lets you make the "new file" decision on its own merits rather than be forced into it.

I'll point out to @rgchris that being able to put multiple "units" into a single file is another good argument for my "out of band signal" of module-ness. You wouldn't accidentally wind up taking these modules as an argument to a function--they'd clearly escape out of legal syntax and set up a boundary for where units began and ended.

Right now the whitespace interpreter even has a demo file embedded in it... that's kind of cool, and it would be nice if whatever header-isms you'd put in a standalone file could be put on a unit that tagged along inside of another file.

You'd still need some way of saying the file was an aggregate, and explain why everything is glued together in the same file. Not having that would make it confusing if you were looking for those parts and couldn't find them. Still, it could be optional...as the whole file has to be scanned before running anyway, so it would see all the modules in it.

Aggregator> [
    Contents: [
        Whitespace-Commands
        Whitespace-Dialect
        Whitespace-Interpreter
        Demo
    ]
    Description: {
        This file packages together Whitespace components as one file, for easier
        transmission and maintenance.
    }
]

Module> [
    Name: 'Whitespace-Commands
    ...
]

It might be interesting if some properties could be inherited by the contained modules from the aggregator if they weren't overridden (date? license?)

This is all pretty far out, but, unless there's something like this the only way you're going to shuffle the order is breaking things into files.

Tech Note: Execution Does Scan Ahead, But Only for Binding

If you follow through "The Real Story about User and Lib Contexts" you can see how Rebol winds up in the situation where you can have a variable declared after a function and it can still be bound in functions prior to its SET-WORD!.

>> foo: bar: x: '~unset~
>> do [
    bar: func [] [foo]
    foo: func [] [print ["X is" x]]
    x: 1020
    bar
]

X is 1020

So some "looking ahead" is definitely going on.

But this only means the binding is available "a priori". The value doesn't get put into the variable until the SET-WORD! actually reaches the evaluation.

>> foo: bar: x: '~unset~
>> do [
    bar: func [] [foo]
    bar  ; too early
    foo: func [] [print ["X is" x]]
    x: 1020
]
** Script Error: foo is ~unset~

You can try and think up imaginative (crazy) ways to deal with this. For instance: going through a prepass and turning everything into a function stub that couples up SET-WORD!s with code after them, and then caches them on-demand when they're called. That might be fun for special cases, but the generic stub functions would be variadic...and not everything is a function so you'd get false answers for get 'word saying things were ACTION! when they weren't... etc. etc.

So I think it's unwise to fight this. It's better to think about more interesting ways to express the ordering, than to defy the nature of the language to lie and make it seem order-agnostic. It isn't.