Rebol And Scopes: Well, Why Not?

hostilefork · January 25, 2024, 1:36am

Here we are a mere two weeks later, and I have managed to put together "Pure Virtual Binding II". It actually didn't take but a couple of days to get it booting. But this time, I've pushed through getting it to run a non-trivial set of things:

UPARSE (now with a LET combinator!)
the Whitespace Interpreter (including the Redbol emulated old version)
the ODBC dialect
@rgchris's HTTPD Server
@BlackATTR's Query dialect
bootstrap via Rebmake

Changes were needed to the usermode code. But most of the time taken was more about figuring out what to change. The total number of edits needed were actually smaller than I would have thought.

What's Different This Time

No attempt to propagate binding through structural operations.

Now you can say do [print "Hello"] and have it work fine, yet if you say get first [print "Hello"] it will say that the PRINT is unbound. Propagating the binding from the block to its elements is done implicitly by the evaluator, but everywhere else you need to do explicit lookup.

It was very useful here to have @bradrn's fresh perspective, to advocate for this behavior. My bias to running existing code hadn't quite absorbed that working with unbound material isn't that bad, if environments to do lookup in are at hand. So I had tried to bake the binding propagation into structural operations like PICK and FOR-EACH.
QUOTED! values lose a quote level under evaluation, but their binding is not affected.

This is something that I'd suspected would be needed, but again @bradrn's no-baggage perspective helped push the behavior...regardless of what it broke!

While I conceived this more for shielding a value's binding state, it can also be used to easily generate unbound material. Unbound material is more useful than it has been in the past, as you can compose it into places that provide the needed context and it will run as if it were written there to start with. As best practices emerge, we may see people making data-oriented lists as '[apple banana pear] to avoid generating spurious bindings.

But because unbound code doesn't have a lot of historical usage, the main consequence of this change was to break code where you try to pass a function a variable as 'var. That function won't be able to GET or SET the unbound word. The simplest workaround for this at the moment is to pass (in [] 'var) which uses a dummy block to capture the evaluator's current concept of environment.

(Note you can't do this as first [var], because as mentioned, even though the block is bound the things you pick out of it structurally are not.)
Strings don't carry binding (but you can--for instance--wrap them in a block).

I wound up being kind of uncomfortable with strings carrying a hidden capture of the entire environment they came from. It might not seem like such a big deal--given that we're saying any random block can do that--and blocks are everywhere. But I'm much more relaxed about this change by limiting the environment capture to arrays.

One benefit is you can use the case of a string not being wrapped in a block as an optimization... to say "this string doesn't have any escapes in it". (I actually did this with the CSCAPE interpolation: if you pass the emitter a raw string, it just outputs it as-is and doesn't parse it.)

Put Another Way...

The system only does "automatic" binding inheritance in evaluated code (PICKs and such do not count)

>> x: 10

>> code: [x + 1]
== [x + 1]  ; block visit during assignment bound its *tip*

>> eval code
== 11  ; the EVAL propagated block's tip binding to X inside

>> get first code
** Error: x is not bound  ; FIRST isn't evaluative, it's structural

Once a binding is on something, the system won't override it except at clearly specified points (something has to BIND or UNBIND things, typically immutably)

>> x: 10

>> code: [x + 1]
== [x + 1]  ; block bound at tip

>> inner: lambda [x] compose* [eval (code)]

>> inner 100
== 11  ; nested block did not have binding overridden

>> outer: lambda [x] code

>> outer 100
== 101  ; LAMBDA and FUNC inject their binding at body tip only

Even when evaluated, quoted material retains whatever binding it had (which the majority of the time means it will be unbound)

>> x: 10

>> word: 'x
== x  ; unbound

>> get word
** Error: x is not bound

>> word: $x
== x  ; bound ($... can be used to get bound material)

>> get word
== 10

>> code: '[x + 1]
== [x + 1]  ; unbound (quoting suppresses tip binding under evaluation)

>> eval code 
** Error: x is not bound

Let me be clear: coding within this new model is not trivial. It requires writing dialects that perform composition with a conscious awareness of binding at each step.

But it's much better than spraying meaningless bindings on code...in the hopes that a meaningful one will ultimately "win" as deep walks of mutable binding are applied in waves.

Not finding yourself in a situation where it's too late for you to bind code is empowering across the board. Again: I don't have all the answers, but it makes about as much sense as I think this can make.

Overall Impression: It's The Way Forward

I've said before that this is important, because it offers means for things people have always wanted to do. Like looking up variables to do string interpolation. Or more generally: "treat this unbound code as if it had been written here".

But a crucial aspect is that this has much more of a feeling of things being under control.

There is a sanity and a logic to diagnosing binding problems. You have the compounded binding environment in your hands, in the moment. Previously you could only sift through the scattershot results of however many binding waves that happened before.

It does require a new way of thinking about what you're doing, but at least you can think.

Performance Is An Issue

Historical binding did have one thing going for it: gluing the binding onto things meant you knew where to look up words without having to search through a linked list of objects every time.

But we're talking about bound values only coming into existence under evaluation. This means that the entire body of a function is effectively unbound at the start of each run.

I haven't done any optimization yet, and empirically it seems to be about 25% slower. That's actually not bad considering that I haven't put any attention on performance...but I do feel like there's going to be limits to how good it can be.

UPDATE: Adding the optimization of caching in function bodies whether a word is in that function or not brings it more to approximately 10%. Lots more to look at, but that's encouraging.

Working Through A Few Things Before Committing This

I'm aiming to commit this to GitHub soon-ish. But I have a few more things to look at:

The tests. I got the test runner to work, and it ran to the end without crashing. But there were still 300 failures, after I'd just gotten that number down to like a dozen after facing a similar number of issues in the year-of-isotopes. I felt like looking at real code first...and I'll probably be pushing this out before all the minutiae of thought experiments get rethought.
Get some more clarity on attachment binding. It's a sketchy concept, but is required for the system to run at the moment. It's bothersome because it creates a "neither bound nor unbound" state that complicates code that wants a more definitive answer on whether words are bound or unbound--that doesn't change out from ounder them.
Cleanup old unnecessary code, e.g. the code for propagating bindings through structural operations (which is still there, just deactivated).
Do at least a little bit of looking into what can be done about performance.

But overall, it feels about as right as I think it's going to feel. So that's a win!