Performance (and Security?) Implications of Binding Leakage

hostilefork · January 13, 2024, 11:27am

In the model being currently brainstormed about for binding, blocks and groups will capture an exhaustive amount of context.

@bradrn's suggestion is that this only happens if the blocks are not quoted. So the following would produce a fully unbound block:

 >> x: 10 a: "a" b: "b" c: "c"

 >> foo: func [y] [
        let z: 30
        return '[x y z]  ; quoted
    ]

 >> foo 20
 == [x y z]  ; no binding

By contrast the following would capture... everything:

 >> x: 10 a: "a" b: "b" c: "c"

 >> foo: func [y <local> alpha beta gamma] [
        let z: 30
        return [x y z]  ; not quoted
    ]

 >> foo 20
 == [x y z]  ; knows X, Y, Z, A, B, C, ALPHA, BETA, GAMMA...

In fact, so would an empty block as just return []. You'd still be able to reach everything.

If you were to accidentally return one of these blocks cross module, you would be supplying the receiver with a context exposing every definition in the module, and every local in the function except those that are overridden. Under some direction of the proposal, even those overridden definitions would be available through a programmatic API that let you climb the environment parent layers.

Seemingly worse, we're suggesting strings would capture this information as well. (I don't know if it's that much worse, although quoting strings seems more belabored to have to do systemically and is less pleasing, especially with using the "s" notation.)

Not A New Problem, Just an Exacerbated One

"Stray bindings" in Rebol blocks have always been a thing. If you make a block [a b c] then you clearly leak the words in it and their values--intentionally or not. But you would also leak any contexts reachable from those words... and by extension any fields in those contexts:

For instance:

rebol2>> o: make object! [private: <secret> public: 10 expr: [public * 20]]

rebol2>> do o/expr
== 200

rebol2>> first o/expr
== public

rebol2>> probe bind? first o/expr
make object! [
    private: <secret>
    public: 10
    expr: [public * 20]
]

Unlike the proposed new model, an empty block in such cases won't let you get to everything the evaluation site had visible. But you can still reach a large enough number of things that it's on the same order of magnitude... if we're just considering what's reachable.

Security In Rebol is a Lost Cause, But...

If you're looking for a language for its relevance to secure programming, look somewhere else.

However, it still seems like something could be done to stop the most egregious cases. Maybe if RETURN: values would need an annotation like <bound> in order to return a bound value... and it would mask out the binding somehow, otherwise.

(Easy for words and strings, but would be hard for something like a mutable block with nested bindings. Specifiers at the "tip" could say "consider this block unbound as you descend it and only return unbound values". Such a specifier exists today, but it only works with immutable blocks... because once you start putting bound material into such a masked structure and expect to see that as bound material in the midst of unbound material, you don't know what bindings were added before or after the mask.)

I've often thought a binding only lets you GET and SET, not follow the pointer to reach other variables in the containing context. We've been in this situation with things like LETs, because they're a little island of one variable...so you can't even ask what object they live in...because they don't live in one.

But if functions are allowed to access the evaluator's notion of "current context" you could subvert this by putting a capturing function in a block you received, evaluate it, and get at the information anyway.

Bigger Issue: Extreme Stress for the GC

It's extremely easy to force indefinite lifetime of function frames in Ren-C. This is already a problem:

 >> some-function: func [x <local> a b c d e f g h i j ...] [
        ... return first [(x)]
    ]

>> blocks: collect [count-up i 100000 [keep maker i]]
== [(x) (x) (x) (x) ... ]

>> results: reduce blocks
== [1 2 3 4 .... ]

Looks like a small 100000 blocks... but it's 100000 frames of arbitrary size being kept alive to preserve one binding.

But... now imagine that happening for every string.

The choice to keep frames reachable after execution by default was done some time ago, kind of in order to try and feature-match JavaScript:

What happens to FUNCTION! arguments and locals when the call ends?

Maybe it's a decision that needs to be revisited. Anyway, just starting a thread for discussing binding leakage.

hostilefork · September 7, 2024, 12:38am

In terms of making use of our parameter conventions, we could say that the $arg convention means "evaluate at callsite and keep the binding", but an ordinary arg would not bring the binding along.

So if you used the $...

machine-control: func [$instruction [word!] [
    print ["lookup of (" instruction ") is:" get instruction]
]

on: 1  off: 0

instructions: reduce [$on $off]

>> machine-control instructions.1
lookup of ( on ) is 1

But without the $...

machine-control: func [instruction [word!] [
    print ["lookup of (" instruction ") is:" get instruction]
]

on: 1  off: 0

instructions: reduce [$on $off]

>> machine-control instructions.1
** Error: `on` word is not bound

Would Unbound By Default Be Too Annoying?

This is one of those decisions that's kind of like whether to make arguments const by default or not. In that case, I feel pretty strongly that it helps to know when a function is going to mutate the thing you're passing it or not, and that a modern language should require you to say when things are mutable...not call out cases when they are const...if they happen to remember.

Both cases would definitely make the code noisier (and it's already developing a lot of symbolic noise by necessity...this isn't a necessity).

But since everything is configurable, perhaps having the core do the right thing...then let you rewrite the spec in your own FUNC version to make normal arg become $arg... that could be the way to go.