Value (vs. Series) Modification Bit: CONST and MUTABLE

hostilefork · December 25, 2018, 8:07pm

Every newbie to Rebol (and every experienced user too!) gets bitten by the intrinsic mutability of source series. A common misunderstanding/mistake might look like:

 blockify: func [x] [
    block: []
    append block x
    return block
 ]

 >> blockify 10
 == [10]

 >> blockify 20
 == [10 20]  ; !!! why didn't `block: []` reset the block?

Some have deemed it easy enough to learn to say block: copy []. But consider the following:

symbol-name: func [symbol [word!]] [
     switch symbol [
         '+ ["plus"]
         '? ["question-mark"]
         ...
     ]
]
...
filename: append (symbol-name '+) ".dat"

This innocent-looking piece of code has a terrible bug. The string you return lives in the SWITCH, so the APPEND is actually mutating the string inside the SWITCH. Every subsequent call to SYMBOL-NAME will be affected.

I feel like it shouldn't be controversial to say it should not be this easy to write self-modifying code on accident. Something equivalent to this (but trickier) caused a problem in the build system that took me hours to find.

If Rebol is supposed to be more than a toy, it needs answers for usage problems like this--where it is notably much more brittle than other languages.

The APPENDs Above Must Fail, But By What Means?

I want those examples to cause errors, vs. silently modify the blocks or strings resident in the bodies of functions.

Yet a lot of off-the-cuff scripting (and test code) relies on the mutability of source, e.g.:

>> append [a b] 'c
== [a b c]

R3-Alpha had the concept of being able to PROTECT a series so that all references to it would be immutable. But if we were to make a rule that all source series were permanently locked, that would be a heavy-handed policy that wouldn't permit alternate styles of coding ever.

I concluded that we needed another--lighter--form of lock...something that doesn't make all views of a series have to be unchanging for all time, but that different views of a series be read or write. And constructs could fiddle this bit as they saw appropriate.

Meet CONST and MUTABLE

Ren-C's pioneering new feature is of values being able to be read only or not. You can flip the bit yourself with the CONST and MUTABLE functions:

>> data: [a b c]
== [a b c]

>> data-readonly: const data
== [a b c]

>> append data-readonly 'd
** Access Error: value is CONST (see MUTABLE): [a b c]

>> append data 'd
== [a b c d]

>> data-readonly
== [a b c d]

>> append mutable data-readonly 'e
== [a b c d e]

It's quite different from locking a series. For instance: you can keep write access for yourself, while giving out const access to subroutines you don't want to be doing casual modifications.

But the real win here is that the execution of code defaults to putting a wave of constness on any slots the evaluator fills from "literals"...be those blocks or strings. You see it catching the bug I introduced at the beginning of the post, of the string being changed inside the switch:

>> filename: append (symbol-name '+) ".dat"
** Access Error: value is CONST (see MUTABLE): "plus"

The Constructs Are In Control

In this model, the constness is applied by anything that thinks of its argument as being iterative.

So for example, the WHILE loop takes its body (and condition) as a <const>-marked parameter.

input: [a b c]

output: []  ; want to get [[a] [b] [c]]

while [item: try take input] [
    block: []
    append block item
    append output block
]

You'll get an error on the APPEND to BLOCK of "CONST or iterative value".

By comparison, EVAL does not take its block argument as a const parameter, so this works without complaining about the appends to data:

>> eval [data: [], append data <1>, append data <2>]
== [<1> <2>]

But it's inherited, so a EVAL inside of a WHILE would have the block it received to do as const, due to the WHILE's influence.

Predicting that functions are likely to be called more than once, FUNC takes its body as CONST...and that constness propagates as the wave of evaluation proceeds through the body.

But notice that as long as the underlying series isn't immutable (due to things like PROTECT), you can subvert the const bit with MUTABLE:

 accumulate: func [x] [
     accumulator: mutable []
     return append accumulator x
 ]

 >> accumulate 10
 == [10]

 >> accumulate 20
 == [10 20]

Emulating historical Rebol2/R3-Alpha/Red conventions just means tweaking the specs for things like FUNC and WHILE. Instead of taking their body parameters as <const>, take them normally.

Should Modules Be Stricter By Default?

The SWITCH case I opened with shows why I absolutely think that constness-on-func-bodies is the right choice. That's in addition to addressing the speedbump every new user has when they write repeat 10 [block: [] ...] and expect block to be reinitialized each time through the loop.

But what should the default be for code that's not in a function or a loop?

Certainly in the console mutability has been the status quo. If modules enforced constness for their top-level code (despite being run only once) but the console didn't, would that be a good tradeoff...or just confusing?

I don't think saying MUTABLE [...] is much of a burden to get deep mutable access to a series when you mean that. I feel it's better to teach good habits early on. But who knows.

IngoHohmann · December 25, 2018, 11:42pm

Yes, I think it would be.

One question, if a function hands out const access to a value, is the the receiver able to change it to a mutable value? Should this be possible?

hostilefork · December 25, 2018, 11:59pm

There may be more options than just binary ones here, so it's likely best to get some experience.

I really believe that not being consistent between the console and scripts running should be heavily weighed. The console is kind of the place where you try out things and use as a sanity check when debugging.

Perhaps there could be a difference between explicit const (irrevocable on that value once applied) and an implicit one, the evaluator just put on itself from a frame. That mechanic may not be too difficult.

But in their current incarnation, const and mutable are "suggestions" and there's no level of privilege escalation. If you want to lock something so no one can get write access on it, you have to LOCK it.

Locking is still necessary for things like using blocks for keys in MAP!, and something more lock-like is probably the only way to imagine safe multithreading.

BlackATTR · December 26, 2018, 5:16pm

I think this gets to the question of robustness of the language. If this helps Rebol get beyond the perception of being unserious for real development work, then I'm in favor as it seems like a worthy tradeoff. It would need to be documented/taught but I think the additional rigor would lead to better programming practices.

hostilefork · September 15, 2020, 3:28am

CONST and MUTABLE have been around for a while now, and I think the chosen balance has worked out rather well.

One historical problem point with these mutability features is that there was no compile-time checks to make sure code wasn't violating it. There were tons of cases of PROTECT bits not being honored, simply because there wasn't a check for mutability in some routine. The person hacking on the C code to REVERSE or SORT a series would have to explicitly remember to think that was a mutating operation and check the bit.

The obvious-sounding way to stop these problems from creeping in would be to leverage the const annotation in C and C++. All the routines that modified series would require the caller to have a non-const pointer in their hand...while routines that could be done on read-only series could take either a const or non-const pointer.

So consider the simple example of getting an element at a position in an array:

 Cell* Array_At(Array* array, Index n)
     { ...lots of code... }

Historically this would take in a mutable Array (the only kind there was) and give back a mutable Cell. But what we want is for mutable arrays to give back mutable cells, and const arrays to give const cells. So we could simply create a wrapper that calls into the mutable implementation but reskins the result as const for const input:

 Cell* Array_At(Array* array, Index n)
     { ...lots of code... }

 inline const Cell* Array_At(const Array* array, Index n)
     { return Array_At(m_cast(Array*, array), n); }

There's just one problem... C doesn't support overloading. You can't have two functions with the same name and different signatures and have the compiler pick between them. There'd have to be two different names:

 Cell* mutable_Array_At(Array* array, Index n)
   { ...lots of code... }

 inline const Cell* Array_At(const Array* array, Index n)
   { return mutable_Array_At(m_cast(Array*, array), n); }

This might not seem like that big a deal, but the combinatorics add up. Because now you can't write a generic macro that speaks about array positions...you have to have macros with different names that call the differently named accessors. And consider there are lots of these routines (Array_Head, Array_Tail, Array_Last... Binary_Head, Binary_Tail... Series_Data, etc. etc. etc.) It's pretty horrific when you start having this explode with mutable_XXX variations and mutable_XXX variations of everything that calls them.

I came up with a trick to get around it. Basically, the trick is to sacrifice some amount of const checking in C. First, define a macro for something that resolves to const in C but vaporizes in C++:

#ifdef __cplusplus
    #define const_if_c
#else
    #define const_if_c const
#endif

Then, define the functions like this:

Cell* Array_At(const_if_c Array* array, Index n)
  { ...lots of code... }

#ifdef __cplusplus
    inline const Cell* Array_At(const Array* array, Index n)
         { return Array_At(m_cast(Array*, array), n); }
#endif

So the C build will give you back a mutable array no matter whether your input array was const or not. But the C++ build only gives back const arrays for const input.

This makes systemic enforcement of mutability checking practical. If you're inside the implementation with a const array, string, or binary... you won't be able to make a call to a C routine that will mutate it. The only way you can get mutable arrays is through specific entry points that extract the array with a runtime check to make sure it's mutable.

It's all in the implementation guts...so it only affects those using the core API, not libRebol. The only thing you need to do is make sure you at some point build the code with a C++ compiler, and it will tell you where any problems are.

hostilefork · November 20, 2024, 4:12am

The const feature was implemented via CELL_FLAG_CONST, a bit that a cell could have set (or not) in its header.

But there was a second bit... called CELL_FLAG_EXPLICITLY_MUTABLE.

This bit protected a block from the "wave of constness".

It was necessitated by some behaviors that I ultimately deemed to be bugs. Now that those bugs are gone, it's no longer mandatory to have it.

So I took it out, but found it broke a test:

block: mutable [a b c]
eval compose:deep $() [repeat 2 [append (block) <legal>]]
assert [block = [<legal> <legal>]]

So previously, that mutable would put the CELL_FLAG_EXPLICITLY_MUTABLE bit on. Then, when the COMPOSE ran, you'd get:

 eval [repeat 2 [append [a b c] <legal>]]

Because REPEAT is iterative, it applies CONST to its body. And that CONST would get applied to the [a b c]...but, CELL_FLAG_EXPLICITLY_MUTABLE overruled it.

But I Don't Like CELL_FLAG_EXPLICITLY_MUTABLE

There are other ways to get past the const wave, such as quoting it:

block: [a b c]
eval compose:deep $() [repeat 2 [append '(block) <legal>]]

Or you could do append mutable:

block: [a b c]
eval compose:deep $() [repeat 2 [append mutable (block) <legal>]]

Admittedly, these require you to have some control of the code the block is being fed into, vs. being able to put a "magic bit" on the block to counteract the const wave.

But the semantics of this magic bit are nebulous. I would not have added such a thing if it wasn't for working around the things I decided were bugs. It's not clear when (if ever) it should stop protecting the value, and that just opens a can of worms.

Cell flags are a pretty scarce resource, and giving one up for something that makes the code harder to understand than a simple CONST bit that's on or off is not a good investment.