Source Mutability--CONST and MUTABLE

One of the immediate incompatible changes people notice in Ren-C is the locking of source. They go to type append [a b] 'c and it gives back an error that the series is "source or permanently locked".

It was motivated by a VERY good reason. To refresh your memory, here is an example of "objectively bad default behavior" the change was attempting to address:

symbol-name: function [symbol [word!]] [
     switch symbol [
         '+ ["plus"]
         '? ["question-mark"]
         ...
     ]
]
...
filename: append (symbol-name '+) ".dat"

This innocent-looking piece of code has a terrible bug. The string you return lives in the SWITCH, so it's actually mutating the string inside the SWITCH. Every subsequent call to SYMBOL-NAME will be affected.

I feel like it shouldn't be controversial to say it should not be this easy to write self-modifying code on accident. Something exactly like this caused a problem in the build system that took me hours to find.

If Rebol is supposed to be more than a toy, it needs answers for usage problems like this--where it is very notably more brittle than other languages.

I knew locking source permanently was heavy-handed...

...but I want the example above to cause an error, vs. silently modify the string in the body of the function. Locking source was the only option we had at the time, and I figured we'd be on a better path to figuring out an answer if we biased it to be an error. It also meant we got more testing of the places in the code which make modifications to check the existing flags. (PROTECT in R3-Alpha was quite half-baked...but no one noticed how buggy it was because it wasn't used much.)

I've suggested that we need a lighter form of lock...something that doesn't make all views of a series have to be unchanging for all time, but that different views of a series be read or write.

Meet CONST and MUTABLE

Just in time for Christmas, we have a pioneering new feature of values being able to be read only or not. You can flip the bit yourself with the CONST and MUTABLE functions:

>> data: mutable [a b c]
== [a b c]

>> data-readonly: const data
== [a b c]

>> append data-readonly 'd
** Access Error: value is CONST (see MUTABLE): [a b c]

>> append data 'd
== [a b c d]

>> data-readonly
== [a b c d]

So it really is different from locking a series. For instance: you can keep write access for yourself, while giving out const access to subroutines you don't want to be doing casual modifications.

But the real win here is that the execution of code defaults to putting a wave of constness on any slots the evaluator fills from "literals"...be those blocks or strings. You see it catching the bug I introduced at the beginning of the post, of the string being changed inside the switch:

>> filename: append (symbol-name '+) ".dat"
** Access Error: value is CONST (see MUTABLE): "plus"

What's the difference between this and before?

Previously source was locked by LOADing and phases prior to a DO. Now, nothing gets const-ed unless it runs. It is the "wave of evaluation" that brings along the constness, and any literals seen are affected.

This means some things won't work that did before. For instance, while this will work:

 block: copy []
 append block <works>

You no longer get a pass on a block that's been put elsewhere (e.g. by a compose) if it looks literal by the time the evaluator sees it, e.g.

 block: copy []
 do compose [append (block) <fails>]

After the compose, all the evaluator sees is append [] <fails>. It doesn't care where that block came from--copied or not, it looks like you're trying to append to a literal. If you produce situations like this deliberately, you'll need to change them to:

 block: copy []
 do compose [append (mutable block) <fails>]

Or...

 block: copy []
 do compose [append mutable (block) <fails>]

Or...

 block: mutable copy []
 do compose [append (block) <fails>]

Any of which you could omit the copy from, if you didn't want a copy.

With DO MUTABLE...emulate Rebol2/R3-Alpha/Red!

There's a secret weapon for compatibility, which is that the way the constness propagates is based on a combination of bits on the values, and on inheritance through the call stack.

A simple use of DO MUTABLE shows you can get away with old-style behaivor:

>> do [append [1 2 3] 4]
** Access Error: value is CONST (see MUTABLE): [1 2 3]

>> do mutable [append [1 2 3] 4]
== [1 2 3 4]

But when execution happens under a mutable "evaluation wave", interpreted functions remember that fact.

>> do [newstyle: function [] [b: [1 2 3] append b 4]]

>> do mutable [oldstyle: function [] [b: [1 2 3] append b 4]]

>> newstyle
** Access Error: value is CONST (see MUTABLE): [1 2 3]

>> oldstyle
== [1 2 3 4]

So you can call a module written to Rebol2 conventions from Ren-C conventions. Moreover, even though the Rebol2-style module will have its series mutable by default, series you pass it from Ren-C code still get the protections:

>> do mutable [oldstyle: function [b] [clear b]]

>> oldstyle [1 2 3]  ; oldstyle called from newstyle context
** Access Error: value is CONST (see MUTABLE): [1 2 3]

This concept of explicit mutability can be applied anywhere, at the branch-level if you like.

>> condition: true

>> either condition (mutable [append [] <success>]) [append [] <fail>]

>> condition: false

>> either condition (mutable [append [] <success>]) [append [] <fail>]
 ** Access Error: value is CONST (see MUTABLE): []

What should be the default?

The SWITCH case I opened with shows why I absolutely think that constness-on-evaluation is the right choice. That's in addition to addressing the speedbump every new user has when they write loop 10 [block: [] ...] and expect block to be reinitialized each time through the loop.

I don't think saying MUTABLE [...] is much of a burden to get deep mutable access to a series when you mean that.

I feel it's better to teach good habits early on. But who knows. Certainly in the console mutability has been the status quo. If modules enforced it by default but the console didn't, would that be a good tradeoff...or just confusing?

2 Likes

Yes, I think it would be.

One question, if a function hands out const access to a value, is the the receiver able to change it to a mutable value? Should this be possible?

There may be more options than just binary ones here, so it's likely best to get some experience. It could be that what's really wanted is a kind of "first wave" of evaluative mutability (top level of module that only runs once, top level of console when you're just entering data) and all people want is x: [a b c] append x 'd, but when it gets to loop 5 [data: [] ... append data ...] situations...or a function definition...they would be happy to have constness.

I really believe that not being consistent between the console and scripts running should be heavily weighed. The console is kind of the place where you try out things and use as a sanity check when debugging. I feel the variation doesn't buy that much--when casual mistakes are so easy to make.

Perhaps there could be a difference between explicit const (irrevocable on that value once applied) and an implicit one, the evaluator just put on itself from a frame. That mechanic may not be too difficult.

But in their current incarnation, const and mutable are "suggestions" and there's no level of privilege escalation. If you want to lock something so no one can get write access on it, you have to LOCK it.

Locking is still necessary for things like using blocks for keys in MAP!, and something more lock-like is probably the only way to imagine safe multithreading.

1 Like

I think this gets to the question of robustness of the language. If this helps Rebol get beyond the perception of being unserious for real development work, then I'm in favor as it seems like a worthy tradeoff. It would need to be documented/taught but I think the additional rigor would lead to better programming practices.

CONST and MUTABLE have been around for a while now, and I think the chosen balance has worked out rather well.

One historical problem point with these mutability features is that there was no compile-time checks to make sure code wasn't violating it. There were tons of cases of PROTECT bits not being honored, simply because there wasn't a check for mutability in some routine. The person hacking on the C code to REVERSE or SORT a series would have to explicitly remember to think that was a mutating operation and check the bit.

The obvious-sounding way to stop these problems from creeping in would be to leverage the const annotation in C and C++. All the routines that modified series would require the caller to have a non-const pointer in their hand...while routines that could be done on read-only series could take either a const or non-const pointer.

So consider the simple example of getting an element at a position in an array:

 REBVAL *ARR_AT(REBARR *array, REBLEN n)
     { ...lots of code... }

Historically this would take in a mutable REBARR array (the only kind there was) and give back a mutable REBVAL value. But what we want is for mutable arrays to give back mutable values, and const arrays to give const values. So we could simply create a wrapper that calls into the mutable implementation but reskins the result as const for const input:

 REBVAL *ARR_AT(REBARR *array, REBLEN n)
     { ...lots of code... }

 inline const REBVAL *ARR_AT(const REBARR *array, REBLEN n)
     { return ARR_AT(m_cast(REBARR*, array), n); }

There's just one problem... C doesn't support overloading. You can't have two functions with the same name and different signatures and have the compiler pick between them. There'd have to be two different names:

 REBVAL *MUT_ARR_AT(REBARR *array, REBLEN n)
   { ...lots of code... }

 inline const REBVAL *ARR_AT(const REBARR *array, REBLEN n)
   { return MUT_ARR_AT(m_cast(REBARR*, array), n); }

This might not seem like that big a deal, but the combinatorics add up. Because now you can't write a generic macro that speaks about array positions...you have to have macros with different names that call the differently named accessors. And consider there are lots of these routines (ARR_HEAD, ARR_TAIL, ARR_LAST... BIN_HEAD, BIN_TAIL... SER_DATA, etc. etc. etc.) It's pretty horrific when you start having this explode with MUT variations and MUT variations of everything that calls them.

I came up with a trick to get around it. Basically, the trick is to sacrifice some amount of const checking in C. First, define a macro for something that resolves to const in C but vaporizes in C++:

#ifdef __cplusplus
    #define const_if_c
#else
    #define const_if_c const
#endif

Then, define the functions like this:

REBVAL *ARR_AT(const_if_c REBARR *array, REBLEN n)
  { ...lots of code... }

#ifdef __cplusplus
    inline const REBVAL *ARR_AT(const REBARR *array, REBLEN n)
         { return ARR_AT(m_cast(REBARR*, array), n); }
#endif

So the C build will give you back a mutable array no matter whether your input array was const or not. But the C++ build only gives back const arrays for const input.

This makes systemic enforcement of mutability checking practical. If you're inside the implementation with a const array, string, or binary... you won't be able to make a call to a C routine that will mutate it. The only way you can get mutable arrays is through specific entry points that extract the array with a runtime check to make sure it's mutable.

It's all in the implementation guts...so it only affects those using the core API, not libRebol. The only thing you need to do is make sure you at some point build the code with a C++ compiler, and it will tell you where any problems are.

2 Likes