C++ Magic for LVALUE Checking...or Use Functions?

If you write traditional C like:

BINDING(cellA) = BINDING(cellB);

Regardless of what the BINDING() macro expands to, there's no way to do any sort of validation in any build (debug or otherwise). Whatever this becomes can only be a simple assignment.

But if you instead wrote:

Tweak_Binding(cellA, Binding_Of(cellB));

Then even if the intent is just a simple assignment in a release build, you could define that in a checked build to something that runs arbitrary code :

#if NO_RUNTIME_CHECKS
    #define Tweak_Binding(dest,src) \ 
        (dest)->extra.binding = (src)->extra.binding
#else
    #define Tweak_Binding(dest,src) \
        Tweak_Binding_With_Added_Checks((dest), (src))
#endif

But if you're willing to confine your extra checks to C++ builds, you can leave the source looking like BINDING(cellA) = BINDING(cellB)

This is because C++ has operator overloading, and you can create a wrapper class which "does the right thing", e.g. notices whether a BINDING(...) is used as a left hand side operator or right side operator.

Just Because I Can, Does That Mean I Should?

Most of my original rationale was that I didn't want the source to make it look like you were making a function call when you were not.

When you see something like Tweak_Binding(...) you don't know how many instructions that's going to be, or what side effects its going to have. But when you see a plain assignment (that you know compiles in C to something quite simple), then that tells you not much is going on.

However...I specifically came up with the term "Tweak" as a convention for functions like this that do very little. It's a pretty good name, and it's learnable to say "oh, if it says Tweak, that means it's as cheap as an assignment".

Here are some of the points to consider:

  1. Because things like BINDING() are doing weird magic allowing them to be on the left side of assignments, by convention I believe they must be in all caps. This is kind of noisy.

    • On the plus side, it can be brief, with the same term used for extracting and for tweaking. Since datatypes use the leading-caps with no underscore convention, plain Binding() looks like a parameterized datatype, so it would have to be Binding_Of() for the extractor
  2. Using C++ magic to get the RUNTIME_CHECKS means the C build won't have assertion-parity with the C build. Bugs that only happen in the C build (perhaps on platforms that only offer C) would thus be harder to find.

  3. The C++ operator overloading is going to be over the heads of those reading the code who only know C.

  4. In debug builds, the C++ compilers do not inline the operator overloading, constructors, etc. that are involved in making the weird objects that are behind the scenes making the trick work. That means using the C++ debugger to step into an expression like BINDING(cellA) = BINDING(cellB) takes an annoying number of step-in and step-out operations.

Here's The "Scary" Implementation

It really isn't rocket science, but it is something. And it's something that can't be compiled by plain C compilers, meaning you can't do DEBUG_CHECK_BINDING in C builds.

#if (! DEBUG_CHECK_BINDING)
    #define BINDING(cell) \
        *x_cast(Context**, m_cast(Node**, &(cell)->extra.node))
#else
    struct BindingHolder {
        Cell* & ref;

        BindingHolder(const Cell* const& ref)
            : ref (const_cast<Cell* &>(ref))
        {
            assert(Is_Bindable_Heart(Cell_Heart(ref)));
        }

        void operator=(Stub* right) {
            Assert_Cell_Writable(ref);
            ref->extra.node = right;
            Assert_Cell_Binding_Valid(ref);
        }
        void operator=(BindingHolder const& right) {
            Assert_Cell_Writable(ref);
            ref->extra.node = right.ref->extra.node;
            Assert_Cell_Binding_Valid(ref);
        }
        void operator=(nullptr_t) {
            Assert_Cell_Writable(ref);
            ref->extra.node = nullptr;
        }
        template<typename T>
        void operator=(Option(T) right) {
            Assert_Cell_Writable(ref);
            ref->extra.node = maybe right;
            Assert_Cell_Binding_Valid(ref);
        }

        Context* operator-> () const
          { return x_cast(Context*, ref->extra.node); }

        operator Context* () const
          { return x_cast(Context*, ref->extra.node); }
    };

    #define BINDING(cell) \
        BindingHolder{cell}
#endif

Looking at them Side-By-Side

BINDING(cellA) = BINDING(cellB);

Tweak_Binding(cellA, Binding_Of(cellB));

I do feel a pretty strong bias for the briefer notation...

But I can see the argument for not doing the "weird" thing when there's no increase in functionality... in fact, only losing functionality in C builds. (Most C++-isms are there to add something that would be fundamentally not possible in C, vs. just syntax sugar like this.)

The invention of the "Tweak" term does change my calculation a little bit here. Because if it's used only in these "single-assignment-equivalent" circumstances, you can comprehend it as a C assignment statement.

After a bit of contemplation, I decided this is the better path. When you look at the code in context, the all-caps BINDING is actually jarring.

I ended up choosing even longer names:

Tweak_Cell_Binding(x, Cell_Binding(y));

You actually get something for your typing there... you reinforce that what you're passing is a cell.

But the term "Tweak" is doing the heavy lifting here, as a term that you know when you see it is doing the equivalent of an assignment...with extra checking only in debug builds.

Plain inline functions are easily adapted to, and it's much easier to debug and instrument. Using "general trickery" may provide some brevity, but at the cost of not having your helper functions tailored to the specific fields you are working with.

I'll invoke one of my favorite phrases lately: "it's a false economy". There are actually several other changes that have made this kind of thing less necessary.

Here's Some More Code In This Vein Getting Dropped

The need for lengthy explanations is another reason for dropping it. Lots of explanation for something that doesn't do anything an inline function can't do better.

NodeHolder

//=//// NODE HOLDER C++ TEMPLATE //////////////////////////////////////////=//
//
// The NodeHolder is a neat trick which is used by accessors like LINK() and
// MISC() to be able to put type checking onto the extraction of a node
// subclass, while not causing errors if used as the left-hand side of an
// assignment (on a possibly uninitialized piece of data).  This means you
// don't need to have separate macros:
//
//    LINK(Property, s) = foo;
//    bar = LINK(Property, s);
//
// It simply puts the reference in a state of suspended animation until it
// knows if it's going to be used on the left hand side of an assignment or
// not.  If it's on the left, it accepts the assignment--type checked to the
// template parameter.  If it's on the right, it runs a validating cast of
// the template parameter type.

#if CPLUSPLUS_11
    template<typename T>
    struct NodeHolder {
        const Node* & ref;

        NodeHolder(const Node* const& ref)
            : ref (const_cast<const Node* &>(ref))
          {}

        void operator=(T &right)
          { ref = right; }
        void operator=(NodeHolder<T> const& right)
          { ref = right.ref; }
        void operator=(nullptr_t)
          { ref = nullptr; }

        T operator-> () const
          { return cast(T, m_cast(Node*, ref)); }

        operator T () const
          { return cast(T, m_cast(Node*, ref)); }
    };

  #if RUNTIME_CHECKS
    template<class T>
    INLINE void Corrupt_Pointer_If_Debug(NodeHolder<T> const& nh)
      { nh.ref = p_cast(Node*, cast(uintptr_t, 0xDECAFBAD)); }
  #endif
#endif

LINK, MISC, INODE Comments

//=//// LINK, MISC, and INODE HELPERS /////////////////////////////////////=//
//
// Every Stub Node has two generic platform-pointer-sized slots, called LINK
// and MISC, that can store arbitrary information.  How that is interpreted
// depends on the Flex subtype (its FLAVOR_XXX byte).  If a Stub isn't a
// Flex that uses its INFO bits, then it can use that space for another
// generic platform-pointer slot.
//
// Some of these slots hold other Node pointers that need to be GC marked.  But
// rather than a switch() statement based on subtype to decide what to mark
// or not, the GC is guided by generic flags in the Stub header called
// LINK_NODE_NEEDS_MARK, MISC_NODE_NEEDS_MARK, and INFO_NODE_NEEDS_MARK.
//
// Yet the link and misc actually mean different things for different subtypes.
// A FLAVOR_NONSYMBOL node's LINK points to a list that maps byte positions to
// UTF-8 codepoint boundaries.  But a FLAVOR_SYMBOL Flex uses the LINK for a
// pointer to another symbol's synonym.
//
// A C program could typically deal with this using a union, to name the same
// memory offset in different ways.  Here `link` would be a union {}:
//
//      BookmarkList* books = string.link.bookmarks;
//      string.link.bookmarks = books;
//
//      const Symbol* synonym = symbol.link.synonym;
//      symbol.link.synonym = synonym;
//
// The GC could then read a generic field like `stub->link.node` when doing
// its marking.  This would be fine in C so long as the types were compatible,
// it's called "type punning".
//
// But that's not legal in C++!
//
//  "It's undefined behavior to read from the member of the union that
//   wasn't most recently written."
//
//  https://en.cppreference.com/w/cpp/language/union
//
// We use a workaround that brings in some heavy checked build benefits.  The
// LINK() and MISC() macros force all assignments and reads through a common
// field.  e.g. the following assigns and reads the same field ("node"), but
// the instances document it is for "bookmarks" or "synonym":
//
//      BookmarkList* books = LINK(Bookmarks, string);  // reads `node`
//      LINK(Bookmarks, string) = books;
//
//      const Symbol* synonym = LINK(Synonym, symbol);  // also reads `node`
//      LINK(Synonym, symbol) = synonym;
//
// The syntax is *almost* as readable, but throws in benefits of offering some
// helpful debug runtime checks that you're accessing what the Flex holds.
// It has yet another advantage because it allows new "members" to be "added"
// by extension code that wouldn't be able to edit a union in a core header.
//
// To use the LINK(), MISC() or INODE(), you define two macros, like this:
//
//      #define LINK_Bookmarks_TYPE     BookmarkList*
//      #define HAS_LINK_Bookmarks      FLAVOR_NONSYMBOL
//
// You get the desired properties of being easy to find cases of a particular
// interpretation of the field, along with type checking on the assignment,
// and a cast operation that does potentially heavy debug checks on the
// extraction.
//

LINK, MISC, INODE Implementation

#if NO_RUNTIME_CHECKS || NO_CPLUSPLUS_11
    #define LINK(Field,stub) \
        *x_cast(LINK_##Field##_TYPE*, m_cast(Node**, &(stub)->link.node))

    #define MISC(Field,stub) \
        *x_cast(MISC_##Field##_TYPE*, m_cast(Node**, &(stub)->misc.node))

    #define INODE(Field,stub) \
        *x_cast(INODE_##Field##_TYPE*, m_cast(Node**, &(stub)->info.node))
#else
    #define LINK(Field,stub) \
        NodeHolder<LINK_##Field##_TYPE>( \
            ensure_flavor(HAS_LINK_##Field, (stub))->link.node)

    #define MISC(Field,stub) \
        NodeHolder<MISC_##Field##_TYPE>( \
            ensure_flavor(HAS_MISC_##Field, (stub))->misc.node)

    #define INODE(Field,stub) \
        NodeHolder<INODE_##Field##_TYPE>( \
            ensure_flavor(HAS_INODE_##Field, (stub))->info.node)
#endif