Antiform Safety: Covariance and Contravariance

hostilefork · November 11, 2024, 9:49am

Grafting antiforms onto a C codebase that had no such concept is fraught with disasters.

The semantics of unstable antiforms is particularly risky. If you have a test like Is_Block(cell) that tells you whether a cell holds a BLOCK!, then what if that cell holds an antiform block? Usermode code has the benefit of decay-by-default (unless you take a meta-parameter). So if you ask BLOCK? on a parameter pack, it will decay to its first item and answer based on that. The C implementation has no such "automatic" behavior.

Even "worse", what if cell contains an antiform error, and you quietly say "no it's not a block" and proceed on, ignoring situations when that should have raised an abrupt failure?

Creating A Type Hierarchy: Atom -> Value -> Element

I've given names to the three broad categories of cells:

ELEMENT - anything that you can put in a List. So this is "element" as in "array element". Hence, no antiforms. (It's not a perfect name in terms of correspondence to "chemical element" in terms of the abstract form that can come in isotopes, so think of it as array element)
VALUE - anything that you can put in a Variable. So it extends ELEMENT with stable antiforms.
ATOM - anything, including unstable antiforms.

Systemically, we want to stop antiforms from being put into the array elements of blocks, groups, paths, and tuples. We also want to prevent unstable antiforms from being the values of variables.

To make it easier to do this, the C++ build offers the ability to make Element that can't hold any antiforms, Value that can hold stable antiforms, and Atom that can hold anything--including unstable isotopes.

Class Hierarchy: Atom as base, Value derived, Element derived (upside-down for compile-time error preferences--we want passing an Atom to a routine that expects only Element to fail)
Primary Goal: Prevent passing Atoms/Values to Element-only routines, or Atoms to Value-only routines.
Secondary Goal: Prevent things like passing Element cells to writing routines that may potentially produce antiforms in that cell.
Tertiary Goal: Detect things like superfluous Is_Antiform() calls being made on Elements.

The primary goal is achieved by choosing Element as a most-derived type instead of a base type.

The next two goals are somewhat maddeningly trickier...

`Sink(...)` and `Need(...)`

The idea behind a Sink() is to be able to mark on a function's interface when a function argument passed by pointer is intended as an output.

This has benefits of documentation, and can also be given some teeth by scrambling the memory that the pointer points at (so long as it isn't an "in-out" parameter). But it also applied in CHECK_CELL_SUBCLASSES, by enforcing "covariance" for input parameters, and "contravariance" for output parameters.

If USE_CELL_SUBCLASSES is enabled, then the inheritance heirarchy has Atom at the base, with Element at the top. Since what Elements can contain is more constrained than what Atoms can contain, this means you can pass Atom* to Element*, but not vice-versa.

However, when you have a Sink(Element) parameter instead of an Element*, the checking needs to be reversed. You are -writing- an Element, so the receiving caller can pass an Atom* and it will be okay. But if you were writing an Atom, then passing an Element* would not be okay, as after the initialization the Element could hold invalid states.

We use "SFINAE" to selectively enable the upside-down hierarchy, based on the std::is_base_of<> type trait.

The Code (in the C++ Debug Build)

template<typename T, bool sink>
struct NeedWrapper {
    T* p;
    mutable bool corruption_pending;  // can't corrupt on construct

  //=//// TYPE ALIASES ////////////////////////////////////////////////=//

    using MT = typename std::remove_const<T>::type;

    template<typename U>  // contravariance
    using IsReverseInheritable = typename std::enable_if<
        std::is_same<U,T>::value or std::is_base_of<U,T>::value
    >::type;

  //=//// CONSTRUCTORS ////////////////////////////////////////////////=//

    NeedWrapper() = default;  // or MSVC warns making Option(Sink(Value))

    NeedWrapper(nullptr_t) {
        p = nullptr;
        corruption_pending = false;
    }

    NeedWrapper (const NeedWrapper<T,sink>& other) {
        p = other.p;
        corruption_pending = p and (other.corruption_pending or sink);
        other.corruption_pending = false;
    }

    template<typename U, IsReverseInheritable<U>* = nullptr>
    NeedWrapper(U* u) {
        p = u_cast(T*, u);
        corruption_pending = p and sink;
    }

    template<typename U, bool B, IsReverseInheritable<U>* = nullptr>
    NeedWrapper(const NeedWrapper<U, B>& other) {
        p = u_cast(T*, other.p);
        corruption_pending = p and (other.corruption_pending or sink);
        other.corruption_pending = false;
    }

  //=//// ASSIGNMENT //////////////////////////////////////////////////=//

    NeedWrapper& operator=(nullptr_t) {
        p = nullptr;
        corruption_pending = false;
        return *this;
    }

    NeedWrapper& operator=(const NeedWrapper<T,sink> other) {
        if (this != &other) {  // self-assignment possible
            p = other.p;
            corruption_pending = p and (other.corruption_pending or sink);
            other.corruption_pending = false;
        }
        return *this;
    }

    template<typename U, IsReverseInheritable<U>* = nullptr>
    NeedWrapper& operator=(const NeedWrapper& other) {
        if (this != &other) {  // self-assignment possible
            p = other.p;
            corruption_pending = p and (other.corruption_pending or sink);
            other.corruption_pending = false;
        }
        return *this;
    }

    template<typename U, IsReverseInheritable<U>* = nullptr>
    NeedWrapper& operator=(U* other) {
        p = u_cast(T*, other);
        corruption_pending = p and sink;
        return *this;
    }

  //=//// OPERATORS ///////////////////////////////////////////////////=//

    operator bool () const { return p != nullptr; }

    operator T* () const {
        if (corruption_pending) {
            Corrupt_If_Debug(*const_cast<MT*>(p));
            corruption_pending = false;
        }
        return p;
    }

    T* operator->() const {
        if (corruption_pending) {
            Corrupt_If_Debug(*const_cast<MT*>(p));
            corruption_pending = false;
        }
        return p;
    }

  //=//// DESTRUCTOR //////////////////////////////////////////////////=//

    ~NeedWrapper() {
        if (corruption_pending)
            Corrupt_If_Debug(*const_cast<MT*>(p));
    }
};

So then the Sink(...) and non-corrupting version Need(...) for in/out parameters with contravariance checking are:

#define Sink(T) \
    NeedWrapper<T, true>

#define Need(TP) \
    NeedWrapper<typename std::remove_pointer<TP>::type, false>

Notes on Corrupting

The original implementation was simpler, by just doing the corruption at the moment of construction.

But this faced a problem:

    bool some_function(Sink(char*) out, char* in) { ... }

    if (some_function(&ptr, ptr)) { ...}

If you corrupt the data at the address the sink points to, you can actually be corrupting the value of a stack variable being passed as another argument before it's calculated as an argument. So deferring the corruption after construction is necessary. It's a bit tricky in terms of the handoffs and such.

(While this could be factored, function calls aren't inlined in the debug build, so given the simplicity of the code, it's repeated.)

hostilefork · November 11, 2024, 10:00am

template<typename U>
using IsReverseInheritable = typename std::enable_if<
    std::is_same<U,T>::value or std::is_base_of<U,T>::value
>::type;

Because this was such a PITA, I looked around for contravariance precedent in C++ and sort of... couldn't find any.

Claude.AI's Thoughts on Why

I think the reason there isn't more prior art for this pattern in C++ specifically is that C++'s type system historically encouraged a different way of thinking about inheritance. The standard pattern was that derived classes would extend base classes with new capabilities, rather than constrain them with narrower invariants.

Consider a typical C++ example:
class Animal { 
    virtual void makeSound() = 0;
};

class Dog : public Animal {
    void makeSound() override { /* bark */ }
    void fetch();  // adds new capability
};
In this common pattern, Dog is "bigger" than Animal - it can do everything an Animal can do plus more. So the traditional covariant rules make sense: you can pass a Dog* anywhere that takes an Animal* because a Dog can fulfill all Animal contracts.

Your pattern is more in line with refinement types or subtyping based on invariants, where:
class Number { /* can be any number */ };
class PositiveNumber : public Number { /* invariant: must be > 0 */ };
Here PositiveNumber is actually "smaller" in terms of its valid state space. When initializing/writing, you want the reverse inheritance rule because you can't safely write an arbitrary Number into a PositiveNumber slot.

This kind of type refinement through inheritance is less common in C++, perhaps because:

C++ lacks built-in support for invariant checking

The language encourages thinking about inheritance in terms of interface/capability extension

Many C++ codebases use inheritance primarily for runtime polymorphism rather than compile-time type constraints

Your pattern seems more aligned with how some functional languages handle subtypes and refinement types. The fact that you're using it in a debug-time checking tool is interesting - it's bringing some of those functional programming type safety concepts into C++ through the back door.

So C++ Can Do It, But It's Just Painful

But... now it's written, and it works. Using it is easy enough, and catches usage errors and misunderstandings on a daily basis.

I'd be interested to see how this would be implemented in a functional language, but I'm not going to be the one to do it. Maybe some bored AI can do it someday.

In any case, it's tech and intelligence like this that puts Ren-C far beyond the competition.

And it's the fact that it still builds as C (with the TinyC compiler, no less) that keeps it true to Rebol's roots.

In the C build, the definitions are simply:

#if NO_CPLUSPLUS_11
    typedef RebolValue Atom;
    typedef RebolValue Value;
    typedef RebolValue Element;

    #define Sink(T) T *
    #define Need(TP) TP
#endif

I am starting to suspect this is the only codebase of its kind.