Representing Everything About A Parameter (...except its name)

hostilefork · January 6, 2021, 6:29pm

I'm aiming to draw out the string labels in parameter lists into their own more compact form, that's just the pointer to the string name.

We can think of splitting out the symbol as if decorations we currently put on the parameter would be moved to the block:

 func ['foo [<end> word!] /bar [integer!] /no-arg] [...]
 =>
 func [foo '[<end> word!] bar /[integer!] no-arg-refinement /[]] [...]

That's just how to think of how it's stored. We don't have to write our specs like that...

There are a number of good reasons to do this, e.g. implementing "hidden classes" as in V8...each time you do for-each [x y] ... or make object! [a: 10, b: 20] the system should detect the similarity of the key collection and reuse it, instead of needing unique [x y] and [a b] list copies allocated every time.

(I'll point out that consistent chipping away at efficiencies means that things like /[a b] do not take up more space than /[a b] or [a b]/ or [a b]. or .[a b] ... though these forms are immutable... keep that in mind...)

This reduces how much information a "PARAM!" stores

Currently all the information for a parameter--including the symbol, types, and other modes--is stuffed into an internal Frankenstein-like type called a PARAM!. It's compressed into a single cell as a mismash of packed bits and a pointer to a spelling.

(Historical Note: R3-Alpha acted like these freakish cells were WORD!s...but with an off-to-the-side flag that marked them as "UNWORD"s. This meant they stored a bunch of type bits where most words would store a binding. These fake words could easily leak and crash the system, so Ren-C gave parameters a dedicated internal type, asserting on cases of use as if they were WORD!.)

The symbol takes one of the four slots in the PARAM!. The cell header takes another. So what's left in the remaining two slots is just a bunch of bits... 64 bits is what's available on both 32-bit and 64-bit platforms.

There's a bit for whether or not each fundamental type--like a BLOCK! or TEXT!--is accepted by the parameter. Then there are bits for "is this parameter <skip>-able" or "can this parameter be the <end> of input". This means the number of fundamental types allowed has been less than 64...as other parameter options have to fit in this set too.

Can PARAM! be replaced with "normal" values?

What if params were not a mysterious compressed form, but values that could be inspected more directly as a "parameter spec"?

I showed this "represenational concept" above (again, just to think of it as the system stores it, not as how you write it at source level):

 func [foo '[<end> word!] bar /[integer!] no-arg-refinement /[]] [...]

When the burden of representing the parameter name is removed, then '[<end> word!] could be the value that represents how the evaluator deals with the foo parameter. Today the HELP gets a capture of these typeset blocks in the spec just for reference purposes. But this would mean that what help used would concretely match what was in the spec.

Challenge: Mutability of Referenced Type Words

Let's imagine you did this:

>> foo: func [name [text!]] [print [name]]
>> foo "before"
before

>> text!: integer!
>> foo "after"
after  ; did not reflect the change

Today it would not reflect the change. This is because when that TEXT! word is looked up, it finds one of the built in DATATYPE!s for representing text, and sets the corresponding bit in the PARAM!. No matter how you change TEXT!, it will only affect functions created after that point...not any that already set their bits.

We could address this by saying that if you use any WORD!s in the type spec of a function, then the binding of that word gets forcibly protected (if it isn't already).

>> x!: text!

>> foo: func [name [x!]] [print [name]]

>> x!: integer!
** Access Error: variable x! locked
; ^-- it would be helpful to mention a "lock reason", though we are a bit short
; on bits for putting in these reasons...maybe only done in 64-bit builds?

This would give a persistence so the parameter description could store x! without worrying about its meaning changing. That means you can do things like type check a parameter for a specialization at specialization time...and trust it doesn't need rechecking when used. It also allows performance tricks that cache bits to make the check faster without having to look up the word every time (since you know it won't change).

It's a little harsh-seeming, but the type dialect has to be hardened somehow. If you needed to use X! locally for something else, you've always got use [x!] [...] to create a new context for it.

Challenge: Performance

Checking a bit for a fundamental type in a typeset is pretty fast. Matching a value against a rich type specification dialect isn't necessarily fast, and this is something every function (including natives) do. It's particularly important for natives, because they interpret the bits of the cell assuming it has been checked...getting the wrong thing means it will crash.

This is where internal compactions could come into play. Users might see the parameter spec as [text! integer!] but the system could recognize specific common patterns like that and compress them into something like today's PARAM! bits, behind the scenes.

We might want to rethink the usage of things like <end> to not use TAG!, but to use a type that can be interned for speed. The problem with tags is that you could see <end> but actually have next <mend>...so if you had a process of locking down the symbol for speedy recognition you'd not be able to do it. By contrast, ISSUE! (token) has no position and could be canonized to a word, so the process of checking could turn #end into something that is matched faster.

Challenge: Mutability Part II - Type Predicates

I've suggested essentially the end of the TYPESET! datatype as a concept; replacing it with functions. This would mean something like:

any-type!: :any-type?

This would have the same issue with locking, so once you used ANY-TYPE! in a function spec you couldn't change that particular binding's value of ANY-TYPE! to anything else.

But further, there needs to be a rule that ANY-TYPE? is a pure function. It needs to give the same answers for the same input, and that answer cannot depend on anything about that input that can mutate.

Imagine that you specialized a function with a mutable BLOCK!, and the constraint was that it was a BLOCK! of length 2. Then you append to the block, and call the specialization. It no longer matches.

What you could ultimately end up with is a situation where you pay for type checking of specialized arguments every time (which would also mean you couldn't use the slot where the type information would have been for the specialized value...because you'll need both at the same time, an optimization loss)

Challenge: Generics Throw A Wrench Into Types

There has never been a good answer to how GENERICs (what Rebol2/Red called actions) work. If you have something like APPEND that's defined to allow you to append to strings and blocks...but then later add an extension that implements GOB!s, how do you say that APPEND now accepts GOB!s...and how do you constrain the parameters to indicate that?

Historically, the grab bag of parameters for what these generic functions allow or don't is just updated in the bootstrap files. But users and extensions can't really do this.

This is a topic in its own right--but it's worth mentioning.

Inventory Of Parts to be Represented

That's a lot to take in, but I'll close with the list of things that PARAM! bits currently encode:

Some of these parts are on the element that names the parameter itself:

the spelling of the parameter's name (currently case-sensitive, please read and discuss implications)
its quoting status (WORD! -> normal evaluation, QUOTED! WORD! -> hard literal, GET-WORD! -> soft literal)
if it's a refinement that outputs to a variable, which can also be used by multiple return...indicated by being a SET-WORD! if so
if it's a local, shown by a leading dot (it's possible to indicate that a range of ordinary words are all local by prefixing them with the <local> tag, e.g. <local> x y is the same as .x .y)
whether it is optional or not, denoted by a leading slash

The rest is in a BLOCK! which specifies what types the parameter accepts. This includes other attributes of the parameter that don't fit on the first value:

if it is willing to accept NULL or not--denoted by <opt>. (Note: this has been a gray area in "typesets" as NULL is not a value and "has no type")
if it is willing to treat the end of a series as if it had received NULL--denoted by <end>. (to help with the conflation, a separate function allows to ask if a parameter's null actually came from reaching the end or not)
if the parameter will be skipped over and given as NULL if there is not a precisely matching type in that position slot, denoted by <skip> (Note: this is only available on hard literal parameters)
if the parameter is variadic, denoted by <variadic> (Note: this was once <...> but that is now a 4-element TUPLE! corresponding to [< _ _ >]. While it might seem like that "should be a tag!" that would be a broken interpretation since < is a WORD! and if used for a function or object it would need </refinement or <.field to be PATH! and TUPLE! respectively)
whether a parameter is modal, and controls the optional parameter directly after it in the parameter order. (Note: this is an experimental feature that is weird and is still being studied, but it has some places where it's looking like it is fairly critical.)

Addendum: ...that's only information the evaluator uses...!

That lengthy list doesn't include the HELP description string...which is just the tip of the iceberg for what a user might want to annotate arguments on a function with.

But Ren-C pushes all of the help information into a "meta" object. Function makers have a low-level form that doesn't bother making this object (e.g. specialize*) and then a higher-level version that does. There's a certain amount of default information put into the object:

>> meta: meta-of :append
>> words of meta
== [description return-type return-note parameter-types parameter-notes

>> meta/parameter-notes/dup
== "Duplicates the insert a specified number of times"

You can tweak the object to your liking, and use FRAME!s as maps from parameter to value. For example, you could track a property for each parameter being either <cool> or <uncool>:

 >> append meta compose [coolness: (make frame! :append)]
 >> meta/coolness/line: <cool>
 >> meta/coolness/part: <uncool>

>> meta/coolness
== make frame! [
    series: '~unset~
    value: '~unset~
    part: <uncool>
    only: '~unset~
    dup: '~unset~
    line: <cool>
]

It's far from perfect, but it pushes the information out into the open where things like HELP can process it...and it's critical to writing code that inherits and manipulates the information.