Representing Everything About A Parameter (...except its name)

hostilefork · January 6, 2021, 6:29pm

I'm aiming to draw out the string labels in parameter lists into their own more compact form, that's just the pointer to the string name.

We can think of splitting out the symbol as if decorations we currently put on the parameter would be moved to the block:

 func ['foo [<end> word!] /bar [integer!] /no-arg] [...]
 =>
 func [foo '[<end> word!] bar /[integer!] no-arg-refinement /[]] [...]

That's just how to think of how it's stored. We don't have to write our specs like that...

There are a number of good reasons to do this, e.g. implementing "hidden classes" as in V8...each time you do for-each [x y] ... or make object! [a: 10, b: 20] the system should detect the similarity of the key collection and reuse it, instead of needing unique [x y] and [a b] list copies allocated every time.

(I'll point out that consistent chipping away at efficiencies means that things like /[a b] do not take up more space than /[a b] or [a b]/ or [a b]. or .[a b] ... though these forms are immutable... keep that in mind...)

This reduces how much information a "PARAM!" stores

Currently all the information for a parameter--including the symbol, types, and other modes--is stuffed into an internal Frankenstein-like type called a PARAM!. It's compressed into a single cell as a mismash of packed bits and a pointer to a spelling.

(Historical Note: R3-Alpha acted like these freakish cells were WORD!s...but with an off-to-the-side flag that marked them as "UNWORD"s. This meant they stored a bunch of type bits where most words would store a binding. These fake words could easily leak and crash the system, so Ren-C gave parameters a dedicated internal type, asserting on cases of use as if they were WORD!.)

The symbol takes one of the four slots in the PARAM!. The cell header takes another. So what's left in the remaining two slots is just a bunch of bits... 64 bits is what's available on both 32-bit and 64-bit platforms.

There's a bit for whether or not each fundamental type--like a BLOCK! or TEXT!--is accepted by the parameter. Then there are bits for "is this parameter <skip>-able" or "can this parameter be the <end> of input". This means the number of fundamental types allowed has been less than 64...as other parameter options have to fit in this set too.

Can PARAM! be replaced with "normal" values?

What if params were not a mysterious compressed form, but values that could be inspected more directly as a "parameter spec"?

I showed this "represenational concept" above (again, just to think of it as the system stores it, not as how you write it at source level):

 func [foo '[<end> word!] bar /[integer!] no-arg-refinement /[]] [...]

When the burden of representing the parameter name is removed, then '[<end> word!] could be the value that represents how the evaluator deals with the foo parameter. Today the HELP gets a capture of these typeset blocks in the spec just for reference purposes. But this would mean that what help used would concretely match what was in the spec.

Challenge: Mutability of Referenced Type Words

Let's imagine you did this:

>> foo: func [name [text!]] [print [name]]
>> foo "before"
before

>> text!: integer!
>> foo "after"
after  ; did not reflect the change

Today it would not reflect the change. This is because when that TEXT! word is looked up, it finds one of the built in DATATYPE!s for representing text, and sets the corresponding bit in the PARAM!. No matter how you change TEXT!, it will only affect functions created after that point...not any that already set their bits.

We could address this by saying that if you use any WORD!s in the type spec of a function, then the binding of that word gets forcibly protected (if it isn't already).

>> x!: text!

>> foo: func [name [x!]] [print [name]]

>> x!: integer!
** Access Error: variable x! locked
; ^-- it would be helpful to mention a "lock reason", though we are a bit short
; on bits for putting in these reasons...maybe only done in 64-bit builds?

This would give a persistence so the parameter description could store x! without worrying about its meaning changing. That means you can do things like type check a parameter for a specialization at specialization time...and trust it doesn't need rechecking when used. It also allows performance tricks that cache bits to make the check faster without having to look up the word every time (since you know it won't change).

It's a little harsh-seeming, but the type dialect has to be hardened somehow. If you needed to use X! locally for something else, you've always got use [x!] [...] to create a new context for it.

Challenge: Performance

Checking a bit for a fundamental type in a typeset is pretty fast. Matching a value against a rich type specification dialect isn't necessarily fast, and this is something every function (including natives) do. It's particularly important for natives, because they interpret the bits of the cell assuming it has been checked...getting the wrong thing means it will crash.

This is where internal compactions could come into play. Users might see the parameter spec as [text! integer!] but the system could recognize specific common patterns like that and compress them into something like today's PARAM! bits, behind the scenes.

We might want to rethink the usage of things like <end> to not use TAG!, but to use a type that can be interned for speed. The problem with tags is that you could see <end> but actually have next <mend>...so if you had a process of locking down the symbol for speedy recognition you'd not be able to do it. By contrast, ISSUE! (token) has no position and could be canonized to a word, so the process of checking could turn #end into something that is matched faster.

Challenge: Mutability Part II - Type Predicates

I've suggested essentially the end of the TYPESET! datatype as a concept; replacing it with functions. This would mean something like:

any-type!: :any-type?

This would have the same issue with locking, so once you used ANY-TYPE! in a function spec you couldn't change that particular binding's value of ANY-TYPE! to anything else.

But further, there needs to be a rule that ANY-TYPE? is a pure function. It needs to give the same answers for the same input, and that answer cannot depend on anything about that input that can mutate.

Imagine that you specialized a function with a mutable BLOCK!, and the constraint was that it was a BLOCK! of length 2. Then you append to the block, and call the specialization. It no longer matches.

What you could ultimately end up with is a situation where you pay for type checking of specialized arguments every time (which would also mean you couldn't use the slot where the type information would have been for the specialized value...because you'll need both at the same time, an optimization loss)

Challenge: Generics Throw A Wrench Into Types

There has never been a good answer to how GENERICs (what Rebol2/Red called actions) work. If you have something like APPEND that's defined to allow you to append to strings and blocks...but then later add an extension that implements GOB!s, how do you say that APPEND now accepts GOB!s...and how do you constrain the parameters to indicate that?

Historically, the grab bag of parameters for what these generic functions allow or don't is just updated in the bootstrap files. But users and extensions can't really do this.

This is a topic in its own right--but it's worth mentioning.

Inventory Of Parts to be Represented

That's a lot to take in, but I'll close with the list of things that PARAM! bits currently encode:

Some of these parts are on the element that names the parameter itself:

the spelling of the parameter's name (currently case-sensitive, please read and discuss implications)
its quoting status (WORD! -> normal evaluation, QUOTED! WORD! -> hard literal, GET-WORD! -> soft literal)
if it's a refinement that outputs to a variable, which can also be used by multiple return...indicated by being a SET-WORD! if so
if it's a local, shown by a leading dot (it's possible to indicate that a range of ordinary words are all local by prefixing them with the <local> tag, e.g. <local> x y is the same as .x .y)
whether it is optional or not, denoted by a leading slash

The rest is in a BLOCK! which specifies what types the parameter accepts. This includes other attributes of the parameter that don't fit on the first value:

if it is willing to accept NULL or not--denoted by <opt>. (Note: this has been a gray area in "typesets" as NULL is not a value and "has no type")
if it is willing to treat the end of a series as if it had received NULL--denoted by <end>. (to help with the conflation, a separate function allows to ask if a parameter's null actually came from reaching the end or not)
if the parameter will be skipped over and given as NULL if there is not a precisely matching type in that position slot, denoted by <skip> (Note: this is only available on hard literal parameters)
if the parameter is variadic, denoted by <variadic> (Note: this was once <...> but that is now a 4-element TUPLE! corresponding to [< _ _ >]. While it might seem like that "should be a tag!" that would be a broken interpretation since < is a WORD! and if used for a function or object it would need </refinement or <.field to be PATH! and TUPLE! respectively)
whether a parameter is modal, and controls the optional parameter directly after it in the parameter order. (Note: this is an experimental feature that is weird and is still being studied, but it has some places where it's looking like it is fairly critical.)

Addendum: ...that's only information the evaluator uses...!

That lengthy list doesn't include the HELP description string...which is just the tip of the iceberg for what a user might want to annotate arguments on a function with.

But Ren-C pushes all of the help information into a "meta" object. Function makers have a low-level form that doesn't bother making this object (e.g. specialize*) and then a higher-level version that does. There's a certain amount of default information put into the object:

>> meta: meta-of :append
>> words of meta
== [description return-type return-note parameter-types parameter-notes

>> meta/parameter-notes/dup
== "Duplicates the insert a specified number of times"

You can tweak the object to your liking, and use FRAME!s as maps from parameter to value. For example, you could track a property for each parameter being either <cool> or <uncool>:

 >> append meta compose [coolness: (make frame! :append)]
 >> meta/coolness/line: <cool>
 >> meta/coolness/part: <uncool>

>> meta/coolness
== make frame! [
    series: '~unset~
    value: '~unset~
    part: <uncool>
    only: '~unset~
    dup: '~unset~
    line: <cool>
]

It's far from perfect, but it pushes the information out into the open where things like HELP can process it...and it's critical to writing code that inherits and manipulates the information.

hostilefork · January 12, 2021, 12:03am

First Step of Mission Accomplished: we can convert actions to frames and back without any allocations! (Another good side-effect: object representations are now about 5/8 the size of how big they are in R3-Alpha....)

The frame you get has the names as the keys, with typesets as the values:

>> interface: as frame! :append

>> find interface/series integer!
; null   (because APPEND doesn't allow INTEGER! in its series argument)

>> find interface/series block!
== #[true]

>> append-alias: as action! :interface
== #[action! [series value /part /only /dup /line]]

>> append-alias [a b c] 'd
== [a b c d]

Of course if you try to look at this frame, it will be hideous, because typesets are hideous. Things like ANY-VALUE! mold out as every single type.

What you would want (and what HELP wants) is something that looks much more like what you typed at source level. Plus, as this thread brings up, you want to be able to know more than just the types...you want to know if the parameter is quoted or not...if it's endable, if it's const, if it's a refinement, etc. etc.

This is About More Than Just Knowing...

Clearly it's useful to have a representation of an action, where the keys are parameter names and the values are parameter types/modes.

But besides just getting an immutable description of an action as a frame, we'd like to be able to get a mutable copy...tweak it, and make a new action!

So imagine:

>> f: copy as frame! :append

>> f/value: make typeset! [integer!]

>> append-only-integers: as action! f

That is actually pretty close to working. But we want to be able to change more than just the types accepted...we'd like to change the quoting convention, and whether something is a refinement or not.

As this thread's goal states: we need to find a way of packing all that information up into a value...the value that you get from this mapping in the context.

Big Picture: Building Actions From Scratch

This direction is all part of the vision laid out in "Seeing All ACTION!s As Variadic FRAME!-makers", which was written almost exactly two years ago.

That was before things like AUGMENT existed. But we can now see operations like AUGMENT as being something that comes from being able to append new fields to a copy of a frame, and then as action! on the bigger frame.

I wondered in the original post "where does the body go in the frame spec?" But I think the answer is now that it doesn't go anywhere... you just ADAPT code into the frame.

Plus, I think that you should be able to give frames to ADAPT...and when you do, that means the bound code will see the locals in the frame.

 foo: func [x [integer!] <local> y] [...]

 bar: adapt :foo [... this would bind to x, not y ...]

 f1: make frame :foo
 baz: adapt f1 [... again just see x, not y ...]

 f2: make frame! [x: [integer!] y: _]  ; or whatever notation for "local" is

 mumble: adapt f2 [... this would bind to x *and* y ...]

The concept here is that once you seal something up as an action, the locals are no longer part of the interface...so anyone building on top of it can't see them. But when you're making an action from scratch, you can set up a frame with locals and adapt code into it that sees those locals.

So basically, when you first MAKE FRAME! on a BLOCK!, you get something that has no implementation. If you were to DO it, it would error... because what the fields contain are parameter descriptions and not intended parameter values. But then you'd ADAPT it to give it a body.

This should make it so that writing your own FUNC-like generator, or your own SPECIALIZE or AUGMENT or other concept... will be very easy!

hostilefork · January 25, 2021, 1:21am

I poked a little further along on prototyping this concept, and hit a bit of a printing issue.

The BLOCK!s for types were looking all right. But once I tried to put a leading slash on them to indicate the refinements, quote marks started popping up...even without any quoted parameters:

>> as frame! :append
== make frame! [
    return: '[port! map! object! module! bitset!]:
    series: [any-series! port! map! object! module! bitset!]
    value: [<opt> any-value!]
    part: '/[any-number! any-series! pair!]
    only: '/[]
    dup: '/[any-number! pair!]
    line: '/[]
]

The quotes show up for a reason... it's to suppress evaluation. They're left off as a nicety when the value doesn't evaluate (as a block doesn't).

Imagine if value were a SET-WORD!... the tick mark helps it not be nonsensical, like value: 'part:

But what if we're using a tick mark to mean the parameter is quoted? Then it would get two quotes.

I have a lot of bones to pick with this "MAKE FRAME!" concept of representation (as I do with the general idea that molding things would be able to give you back the same thing if you copied-and-pasted, which binding means it won't...)

Anyway, it's a cosmetic problem in this particular rendering. Ordinary value extraction doesn't have the issue:

>> pick as frame! :append 'part
== /[any-number! any-series! pair!]

And if it were actually quoted and you picked it, then you'd see the quote.

But more thinking is needed here on these outputs. Also, it's very misleading to just show that little of the frame... it should mention something about its dispatcher. Or a count of how many hidden/specialized fields are in the frame. Even something cursory like the address in memory of the code it runs...to relay there's more stored here than just these parameter descriptions.

A Bigger Question Came To Mind...

What I'm trying to do here is to "compress" and dialect the various pieces that the system knows about a parameter into a notational form.

A more typical solution would describe each parameter with an object and attributes:

/* ... */ value: {
   quoted: false,
   refinement: false,
   nullable: true,
   types: ["any-value"]
}
part: {
   quoted: false
   refinement: true
   nullable: true,
   types: ["any-number", "any-series", "pair"]
}  /* ... */

The choice to try and encode quotedness with a tic, and refinement-ness with a leading slash, winds up with a kind of cryptic package. It actually does compress pretty well--given how things are designed. And it's quick for people to enter and absorb. But it's harder to query and manipulate than the flattened form.

Representation Idea

I wonder if there is some kind of "out of band" method of representation which could help relay that it's not an ordinary assignment through the evaluator. Perhaps if there were a colon on its own, which would be deemed illegal otherwise:

>> as frame! :append
== #[frame! {append} [
    return : [port! map! object! module! bitset!]:
    series : [any-series! port! map! object! module! bitset!]
    value : [<opt> any-value!]
    part : /[any-number! any-series! pair!]
    only : /[]
    dup : /[any-number! pair!]
    line : /[]
]]

That's not too bad. It might then be tempting to make : an operator which assigns things literally.
It would undermine this somewhat (e.g. what if VALUE were this colon, so the line would read value : :). It would have some downsides.

Then there's the trouble with NULL representation. If you're willing to grant newline significance in this notation, then emptiness on the line could indicate that.

Bottom Line: The Goal Is To Reflect ALL The Information

If you reread this thread from top to bottom, what I want to do here is really give you all the information the system knows about the parameters...without having to parse it out yourself.

Doing it by decorating a single value in this way may not be perfect. But the goal is to be easy to COPY a frame, tweak accepted parameter types... specialize out values and hide them from the interface, add new fields (e.g. augment)...and then transform back into an ACTION! again, with additional behavior from ADAPT or ENCLOSE.

hostilefork · September 6, 2024, 11:28pm

hostilefork:

We can think of splitting out the symbol as if decorations we currently put on the parameter would be moved to the block:
func ['foo [<end> word!] /bar [integer!] /no-arg] [...]
=>
func [foo '[<end> word!] bar /[integer!] no-arg-refinement /[]] [...]

So this was never the craziest idea.

I think it gets a little less ugly now that we're talking about using :word for refinements

 func ['foo [<end> word!] :bar [integer!] :no-arg] [...]
 =>
 func [foo '[<end> word!] bar :[integer!] no-arg-refinement :[]] [...]

I think getting the typespec blocks in this form makes sense, it will allow code that wants to do its own processing of functions to get what it needs.

But in the function spec dialect, I think it's still probably superior to decorate the words. But perhaps you could choose either in the spec...it might be more convenient in generated code to be able to splice the parameter convention on with the types instead of decorate the word.

hostilefork:

Perhaps if there were a colon on its own, which would be deemed illegal otherwise:

>> as frame! :append
== #[frame! {append} [
    return : [port! map! object! module! bitset!]:
    series : [any-series! port! map! object! module! bitset!]
    value : [<opt> any-value!]
    part : /[any-number! any-series! pair!]
    only : /[]
    dup : /[any-number! pair!]
    line : /[]
]]

So unspecialized slots now hold what are actually PARAMETER! antiform types... not blocks. If you were putting actual BLOCK!s in the frame, then those would be specialized as blocks. If you were putting non-antiform parameters in those slots, they'd be specialized as parameter descriptions.

I don't know what parameters should render like, but for the moment they're the same lame idea as everything else that isn't structurally trivial... e.g. #[parameter! [~null~ any-value?]]

Unfortunately, this means the previously elegant-looking ~ for unspecialized parameters is now rather ugly (albeit informative!). And things like DEFAULT need to treat antiform parameters as being not there (but it has to do that for antiform tags as well, so...)

Another three years downstream...

The elimination of TYPESET! and using intrinsic-powered typechecking with the PARAMETER! type means we preserve the whole type spec block for processing directly
The elimination of skippable parameters helps remove a difficult-to-simulate behavior for anyone trying to reason about the behavior of an argument.

There's still some curveballs, in terms of whether your abstraction knows every weird behavior-controlling TAG! we might throw in now or in the future. But it's definitely a lot closer to reality now.

hostilefork · September 21, 2024, 3:44pm

hostilefork:

So this was never the craziest idea.

I'm actually pretty close to being able to give back a BLOCK! that tells you about a parameter's properties.

There are a few stumbling blocks. We have:

normal parameters
^meta parameters
@bind-literal parameters
'as-is-literal parameters

(I've also contemplated $bind-normal parameters, where if you don't use the $ then you don't get the binding of whatever's passed in. This could help cut down binding leakage that taxes the GC, and offer some other sanity/"security" benefit. However, that would raise the question of how to specify to get binding or not on a ^meta parameter, given there is no $^bind-meta syntax.)

Then there's the issue of the "soft" or "escapable" parameters:

@(soft-bind-literal) parameters
'(soft-as-is-literal) parameters

These will accept GROUP! at the callsite, and the evaluator will run that group and then typecheck its product. If it's not a group, then it acts just like the non-soft version.

And finally, all of these come in "refinement" variations that are optional: (:normal, ^:meta, @:bind-literal, ':as-is-literal, @:(soft-bind-literal), ':(soft-as-is-literal))

(There is a choice there as to whether we want it to be @:(var) or @(:var). I believe the former is superior, and goes along with the fact that you first have to consider the optionality of the parameter before you can consider whether to soft escape it. Though really the first thing you want to consider is the name... which is just a further argument for why the parameter convention belongs "on" the type spec block.)

It looks relatively good, although when you ask about a parameter "give me the info" you might get back something pretty weird if things like the softness are applied to the type block via group:

>> spec of param
== @:([<end> word! set-word!])

The GROUP! might be a nice way to specify escapability. But it's kludgey if you're code processing arguments.

When you're writing code, what you want are things like param.optional or param.endable or param.escapable. You want to switch param.class [...] and write code that way. It's nice for purposes of getting an overview in HELP if you can ask to have the decoration generated based on the parameter:

>> decorate 'name param
== @:(name)

But if you can just get the properties directly it's a lot better.

Anyway, this direction is where looking at things like HELP and writing usermode APPLY operations has led me, so this idea of "everything in a block" is giving way to "everything in a parameter". But for the sake of rendering efficiency, the parameter might well show itself as something like @:([<end> word! set-word!])