SET-WORD! To Initialize Locals In Function Specs?

hostilefork · August 24, 2024, 3:00am

(cc: @IngoHohmann as you have had opinions on these kinds of things.)

It seems it would be nice if you had the option of setting your locals when you define them.

foo: func [
    arg1 [integer!]
    arg2 [text!]
    <local>
    local1 local2
    local3: 10
    local4: (20 * 30)
][
    ...
]

As it so happens, there's potential to exploit this for efficiency. The frame mechanics have a slot for each local in the function archetype that currently just holds trash, and it could hold this default value / expression. So it wouldn't just save on typing the local name and then later the name again and the expression...but you're avoiding the need to perform the evaluation to do the assignment on each call!

There's a lot of questions to answer:

What binding rules is it using? Could you initialize local3 and then say local4: (local2 * arg1)?
- Almost certainly not, and it would just be using the binding of the spec block
Does the code run on each invocation, or is it run only once to calculate a fixed value? e.g. if it was local4: (global-var * 30) would each invocation of FOO recalculate what (global-var * 30) was at that moment?
- Almost certainly would just calculate a fixed value and use that value on each call.
Do you need parentheses directly after the SET-WORD!?
- If the expression were run on each invocation (which it probably shouldn't be) then it would be a requirement, because there'd be no way to find the start and the end of the right hand expression without evaluating it.
- If plain words are being picked up as locals there's potential for error if you accidentally wrote an expression that didn't work, like:
```
func [
   arg [integer!]
   <local>
   local1 local2
   local3: arity-2-but-I-think-it's-3 a b c
   local4: 10
][
    ...
]
```
  That could wind up making a local c that you didn't intend. But then again, sometimes it would be just a very obvious simple initialization, like local4: 10. Forcing people to use parentheses could do more harm than good, vs. trusting them to use the parentheses if they feel it's warranted.

Compare to `<static>`: Not Initialized With SET-WORD! ATM

Right now the <static> feature lets you assign your variables, but it uses a non-Reboly-notation to do so:

accumulate: func [
    item [any-element?]
    <static>
    block ([])
][
    append block item
]

The parentheses are optional to hold the initializer. But it seems much more normal to say:

accumulate: func [
    item [any-element?]
    <static>
    block: []
][
    append block item
]

One reason for the parentheses notation was to try and be consistent with the idea of defaulting refinements.

>> foo: func [/string [text!] ("default")] [print string]

>> foo/string "hello"
hello

>> foo
default

But that feature was removed

There's another reason why just WORD! was used...

RETURN: Has "Owned" SET-WORD! In The Spec Dialect

We have a little bit of friction in that the dialect has been using RETURN: to indicate what a function returns. The choice has not much to do with what comes after a return being an assignment any more than anything else, it was picked for looks:

double-multiply: func [
    return: [integer!]
    value1 [integer!]
    value2 [integer!]
][
    return 2 * value1 * value2
]

The issue is that historical Rebol2 (and R3-Alpha, and Red) allow this:

rebol2>> print-sum: func [return break] [print ["Sum is" return + break]]

rebol2>> print-sum 10 20
Sum is 30

Ren-C only lets you do that in LAMBDA. FUNC prohibits it:

ren-c>> print-sum: func [return break] [print [return + break]]
** Error: Generator provides RETURN:, use LAMBDA if not desired

I think Red/System decided on RETURN: first. But they put it at the end of the spec. Red errors if you try to put the return elsewhere:

red>> stringy: func [a b return: [string!]] [a + b]
== func [a b return: [string!]][a + b]

red>> stringy: func [return: [string!] a  b] [a + b]
*** Script Error: invalid function definition: [return: [string!] a b]

But either way, it's not checked. On the 2012 announcement of function support in Red, DocKimbel says: "Note: argument and return value type checking have not been implemented yet, they need typeset! and error! datatypes to be implemented first." Parameter type checking works, but I guess return type checking was never added. It does show up in the HELP though.

red>> help stringy
USAGE:
     STRINGY a b

DESCRIPTION: 
     STRINGY is a function! value.

ARGUMENTS:
     a             
     b             

RETURNS:
     [string!]

Note that they also put the RETURNS: at the end there, too. Most people would expect the return value for functions to be the first thing you put down.

I've Wondered If A Leading Block Would Suffice...

Off and on, I've been willing to consider the idea that return typing is just implicitly what you get if you have a leading block:

double-multiply: func [
    [integer!]
    value1 [integer!]
    value2 [integer!]
][
   return 2 * value1 * value2
]

Yet while it looks clean there, it causes some problems when you are filling in documentation strings.

I've become a pretty true believer in the idea that documentation strings for arguments come after the argument name (and that we may do a service to the userbase by standardizing this, rather than by letting it be done either way and have people fight about it):

 my-style: func [
     "Overall function description here"
     argument "Argument description here"
         [integer! text!]
     /refinement "Refinement description here"
 ][
    ...
 ]

The rationale is that any good function will put labels on all its arguments. But not all arguments are type-constrained, in particular refinements are not. So you wind up either being inconsistent

 variation1: func [
     "Overall function description here"
     argument [integer! text!]
         "Argument description here"
     /refinement "Refinement description here"  ; this feels inconsistent
 ][
    ...
 ]

Or you're throwing in newlines for no reason

 variation2: func [
     "Overall function description here"
     argument [integer! text!]
         "Argument description here"
     /refinement
         "Refinement description here"  ; consistent, but annoying
 ][
    ...
 ]

This is why I chose "MY-STYLE" above. But if return becomes implicit on a leading block, you wind up back in inconsistent land:

my-style-with-leading-block: func [
     "Overall function description here"
     [integer!] "Description here"
     argument "Argument description here"
         [integer! text!]
     /refinement "Refinement description here"
 ][
    ...
 ]

So one thing RETURN: has historically bought us is making that look better:

my-style-with-leading-block: func [
     "Overall function description here"
     return: "Description here"
         [integer!] 
     argument "Argument description here"
         [integer! text!]
     /refinement "Refinement description here"
 ][
    ...
 ]

And I think having the word RETURN in there makes it better. Note how it's less obvious when the word isn't there what that is.

But SET-WORD!... is that best?

If we're going to be allowing SET-WORD! for locals and statics, does it make sense to have a stray SET-WORD! for RETURN?

And one outside-the-box thought... given that modern FUNC doesn't allow you to name parameters RETURN, why not just go with a plain WORD! ?

what-about-plain-word: func [
    "Overall function description here"
    return "Description here"
        [integer!] 
    argument "Argument description here"
        [integer! text!]
    /refinement "Refinement description here"
 ][
    ...
 ]

If you try that with a LAMBDA you'll not get an error, and maybe suffer some confusion when the lambda gets its first argument as a variable named RETURN. You'll figure it out pretty quickly, though.

Though I have wondered about questions like "what if you want the behavior of a lambda with the bottom expression dropping out, and no RETURN declared, but you still want type checking?"

You might say "just use ENSURE"

 my-lambda: lambda [
    "Overall function description here"
    argument "Argument description here"
        [integer! text!]
    /refinement "Refinement description here"
 ][
    ensure [integer!] [
        ...
    ]
 ]

The problem is that the return type and any description don't make it to the HELP. This is one reason that I pretty much always use FUNC.

This makes one want to lean back to the return type being something nameless, like just a leading block.

Not Sure On RETURN, But I Like SET-WORD! Locals

I definitely feel that finding a way to not be using SET-WORD! for RETURN: would be good. It's not like it has anything to do with assignment.

Plain word RETURN in FUNC is not an idea that I'm feeling is as crazy as it might sound.

I do think that I like the idea of SET-WORD! for local initialization... that runs the right hand side without required parentheses, and that only runs the evaluation once in the environment of the spec block... stowing that value in the currently-just-trash slots of the frame archetype for the local.

And I like the idea of bringing <static> on board with the same rules.

hostilefork · November 22, 2024, 4:04am

When the higher-level FUNCTION code was removed, that removed the code for assigning default values to locals.

As I mentioned, it was done with GROUP!s:

/foo: function [arg1 arg2 <local> x y (1 + 2) z (null)] [...body...]

The way it worked was to augment the function's body:

/foo: function [arg1 arg2 <local> x y z] [
   y: '3
   z: ~null~
   (...body...)
]

It evaluated the expression once, and then used a ^META of the evaluation product as the thing to assign.

Native `<local>` Handling Brings Long-Desired Advantage

This theoretical advantage was not realized by the FUNCTION abstraction, as it was just injecting assignments into the body.

But now that the FUNC native is orchestrating the situation, the advantage is there.

Feature Question: Multi-Returns

When I brought up the question of [<local> word: expr] instead of [<local> word (expr)], I didn't consider whether you could do multi-returns:

/bar: function [arg <local> [begin end]: (find series "a")] [...]

First of all, I'm not certain that feels like it "belongs" in the spec.

Secondly, this would require some significant redesign.

The problem is, that at the time of the spec processing, there's no object in existence to bind into and do such an evaluation.

So this would have to somehow collect the words inside the SET-BLOCK, save the expression, make the archetypal frame, bind into the archetypal frame and evaluate into its locals.

Continuing on this...

SET-WORD!s Not Being Bound Has Other Consequences

Evaluating the expression, and then writing it into a slot of a partially constructed archetypal FRAME! has other things that are ruled out:

So that can't work, because what's happening is that the expression is just being evaluated and put in a spot that represents local4. No "SET" is actually happening, because there's no completed context to assign it to.

That rules out other ideas, like assigning an "accessor".

When put this way, it makes it seem like using a GROUP! is clearer, because it makes you aware that there's no actual SET-WORD! to assign to.

Does This Suggest A Redesign Is Needed?

The system could generate the archetypal frame, with nothing in the local slots, and then bind the expressions into it and run them...just as they would run had it been an instantiated function.

So this would be a two-phase thing, that would enable things like multi-returns, accessors, weird-infix-functions that capture the thing they're assigning to in order to know their names...etc.

It's definitely a lot more than I bargained for when I suggested the feature. So what would happen here is that only a subset would be implemented, with the rest being done at a future date...

What it does point out is the set of things that the GROUP!-based syntax cannot do.

 /foo: function [arg1 arg2 <local> x y (1 + 2) z (null)] [...body...]

So the big question is whether that's by design ("we don't plan to implement anything more, it's a simple low-hanging fruit you can take advantage of if it fits")... or if it's too limited.

Then... Is Freeform Dialecting Good?

It forces the question of "is the word and set-word mix good"

 /foo: function [arg1 arg2 <local> x y: 1 + 2 z: null] [...body...]

Instinctually, I feel uneasy about that.

If we went that direction, I'd kind of rather have all the locals be SET-WORD!, and allow you to chain the assignments:

 /foo: function [arg1 arg2 <local> x: y: z: ~] [...body...]

I think that if people were hinted that they could assign the locals there, and if they found out this gave them an efficiency boost, the feature might be taken advantage of more.

But they would not be able to be assigned in terms of arguments, so that's a limitation.

And if it's going to be limited anyway, might it be best to let people in on the limitations and just keep it as WORD! plus GROUP! ?

Leaning To: Stick With WORD! + optional GROUP!

Really, what we're talking about here is an optimization.

If a local wants a fixed value upon entry to the function on each call, that fixed value can be stored by the archetypal frame and copied into a new instance's cell, at the same cost that initializing it with nothing would cost.
If you are initializing a local in such a way, you don't have to repeat its name to initialize it... because you're putting the initial value beside it.

If the function creation process becomes two-pass, doing strange bindings/etc., you're starting to drift from the "optimizing" part of the optimization. And I'm concerned about the complexity cost of that code. Thinking about it now, there are details to where I'm not sure how it would work.

Evaluating a GROUP! and dropping its value into the slot of an incomplete archetype you're building is cheap-as-free, and not complicated.

It's a natural extension of just listing out locals as words, and I think it turns out to probably be for the best.