In the interest of shipping something: Rebol2-style `<local>`s only?

hostilefork · July 26, 2020, 7:50pm

I pretty much consider being able to declare variables at the point they are first used to be a non-negotiable feature.

This is why I was a big fan of FUNCT when I first saw it. It looked for SET-WORD!s in your function and "gathered" them up implicitly as being local to that function.

 funct [arg1 arg2] [  ; R3-Alpha "funct"
     local1: 10
     local2: 20
     return (local1 + arg1) * (local2 + arg2)
 ]
 => func [arg1 arg2 /local local1 local2] [  ; equivalent R3-Alpha code
     local1: 10
     local2: 20
     return (local1 + arg1) * (local2 + arg2)
 ]

I was one of the proponents for it taking over the relatively-useless role of FUNCTION, which was previously a fairly lame contraction like this:

function [arg1 arg2] [local1 local2] [...body...]   ; Rebol2 "function"
=> func [arg1 arg2 /local local1 local2] [...body...]  ; equivalent Rebol2 code

As you write more sophisticated code, you start to realize the weakness of automatically assuming SET-WORD!s are local. There are many uses of SET-WORD!s that are not implied as being local variables, and if dialect authors are to be free with using them as labels or other purposes then there will be many more.

I've written about the problems - Please Review if you haven't already.

I've experimented with LET as a replacement, but I'm not happy yet.

Right now, LET works mechanically about the same as SET-WORD!s did. let x: 10 behaves in a FUNC or FUNCTION body just as x: 10 did in a FUNCT(ION).

It's an improvement in the sense that it sacrifices a single word--instead of the whole class of values of SET-WORD!--for carrying the meaning of declaring a local.

There are questions about how this can interact with PARSE or other dialects, which had expanded their syntax e.g. to allow copy x: [some "a"] instead of just copy x [some "a"]. Should that now enable copy let x: [some "a"]? Who is responsible for these dialect syntax exceptions, where any point of providing a variable to assign might want to also declare that variable?

Beyond that... I have a nagging feeling that the way it works is wrong...injecting variables into the function definition and being scanned for. I feel like it should work more like a USE does, and be dynamically binding the code that comes after it when it is encountered. If you want to add variables to a function's definition you should always be able to do that with <local>. But something tells me that LET should be a different beast...that uses a syntax trick to create a wave of binding that affects the statements after it instead of forcing your hand in making a code block the way USE does.

Can we punt on it?

Rebol2 got a fair way with just /local. I think it would have been safer if it had unbound any SET-WORD!s in the body that were not explicitly declared or imported.

One of the main reasons this would be a pain would be OBJECT!s that would want methodized access to things without explicitly <with>ing them. But Ren-C has METHOD to take care of this (and establishing the relationship to the object addresses other design holes, e.g. not having to deep copy and rebind all the functions in objects on each instance creation).

What if we went to a situation where you used <local> for now, and all SET-WORD!s were unbound that weren't covered by either the args or <local> or imported by <with> or on an object supplied by <in> (or implicitly via METHOD)? We could limit the "junk" which might accrue by having it so that <local>s which do not have corresponding SET-WORD!s in the body raise an error.

I'm not floating the idea because I don't believe in being able to declare locals at their point of first usage. I'm suggesting it because I believe in the idea too much to see it done incorrectly, and what I've seen so far feels wrong.

Thoughts? I do like the idea of FUNC and FUNCTION being synonyms, and would like to stay the course with that direction.

BlackATTR · July 26, 2020, 8:22pm

Sounds fine to me. I trust your intuition. I think you're saying you don't know what the answer is yet, and putting that on the back-burner for now sounds completely reasonable.

rgchris · July 27, 2020, 2:21pm

I tend to use FUNC for the most part anyhow as I feel the naming of locals is an important part of understanding the function as it's being written.

I'm a bit ambivalent on METHOD as I think having functions be local to their object be the default, though I understand this is not always the case (see the classic example of object usage in the Ten Steps doc). I sense that the desired non-copying behaviour of functions came from the usage of OBJECT! that MODULE! now covers. Could use more data points on that.

My feeling on functions in general is that they are effectively blocks with a spec, and that FUNC should reflect the native spec dialect of ACTION! (i.e. func: [spec body][make action! reduce [spec body]])—that seems to be a little out of whack at the moment. This is with an eye to having some form of representation of function ([spec]:[body] or [spec]|[body] are spitballs with the ']:[' sequence being the key designation) where it's implied that the body behaves like BLOCK! in terms of obtaining context.

I'm not sure I get how LET works—is it a nothing pass-through word that is just used as a marker for locals gathering or is does it have meaning within FRAME! ? For me, it seems foreign.

Mark-hi · July 27, 2020, 6:11pm

Writing a bunch of functions that communicate state with each other via global variables without requiring them all to be put inside an object seems like a reasonable desire, and is/was a common pattern in R2.

hostilefork · July 27, 2020, 7:56pm

This is what <with> does, and I think being explicit about it isn't too much of a problem. You only have to use it if you intend to write to the variable:

 global: _

 writer: func [x <with> global] [
     global: x
 ]

reader: func [] [
    return global
]

hostilefork · July 27, 2020, 11:16pm

Yes; it is an "invisible" (no return result). It takes a skippable WORD!, so that if the thing that follows it is a plain word it will be consumed silently.

let x  ; will just remain as VOID!, no error raised

But a SET-WORD! or SET-BLOCK! or whatever will not be consumed as an argument, hence it is not subject to the invisibility:

>> print ["The let is invisible, but value is" let x: 10]
The let is invisible, but value is 10

I don't think it's particularly mysterious, but there are issues. e.g. whether to enforce not using a LET'd variable before the LET, and how that might work. (See also "hoisting" in JavaScript for difference between VAR and LET)

 foo: func [] [
    x: 10
    print ["Should this be legal?" x]
    let x: 20
    print ["Or only after the LET?" x]
]

There's the question of if dialects are responsible for adding support for LET. I mention the early change that allowed SET-WORD! in COPY and SET for PARSE...e.g. parse data [copy x: [some "a"]]. Does it need to support [copy let x: [some "a"]] now? Does every dialect need to be involved?

I'd considered other ideas for LET's behavior, e.g.

>> let x
== x  ; the variable itself

Which would permit the likes of parse data [copy (let x) [some "a"]], modulo our debates over plain GROUP! injections.

But like I say: the collecting SET-WORD!s is a fairly unsustainable tactic. And a "scanned for in the body" construct has the same problem of inadvertent locals:

 outer: function [x] [
     let inner: function [y] [
         let z: x + y  ; Both OUTER and INNER pick up this Z in body walking
         return z
     ]
     return inner
 ]

If LET had more of a runtime character... like USE... then only those bodies that actually needed the variable would get it. This is what I mean when I say I'm looking for a pleasing answer that hasn't really emerged. That unused Z in OUTER is symptomatic of a design flaw I want to see a way out of.

hostilefork · August 29, 2020, 6:43am

Let me try to summarize an unfinished long response I had in I had hanging around a draft.

I'm committed to an implementation that does not require making deep copies of every method in a base object to make a new instance. That's working....based on storing a single pointer in ACTION! values (the "binding"). When that function is called, this pointer is threaded through execution of the body. WORD! lookup uses it to know how to forward references to any of the base classes to the derived object.

You have to start putting these "bindings" on at some point. METHOD is nice because it makes it explicit. We could avoid METHOD and say that MAKE does it (kind of like how it did the body-copying before). What you wind up with is sloppier, and I feel like it's harmful.

Consider what would happen:

make parent-object [
    foo: func [...] [...]
    bar: :some-func
]

Here we are saying that before the MAKE call, foo would be a blank slate like any other function...with no binding of its own. Without the help of METHOD, it also doesn't get implicit visibility of things declared in the parent object. That alone seems unfortunate (wouldn't most languages expect the words in parent-object to be visible by default?)

bar is coming from elsewhere. This some-func could have no binding, it could be bound to parent-object (or one of its ancestors), or it could be bound to an object from another ancestry.

In our METHOD-less world, MAKE has to decide what to do with each case:

Overwriting bindings from the same hierarchy is a requirement for derived binding to work. This is how we avoid making full-body copies of every method on a derivation. It has to update those binding pointers to point to the new object being created.
Overwriting an existing binding that is not in the hierarchy would break that function--it would no longer refer to anything it thought it did. (This damage would be systemic...e.g. if you'd said bar: :return for some return, for example, that return would lose its return target.) These bindings can't be touched.

The odd case out is when something does not have a binding. Without METHOD having explicitly connected the member to the object, then the body of that function may-or-may-not contain references that need to be forwarded.

It's a little tough to articulate the full harm of slapping a binding on all actions that lack bindings during a MAKE. A conceptually impure aspect is that functions would no longer appear to be the same when nothing is making them different.

handler: func [x] [print [x + 1]]

obj1: make object! [h1: :handler]
obj2: make object! [h2: :handler]

If MAKE put a binding to obj1 on h1...and MAKE put a binding to obj2 on h2... then as far as the system is concerned these are now unique functions. There's no longer a way to guarantee them as acting the same. I don't like how it feels, as "contaminating without asking". (Note that handler remains untouched, as it is a distinct value cell from h1 and h2...so its binding would remain as null in any case.)

We would lose the ability of objects to hold onto functions in their top-level that referred to a specific member of a base object by WORD!, without applying derivation. Consider something like registering a callback:

 obj: make object! [
     x: 10
     foo: method [...] [
         global/register-callback func [y] [x: x + y]
     ]
 ]

It seems reasonable to assume that this callback that is being passed around will always refer to the x in obj. I'd be nervous about any strategy which says that if you happen to store that callback in the wrong place at the wrong time, it suddenly refers to the x in another object. But that is exactly what would happen if the global passed that handler back...you put it in a member variable of some class related by derivation of OBJ, and another derivation happened. Calling out methods explicitly helps limit the reach of this behavior.

There's a small performance hit in lookup on each WORD! in the method body when the method has a binding. We have to walk the binding down through to its base classes (0 steps in this case) and make sure that any references to those base classes are forwarded to the object.

(This points out an optimization, that bindings with no parents can be ignored at the moment of execution. They're only relevant at the point of the next derivation. I should do that adjustment now.)

Those might be the only problems. But I'm not quite sure, and my intuition says it's not a good idea to avoid being explicit.

Bear in mind that derived binding solves an actual issue. Even just a thousand objects with 10 member functions with 10 arrays on average each, you get 1000 * 10 * 10 => 100000 arrays of overhead. If each array averages 5 items, that is rounded up to length 8 arrays in the pool. On a 64-bit system you're paying 8 * 4 => 32 bytes per cell, so times 8 that's 256 bytes per array. Your 1000 modest object instances could easily incur 2560000 bytes of overhead for these deep copies.

This is a foundational flaw in a general purpose "object language", and is not acceptable. The problem is real: Atronix was forced to move methods out of their objects to be functions that took the objects as a parameter. It wasn't even the memory that bothered them--their use case could add gigabytes if they needed it--it was the delay the GC incurred from having to traverse so many series.

rgchris · September 1, 2020, 8:31pm

I get the argument of efficiency here, but my concern is having different behaviour between blocks and functions in terms of binding. I'm not sure the point of a function inside of an object is if it's not bound to that object.

The scalability argument does sound compelling, but it seems the solution is to use functions external to the object than a two-tier function solution with hidden metadata and be on the developer's head if they choose the costlier option.

hostilefork · September 5, 2020, 12:40am

Mechanically, there really is no such thing as "inside an object". Objects only refer to functions.

Functions are values in Rebol. The point of functions inside of an object that are not bound to that object is that the function is a value. :-/

If I make an object that wants to cache the old value of some operators I'm going to change from LIB, with intent to put them back later... then should these values get somehow tainted by the fact they were stored in object fields? e.g.

red>> cacher: make object! [
    saved: none
    save-func: func [f] [saved: :f]
]

red>> cacher/save-func :replace

red>> :cacher/saved = :replace
== true

red>> cacher2: make cacher []

red>> :cacher2/saved = :replace
== false

I call this bad--performance ramifications notwithstanding.

rgchris · September 14, 2020, 5:19pm

So are blocks, but they are deep copied. I would posit that vastly more often than not a function value within an object has some relation to other values within that object or clone.

>> x: make object! [name: "X" print-name: func [] name-printer: [print name]]
== make object! [
    name: "X"
    print-name: 'make action! [[] [...]]
    name-printer: [print name]
]

>> y: make x [name: "Y"]      
== make object! [
    name: "Y"
    print-name: 'make action! [[] [...]]
    name-printer: [print name]
]

>> do y/name-printer    
Y

>> y/print-name
X

Performance is obviously a concern and something that would have to be managed, however sanitizing this behaviour in the name of performance is next to useless.

iArnold · November 4, 2020, 6:14pm

y/print-name cannot work in this example, right?

hostilefork · November 4, 2020, 11:13pm

The code works as is in current Ren-C.