Beating REPEND: A New Parameter Convention?

hostilefork · August 19, 2021, 10:29pm

When you do an append a reduce b, the REDUCE generates a new series... let's call it rb. Then rb is spliced into a. And then rb needs to be GC'd.

The idea behind repend a b is that you never make rb. Instead, expressions are evaluated one by one and put onto a as you go. The savings are twofold...reduced memory overhead and reduced tax on the GC by not making extra series nodes.

That might sound like a great savings, but here is a heated debate in Red about the questionable benefit of REPEND (as well as /INTO):

https://github.com/red/red/issues/3340

I guess I'm halfway on DocKimbel's side there...in that if REPEND isn't showing a benefit it's probably more to do with a bug in REPEND vs. that the idea doesn't represent a savings.

But I hate the word REPEND. Things like REMOLD are double monstrous, and REFORM? Give me a break. These make a terrible impression.

More generally, I don't like the idea that every function would have to come in two flavors and create anxiety on the part of the caller as to if they're using the optimized one or not. I'd like any optimization to be more "under the hood" so the caller doesn't have to fret about it.

This got me to thinking...

A New Voodoo for GET-params!

Let's imagine that we have a new rule for params that look like GET-WORD!:

If the argument is a GET-XXX!, it is passed literally
If the argument is anything else, it is evaluated normally and the product is passed in with one quoting level added.

Here's an example definition

appender: func [
    block [block!]
    :value [any-value!]
][
   print ["Block is" mold block]
   print ["Value is" mold value]
   if get-block? value [
       append block reduce as block! value
   ] else [
       append block unquote value
   ]
]

Let's look at some concrete examples:

>> appender [1 2 3] 2 + 2
Block is [1 2 3]
Value is '4
== [1 2 3 4]

>> data: [[a b c] [d e f]]
>> appender [1 2 3] second data
Block is [1 2 3]
Value is '[d e f]
== [1 2 3 d e f]

>> appender [1 2 3] :[10 + 20 100 + 200]
Block is [1 2 3]
Value is :[10 + 20 100 + 200]  ; not quoted!
== [1 2 3 30 300]

At the source level, the user doesn't really have to worry about the parameter convention. They get the same outcome as if the REDUCE had been done by the evaluator, but the APPENDER becomes complicit.

And look what happens if the GET-BLOCK! is in a variable...

>> data: the :[10 + 20 100 + 200]
>> appender [1 2 3] data
Block is [1 2 3]
Value is ':[10 + 20 100 + 20]
** Error: Cannot append evaluative items...

A ha! We could tell that this was an evaluative get-block product, and not meant to participate in our little trick. (Erroring is actually the right answer here, you would need to use only data or ^data or quote data etc. to APPEND an evaluative GET-BLOCK! under the new rules.)

This is neat, because it means users can express intention to reduce at the callsite...and it is something that you can optimize on an as-needed basis.

As One Would Expect, There Are Some Glitches...

There are some seeming semantic glitches when a function takes these and they're not the last parameter, where you might see variations along the lines of:

 >> takes-first-args-normally :[elide print "A", 1 + 2] (print "B", <x>)
 A
 B
 3
 <x> 

>> takes-first-arg-specially: :[elide print "A", 1 + 2] (print "B", <x>)
A
B
<x>
3

Basically: If you somehow relied on side effects happening in left-to-right parameter order at the callsite, then moving the REDUCE of any parameters other than the last one into the body of the operation will change that order.

This is nothing new for this line of thinking in optimization: imagine if APPEND and REPEND took their arguments in the reverse order, so that the block wasn't the last item. You couldn't just blindly substitute APPEND REDUCE for REPEND in that case, if you were dependent on argument-ordering effects...if there was an evaluation in the first parameter's reduction that was needed for the second parameter.

But the difference is that the person editing APPEND REDUCE => REPEND made a change at the callsite. If you change the parameter convention and don't touch the callsites--with the intent that they stay working and you're just adding an optimization--it starts to matter.

We have some control here, though! We can define how GET-BLOCK!s act as arguments to function calls, and we can say that they don't actually perform their REDUCE until the function executes. That leaves breathing room for people who wish to add the optimization later...knowing they won't break the expectations.

Whew, that solves that problem! Good thing it's the only one! Oh, no, wait...

Not All REPEND Operations Take Literal Blocks

You see repend data [...] a lot of the time, but there's also repend block1 block2.

So append data :[...] can be finessed as an optimization for the first case, but doesn't solve the second.

To shore it up, we'd have to say that :(...) means "reduce the result of what's in the expression".

>> :(reverse [1 + 2 10 + 20])  ; -> :[20 + 10 2 + 1]
== [30 3]

This way, we could actually pass the APPEND an expression to reduce the product of. We'd need to do the evaluation at the moment we passed the parameter (I think), and then alias it as a GET-BLOCK!, so:

>> appender [1 2 3] :(reverse [1 + 2 10 + 20])
Block is [1 2 3]
Value is :[20 + 10 2 + 1]
== [1 2 3 3 30]

Where Are GET-WORD!, GET-PATH!, GET-TUPLE! in all of this?

We don't have GET-WORD! mean "reduce the product of fetching the word":

>> block: [1 + 2]

>> :block
== [1 + 2]  ; not [3]

But it seems it would be inconsistent to not put these other GET-XXX! types into the family of parameters that are captured as-is. So the above code would get this behavior:

>> appender [1 2 3] :foo
Block is [1 2 3]
Value is :foo
** Error: Cannot append evaluative items...

Instead of a REDUCE it would need a GET. But this makes a good argument for why REDUCE of a GET-WORD! should work as a word fetch, for generality... it makes routines like this easier to write correctly.

I don't think it's worth shuffling the symbols around so that :foo does a reduce and we pick something else for GET. It seems to me that :(foo) is fine enough.

But even though GET-WORD! won't run arbitrary code, you can be impacted by ordering problems, where someone might pass a :foo argument and then in the next parameter change the value of foo. Hence for consistency, we'd be saying that normal parameters would likely have to delay their get of foo until all the parameters were given...this way you could change the parameter convention without affecting callsites.

But likely the best way to go about that would be to protect the word from modification:

>> some-func :foo (foo: 20, <arg>)
** Error: FOO captured by GET-WORD! in parameter slot, can't modify
      while gathering arguments

I'm Probably Over-Worrying About It

...these protection mechanisms I mention in order to make it painless to change a parameter convention are not likely suited to being the kind of concern that applies.

But it's good to articulate what the limits of a design are...

hostilefork · August 19, 2021, 10:39pm

hostilefork:

Here's an example definition

appender: func [
    block [block!]
    :value [any-value!]
][
   print ["Block is" mold block]
   print ["Value is" mold value]
   if get-block? value [
       append block reduce as block! value
   ] else [
       append block unquote value
   ]
]

Functions like this can give us pretty profound guidance on what the behavior of operations on various datatypes should be.

For instance: here we see that we want to REDUCE a GET-BLOCK! (and any GET-XXX! type), and we want to unquote quoted items. Well doesn't that make it seem like a good idea to make REDUCE of a quoted item unquote it?

>> first ['foo]
== 'foo

>> reduce first ['foo]
== foo

That makes the above shorter:

appender: func [
    block [block!]
    :value [any-value!]
][
   print ["Block is" mold block]
   print ["Value is" mold value]
   append block reduce value
]

Internally there's all kinds of shortcuts you could take with this...empty blocks being no-ops, or blocks with just one inert item not kicking into a reduce.

I think that we might want to avoid making REDUCE accept just any type, like WORD!s etc, but here we can see that if you were wiring up operations and you wanted to line it up so some input to REDUCE was a no-op you'd know how: you'd quote it.

hostilefork · October 23, 2021, 8:26am

I haven't implemented the parameter convention described in this proposal, though I think about it a lot. It's a pretty interesting thought.

One point I keep contemplating is this asymmetry:

But what if :var reduced it? How would you fetch actions as-is?

Well, maybe that's what terminal dot could be for:

>> apply append. [[a b c] 'd]
== [a b c d]

Terminal dot already means "don't run actions", but it errors on actions. This would simply act like a GET-WORD! used to.

So this would wind up meaning that calling :x a GET-WORD! would be a misnomer. It'd be a REDUCE-WORD!. :-/

Is This A Good Idea?

I'm not hating the idea of using periods to suppress function evaluations.

apply :append [[a b c] 'd]

apply append. [[a b c] 'd]

And having a shorthand for reduce foo as just :foo which would then be consistent with :[...] seems rather tempting.

It fills in the missing link for the REPEND / REFORM / REXXX finesse, by allowing you to defer the reduction of a variable until the function gets it. Recalling the rules of the weird :arg parameter convention:

>> block: [1 + 2]

>> append data :block
; append receives literally `:block`
; can use optimized tactic and avoid intermediate series

>> append data block
; append receives evaluative product '[1 + 2]
; never sees the word "block"

Bonus Feature: Alternative To UNMETA

Another place this fills in is that we don't have anything to act as a complement to META-WORD!, e.g. ^X

>> x: ~bad~
== ~bad~  ; isotope

>> temp: ^x
== ~bad~

>> unmeta temp
== ~bad~  ; isotope

We have to invoke the word "unmeta" there. But if :temp meant "reduce temp" in a way that accepted quoted items (with BAD-WORD!s becoming isotopes) then it would be a complement (though it would actually handle more cases than UNMETA does, since that errors on BLOCK!s instead of reducing them.)

Wishful Thinking Addendum: Symbols for UNMETA-XXX!

I wish there were a symbol for the complement to meta.

>> x: first [a]
== a

>> y: ^x
== 'a

>> _∨y
== a

But there's not (that's a subscripted character, that's slightly better than the letter v for the purpose, but still not great).

Up and down arrows would actually be nice for this.

>> x: first [a]
== a

>> y: ↑x
== 'a 

>> ↓y
== a

In fact, if we consider leading caret and leading colon to be a workaround for not having the symbols, we could call them UP-WORD! and DOWN-WORD!. Because calling things GET-WORD! when they don't really relate to the GET operation is confusing.

It's like naming particles in physics...you've only got so many words.

hostilefork · September 2, 2022, 11:14am

As With Many Things... Isotopes Solve It!

We could simply say that isotopic GET-BLOCK!s are reduced by the APPEND/CHANGE/INSERT.

In fact, this can be the result of REDUCE.

>> ^ reduce [1000 + 20, 300 + 4]
== ~:[1000 + 20, 300 + 4]~

I used a ^META there because I'm presuming the console could be one of the things that forces reduction, so that if you don't use a ^META you see the reduced result.

>> reduce [1000 + 20, 300 + 4]
== [1020 304]

So REDUCE would be a very cheap operation with a fixed cost, regardless of how big the block you pass in is. (I've thought these might be "intrinsics", and not even create frames).

Then APPEND, INSERT, CHANGE, etc. can accept the isotope, and blend in the reduction with their new series creation. You can thus avoid the creation of large intermediate series.

And... the parameter fulfillment to any function that doesn't understand the convention, can just reduce it during argument fulfillment... passing the function the normal block!

The only downside I can think of is that if something ^METAs the REDUCE and pokes it off somewhere, it could wind up performing the operation at a later time than you would think.

It also gives more information to a function than you'd think you were, if it has this parameter convention then the receiving function gets the pre-reduced information.

If these kinds of things are problems, we could call it REDUCES and say it is used by the optimization-minded, vs. trying to make it the pervasive default of REDUCE. People who disagree could say reduce: :reduces and see if everything runs the same...

Hmmm... it seems that deferring the REDUCE is generally only useful if you splice, so maybe it should be an isotopic GET-GROUP!?

Well no matter what, it points in a far better direction than REPEND!!