UNION, INTERSECT, DIFFERENCE...and Splices

hostilefork · October 22, 2023, 6:06am

There is a pattern that comes up a lot in Query which looks like this:

if not find qe.clause-list 'update [append qe.clause-list 'update]

You could use an operation like UNION here, but since UNION only works on arrays you'd have to put UPDATE in a block:

qe.clause-list: union qe.clause-list [update]

If the word to update with was in a variable (e.g. word: 'update), you couldn't do that block literally...so you'd need some blockifying function to do it:

qe.clause-list: union qe.clause-list compose [(word)]

qe.clause-list: union qe.clause-list reduce [word]

qe.clause-list: union qe.clause-list :[word]

qe.clause-list: union qe.clause-list enblock word

Semantics of Operating on Arrays

Historical Rebol would only accept BLOCK!s as the second argument to these operations. I didn't see any reason not to generalize it, so I allowed GROUP!s as well:

>> intersect [a b c d] '(b c e)
== [b c]

>> intersect '(a b c d) [b c e]
== (b c)

But there's a little bit of a question there about the result type. Both arrays are taken account in terms of the elements, but only the first determines the type. That's a little bit odd.

Furthermore, what if you wanted to intersect an array as an item? The following wouldn't give you what you intended:

item: [b]
collection: [[a] [b] [c] [d]]

collection: intersect collection item

You'd have to put the block into another block. This is another one of those /ONLY style problems...

With SPREAD, we can do this better!

We can make it so that UNION and friends assume you mean just one item by default, and you need to SPREAD the second argument to get it considered itemwise:

>> union [a b c d] 'e
== [a b c d e]

>> union [[a] [b] [c] [d]] [e]
== [[a] [b] [c] [d] [e]]

>> union [[a] [b] [c] [d]] spread [e]
== [[a] [b] [c] [d] e]

So this means the operation from Query could be a little more succinct:

if not find qe.clause-list 'update [append qe.clause-list 'update]
=>
qe.clause-list: union qe.clause-list 'update

But more importantly, the as-is nature would avoid needing to jump through hoops for single-item operands:

 word: update
qe.clause-list: union qe.clause-list word

There are some questions about whether there is a guarantee of where the update would be added if it wasn't there... does it matter if it's added at the beginning or the end?

This is not a new question for these operations... the blocks are being treated as sets, so theoretically multiple answers could be valid:

>> intersect [a b c d] spread [c b e]
== [b c]

 >> intersect [a b c d] spread [c b e]
== [c b]  ; what promises this wouldn't be the answer?

I think it's likely beneficial to make some kind of promise here.

Should UNION/etc. mutate by default?

I point out that you'd have to write:

qe.clause-list: union qe.clause-list 'update

In the Rebol model, "modify by default" is how things like APPEND or REVERSE work. That would suggest you could write:

union qe.clause-list 'update

And if you didn't want to modify qe.clause-list directly, you could copy it:

result: union copy qe.clause-list 'update

If the interface to these functions was changed to work with splices and treat everything else as-is by default, it seems like a good time to make this change for consistency.

hostilefork · October 22, 2023, 8:31am

There's a mechanical question of what this would mean for series not at their head:

>> block: [a b c d]

>> union skip block 2 spread [b e]
== [c d b e]

>> block
== [a b c d b e]

I ran across this when I tried to implement a little hack to test and see what the general experience of mutable set operations would be like. The need for this behavior breaks the hack. :-/