The Semantics of JOIN

hostilefork · November 16, 2024, 6:33pm

As @Brett's critique of yesteryear pointed out, REJOIN sucked.

If naming were consistent, you might think from this pattern:

append a reduce b <=> repend a b

...that the following would have been true:

join a reduce b <=> rejoin a b  ; one would have perhaps thought?

But no...REJOIN was single arity (and was a mess).

So REJOIN is in the trash heap, but what about JOIN?

JOIN implicitly reduced, and was basically this:

join a b <=> append copy a reduce b

But b didn't have to be a BLOCK!, so if it was not, it wasn't reduced and just left as-is.

rebol2>> join "abc" [1 + 2 3 + 4]
== "abc37"  ; so the block was reduced

rebol2>> d: 10
rebol2>> join "abc" 'd
== "abcd"  ; not abc10, so the word was *not* reduced

Red did not carry forward this definition:

>> join
*** Script Error: join has no value

Next-Level JOIN: Allow JOIN DATATYPE

At some point, it occurred to me that if you could use JOIN with a datatype, it could step in to fill in the desires of REJOIN more clearly:

>> join binary! [1 + 2 #{DECAFBAD} 2 + 3]
== #{03DECAFBAD05}

There really had been expressions of this like:

rebol2>> rejoin [#{} 1 + 2 #{DECAFBAD} 2 + 3]
== #{03DECAFBAD05}

But see Brett's critique, if the surface-level badness isn't enough to convince you!

I also wanted to support things that Rebol2 probably meant to, but did not. This should give a WORD! back:

rebol2>> join 'a 'b
== "ab"  ; should be word! `ab`

Plus, I thought it might be nice to have a non-reducing variant, done with @[...]

>> join word! @[a 1 + 2]
== a1+2

I've mentioned elsewhere that I think having join-like actions be done with JOIN is superior to being some form of MAKE, when there's nebulousness for what MAKE means.

Making Peace (?) With Uneasiness About List Ambiguity

Ergonomically, it's nice for JOIN to be able to take either a BLOCK! or some other type.

One of the biggest uses of JOIN is with files:

join directory %foo.txt

It would be annoying if you had to write:

join directory [%foo.txt]

But then, you have the problem that if you're joining onto a BLOCK!, then a BLOCK! is a legitimate thing to join:

>> join [a b] [1 + 2 10 + 20]
== [a b 3 30]  ; "traditional" behavior

>> join [a b] [1 + 2]
== [a b [1 + 2 10 + 20]]  ; ...but this could be valid

We could "fix" this by defining JOIN as a non-reducing construct, then have people use SPREAD and REDUCE:

>> join [a b] spread reduce [1 + 2 10 + 20]
== [a b 3 30]

>> join [a b] [1 + 2 10 + 20]
== [a b [1 + 2 10 + 20]]

>> join [a b] reduce [1 + 2 10 + 20]
== [a b [3 30]]

But this doesn't exactly square with the JOIN of a DATATYPE! case, and the most common desires.

If anything, I'd rather make JOIN always take a [...] or @[...] in the second argument, and then create some other non-reducing construct that slaps two things together.

>> block: [a b]

>> adjoin block [c d]
== [a b [c d]]

>> adjoin block spread [e f]
== [a b [c d] e f]

>> block
== [a b]  ; unmodified (difference from append)

But asking people to write (adjoin directory %foo.txt) is... ugly.

A compromise is to narrow it, so that you could only use it with single items when joining with non-lists...

>> join "abc" "def"
== "abcdef"

>> join [a b c] "def"
** Error: JOIN with a list must use [...] or @[...]

This would help steer you away from writing code like join list value and think it works, to only find it falling down when value becomes a BLOCK!.

Is REDUCE just JOIN BLOCK! ?

The implementation of JOIN that I'm working on seems like a more powerful REDUCE.

But questions start to arise about the binding... what should the binding be?

>> join [add 1] ['multiply 2 3]
== [add 1 multiply 2 3]

In that case, I'm going to assume the produced block would have the binding of the [add 1] block. Since multiply is 'multiply and not $multiply it would be unbound and hence an evaluation would be driven by that first argument's binding.

But what if you just said:

>> join block! ['add 1 'multiply 2 3]
== [add 1 multiply 2 3]

Before, the second block's binding was disregarded in the product. But when there's no first block to get the binding from, does it assume the binding of the second block? And how would you get an unbound block if you wanted it?

We could say that you get an unbound block, but if you want a bound block, you'd say:

>> join $[] ['add 1 'multiply 2 3]
== [add 1 multiply 2 3]

Then we have similar questions for things like word!:

>> join word! ["a" "b"]
== ab

I feel that pretty obviously should be unbound. But words follow different rules:

>> join $a ["b"]
== ab  ; can't necessarily be bound, just because a was...

That would suggest all words come back unbound from a joining process.

So no shortage of questions. But in general, I think JOIN is on the right track, and splitting MAKE behaviors that are JOIN-like to it seems good.