Do not COLLECT [keep if false [$100]]


#1

As I showed in the solution to FizzBuzz, being able to take advantage of the evaluator’s unique chaining abilities and “opting out” generally means making clever uses of null.

As another tradeoff that I think is worth it, I’ve changed COLLECT to return NULL if there are no non-null KEEPs.

>> collect []
// null

>> collect [keep case [1 > 2 [<nope>] 3 > 4 [<also nope>]]
// null

When I look at that, it seems pretty natural. In contrast, giving back a block when there’s been no KEEPs seems like you’re fabricating something from nothing. I’ll also mention that it helps some with performance/overhead, because you’re not making empty blocks you don’t need if you don’t actually wind up needing one. (The implementation of collect in historical Rebol and Red does make block! 16, so you’re taking a 16 cell block even if you don’t use it, while this creates the block on demand.)

However, if this seems inconvenient, there’s an easy workaround for if you want to ensure you always get an empty block back on no keeps… just do a plain non-/ONLY KEEP of an empty block. Splicing nothing (vs null) is counted as enough intent to get some kind of result:

>> maybe-empty: []
...
>> collect [keep [] for-each x maybe-empty [keep x]]
== []

We can easily chain it to make an “always-returns-a-block” version, and maybe we should put that in the box vs. making people use that idiom:

 collect-block: chain [
     :collect
        |
     func [x] [
         :x else [copy []]
     ]
 ]

But if you ask me, the idiom of keep [] isn’t too bad. Moreover, if you just use data: try collect […] you’ll get a BLANK!, and blank is good enough to opt out of many operations that a [] would be opting out of:

data: try collect [if false [keep <not kept>]]
for-each x data [print "won't run, data is blank and opts-out of FOR-EACH"]

Plus you might prefer to have something that you can test for absence with conditionality, vs EMPTY?. (if empty? data is longer than just if data)

The semi-noisy nature of null has advantages

If you think casual uses of COLLECT are sure they mean they want an empty block on no KEEPs, I don’t know if that seems to be the case.

Consider something like “print collect […]”, with that collection coming up empty. What’s PRINT supposed to be doing? Is it a request to print a blank line–just a line feed? Or is it a request to opt-out of printing altogether, so no newline at all?

I don’t think there’s a generic answer to that question. So it’s handy to draw attention to the ambiguity, since PRINT doesn’t take NULL… only BLANK! to opt out, TEXT!, or BLOCK! to be SPACED. So it will error and force you to make an explicit choice:

 print try collect [...] ;-- no output if there are no non-null KEEPs 
 print collect [keep [] ...] ;-- blank newline if no more KEEPs come along
 print collect-block [...] ;-- blank newline if no KEEPs, if we put it in the box

So this keeps you paying attention.

But that’s just a bonus, the real feature is tighter expression

A lot of code already can’t have a non-empty collect because it has literal KEEPs that always run, or collections that aren’t ever empty. Among the cases that are left, many want to do something different on an empty case and can do so with ELSE. Among the cases that are left after that, many just want to enumerate the result…and a BLANK! from try collect serves better than the empty block does.


#2

For example:

if arity = 0 [
    params: ["void"] ;-- In C, f(void) has a distinct meaning from f()
    args: ["rebEND"]
] else [
    params: collect [
        count-up i arity [keep unspaced ["const REBVAL *arg" i]]
    ]
    args: collect [
        count-up i arity [keep unspaced ["arg" i]]
        keep "rebEND"
    ]
]

This can now be tidied up as:

// In C, f(void) has a distinct meaning from f()
params: ["void"] unless collect [
    count-up i arity [keep unspaced ["const REBVAL *arg" i]]
]
args: collect [
    count-up i arity [keep unspaced ["arg" i]]
    keep "rebEND"
]

While the params couldn’t do that before, the args could have. But if you want params to come right after args you might make aesthetic decisions that make you repeat when you don’t have to. This is one of the benefits of tidier expression…you exploit things you might not have otherwise.


#3

I like how this feature and similar ones maximize robustness of the code (through handling null, blank, etc) while keeping everything extremely tight. There’s a nice balance of robustness paired with a code-golf mentality.


#4

The libRebol API benefits in particular from NULL being in more places as a way of gleaning information without additional calls. It’s even more of an advantage than in the interpreter where everything is handled automatically–because there’s no handle to release, which is a separate API you have to worry about. And null pointers are conditionally falsey in C:

REBVAL *block = rebRun("collect [...]");
if (!block) {
    // stuff to handle case of nothing collected
    return;
}
// other stuff

Without using null, and having to test for empty, it’s longer and easier to get wrong:

REBVAL *block = rebRun("collect [...]");
if (rebDid("empty? block")) {
    rebRelease(block);
    // stuff to handle case of nothing collected
    return;
 }
 // other stuff

These two pieces of code aren’t exactly the same as you can still collect an empty block under my current rules–if you KEEP only empty spliced blocks. However, the person writing the code generally has control.
Plus we’ll be able to do MISMATCH soon:

REBVAL *block = rebRun("mismatch empty? collect [keep [] ...]");

It’s the inverse of MATCH, so it will be null if the collected thing is empty, and the collected thing if it isn’t. That could also be match nonempty? or whatever the name of the relevant function would be.

Anyway, point being that null as a signal getting in more places is a systemic benefit.


#5

I’d lean toward raising an error if NULL is passed to KEEP. I think of COLLECT as somewhat related to REDUCE and COMPOSE which always return blocks unless there’s an error.


#6

The current implementation of KEEP is as a specialized enclosure of APPEND. Being enclosed is how it manages to alter the return result to the input thing kept vs. the appended-to-block. The specialization is to remove the series parameter (with a dummy value, as the COLLECT establishes the block to be appending to).

Since it’s a derived function, it inherits not just /ONLY, but other cool APPEND things like /DUP and /LINE and /PART:

>> collect [keep/dup [a] 2]
== [a a]

>> collect [keep <a> keep/line <b> keep/line [c d]]
== [
    <a> <b>
    c d
]

>> collect [keep/part [a b c d] 2]
== [a b]

Having KEEP piggy-back on APPEND’s capabilities is a good thing. So the next question is… if APPEND allows nulls, why wouldn’t KEEP do so also?

The reason APPEND acts as it does, and that SET-WORD! assignments don’t cause errors, are laid out pretty squarely in my resolution post for this thread:

NULL, first class values, and safety

I think consistent with that, KEEP letting you use NULLs fits…and is quite powerful. Remember, you have easy FAIL in things like evaluative switch…no ELSE or DEFAULT needed.

keep switch setting [
    'foo [...]
    1 + 2 [...]
    fail "Invalid setting" // or if you like, simply FAIL with no arg
]

Because of what NULL represents today, this isn’t really that different from keeping NONE!s was. NULLs no longer represent errory-things, just non-things. They’re just a lot better at that job than NONE! or UNSET! were.

But what we should probably say is that VOID!s are ruled out. At minimum unless you say /ONLY.

But, REALLY…

I should mention that REALLY still exists, which errors on nulls and passes through everything else. So if you say:

 collect [really keep (...)]

That would make sure it wasn’t NULL and error if it was. Since KEEP returns what it’s passed, you could also say that as the less literate:

 collect [keep really (...)]

But I like the first way.

Historically I wasn’t totally thrilled with the name, and wanted ENSURE for it. But I decided ENSURE was better for the more common case of having a type-test or otherwise on the value (an asserting form of MATCH).

But that said, once you get used to REALLY, it it’s a pretty literate framing of asserting what you’re saying, same number of letters as ASSERT, no block needed to use it. It’s not that bad.