I mentioned I wanted to study some basic utilities...even more basic than greb. Small codebases can put a focus on big design points.
Here was my off-the-cuff line-reverser (TAC ... reverse CAT, where CAT is like Windows Command Shell's TYPE). I wrote it a year ago, while I was solely focused on getting READ-LINE to work from piped input cross platform at the time...so almost no thought went into it:
; %tac.r, version 0.0.2
;
; * COLLECT the lines
; * REVERSE the collection
; * output the result DELIMIT-ed with newline
;
write-stdout maybe delimit/tail newline reverse collect [
until [not keep maybe read-line]
]
Right off the bat, you can see it's using twice the memory it needs to. It's collecting a block of strings, and then while that whole block is in memory it merges it into one giant string before output. At minimum, this should loop and write the strings out of the block out one at a time. (Though doing it this way does draw attention to an interesting point about DELIMIT, which I'll get to later.)
Note: This line-reversing task is one of those pathological cases that can't be done in a "streaming" way. You can't start writing anything to the output until you've read the input to the end. (Doing better needs a random-access I/O PORT! that can SEEK the end of the file and go backwards...but the standard input device can't do this.)
Why Does DELIMIT/TAIL Ever Return NULL ?
The MAYBE in [write-stdout maybe delimit/tail ...]
is there because DELIMIT can return NULL. If it does, we want to opt out of the write (since passing the null would cause a failure)
One might ask if it should never be able to return NULL when you use /TAIL. At the moment, it does:
>> delimit/tail "," ["a" "b"]
== "a,b,"
>> delimit []
== ~null~ ; anti
>> delimit/tail "," []
== ~null~ ; anti
Maybe that last one should be "," ? Perhaps when you have /HEAD or /TAIL, you never get null back?
But... let's stick to looking at the use cases.
What Does %tac.r Want From DELIMIT/TAIL Here?
If we look at the edge case here, there is a difference between these two situations:
-
If the first call to READ-LINE returns an empty string, and the second call returns NULL
-
This happens when you pipe in a 1-byte file containing a single line feed, e.g. a file containing one line that's empty.
-
With the code above, COLLECT produces the block
[""]
for this case
-
-
If the first call to READ-LINE returns NULL
-
This happens when you pipe in a 0-byte file, e.g. a file containing no lines at all
-
With the code above, COLLECT produces the block
[]
for this case.
-
So perhaps you see why DELIMIT chooses to react with some kind of signal when the block contents vaporize. It's precisely because cases like this tend to need some kind of special handling, and it's not good to gloss over that.
In this case, the empty block (which corresponds to the 0-byte file input, e.g. 0 lines) should result in there being no write to the output. So the default behavior of WRITE-STDOUT VOID is the right answer.
More to Study, I Just Thought That Bit Was Interesting...
"There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult."