`tac` : Implementation of UNIX's line reverser

I mentioned I wanted to study some basic utilities...even more basic than greb. Small codebases can put a focus on big design points.

Here was my off-the-cuff line-reverser (TAC ... reverse CAT, where CAT is like Windows Command Shell's TYPE). I wrote it a year ago, while I was solely focused on getting READ-LINE to work from piped input cross platform at the time...so almost no thought went into it:

; %tac.r, version 0.0.2
;
; * COLLECT the lines
; * REVERSE the collection
; * output the result DELIMIT-ed with newline
;
write-stdout maybe delimit/tail newline reverse collect [
    until [not keep maybe read-line]
]

Right off the bat, you can see it's using twice the memory it needs to. It's collecting a block of strings, and then while that whole block is in memory it merges it into one giant string before output. At minimum, this should loop and write the strings out of the block out one at a time. (Though doing it this way does draw attention to an interesting point about DELIMIT, which I'll get to later.)

Note: This line-reversing task is one of those pathological cases that can't be done in a "streaming" way. You can't start writing anything to the output until you've read the input to the end. (Doing better needs a random-access I/O PORT! that can SEEK the end of the file and go backwards...but the standard input device can't do this.)

Why Does DELIMIT/TAIL Ever Return NULL ?

The MAYBE in [write-stdout maybe delimit/tail ...] is there because DELIMIT can return NULL. If it does, we want to opt out of the write (since passing the null would cause a failure)

One might ask if it should never be able to return NULL when you use /TAIL. At the moment, it does:

>> delimit/tail "," ["a" "b"]
== "a,b,"

>> delimit []
== ~null~  ; anti

>> delimit/tail "," []
== ~null~  ; anti

Maybe that last one should be "," ? Perhaps when you have /HEAD or /TAIL, you never get null back?

But... let's stick to looking at the use cases.

What Does %tac.r Want From DELIMIT/TAIL Here?

If we look at the edge case here, there is a difference between these two situations:

  1. If the first call to READ-LINE returns an empty string, and the second call returns NULL

    • This happens when you pipe in a 1-byte file containing a single line feed, e.g. a file containing one line that's empty.

    • With the code above, COLLECT produces the block [""] for this case

  2. If the first call to READ-LINE returns NULL

    • This happens when you pipe in a 0-byte file, e.g. a file containing no lines at all

    • With the code above, COLLECT produces the block [] for this case.

So perhaps you see why DELIMIT chooses to react with some kind of signal when the block contents vaporize. It's precisely because cases like this tend to need some kind of special handling, and it's not good to gloss over that.

In this case, the empty block (which corresponds to the 0-byte file input, e.g. 0 lines) should result in there being no write to the output. So the default behavior of WRITE-STDOUT VOID is the right answer.

More to Study, I Just Thought That Bit Was Interesting...


"There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult."

1 Like

The interface for READ-LINE and friends was made before the existence of definitional errors.

There were three return states:

  • Multi-return pack of ~[string data, end of file flag]~

  • NULL

  • An ~escape~ antiform (no longer a legal "keyword")

The case of the ~escape~ antiform pretty clearly should be a definitional error. We can see that it would screw up programs like TAC, with the antiform being truthy. (The original choice was made when they were ornery.)

If you aren't rigged up to handle the user canceling (via EXCEPT) then it should create an error and the program should halt. You don't want it to be conflated with NULL as an ordinary "no more input available, normal completion" condition, and you don't want it to be something that is a branch trigger (which is everything but null these days).

The reason for the end of file flag is that you could do a READ and the other side of the pipe could hang up... not strictly at the point of a newline. Returning NULL in that case might throw away data you were interested in getting. So this was a way of letting you know if it wasn't really a complete string--if you cared.

The EOF Flag Can Be Ignored, And Is A Mistake

The secondary-return-result-EOF isn't a good design, as casual usage will conflate reading an incomplete line with reading a complete line. The other side can hang up on you, and you won't know it.

I think a better answer here is to have a :RAW mode which includes the newline at the end of the string you get. Then you can detect if the newline is there or not. If it's not, your read was prematurely interrupted.

if not (line: read-line:raw except [print "Escape!", quit 1]) [
    print "No more input left to read."
    quit 0
]
if newline <> try last line [  ; need TRY, since string may be empty
   print ["Incomplete line was read:" mold line]
   quit 2
]
try take/last line  ; again, TRY in case string is empty
print ["Complete line read:" mold line]
quit 0

So that gives you coverage of what this layer of abstraction can provide you.

If you don't use the /RAW mode, then the other end of the pipe disconnecting in mid line would cause an abrupt failure.

Seems good to me.

1 Like