The problem#

The command-line is an expressive interface which allows powerful commands to be written concisely. Sometimes you want a longer, less direct way of implementing a task. For example, merely writing wc -l is far too straightforward for counting lines in a file. Surely we can devise a more convoluted way to accomplish that task.

The solution#

cat "$file" |
    expr $(od -t x1 |
    sed 's/ /\n/g' |
    grep '^0a$' |
    sed -z 's/\n//g' |
    wc -c) / 2

The details#

Inspired by the concept of Rube Goldberg machines, sh Rube Goldbergs are a silly game to explore the capabilities of commands that you might otherwise not encounter. I cannot take full credit/blame for the idea: it was suggested by one of my officemates who also came up with some of the examples in this post.

Simpler examples#

Before explaining the example above, I'll first cover a couple shorter Rube Goldbergs for wc -l, which inspired that longer one:

cat "$file" | tr -cd '\n' | wc -c

uses tr to delete all of the non-newline characters and then counts the characters using wc.

cat "$file" | nl -ba | tail -1 | cut -f1

uses nl to label all (-ba) of the lines with line numbers. Then tail selects the last line (-1) and cut selects just the line number from that line.

The larger example#

To understand what's going on, it's helpful to chop off the end of the pipeline and look at the output. Because this command uses $(...), there's not an obvious end to chop off. The outside just feeds the input file in and uses expr to divide the inner result by 2, so the following computes double the number of lines in $file:

cat "$file" |
    od -t x1 |
    sed 's/ /\n/g' |
    grep '^0a$' |
    sed -z 's/\n//g' |
    wc -c

Look at the output of each step on a small file. The first step od shows a view similar to a hex editor. The -t x1 flag displays spaces between all of the bytes. The output looks like

0000000 54 69 74 6c 65 3a 20 73 68 20 52 75 62 65 20 47
0000020 6f 6c 64 62 65 72 67 73 0a 44 61 74 65 3a 20 32
0000040 30 31 35 2d 30 32 2d 31 35 20 30 32 3a 30 30 0a

Next, sed is used to split this into lines at each space. The grep command selects only those lines which contain the hex sequence for a newline (0a). The next sed command joins all of the lines together, so the result is a line that looks like

0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a

Then wc is used to count the number of characters. Since each newline is two hex digits, this results in counting double the number of newlines.

That number is computed inside $(...), which is an sh feature called command substitution, which means the outer command is executed like the result of the inner command were pasted in the place of the inner command. Here we use this to divide the result by 2 using expr.

Further efforts#

Got any more convoluted sh Rube Goldbergs? Or other tasks to try to implement in the least efficient way possible? Have fun!

A Weird Imagination

sh Rube Goldbergs