A Weird Imagination

Mysterious Twitter scraping bug

Posted in

The bug#

A couple days ago my Twitter screen scraper stopped working in Liferea. I hadn't changed anything, and the script's output at the command-line still looked okay to my inspection, but Liferea started giving the message

Read more…

Download all items in a podcast

Posted in

Podcasts are a simple extension to RSS: in addition to text and a link, posts can include a file to be downloaded. In the case of podcasts, this is an audio file. Due to this, while many specialized podcast applications exist, any news aggregator will work, although it might not have the best interface for that use case.

My normal workflow for podcasts is to keep track of them in a news aggregator and explicitly download the files to a local folder. While this isn't overly onerous for a weekly podcast, it is a repetitive task that could be automated. More importantly, downloading an entire backlog of dozens of episodes that way would take a while.

Read more…

Reverse sequence for tr

The problem#

If you take the word wizard, reverse the order of the letters and reverse the alphabet:

From: abcdefghijklmnopqrstuvwxyz
To:   ZYXWVUTSRQPONMLKJIHGFEDCBA

then you get the word wizard back, an observation made at least as early as 1972.

Now let's write a shell script to verify this so we can find other words with similar interesting properties. The obvious shell script to verify this

echo wizard | tr a-z z-a | rev

unfortunately fails with the error

tr: range-endpoints of 'z-a' are in reverse collating sequence order

The error is by design: it's not clear what a sequence in reverse order should mean, so POSIX actually requires that it not work.

Read more…

Better hash-based colors

The problem#

Yesterday, I proposed using a hash function to choose colors:

ps1_color="32;38;5;$((0x$(hostname | md5sum | cut -f1 -d' ' | tr -d '\n' | tail -c2)))"

I did not discuss two issues with this approach:

  1. Colors may be too similar to other colors, such that they are not useful.
  2. Some colors may be undesirable altogether, particularly very dark ones may be too similar to a black terminal background color.

Read more…

Hash-based hostname colors

Random color selection#

In my post about hostname-based prompt colors, I suggested a fallback color scheme that was obviously wrong in order to remind you to set a color for that host:

alice@unknown:~$ 

This carried with it an implicit assumption: you care what color each host is assigned. You may instead be happy to assign a random color to each host. We could use shuf to generate a random color:

ps1_color="32;38;5;$(shuf -i 0-255 -n 1)"

The problem with this solution is the goal of the recoloring the prompt was not simply to make it more colorful, but for that color to have meaning. We want the color to always be the same for each login to a given host.

One way to accomplish this would be to use that code to randomly generate colors, but save the results in a table like the one used before for manually-chosen colors. But it turns out we can do better.

Hash-based color selection#

Hash functions have a useful property called determinism, which means that hashing the same value will always get the same result. The consequence is that we can use a hash function like it's a lookup table of random numbers shared among all of our computers:

ps1_color="32;38;5;$(($(hostname | sum | cut -f1 -d' ' | sed s/^0*//) % 256))"

The $((...)) syntax is bash's replacement for expr which is less portable but easier to use. Here we use it to make sure the hash value we compute is a number between 0 and 255. [sum][sum] computes a hash of its input, in this case the result of hostname. Its output is not just a number so cut selects out the number and sed gets rid of any leading zeros so it isn't misinterpreted as octal.

The idea of using sum was suggested by a friend after reading my previous post on the topic.

But this turns out to not work great for hosts with similar names like rob.example.com and orb.example.com:

alice@rob:~$ 
alice@orb:~$ 

Similar colors on hosts with very different names would not be so bad, but because of how sum works, it will tend to give similar results on similar strings (although less often than I expected; it took some effort to find such an example).

Better hash functions#

While this is not a security-critical application, here cryptographic hash functions solve the problem. Cryptographic hash functions guarantee (in theory) that knowing that two inputs are similar tells you nothing about their hash values. In other words, the output of cryptographic hash functions are indistinguishable from random and, in fact, they can be used to build pseudorandom generators like Linux's /dev/urandom.

The cryptographic hash function utilities output hex instead of decimal, so they aren't quite a drop-in replacement for sum:

ps1_color="32;38;5;$((0x$(hostname | md5sum | cut -f1 -d' ' | tr -d '\n' | tail -c2)))"

Here we use cut and tr to select just the hex string of the hash. tail's -c option specifies the number of bytes to read from the end, where 2 bytes corresponds to 2 hex digits, which can have a value of 0 to 255, so the modulo operation is not needed. Instead the 0x prefix inside $((...)) interprets the string as a hex number and outputs it as a decimal number.

This code uses the md5sum utility to compute an MD5 hash of the hostname. This is recommended because md5sum is likely to be available on all hosts. Do be aware that MD5 is insecure and it is only okay to use here because coloring the prompt is not a security-critical application.

sha1sum and sha256sum are also likely available on modern systems and work as drop-in replacements for md5sum in the above command should you wish to use a different hash. Additionally, you could also get different values out of the hash by adding a salt:

salt="Some string."
ps1_color="32;38;5;$((0x$( (echo "$salt"; hostname) | sha256sum | cut -f1 -d' ' | tr -d '\n' | tail -c2)))"

Floats in shell

The problem#

Given a file which contains a list of floating point numbers in IEEE 754 single-precision format stored in big endian byte order, how do you view and manipulate this data using command-line tools? This is an actual problem one of my officemates had.

The solution#

$ od --endian=big -f file
0000000   1.7155696e-07   1.0432226e-08    4.563314e+30    6.162976e-33

Read more…

Type your SSH passphrase less often

Posted in

The problem#

SSH public key authenication can make SSH much more convenient because you do not need to type a password for every SSH connect. Instead, in common usage, the first time you connect, GNOME Keyring, KWallet, or your desktop environment's equivalent will pop up and offer to keep your decrypted private key securely in memory. Those programs will remember your key until the next time you reboot your computer (or possibly until you log out completely and log back in).

But those are tied to your desktop environment. If you are not at a GUI, either using a computer in text-mode using a console or connecting over SSH, then you do not have access to those programs.

Read more…

256 color hostnames

The problem#

When using multiple terminals on different hosts, it can sometimes be confusing to remember which host you are on. The hostname appears in the command prompt, but it's easy to skim past that if you are not paying attention.

One solution that works pretty well for me is recoloring the prompt based on what host I am on. This is in fact why I researched how to get 256 colors terminals working in the first place: in order to have enough colors to be able to make a good choice for each host I use frequently.

Read more…

256 color terminals

The problem#

By default, terminals on Linux only use 8 colors (or 16 if setup to use bright variants instead of bold text). Everything else on a modern computer uses 24-bit color, allowing for millions of colors. More colors in the terminal would allow for better syntax highlighting and color output of various commands to be more readable.

In practice, while a few terminals support full 24-bit RGB color (at least Konsole does), it is not widespread enough to be used much. On the other hand, most terminals support 256 colors, which is significantly better than just 8.

Read more…

Compile on save

Posted in

The problem#

When developing code or creating visual artifacts in non-WYSIWYG systems, it is very useful to constantly be aware of the output of the compiler and the appearance of the artifact you are creating, whether it is a GUI, a chart, a graph, or a paper. The common way of doing this is to have an IDE specialized for the system you are using; for example, LyX provides a WYSIWYG editor for LaTeX. Similarly, there may be plugins for your text editor to support whatever kind of development you are doing. On the other hand, we can use the shell to create a solution independent of the text editor and the availability of plugins for the particular system being developed for.

Read more…