A Weird Imagination

Floats in shell

Posted Thu 26 February 2015 in Linux

binary data endianness floating point od python python struct sh wajig

The problem#

Given a file which contains a list of floating point numbers in IEEE 754 single-precision format stored in big endian byte order, how do you view and manipulate this data using command-line tools? This is an actual problem one of my officemates had.

The solution#

$ od --endian=big -f file
0000000   1.7155696e-07   1.0432226e-08    4.563314e+30    6.162976e-33

Changing Pelican URL scheme

Posted Tue 24 February 2015 in Blogging

eval grep one-off pelican sed sh

The problem#

I changed the URI scheme of this blog recently from /posts/YYYY/MM/slug/ to /YYYY/MM/DD/slug/. The latter looks better and makes the actual day of the post more visible.

But I already had posts using the old scheme and cool URIs don't change. Luckily, someone wrote a Pelican plugin called pelican-alias which allows articles to be tagged with additional URIs to redirect to their canonical location. All I had to do was add an Alias: /posts/2015/02/... line to the top of each of the posts I had already written and the plugin would take care of the rest.

Automating the aliasing#

The non-trivial part of automating this is that the URIs include the article's slug, which may have been generated by Pelican from the title, so Pelican has to be involved in generating the correct redirects.

There are two ways I could have automated this process:

Modify the plugin to add a redirect from the old scheme to the new scheme for every article. Unless somehow controlled, this would result in creating redirects for new articles which do not need them.
Write a one-off script to get the slugs out of Pelican and write the Alias: lines into the blog posts.

I took the latter approach because it was simpler and involved no new code to maintain.

Type your SSH passphrase less often

Posted Mon 23 February 2015 in Linux

bash sh ssh ssh-add ssh-agent

The problem#

SSH public key authenication can make SSH much more convenient because you do not need to type a password for every SSH connect. Instead, in common usage, the first time you connect, GNOME Keyring, KWallet, or your desktop environment's equivalent will pop up and offer to keep your decrypted private key securely in memory. Those programs will remember your key until the next time you reboot your computer (or possibly until you log out completely and log back in).

But those are tied to your desktop environment. If you are not at a GUI, either using a computer in text-mode using a console or connecting over SSH, then you do not have access to those programs.

256 color terminals

Posted Sat 21 February 2015 in Linux

256 color terminal bash nokia n9 screen sh terminal tmux tput vim vundle

The problem#

By default, terminals on Linux only use 8 colors (or 16 if setup to use bright variants instead of bold text). Everything else on a modern computer uses 24-bit color, allowing for millions of colors. More colors in the terminal would allow for better syntax highlighting and color output of various commands to be more readable.

In practice, while a few terminals support full 24-bit RGB color (at least Konsole does), it is not widespread enough to be used much. On the other hand, most terminals support 256 colors, which is significantly better than just 8.

Compile on save

Posted Tue 17 February 2015 in Linux

compile on save inotify inotifywait latex latexmk linux sh

The problem#

When developing code or creating visual artifacts in non-WYSIWYG systems, it is very useful to constantly be aware of the output of the compiler and the appearance of the artifact you are creating, whether it is a GUI, a chart, a graph, or a paper. The common way of doing this is to have an IDE specialized for the system you are using; for example, LyX provides a WYSIWYG editor for LaTeX. Similarly, there may be plugins for your text editor to support whatever kind of development you are doing. On the other hand, we can use the shell to create a solution independent of the text editor and the availability of plugins for the particular system being developed for.

Checking for unsafe shell constructs

Posted Mon 16 February 2015 in Linux

cat file find glob grep sed sh shellcheck syntastic vim vundle xargs

Filenames are troublesome#

While shell programing lets you write very concise programs, it turns out that the primary use case of working with files is unfortunately much harder than it seems. That detailed article by David A. Wheeler does a good job of explaining all of the various problems that a naive shell script can run into due to various characters which are allowed in filenames which the shell treats specially in some way.

One surprising one is that filenames beginning with a dash (-) can be interpreted as options due to the way globbing works in the shell. Suppose we set up a directory as follows:

$ cat > -n
Some secret text.
$ cat > test
This is a test.
It has multiple lines.

Quick, what will cat * do here?

$ cat *
     1  This is a test.
     2  It has multiple lines.

Probably not what you wanted. The reason that happens is that the * is expanded by the shell before being fed to cat, so the command executed is cat -n test and -n gets interpreted not as a filename but as an option telling cat to number the lines of the output.

The workaround is to use ./* instead of *, so the - will not actually be the first character and therefore will not get misinterpreted as an option. But there are many other things that can go wrong with unexpected filenames and remembering to handle all of them everywhere is error-prone.

Warnings for unsafe shell code#

The solution is shellcheck. shellcheck will warn you about mistakes like the cat * problem and many other issues you may not be aware of.

As I have many shellscripts around that I wrote before learning about shellcheck, I wanted to run it on all of the shell scripts (but not binaries or other language scripts) in my ~/bin directory, so naturally I wrote a script to do so:

#!/bin/sh

find -exec file {} \; \
    | grep -F 'shell script' \
    | sed s/:[^:]*$// \
    | xargs shellcheck

This uses the file command to identify shell scripts and then selects out their file names to run shellcheck on all of them using xargs.

Warnings in Vim#

shellcheck is written to support integration into IDEs. I use Vim to edit shell scripts, so I installed the syntastic (using Vundle which makes installing Vim plugins off GitHub very easy). Note to follow the instructions on the Syntastic page, specifically the recommended settings: without any settings it won't do anything at all. Once set up, it automatically runs shellcheck on every save, identifies lines with warnings and shows a list of warnings that can be double-clicked to jump to the location of the warning.

If you use the other text editor, then the shellcheck website recommends the flycheck plugin.

sh Rube Goldbergs

Posted Sun 15 February 2015 in Humor

cut grep od rube goldbergs sed sh tail tr wc

The problem#

The command-line is an expressive interface which allows powerful commands to be written concisely. Sometimes you want a longer, less direct way of implementing a task. For example, merely writing wc -l is far too straightforward for counting lines in a file. Surely we can devise a more convoluted way to accomplish that task.

The solution#

cat "$file" |
    expr $(od -t x1 |
    sed 's/ /\n/g' |
    grep '^0a$' |
    sed -z 's/\n//g' |
    wc -c) / 2

The details#

Twitter via RSS

Posted Fri 13 February 2015 in Linux

liferea linux perl rss sh tail twitter

Twitter no longer offers an RSS feed. That thread offers a few workarounds which involve external or non-free services or require creating a Twitter account. One of those external services, TwitRSS.me is open-source with its code on GitHub. This code can be run locally to view Twitter streams in Liferea (or any other news aggregator) without relying on an external service.

Specifically, the Perl script twitter_user_to_rss.pl is the relevant part. It's intended to be used on a webserver, so the output includes HTTP headers:

Content-type: application/rss+xml
Cache-control: max-age=1800

<?xml version="1.0" encoding="UTF-8"?>
...

which can be cleaned out with tail in the script twitter_user_to_rss_file, which assumes it's in the same directory as twitter_user_to_rss.pl:

#!/bin/sh
"$(dirname "$0")/twitter_user_to_rss.pl" "user=$1&replies=1" \
    | tail -n +4

twitter_user_to_rss_file also handles the argument format of the script, so it just takes a single argument which is the Twitter username. The replies=1 part tells the script to use the Tweets & replies view which includes tweets that begin with @.

When creating a subscription in Liferea, the advanced options include a choice of source type. To use the script, set the source type to Command and the source to

/path/to/twitter_user_to_rss_file username

My version of twitter_user_to_rss.pl includes a few differences from the original that make it a bit more usable. Most importantly, links are made into actual links (based on this code), images are included in the feed content, tweets are marked with their creator to make it easier to follow retweets and combinations of tweets from multiple feeds together in a single stream.

Setting up rTorrent

Posted Wed 11 February 2015 in Linux

bash bittorrent linux magnet: rtorrent sh zenity

rTorrent is a text-based BitTorrent client, which makes it convenient to leave running in a screen or tmux session, so you don't have to leave a terminal window open and you can access it remotely over ssh. It also has an API for web frontends if you don't like text.

Basic setup#

You can set it up to automatically start and stop downloads based on placing .torrent files into a watch/ directory by putting the following in your ~/.rtorrent.rc:

# Default session directory. Make sure you don't run multiple instance
# of rtorrent using the same session directory. Perhaps using a
# relative path?
session = ./session

# Watch a directory for new torrents, and stop those that have been
# deleted.
schedule = watch_directory,5,5,load_start=./watch/*.torrent

Those settings also use a session directory to keep track of torrents across runs of rTorrent, which is useful if you have a lot of torrents and want to be able to restart rTorrent, say, after rebooting your computer. Note rTorrent will complain if the session directory doesn't already exist, so your first run will look like

$ screen
$ mkdir session watch
$ rtorrent

That configuration uses relative paths for watch/ and session/ so you can have multiple instances of rTorrent in different directories.

`magnet:` links#

In additional to .torrent files, BitTorrent also supports magnet: links as a way to join a torrent without needing a file. There is built-in support for magnet: links in rTorrent, but it requires a little extra work to make clicking one in a web browser start the download in rTorrent. Here's a script for doing so along with instructions for having your web browser use it to handle magnet: links. I modified it to handle multiple watch/ directories:

#!/bin/bash

DEFAULT_WATCH='/path/to/your/watch'
if [[ $# -ge 2 ]]
then
    WATCH="$2"
else
    if [[ -z "$DISPLAY" ]]
    then
        WATCH="$DEFAULT_WATCH"
    else
        WATCH=$(zenity --file-selection --directory --title="Select rtorrent watch directory" --filename="$DEFAULT_WATCH")
        [[ "$(basename "$WATCH")" = watch ]] || exit;
    fi
fi
cd "$WATCH"
[[ $1 =~ xt=urn:btih:([^&/]+) ]] || exit;
echo "d10:magnet-uri${#1}:${1}e" > "meta-${BASH_REMATCH[1]}.torrent"

This script uses bash because it uses the bash-only =~ operator for regular expression matching.

This script has a hard-coded default directory to use, but supports either specifying a different directory as the second argument or will use zenity to show a dialog asking the user to select a watch/ directory. zenity is quite useful for easily adding interactivity to shell scripts, especially for something like a directory chooser which doesn't work as well in text.

Child process not in ps?

Posted Fri 06 February 2015 in Linux

bash heisenbug jobs linux /proc ps sh strace subprocess timeout tr uniq yes

A buggy program#

Consider the following (contrived) program¹ which starts a background process to create a file and then waits while the background process is still running before checking to see if the file exists:

#!/bin/sh

# Make sure file doesn't exist.
rm -f file

# Create file in a background process.
touch file &
# While there is a touch process running...
while ps -C "touch" > /dev/null
do
    # ... wait one second for it to complete.
    sleep 1
done
# Check if file was created.
if [ -f file ]
then
    echo "Of course it worked."
else
    echo "Huh? File wasn't created."
    # Wait for background tasks to complete.
    wait
    if [ -f file ]
    then
        echo "Now it's there!"
    else
        echo "File never created."
    fi
fi

# Clean up.
rm -f file

Naturally, it will always output "Of course it worked.", right? Run it in a terminal yourself to confirm this. But I claimed this program is buggy; there's more going on.

Future-dating static blog content

Posted Mon 02 February 2015 in Blogging

at blogging git git hooks pelican sh

The problem#

Static site generators are great. But so are blog posts that automatically appear on schedule. How do we reconcile the two? There are solutions involving checking for updates on a schedule like every hour or every day, but that seems unsatisfying: if the posts have already been written, the blog should only need to be regenerated exactly when there is new content to publish.

The solution#

(These instructions are specifically for Pelican as that is what this blog uses, a similar method should work for other static blogging engines.)

Use Pelican's WITH_FUTURE_DATES setting to make future dated posts not appear as part of the blog, but only as drafts. Add the following to the article template in order to include the future publication dates in an easy to parse format:

{% if article.status == "draft" %}
    <!-- Post at datetime {{ article.date|strftime("%H:%M %Y-%m-%d") }} -->
{% endif %}

Then the following script schedule_publish.sh uses those comments to schedule rerunning itself using at:

#!/bin/sh

# Pelican publish
make publish

# Clear old queue entries if they call this script.
for q in `atq -q g | cut -f1`
do
    if [ `at -c $q | tail -2 | head -1` = "$0" ]
    then
        atrm $q
    fi
done

# Check newly published drafts for when they should be published.
# Not using for because output lines have spaces.
grep -F -- '<!-- Post at datetime ' output/drafts/* | cut -d' ' -f5-6 | while read time
do
    # Schedule running this script for that time.
    echo "$0" | at -q g $time
done

Last, follow the instructions in this blog post and run that script as the deployment task.

The details#

The problem#

The solution#

The problem#

Automating the aliasing#

The problem#

The problem#

The problem#

Filenames are troublesome#

Warnings for unsafe shell code#

Warnings in Vim#

The problem#

The solution#

The details#

Basic setup#

magnet: links#

A buggy program#

The problem#

The solution#

The details#

`magnet:` links#