In my previous post on SSH multiplexing, I gave the following to add to your ~/.ssh/config file without explaining what it actually means:

Host *
ControlMaster auto
ControlPath ~/.ssh/connections/%r_%h_%p

The documentation for ~/.ssh/config can be found at ssh_config(5). Four options are relevant to this post:

Host

The config file is broken up in to sections based on which hosts the configuration options apply to. Host * means these options apply to all connections. If you wanted the options to apply only when connecting to example.com, you could change that line to Host example.com.

Actually, you can also limit configuration options by things other than just the host using the Match directive. For example, this configuration has options for connecting with the username git, presumably due to having multiple git servers that use that username.

ControlMaster

Tells ssh to use multiplexing. Specifically, the default is no, which means it will look for an already open master connection. To actually open a master connection, yes or ask can be used, the latter means that a password prompt will appear when connecting to that master connection. The more useful options for a config file are the auto and autoask options which will use an already open connection if exists, but fall back to acting like yes and ask respectively otherwise.

ControlPath

In order to connect to the master connection, ssh needs a way to communicate with it. This is handled by the master creating a Unix socket which future ssh instances look for. Unix sockets are an IPC mechanism which allows two processes on the same machine to communicate via a connection initiated by one process creating a socket identified by a filename and another using that special file to connect. In comparison with TCP, every server needs its own port number that the client needs to know and any client can connect as long as it knows the port number.

Unix sockets are identified by filenames and ControlPath specifies the filename to use for the socket The %r, %h, %p parts mean the filename should include the remote username, hostname, and port number in order to identify which ssh session is which.

This should usually be enough, but if your home directory is shared among multiple computers, as is common in some university and other large organization setups, then you will also need %l to identify which host you are connecting from. Otherwise ssh may get confused by master connections created by a different host. Luckily, ssh provides a shortcut, which is the %C option which is a hash of all 4 (although it is not available on older versions of ssh):

ControlPath ~/.ssh/connections/%C

or, if you are a disto which does not have %C yet like the latest Ubuntu LTS:

ControlPath ~/.ssh/connections/%L_%h_%p_%r

I used %L for the short version of the local hostname (for example, if %l is foo.example.com, %L would be just foo) because when I used %l, my system complained the filename was too long.

Keep in mind that the socket file is security critical because it is used to piggyback on your existing ssh sessions without authenticating (unless you use the ask or autoask options for ControlMaster), so make sure your ~/.ssh/connections/ directory is readable only by you:

chmod 700 ~/.ssh/connections

ControlPersist

Not used above, the ControlPersist option lets you control when the master connection actually closed. When set to no, it closes with the initial connect. When set to yes it stays open until explicitly closed with ssh -O exit. It can also be set to a length of time to stay open after the last connection is closed.

While the default of ControlPersist is not clearly stated in the documentation, I checked the source code to confirm it does default to no: the default value is set to 0 here if it is still set to its initial value of -1, which is the same value it is given if the configuration file says no.

Twitter no longer offers an RSS feed. That thread offers a few workarounds which involve external or non-free services or require creating a Twitter account. One of those external services, TwitRSS.me is open-source with its code on GitHub. This code can be run locally to view Twitter streams in Liferea (or any other news aggregator) without relying on an external service.

Specifically, the Perl script twitter_user_to_rss.pl is the relevant part. It's intended to be used on a webserver, so the output includes HTTP headers:

Content-type: application/rss+xml
Cache-control: max-age=1800

<?xml version="1.0" encoding="UTF-8"?>
...

which can be cleaned out with tail in the script twitter_user_to_rss_file, which assumes it's in the same directory as twitter_user_to_rss.pl:

#!/bin/sh
"$(dirname "$0")/twitter_user_to_rss.pl" "user=$1&replies=1" \
    | tail -n +4

twitter_user_to_rss_file also handles the argument format of the script, so it just takes a single argument which is the Twitter username. The replies=1 part tells the script to use the Tweets & replies view which includes tweets that begin with @.

When creating a subscription in Liferea, the advanced options include a choice of source type. To use the script, set the source type to Command and the source to

/path/to/twitter_user_to_rss_file username

My version of twitter_user_to_rss.pl includes a few differences from the original that make it a bit more usable. Most importantly, links are made into actual links (based on this code), images are included in the feed content, tweets are marked with their creator to make it easier to follow retweets and combinations of tweets from multiple feeds together in a single stream.

Liferea is a desktop news aggregator (sometimes called an RSS reader). Unlike the late Google Reader or most of its alternatives like the open-source Tiny Tiny RSS which are web-based and run on a server to be accessed via a web browser, Liferea is a separate desktop application and uses an embedded browser to view content.

The problem#

Sometimes you don't actually care about all of the items in a feed and the site provides no filtering mechanism. If the uninteresting items are rare enough, you can just ignore them, but a news aggregator is most useful if it only notifies you of news items you actually might want to read.

The solution#

Luckily, Liferea is very flexible. It supports running a command on a feed which it calls a conversion filter. I wrote some python scripts to filter feeds by title locally.

For instance, I wanted to follow only the changelog posts in the forum feed http://braceyourselfgames.com/forums/feed.php, but it includes changes to all forum topics, so I checked the Use conversion filter option and set the conversion filter to

/path/to/atom_filter_title.py --whitelist "Re: Change log"

rTorrent is a text-based BitTorrent client, which makes it convenient to leave running in a screen or tmux session, so you don't have to leave a terminal window open and you can access it remotely over ssh. It also has an API for web frontends if you don't like text.

Basic setup#

You can set it up to automatically start and stop downloads based on placing .torrent files into a watch/ directory by putting the following in your ~/.rtorrent.rc:

# Default session directory. Make sure you don't run multiple instance
# of rtorrent using the same session directory. Perhaps using a
# relative path?
session = ./session

# Watch a directory for new torrents, and stop those that have been
# deleted.
schedule = watch_directory,5,5,load_start=./watch/*.torrent

Those settings also use a session directory to keep track of torrents across runs of rTorrent, which is useful if you have a lot of torrents and want to be able to restart rTorrent, say, after rebooting your computer. Note rTorrent will complain if the session directory doesn't already exist, so your first run will look like

$ screen
$ mkdir session watch
$ rtorrent

That configuration uses relative paths for watch/ and session/ so you can have multiple instances of rTorrent in different directories.

`magnet:` links#

In additional to .torrent files, BitTorrent also supports magnet: links as a way to join a torrent without needing a file. There is built-in support for magnet: links in rTorrent, but it requires a little extra work to make clicking one in a web browser start the download in rTorrent. Here's a script for doing so along with instructions for having your web browser use it to handle magnet: links. I modified it to handle multiple watch/ directories:

#!/bin/bash

DEFAULT_WATCH='/path/to/your/watch'
if [[ $# -ge 2 ]]
then
    WATCH="$2"
else
    if [[ -z "$DISPLAY" ]]
    then
        WATCH="$DEFAULT_WATCH"
    else
        WATCH=$(zenity --file-selection --directory --title="Select rtorrent watch directory" --filename="$DEFAULT_WATCH")
        [[ "$(basename "$WATCH")" = watch ]] || exit;
    fi
fi
cd "$WATCH"
[[ $1 =~ xt=urn:btih:([^&/]+) ]] || exit;
echo "d10:magnet-uri${#1}:${1}e" > "meta-${BASH_REMATCH[1]}.torrent"

This script uses bash because it uses the bash-only =~ operator for regular expression matching.

This script has a hard-coded default directory to use, but supports either specifying a different directory as the second argument or will use zenity to show a dialog asking the user to select a watch/ directory. zenity is quite useful for easily adding interactivity to shell scripts, especially for something like a directory chooser which doesn't work as well in text.

The problem#

If you are making a lot of SSH connections, starting each connection can add noticeable overhead. Even worse, a firewall might start blocking the connections as many SSH connections from the same source looks a lot like an attacker trying to guess a password, as one of my officemates discovered recently.

The solution#

SSH has a feature called multiplexing, which is described in this blog post, along with a few other useful SSH tips. Here's the relevant excerpt:

In a shell:

$ mkdir -p ~/.ssh/connections
$ chmod 700 ~/.ssh/connections

Add this to your ~/.ssh/config file:

Host *
ControlMaster auto
ControlPath ~/.ssh/connections/%r_%h_%p

The details#

While ssh is often used as just a secure version of telnet, it's actually closer to being a VPN system, supporting many channels of communication over the same encrypted link, which is how port forwarding over SSH is implemented.

Normally SSH makes a connection and opens a single channel for the terminal. Multiplexing merely means keeping that connection open for additional terminal channels. The settings described tell SSH to keep track of open connections in ~/.ssh/connections/ and automatically reuse an open connection whenever possible.

The firewall#

The firewall which caused this post to get written was keeping track of how many new SSH connections were made to a host and only allow a maximum of 3 new connections each minute. As the firewall was not paying attention to whether the connections were accepted, my officemate's script which performed multiple copies and remote commands was getting blocked.

The problem#

I used to have an occasionally unreliable internet connection. I wanted logs of exactly how unreliable it was and an easy way to have notice when it was back up.

The solution#

Use cron to check online status once a minute and write the result to a file. An easy way to check is to confirm that google.com will reply to a ping (this does give a false negative in the unlikely event that Google is down).

To run a script every minute, put a file in /etc/cron.d containing the line

* * * * * root /root/bin/online-check

where /root/bin/online-check is the following script:

#!/bin/sh

# Check if computer is online by attempting to ping google.com.
PING_RESULT="`ping -c 2 google.com 2>/dev/null`"
if [ $? -eq 0 ] && ! echo "$PING_RESULT" | grep -F '64 bytes from 192.168.' >/dev/null 2>/dev/null
then
    ONLINE="online"
else
    ONLINE="offline"
fi
echo "`date '+%Y-%m-%d %T%z'` $ONLINE" >> /var/log/online.log

The details and pretty printing#

A buggy program#

Consider the following (contrived) program¹ which starts a background process to create a file and then waits while the background process is still running before checking to see if the file exists:

#!/bin/sh

# Make sure file doesn't exist.
rm -f file

# Create file in a background process.
touch file &
# While there is a touch process running...
while ps -C "touch" > /dev/null
do
    # ... wait one second for it to complete.
    sleep 1
done
# Check if file was created.
if [ -f file ]
then
    echo "Of course it worked."
else
    echo "Huh? File wasn't created."
    # Wait for background tasks to complete.
    wait
    if [ -f file ]
    then
        echo "Now it's there!"
    else
        echo "File never created."
    fi
fi

# Clean up.
rm -f file

Naturally, it will always output "Of course it worked.", right? Run it in a terminal yourself to confirm this. But I claimed this program is buggy; there's more going on.

I use a Nokia N9 as my cell phone, largely because its MeeGo operating system is Linux based and in fact the command-line can be used very similarly to any other Debian system. This also means Linux sysadmining skills can be used to work around bugs in this sadly no longer supported platform.

The N9 stores a lot of its state including contacts, messages, and call logs in an SQLite database called tracker. It turns out many people have had trouble with it failing, resulting in the contacts app showing the error Can't import contacts and the messaging the phone apps also showing no data. Those threads offer various solutions on how to get your phone back to a working state. In my case, I followed the instructions, and my phone worked fine for several months before failing in the same way again.

I followed the instructions a second time but noticed that it was giving disk full errors. On further inspection, it was clear that the disk wasn't actually full: it was actually out of inodes. After some work which led to my previous blog post, I found /home/user/.cache/telepathy/avatars/gabble/jabber/ had hundreds of thousands of files (and I don't have that many friends). Simply deleting them freed up all of the inodes and I haven't had any troubles since, although I've been making regular backups just in case.

Recovering (some) lost data#

While the files have been deleted, they may not have been overwritten yet, so there may be some hope of a partial recovery. The data for tracker is stored in /home/user/.cache/tracker/. df has the useful side effect of revealing which filesystem a directory is on:

$ df /home/user/.cache/tracker/
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mmcblk0p3         2064208    793712   1165640  41% /home

Attempting data recovery on a mounted partition is a bad idea as the unused space might get overwritten by new files; it's best to make a copy of it. Now that we know where the filesystem is, we can copy it using dd:

dd if=/dev/mmcblk0p3 | ssh $hostname dd of=$file

Then we can examine the partition offline. Particularly strings and grep with its -A and -B options can search for known strings like names and phone numbers and nearby content. For example, searching for a phone number without spaces should find at least some of the associated text messages:

strings partdump | grep -A3 -B3 -F '+19175551212'

Unfortunately, this method is slow and unreliable. I've used it to recover a few text messages and a few phone numbers, but there's no clear way to automate it, so do not expect to recover all of your contacts and text messages this way.

The problem#

Transferring many small files is much slower than you would expect given their total size.

The solution#

tar c directory | pv -abrt | ssh target 'cd destination; tar x'

cd destination; ssh source tar c directory | pv -abrt | tar x

A Weird Imagination

SSH multiplexing options

Twitter via RSS

Title filtering for Liferea

The problem#

The solution#

Setting up rTorrent

Basic setup#

`magnet:` links#

SSH multiplexing

The problem#

The solution#

The details#

The firewall#

Logging online status

The problem#

The solution#

The details and pretty printing#

Child process not in ps?

A buggy program#

Tracker troubles

Recovering (some) lost data#

Transferring many small files

The problem#

The solution#

The details#

The problem#

The solution#

Basic setup#

magnet: links#

The problem#

The solution#

The details#

The firewall#

The problem#

The solution#

The details and pretty printing#

A buggy program#

Recovering (some) lost data#

The problem#

The solution#

The details#

`magnet:` links#