man (and its lesser known cousin apropos) are invaluable resources for using the command-line on Linux. While the information is useful, sometimes they can also be entertaining, particularly in the BUGS sections. Some due to being overly honest like nethack's which merely states

Probably infinite.

or bash's which admits

It's too big and too slow.

My personal favorite is the passage in the BUGS section for xargs, which, after a two paragraph long discussion on the interaction between the -i, -I, and -s flags finally concludes

Instead, the -i option should not impose a line length limit, which is why this discussion appears in the BUGS section.

Some have comments that are merely strange, like the BUGS section for screen, which, among many other comments, includes the bug

A weird imagination is most useful to gain full advantage of all the features.

which is where this blog gets it name.

The problem#

My previous blog post has a footnote in the first sentence. Due to the way footnotes are handled, the footnote reference is a link to #fn:prg, which works fine if the footnote is actually on the page, but on the blog main page (or any other listing of multiple articles) the footnote is not present because it's after the Read more… link. The result is that on those pages, all footnote references are broken links. These broken links should either be repaired such that they point to the article page or removed.

First attempt#

Unable to find an existing solution, I decided to write my own plugin, summary_footnotes. I started by finding another plugin, clean_summary that modifies summary and based my code off of it. That plugin uses Beautiful Soup to parse the summary and rewrite it. A quick look at the docs and I was able to figure out how to select the footnote links and rewrite them, which got me this version of the plugin.

A buggy program#

Consider the following (contrived) program¹ which starts a background process to create a file and then waits while the background process is still running before checking to see if the file exists:

#!/bin/sh

# Make sure file doesn't exist.
rm -f file

# Create file in a background process.
touch file &
# While there is a touch process running...
while ps -C "touch" > /dev/null
do
    # ... wait one second for it to complete.
    sleep 1
done
# Check if file was created.
if [ -f file ]
then
    echo "Of course it worked."
else
    echo "Huh? File wasn't created."
    # Wait for background tasks to complete.
    wait
    if [ -f file ]
    then
        echo "Now it's there!"
    else
        echo "File never created."
    fi
fi

# Clean up.
rm -f file

Naturally, it will always output "Of course it worked.", right? Run it in a terminal yourself to confirm this. But I claimed this program is buggy; there's more going on.

I use a Nokia N9 as my cell phone, largely because its MeeGo operating system is Linux based and in fact the command-line can be used very similarly to any other Debian system. This also means Linux sysadmining skills can be used to work around bugs in this sadly no longer supported platform.

The N9 stores a lot of its state including contacts, messages, and call logs in an SQLite database called tracker. It turns out many people have had trouble with it failing, resulting in the contacts app showing the error Can't import contacts and the messaging the phone apps also showing no data. Those threads offer various solutions on how to get your phone back to a working state. In my case, I followed the instructions, and my phone worked fine for several months before failing in the same way again.

I followed the instructions a second time but noticed that it was giving disk full errors. On further inspection, it was clear that the disk wasn't actually full: it was actually out of inodes. After some work which led to my previous blog post, I found /home/user/.cache/telepathy/avatars/gabble/jabber/ had hundreds of thousands of files (and I don't have that many friends). Simply deleting them freed up all of the inodes and I haven't had any troubles since, although I've been making regular backups just in case.

Recovering (some) lost data#

While the files have been deleted, they may not have been overwritten yet, so there may be some hope of a partial recovery. The data for tracker is stored in /home/user/.cache/tracker/. df has the useful side effect of revealing which filesystem a directory is on:

$ df /home/user/.cache/tracker/
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mmcblk0p3         2064208    793712   1165640  41% /home

Attempting data recovery on a mounted partition is a bad idea as the unused space might get overwritten by new files; it's best to make a copy of it. Now that we know where the filesystem is, we can copy it using dd:

dd if=/dev/mmcblk0p3 | ssh $hostname dd of=$file

Then we can examine the partition offline. Particularly strings and grep with its -A and -B options can search for known strings like names and phone numbers and nearby content. For example, searching for a phone number without spaces should find at least some of the associated text messages:

strings partdump | grep -A3 -B3 -F '+19175551212'

Unfortunately, this method is slow and unreliable. I've used it to recover a few text messages and a few phone numbers, but there's no clear way to automate it, so do not expect to recover all of your contacts and text messages this way.

When you start getting disk full messages on Linux, there's a few different reasons why that might happen:

The expected. Too many large files. You can track down large directories using WinDirStat or
```
du -hx --max-depth=1 | sort -h
```
where the -x option tells du to not cross filesystem boundaries and the -h option to both uses human-readable sizes like 11M or 1G.
Deleted files aren't actually deleted if they are still open. You can use lsof to find open files. Give it the filesystem as an argument like lsof /home.
By default 5% of each filesystem is reserved for writes by root. Depending on what the filesystem is being used for, this may be too much or simply unnecessary. See this Server Fault answer for how to deal with this.
The files could be shadowed by a mount. If a filesystem is mounted over a non-empty directory, the files in that directory aren't visible.
Last, the disk might not actually be out of space at all. It might actually be out of inodes. Some filesystems, notably the ext2/3/4 filesystems used by default on most Linux distributions have a fixed number of inodes allocated at filesystem creation time. The default is high enough that it is unlikely to be an issue unless there are a very large number of empty files. df -i will show the number of inodes free on each filesystem to verify if a filesystem is indeed out of inodes.

But how do you find those empty files? As described above, du will help find large files, but now we want to find large numbers of files. The following command acts like du -hx --max-depth=$depth | sort -h for inodes instead of file sizes:
```
find -xdev | sed "s@$\([^/]*/$\{$depth\}[^/]*\).*@\1@" | uniq -c | sort -n
```
find -xdev lists all of the files under the current directory on the same filesystem. The sed command finds the first $depth directories (ending in /) and discards the rest of the filename (the .* at the end), so each directory appears once for every file or directory anywhere under it. Then the end of the command counts the repeated lines and sorts by those counts, highlighting the directories with the most files.

The problem#

Transferring many small files is much slower than you would expect given their total size.

The solution#

tar c directory | pv -abrt | ssh target 'cd destination; tar x'

cd destination; ssh source tar c directory | pv -abrt | tar x

The details#

The problem#

Static site generators are great. But so are blog posts that automatically appear on schedule. How do we reconcile the two? There are solutions involving checking for updates on a schedule like every hour or every day, but that seems unsatisfying: if the posts have already been written, the blog should only need to be regenerated exactly when there is new content to publish.

The solution#

(These instructions are specifically for Pelican as that is what this blog uses, a similar method should work for other static blogging engines.)

Use Pelican's WITH_FUTURE_DATES setting to make future dated posts not appear as part of the blog, but only as drafts. Add the following to the article template in order to include the future publication dates in an easy to parse format:

{% if article.status == "draft" %}
    <!-- Post at datetime {{ article.date|strftime("%H:%M %Y-%m-%d") }} -->
{% endif %}

Then the following script schedule_publish.sh uses those comments to schedule rerunning itself using at:

#!/bin/sh

# Pelican publish
make publish

# Clear old queue entries if they call this script.
for q in `atq -q g | cut -f1`
do
    if [ `at -c $q | tail -2 | head -1` = "$0" ]
    then
        atrm $q
    fi
done

# Check newly published drafts for when they should be published.
# Not using for because output lines have spaces.
grep -F -- '<!-- Post at datetime ' output/drafts/* | cut -d' ' -f5-6 | while read time
do
    # Schedule running this script for that time.
    echo "$0" | at -q g $time
done

Last, follow the instructions in this blog post and run that script as the deployment task.

The details#

Welcome to my blog. I am Daniel Perelman; I am presently a computer science graduate student at the University of Washington.

Much of my time is spent writing programs and using various highly-configurable tools like Bash, Vim, and LaTeX. I, like many other users of these tools, find myself often performing web searches for help on how to use these tools. Thanks to StackExchange and myriad technical blogs, the answers I am looking for are often close at hand. But not always. Sometimes I end up piecing together a solution from many sources. This blog is a place for me to save others time by sharing that knowledge.

A Weird Imagination

Man Humor

My first Pelican plugin

The problem#

First attempt#

Child process not in ps?

A buggy program#

Tracker troubles

Recovering (some) lost data#

Out of inodes, what now?

Transferring many small files

The problem#

The solution#

The details#

Future-dating static blog content

The problem#

The solution#

The details#

Hello World!