A Weird Imagination

Out of inodes, what now?

Posted in

When you start getting disk full messages on Linux, there's a few different reasons why that might happen:

  1. The expected. Too many large files. You can track down large directories using WinDirStat or

    du -hx --max-depth=1 | sort -h
    where the -x option tells du to not cross filesystem boundaries and the -h option to both uses human-readable sizes like 11M or 1G.

  2. Deleted files aren't actually deleted if they are still open. You can use lsof to find open files. Give it the filesystem as an argument like lsof /home.

  3. By default 5% of each filesystem is reserved for writes by root. Depending on what the filesystem is being used for, this may be too much or simply unnecessary. See this Server Fault answer for how to deal with this.

  4. The files could be shadowed by a mount. If a filesystem is mounted over a non-empty directory, the files in that directory aren't visible.

  5. Last, the disk might not actually be out of space at all. It might actually be out of inodes. Some filesystems, notably the ext2/3/4 filesystems used by default on most Linux distributions have a fixed number of inodes allocated at filesystem creation time. The default is high enough that it is unlikely to be an issue unless there are a very large number of empty files. df -i will show the number of inodes free on each filesystem to verify if a filesystem is indeed out of inodes.

    But how do you find those empty files? As described above, du will help find large files, but now we want to find large numbers of files. The following command acts like du -hx --max-depth=$depth | sort -h for inodes instead of file sizes:

    find -xdev | sed "s@\(\([^/]*/\)\{$depth\}[^/]*\).*@\1@" | uniq -c | sort -n
    

    find -xdev lists all of the files under the current directory on the same filesystem. The sed command finds the first $depth directories (ending in /) and discards the rest of the filename (the .* at the end), so each directory appears once for every file or directory anywhere under it. Then the end of the command counts the repeated lines and sorts by those counts, highlighting the directories with the most files.

Transferring many small files

Posted in

The problem#

Transferring many small files is much slower than you would expect given their total size.

The solution#

tar c directory | pv -abrt | ssh target 'cd destination; tar x'

or

cd destination; ssh source tar c directory | pv -abrt | tar x

The details#

Read more…

3 comments

Future-dating static blog content

Posted in

The problem#

Static site generators are great. But so are blog posts that automatically appear on schedule. How do we reconcile the two? There are solutions involving checking for updates on a schedule like every hour or every day, but that seems unsatisfying: if the posts have already been written, the blog should only need to be regenerated exactly when there is new content to publish.

The solution#

(These instructions are specifically for Pelican as that is what this blog uses, a similar method should work for other static blogging engines.)

Use Pelican's WITH_FUTURE_DATES setting to make future dated posts not appear as part of the blog, but only as drafts. Add the following to the article template in order to include the future publication dates in an easy to parse format:

{% if article.status == "draft" %}
    <!-- Post at datetime {{ article.date|strftime("%H:%M %Y-%m-%d") }} -->
{% endif %}

Then the following script schedule_publish.sh uses those comments to schedule rerunning itself using at:

#!/bin/sh

# Pelican publish
make publish

# Clear old queue entries if they call this script.
for q in `atq -q g | cut -f1`
do
    if [ `at -c $q | tail -2 | head -1` = "$0" ]
    then
        atrm $q
    fi
done

# Check newly published drafts for when they should be published.
# Not using for because output lines have spaces.
grep -F -- '<!-- Post at datetime ' output/drafts/* | cut -d' ' -f5-6 | while read time
do
    # Schedule running this script for that time.
    echo "$0" | at -q g $time
done

Last, follow the instructions in this blog post and run that script as the deployment task.

The details#

Read more…

Hello World!

Posted in

Welcome to my blog. I am Daniel Perelman; I am presently a computer science graduate student at the University of Washington.

Much of my time is spent writing programs and using various highly-configurable tools like Bash, Vim, and LaTeX. I, like many other users of these tools, find myself often performing web searches for help on how to use these tools. Thanks to StackExchange and myriad technical blogs, the answers I am looking for are often close at hand. But not always. Sometimes I end up piecing together a solution from many sources. This blog is a place for me to save others time by sharing that knowledge.