The problem#

Last week, I shared a script which continuously backed up game saves whenever the game saved. The result is a series of directories that contain snapshots of the game saves from every autosave. But to view this data, we really want a list of unique files in order marked with the time they were created.

The solution#

The following will create symbolic links to the unique files named after their modification date:

for i in /tank/factorio/.zfs/snapshot/*/*.zip
do
  ln -sf "$i" "$(stat --printf=%y "$i").zip"
done

or if you want a custom date format, you can use date:

  ln -sf "$i" "$(date -r "$i" +%Y-%m-%d_%H-%M-%S).zip"

Alternatively, the following will just list the unique files with their timestamps:

find /tank/factorio/.zfs/snapshot/ -printf "%T+ %p\n" \
    | sort | uniq --check-chars=30

The details#

The problem#

I've recently been writing more series of blog posts or otherwise linking between posts using {filename} links. And also I've been adjusting the scheduling of my future planned blog posts, which involves changing the filename as my naming scheme includes the publication date in the filename. Which means there's opportunities for not adjusting the links to match and ending up with broken links between posts.

Pelican does generate warnings like

WARNING  Unable to find './invalid.md', skipping url        log.py:89
         replacement.

but currently building my entire blog takes about a minute, so I generally only do it when publishing. So I wanted a more lightweight way to just check the intra-blog {filename} links.

The solution#

I wrote the script check_filename_links.sh:

#!/bin/bash

content="${1:-.}"

find "$content" -iname '*.md' -type f -print0 | 
  while IFS= read -r -d '' filename
  do
    grep '^\[.*]: {filename}' "$filename" |
      sed 's/^[^ ]* {filename}\([^\#]*\)\#\?.*$/\1/' |
      while read -r link
      do
        if [ "${link:0:1}" != "/" ]
        then
          linkedfile="$(dirname "$filename")/$link"
        else
          linkedfile="$content$link"
        fi
        if [ ! -f "$linkedfile" ]
        then
          echo "filename=$filename, link=$link,"\
               "file does not exist: $linkedfile"
        fi
      done
  done

Run it from your content/ directory or provide the path to the content/ directory as an argument and it will print out the broken links:

filename=./foo/bar.md, link=./invalid.md, file does not exist: ./foo/./invalid.md

The details#

The problem#

I will often make copies of important files onto multiple devices, and then later make backups of all of those devices onto the same drive. At which point, I now have multiple redundant copies of those files within my backup. Tools like rdfind, fdupes, and jdupes exist to deal with the general problem of searching a collection of files for duplicates efficiently, but none of them support only checking if files are identical if their filenames and/or paths match, so they end up doing a lot of extra work in this case.

The solution#

Download the script I wrote, hardlink-dups-by-name.sh and run it as follows:

hardlink-dups-by-name.sh a_backup/ another_backup/

Then all files like a_backup/some/path that are identical to the corresponding file another_backup/some/path will get hard-linked together so there will only be one copy of the data taking up space.

The details#

Filenames are troublesome#

While shell programing lets you write very concise programs, it turns out that the primary use case of working with files is unfortunately much harder than it seems. That detailed article by David A. Wheeler does a good job of explaining all of the various problems that a naive shell script can run into due to various characters which are allowed in filenames which the shell treats specially in some way.

One surprising one is that filenames beginning with a dash (-) can be interpreted as options due to the way globbing works in the shell. Suppose we set up a directory as follows:

$ cat > -n
Some secret text.
$ cat > test
This is a test.
It has multiple lines.

Quick, what will cat * do here?

$ cat *
     1  This is a test.
     2  It has multiple lines.

Probably not what you wanted. The reason that happens is that the * is expanded by the shell before being fed to cat, so the command executed is cat -n test and -n gets interpreted not as a filename but as an option telling cat to number the lines of the output.

The workaround is to use ./* instead of *, so the - will not actually be the first character and therefore will not get misinterpreted as an option. But there are many other things that can go wrong with unexpected filenames and remembering to handle all of them everywhere is error-prone.

Warnings for unsafe shell code#

The solution is shellcheck. shellcheck will warn you about mistakes like the cat * problem and many other issues you may not be aware of.

As I have many shellscripts around that I wrote before learning about shellcheck, I wanted to run it on all of the shell scripts (but not binaries or other language scripts) in my ~/bin directory, so naturally I wrote a script to do so:

#!/bin/sh

find -exec file {} \; \
    | grep -F 'shell script' \
    | sed s/:[^:]*$// \
    | xargs shellcheck

This uses the file command to identify shell scripts and then selects out their file names to run shellcheck on all of them using xargs.

Warnings in Vim#

shellcheck is written to support integration into IDEs. I use Vim to edit shell scripts, so I installed the syntastic (using Vundle which makes installing Vim plugins off GitHub very easy). Note to follow the instructions on the Syntastic page, specifically the recommended settings: without any settings it won't do anything at all. Once set up, it automatically runs shellcheck on every save, identifies lines with warnings and shows a list of warnings that can be double-clicked to jump to the location of the warning.

If you use the other text editor, then the shellcheck website recommends the flycheck plugin.

When you start getting disk full messages on Linux, there's a few different reasons why that might happen:

The expected. Too many large files. You can track down large directories using WinDirStat or
```
du -hx --max-depth=1 | sort -h
```
where the -x option tells du to not cross filesystem boundaries and the -h option to both uses human-readable sizes like 11M or 1G.
Deleted files aren't actually deleted if they are still open. You can use lsof to find open files. Give it the filesystem as an argument like lsof /home.
By default 5% of each filesystem is reserved for writes by root. Depending on what the filesystem is being used for, this may be too much or simply unnecessary. See this Server Fault answer for how to deal with this.
The files could be shadowed by a mount. If a filesystem is mounted over a non-empty directory, the files in that directory aren't visible.
Last, the disk might not actually be out of space at all. It might actually be out of inodes. Some filesystems, notably the ext2/3/4 filesystems used by default on most Linux distributions have a fixed number of inodes allocated at filesystem creation time. The default is high enough that it is unlikely to be an issue unless there are a very large number of empty files. df -i will show the number of inodes free on each filesystem to verify if a filesystem is indeed out of inodes.

But how do you find those empty files? As described above, du will help find large files, but now we want to find large numbers of files. The following command acts like du -hx --max-depth=$depth | sort -h for inodes instead of file sizes:
```
find -xdev | sed "s@$\([^/]*/$\{$depth\}[^/]*\).*@\1@" | uniq -c | sort -n
```
find -xdev lists all of the files under the current directory on the same filesystem. The sed command finds the first $depth directories (ending in /) and discards the rest of the filename (the .* at the end), so each directory appears once for every file or directory anywhere under it. Then the end of the command counts the repeated lines and sorts by those counts, highlighting the directories with the most files.

A Weird Imagination

Ordering saves by date

The problem#

The solution#

The details#

Finding broken {filename} links

The problem#

The solution#

The details#

Hardlink identical directory trees

The problem#

The solution#

The details#

Checking for unsafe shell constructs

Filenames are troublesome#

Warnings for unsafe shell code#

Warnings in Vim#

Out of inodes, what now?