A Weird Imagination

Recreate moves from zfs diff

Posted in

The problem#

When doing an incremental backup, any moved file on the source filesystem usually results in recopying the file to the destination filesystem. For a large file this can both be slow and possibly waste space if the destination keeps around deleted files (e.g. ZFS holding on to old snapshots). If both sides are ZFS, then you can get zfs send/recv to handle all of the details efficiently. But if only the source filesystem is ZFS or the ZFS datasets are not at the same granularity on both sides, that doesn't apply.

zfs diff gives the information about file moves from a snapshot, but its output format is a little awkward for scripting.

The solution#

Download the script I wrote, zfs-diff-move.sh and run it like

zfs-diff-move.sh /path/ /tank/dataset/ tank/dataset@base @new

The following is an abbreviated version of it:

#!/bin/bash
zfs diff -H "$3" "$4" | grep '^R' | while read -r line
do
  get_path() {
    path="$(echo -e "$(echo "$line" | cut -d$'\t' "-f$3")")"
    echo "${path/#$2/$1}"
  }

  from="$(get_path "$1" "$2" 2)"
  to="$(get_path "$1" "$2" 3)"
  mkdir -vp -- "$(dirname "$to")"
  mv -vn -- "$from" "$to" || echo "Unable to move $from"
done

The details#

zfs diff output format#

zfs diff with the -H flag ("Give more parsable tab-separated output") looks like

R   /tank/test/a\0040space  /tank/test/subdir/emoji\0342\0234\0250name
+   /tank/test/subdir
M   /tank/test/

The R means "Rename" so we can ignore the other lines. cut lets us easily select the two filenames… but they're escaped, which is good so we don't have problems with tabs or newlines in the filename, but we have to actually get the real unescaped names into a string. I intentionally selected filenames that would cause problems for this example. Unescaped, they're a space and emoji✨name. This SuperUser thread suggests some over-complicated ways to do this, but some of the comments pointed to the -e option on echo, which does what we want:

$ echo -e '/tank/test/subdir/emoji\0342\0234\0250name'
/tank/test/subdir/emoji✨name

Note that this is a bashism:

$ dash
$ echo -e '/tank/test/subdir/emoji\0342\0234\0250name'
-e /tank/test/subdir/emoji✨name

Alternatively, printf "%b\n" "$string" works on both.

Why the mkdir?#

One detail that tripped me up in the first version of this script is that since renames include moves to different directories, the target directory might not exist yet. Running mkdir -p ensures that it exists so the move will succeed.

Edge cases#

If multiple different files have shared a name in the history of your filesystem, the changes may not be exactly what you expect. The mv above has the -n option, so no files will get overwritten, they just might not get moved.

The simplest edge case is swapping two files:

$ touch /tank/test/a /tank/test/b
$ zfs snap tank/test@before-move
$ mv /tank/test/a /tank/test/tmp
$ mv /tank/test/b /tank/test/a
$ mv /tank/test/tmp /tank/test/b
$ zfs -H diff tank/test@before-move
R   /tank/test/a    /tank/test/b
R   /tank/test/b    /tank/test/a
M   /tank/test/

The script will fail to move either file as there's already a file by the destination name. Or if you used the --clobber option, it would overwrite b and a and then rename it back to a, which is the completely wrong thing to do.

It's possible a more complicated script could handle this and other edge cases, but simply ignoring them and acknowledging the script doesn't handle everything was sufficient for my use.

Comments

Have something to add? Post a comment by sending an email to comments@aweirdimagination.net. You may use Markdown for formatting.

There are no comments yet.