The problem#
When doing an incremental backup, any moved file on the source
filesystem usually results in recopying the file to the destination
filesystem. For a large file this can both be slow and possibly waste
space if the destination keeps around deleted files (e.g. ZFS holding on
to old snapshots). If both sides are ZFS, then you can get
zfs send
/recv
to handle all of the
details efficiently. But if only the source filesystem is ZFS or the ZFS
datasets are not at the same granularity on both sides, that doesn't
apply.
zfs diff
gives the information about file moves from a
snapshot, but its output format is a little awkward for scripting.
The solution#
Download the script I wrote, zfs-diff-move.sh
and run it like
zfs-diff-move.sh /path/ /tank/dataset/ tank/dataset@base @new
The following is an abbreviated version of it:
#!/bin/bash
zfs diff -H "$3" "$4" | grep '^R' | while read -r line
do
get_path() {
path="$(echo -e "$(echo "$line" | cut -d$'\t' "-f$3")")"
echo "${path/#$2/$1}"
}
from="$(get_path "$1" "$2" 2)"
to="$(get_path "$1" "$2" 3)"
mkdir -vp -- "$(dirname "$to")"
mv -vn -- "$from" "$to" || echo "Unable to move $from"
done
The details#
zfs diff
output format#
zfs diff
with the -H
flag ("Give more parsable tab-separated output")
looks like
R /tank/test/a\0040space /tank/test/subdir/emoji\0342\0234\0250name
+ /tank/test/subdir
M /tank/test/
The R
means "Rename" so we can ignore the other lines. cut
lets us easily select the two filenames… but they're escaped, which is
good so we don't have problems with tabs or newlines in the filename,
but we have to actually get the real unescaped names into a string. I
intentionally selected filenames that would cause problems for this
example. Unescaped, they're a space
and emoji✨name
.
This SuperUser thread suggests some over-complicated
ways to do this, but some of the comments pointed to the -e
option on
echo
, which does what we want:
$ echo -e '/tank/test/subdir/emoji\0342\0234\0250name'
/tank/test/subdir/emoji✨name
Note that this is a bashism:
$ dash
$ echo -e '/tank/test/subdir/emoji\0342\0234\0250name'
-e /tank/test/subdir/emoji✨name
Alternatively, printf "%b\n" "$string"
works on both.
Why the mkdir
?#
One detail that tripped me up in the first version of this script is
that since renames include moves to different directories, the target
directory might not exist yet. Running mkdir -p
ensures that it exists
so the move will succeed.
Edge cases#
If multiple different files have shared a name in the history of your
filesystem, the changes may not be exactly what you expect. The mv
above has the -n
option, so no files will get overwritten, they just
might not get moved.
The simplest edge case is swapping two files:
$ touch /tank/test/a /tank/test/b
$ zfs snap tank/test@before-move
$ mv /tank/test/a /tank/test/tmp
$ mv /tank/test/b /tank/test/a
$ mv /tank/test/tmp /tank/test/b
$ zfs -H diff tank/test@before-move
R /tank/test/a /tank/test/b
R /tank/test/b /tank/test/a
M /tank/test/
The script will fail to move either file as there's already a file by
the destination name. Or if you used the --clobber
option, it would
overwrite b
and a
and then rename it back to a
, which is the
completely wrong thing to do.
It's possible a more complicated script could handle this and other edge cases, but simply ignoring them and acknowledging the script doesn't handle everything was sufficient for my use.
Comments
Have something to add? Post a comment by sending an email to comments@aweirdimagination.net. You may use Markdown for formatting.
There are no comments yet.