The problem#

Last week, I shared a script which continuously backed up game saves whenever the game saved. The result is a series of directories that contain snapshots of the game saves from every autosave. But to view this data, we really want a list of unique files in order marked with the time they were created.

The solution#

The following will create symbolic links to the unique files named after their modification date:

for i in /tank/factorio/.zfs/snapshot/*/*.zip
do
  ln -sf "$i" "$(stat --printf=%y "$i").zip"
done

or if you want a custom date format, you can use date:

  ln -sf "$i" "$(date -r "$i" +%Y-%m-%d_%H-%M-%S).zip"

Alternatively, the following will just list the unique files with their timestamps:

find /tank/factorio/.zfs/snapshot/ -printf "%T+ %p\n" \
    | sort | uniq --check-chars=30

The details#

The problem#

For my recent posts on ZFS, I wanted to quickly try out a bunch of variants of my proposed operations without worrying about accidentally modifying my real ZFS filesystems. Specifically, I wanted to know which ways of copying files would result in more efficiently reusing blocks from existing snapshots where possible.

The solution#

WARNING: The instructions below will modify the ZFS pool tank, which is the default name used in many ZFS examples, and therefore may be a real ZFS pool on your computer.

I strongly recommend doing all of this inside a VM to be sure you are not affecting any real filesystems. I used a VirtualBox VM that I installed Debian on and used the guest additions to share a directory between the VM and my actual machine.

First create a 1 GiB virtual (i.e. in a file instead of a physical device) ZFS pool to run tests on:

fallocate -l 1G /root/tank
zpool create tank /root/tank

Then perform various filesystem operations and inspect the result of zfs list -o space to determine if they were using more (or less) space than you expect. In order to make sure I was being consistent and make it easier to test out multiple variations, I wrote some scripts:

git clone https://git.aweirdimagination.net/perelman/zfs-test.git
cd zfs-test/bin
# dump logs from create-/copy-all- and-measure into ../logs/
./measure-all
# read ../logs/ and print space used as Markdown table
./logs-to-table --links

Create script	orig	rsync-ahvx	rsync-ahvx-sparse	rsync-inplace	rsync-inplace-no-whole-file	rsync-no-whole-file	zfs-diff-move-then-rsync
empty	24K	24K✅	24K✅	24K✅	24K✅	24K✅	24K✅
random-1M-file	1.03M	1.03M✅	1.03M✅	1.03M✅	1.03M✅	1.03M✅	1.03M✅
zeros-1M-file	24K	1.03M❌	24K✅	1.03M❌	1.03M❌	1.03M❌	1.03M❌
move-file	1.04M	2.04M❌	2.04M❌	2.04M❌	2.04M❌	2.04M❌	1.04M✅
edit-part-of-file	1.16M	2.04M❌	2.04M❌	2.04M❌	1.17M✅	2.04M❌	1.17M✅

The details#

The problem#

When running an incremental backup with rsync with the --progress flag, it often spends lot of time outputting nothing as it scans through many unchanged files. If you think of it before starting the transfer, --info=progress2 or the name2/skip2 --info flags would give more detail, but once the transfer has been going for a while, you probably don't want to cancel and restart it so you can add those flags.

The solution#

The documentation and this StackExchange answer say you can send a SIGVTALRM signal to rsync version 3.2.0+ and it will output its current progress, but that wasn't working for me.

As a workaround, you can use strace to get a running log of which files rsync is looking at, which includes files it skips without actually opening:

strace --attach="$(pidof rsync)" --trace=openat

(If that's not showing anything, try removing the --trace=openat filter and seeing if there's other syscalls with paths to filter on.)

Alternatively, this StackExchange answer suggests a way to see the currently open files including their sizes (including directories but not unchanged files being inspected):

watch lsof -p"$(pidof rsync | tr ' ' ',')"

(The same should work for a recursive cp/mv/rm.)

Similarly, for getting the status of a transfer of a single large file, this answer attempts to read the files cp is reading/writing to give a running percentage of how much it has copied; a similar approach might work for rsync.

A Weird Imagination

Ordering saves by date

The problem#

The solution#

The details#

Experimenting with ZFS

The problem#

The solution#

The details#

Status of long-running copy

The problem#

The solution#

The details#