The problem#
I was running a Factorio multiplayer server and was being paranoid about making sure I didn't lose any save data. But I also didn't want to put the saves directory on my ZFS file system as it's on a hard drive, not an SSD, and saves taking too long can cause lag for the players (although with non-blocking saving this is much less of an issue).
The solution#
The following script watches the saves/
directory for any new
files being written and immediately copies them to the ZFS dataset
tank/factorio
mounted at /tank/factorio/
and creates a snapshot
named with the current date and time. The result is a snapshot
corresponding to every time the game saved with the save data.
#!/bin/sh
while true
do
inotifywait -r saves/ -e close_write
sleep 0.1s # write is to *.tmp.zip, wait for rename
rsync -avhx saves/ /tank/factorio/
now="$(date +%Y-%m-%d_%H-%M-%S)"
zfs snapshot tank/factorio@save-"$now"
done
The details#
Watching for changes#
Similar to my script for compiling on save, this uses
inotifywait
to wait for a file to be written. The
one difference is the -r
(--recursive
) option to watch the entire
saves/
directory.
In testing, I noticed the filename detected by inotifywait
wasn't
actually the save filename save.zip
, but save.tmp.zip
. The game
presumably is trying to avoid save corruption by writing the entire save
out to a separate file before doing the very fast and safe operation of
renaming it over the existing save file.1 But in order to avoid
accidentally managing to start the copy before that rename happens,
I put in that sleep 0.1s
, but I doubt it actually matters.
Watching your documents#
While researching rsync
settings, I came across an example in the
Arch wiki, suggesting a setup of using inotifywait
and rsync
to automate backups of a documents directory triggered
on save.
Naming the snapshot#
Names of things in ZFS are not allowed to have many symbols,
only _
, -
, :
, and .
(and, apparently, space).
Using a timestamp is an easy way to ensure a unique2 and
relatively meaningful name for the snapshots. As a trade-off between
avoiding symbols and readability, I chose a date format that mostly uses
-
except for _
to separate the date from the time.
Incremental backups#
This is effectively a somewhat convoluted incremental backup system, using ZFS's snapshots to make the backups incremental instead of using hardlinks. Which means if you aren't using ZFS (or even just don't want to create a separate dataset at your backup granularity for some reason), you could use incremental backup software to do essentially the same thing.
rsync
can do incremental backups
using the --link-dest
option, given the most recent backup to compare
to (using the Bash recommended way to find the newest file):
# Find the most recent backup folder.
files=(/tank/factorio/*) prev=${files[0]}
for f in "${files[@]}"; do
if [[ $f -nt $prev ]]; then
prev=$f
fi
done
now="$(date +%Y-%m-%d_%H-%M-%S)"
rsync -avhx --link-dest="$prev" saves/ "/tank/factorio/$now/"
Note the --link-dest
path is relative to the destination directory,
not relative to the current working directory. Using absolute paths for
--link-dest
avoids that confusion.
rsnapshot
#
There are also many backup tools which internally use rsync
but
provide additional features to help use it as a backup system. One
popular one is rsnapshot
. One difference is it will
create numbered backups instead of timestamped ones, and it will only
create a finite number of them: one of the things it automates is
letting you set how long to retain backups so you can set policies like
"keep the last 30 daily backups and the last 12 monthly backups". For
most backups some limit to retention is probably desired, but you might
want to keep all of your game saves (or maybe you don't).
-
Note this also means that any save named like
save.tmp
will be overwritten wheneversave
is saved. So don't name your Factorio saves names that end in ".tmp
". ↩ -
Timestamps are only unique if you're sure you won't have to events in the same second. In this case, the events are saves that take more than a second to complete and are scheduled every several minutes anyway. And it's unlikely you actually want backups at sub-second granularity, but when using timestamps as unique identifiers do take a second to consider that's true for your usecase. ↩
Comments
Have something to add? Post a comment by sending an email to comments@aweirdimagination.net. You may use Markdown for formatting.
There are no comments yet.