The problem#

The new year is a traditional time for adopting new organizational schemes, among other oft-broken promises to oneself of improved habits. In that vein, I recently adopted a new system for managing my TODO list.

Managing a household involves a lot of infrequent tasks that are easy to forget like checking filters on various appliances every few months and similar invisible maintenance tasks. I had been managing such tasks using recurring Google Calendar events with email reminders, but it was getting unwieldy for multiple reasons. It didn't provide a good record of whether and when tasks were completed (which matters for tasks that should be done some number of weeks or months since the last time it was completed, not since the last time I was reminded of it). Additionally, it doesn't provide a good way to share the TODOs with other members of the household who are also responsible for some of those tasks. And it also cluttered up my calendar with items that didn't really have a meaningful assignment to a particular day or time.

The solution#

Task management systems are very personal: while I will describe what I came up with that hopefully I will continue to find useful, what works for you may be very different.

I set up a TODO list for my household (plus a separate one just for myself) using Sleek which uses the todo.txt format (see todotxt.org for more info and other software). The directory containing the TODO file is shared with the rest of the household with Syncthing. For backups, the directory is a ZFS dataset, so it is automatically snapshotted regularly and included in my backups. If you wanted, you could also apply my copy on save logic to snapshot every change, but that's likely overkill.

Example tasks#

While the simplicity of the todo.txt format means it's easy to edit by hand or use any tool including ones you write yourself, the Sleek GUI handles the syntax for you so it is accessible to non-technical users as well.

Sleek supports "threshold" dates before which tasks are hidden from view by default and recurring tasks which can be "strict" (based on due date, prefixed with +) or not (based on completion date), which allows the specification of tasks like "check funance filter 2-3 months since the last time it was checked":

rec:3m t:2025-02-15 due:2025-03-15 check furnace @filter

as well as "pay the electric bill between the 20th and the end of each month":

rec:+1m t:2024-12-20 due:2025-12-30 pay electric @bill

The thresholds allow for keeping the noise down on the list by hiding tasks that cannot be done yet (can't pay a bill that hasn't arrived yet) or don't make sense to do so soon after they were last done.

The details#

The problem#

I was running a Factorio multiplayer server and was being paranoid about making sure I didn't lose any save data. But I also didn't want to put the saves directory on my ZFS file system as it's on a hard drive, not an SSD, and saves taking too long can cause lag for the players (although with non-blocking saving this is much less of an issue).

The solution#

The following script watches the saves/ directory for any new files being written and immediately copies them to the ZFS dataset tank/factorio mounted at /tank/factorio/ and creates a snapshot named with the current date and time. The result is a snapshot corresponding to every time the game saved with the save data.

#!/bin/sh
while true
do
  inotifywait -r saves/ -e close_write
  sleep 0.1s  # write is to *.tmp.zip, wait for rename
  rsync -avhx saves/ /tank/factorio/
  now="$(date +%Y-%m-%d_%H-%M-%S)"
  zfs snapshot tank/factorio@save-"$now"
done

The details#

The problem#

I had recently done an apt upgrade that included upgrading ZFS and noticed zpool status showed a weird "(non-allocating)" message, which seemed concerning:

$ zpool status
  pool: tank
 state: ONLINE
config:

    NAME         STATE     READ WRITE CKSUM
    tank         ONLINE       0     0     0
      mirror-0   ONLINE       0     0     0
        ata-***  ONLINE       0     0     0  (non-allocating)
        ata-***  ONLINE       0     0     0  (non-allocating)

errors: No known data errors

The solution#

This forum thread suggested the error may be due to a version mismatch between the ZFS tools and the kernel module. I confirmed there was a mismatch:

$ zpool --version
zfs-2.2.3-2
zfs-kmod-2.1.14-1

The easy way to load the new version of a kernel module after an update is to reboot the computer. But if you don't want to do that, here's the general outline of the commands I ran to unload and reload ZFS (run as root):

# Stop using ZFS
$ zfs umount -a
$ zpool export tank
$ service zfs-zed stop
# Remove modules
$ rmmod zfs
$ rmmod spl
# will show error: rmmod: ERROR: Module spl is in use by: ...
# repeatedly rmmod dependencies until spl is removed.

# Reload ZFS
$ modprobe zfs
$ service zfs-zed start
$ zpool import tank

The details#

The problem#

For my recent posts on ZFS, I wanted to quickly try out a bunch of variants of my proposed operations without worrying about accidentally modifying my real ZFS filesystems. Specifically, I wanted to know which ways of copying files would result in more efficiently reusing blocks from existing snapshots where possible.

The solution#

WARNING: The instructions below will modify the ZFS pool tank, which is the default name used in many ZFS examples, and therefore may be a real ZFS pool on your computer.

I strongly recommend doing all of this inside a VM to be sure you are not affecting any real filesystems. I used a VirtualBox VM that I installed Debian on and used the guest additions to share a directory between the VM and my actual machine.

First create a 1 GiB virtual (i.e. in a file instead of a physical device) ZFS pool to run tests on:

fallocate -l 1G /root/tank
zpool create tank /root/tank

Then perform various filesystem operations and inspect the result of zfs list -o space to determine if they were using more (or less) space than you expect. In order to make sure I was being consistent and make it easier to test out multiple variations, I wrote some scripts:

git clone https://git.aweirdimagination.net/perelman/zfs-test.git
cd zfs-test/bin
# dump logs from create-/copy-all- and-measure into ../logs/
./measure-all
# read ../logs/ and print space used as Markdown table
./logs-to-table --links

Create script	orig	rsync-ahvx	rsync-ahvx-sparse	rsync-inplace	rsync-inplace-no-whole-file	rsync-no-whole-file	zfs-diff-move-then-rsync
empty	24K	24K✅	24K✅	24K✅	24K✅	24K✅	24K✅
random-1M-file	1.03M	1.03M✅	1.03M✅	1.03M✅	1.03M✅	1.03M✅	1.03M✅
zeros-1M-file	24K	1.03M❌	24K✅	1.03M❌	1.03M❌	1.03M❌	1.03M❌
move-file	1.04M	2.04M❌	2.04M❌	2.04M❌	2.04M❌	2.04M❌	1.04M✅
edit-part-of-file	1.16M	2.04M❌	2.04M❌	2.04M❌	1.17M✅	2.04M❌	1.17M✅

The details#

The problem#

ZFS datasets are a powerful way to organize your filesystems. At first glance, datasets look a lot like filesystems, so you may default to just one or at most a handful per pool. But unlike with traditional filesystems where you have to decide how much of your disk space each one gets when it's created, ZFS datasets share the space available to the entire pool. Since datasets are the granularity at which ZFS operations like snapshots and zfs send/recv work, having more datasets can give you better control over having different backup policies for different subsets of your data, and ZFS scales just fine to hundreds or thousands of datasets, so you don't have to really worry about creating too many.

But if you're me (well, not just me) and you realize this after you already have months of snapshots of a few terabytes of data, how do you reorganize your ZFS pool into more datasets without either losing the snapshot history or ending up wasting a lot of disk space on redundant copies of data?

The solution#

Before doing anything with real data, make backups and confirm you can restore from them.

I do not have a one-size-fits-all solution here; instead I'll outline the general process and recommend you continually review at each step to make sure things look correct and be ready to zfs rollback and retry if you make a mistake or notice a way you could have done something in a more space-efficient manner.

Create the new dataset hierarchy. I'll refer to the old dataset as tank/old and the new dataset root as tank/new.
Do an initial copy of the earliest snapshot you want to keep from the .zfs directory. If it's @first, then the copy command will be rsync -avhxPHS /tank/old/.zfs/snapshot/first/ /tank/new/.
Check your work and possibly delete or dedup files.
zfs snapshot -r tank/new@first
Do an incremental copy of the next snapshot. If it's @second, this may be as simple as rsync -avhxPHS@-1 --delete /tank/old/.zfs/snapshot/second/ /tank/new/, but that will waste space if you have moved files or modified small sections of large files.
Check your work, and make any necessary changes.
zfs snapshot -r tank/new@second
Repeat steps 5-7 for each snapshot you want to keep.
zfs rename tank/old tank/legacy && zfs rename tank/new tank/old

A Weird Imagination

Tracking household tasks

The problem#

The solution#

Example tasks#

The details#

Copy on save

The problem#

The solution#

The details#

Troubleshooting ZFS upgrade

The problem#

The solution#

The details#

Experimenting with ZFS

The problem#

The solution#

The details#

Splitting ZFS datasets

The problem#

The solution#

The details#

Recreate moves from zfs diff

The problem#

The solution#

The details#