A Weird Imagination

Experimenting with ZFS

Posted in

The problem#

For my recent posts on ZFS, I wanted to quickly try out a bunch of variants of my proposed operations without worrying about accidentally modifying my real ZFS filesystems. Specifically, I wanted to know which ways of copying files would result in more efficiently reusing blocks from existing snapshots where possible.

The solution#

WARNING: The instructions below will modify the ZFS pool tank, which is the default name used in many ZFS examples, and therefore may be a real ZFS pool on your computer.

I strongly recommend doing all of this inside a VM to be sure you are not affecting any real filesystems. I used a VirtualBox VM that I installed Debian on and used the guest additions to share a directory between the VM and my actual machine.

First create a 1 GiB virtual (i.e. in a file instead of a physical device) ZFS pool to run tests on:

fallocate -l 1G /root/tank
zpool create tank /root/tank

Then perform various filesystem operations and inspect the result of zfs list -o space to determine if they were using more (or less) space than you expect. In order to make sure I was being consistent and make it easier to test out multiple variations, I wrote some scripts:

git clone https://git.aweirdimagination.net/perelman/zfs-test.git
cd zfs-test/bin
# dump logs from create-/copy-all- and-measure into ../logs/
# read ../logs/ and print space used as Markdown table
./logs-to-table --links
Create script orig rsync-ahvx rsync-ahvx-sparse rsync-inplace rsync-inplace-no-whole-file rsync-no-whole-file zfs-diff-move-then-rsync
empty 24K 24K✅ 24K✅ 24K✅ 24K✅ 24K✅ 24K✅
random-1M-file 1.03M 1.03M✅ 1.03M✅ 1.03M✅ 1.03M✅ 1.03M✅ 1.03M✅
zeros-1M-file 24K 1.03M❌ 24K✅ 1.03M❌ 1.03M❌ 1.03M❌ 1.03M❌
move-file 1.04M 2.04M❌ 2.04M❌ 2.04M❌ 2.04M❌ 2.04M❌ 1.04M✅
edit-part-of-file 1.16M 2.04M❌ 2.04M❌ 2.04M❌ 1.17M✅ 2.04M❌ 1.17M✅

The details#

Creating the VM#

To completely isolate my system, I did all of my ZFS operations inside a VM, partially because I have had the ZFS drivers crash on my main machine, forcing a reboot, and I didn't want to risk that when I knew I would be intentionally doing weird things. I used VirtualBox and made a minimal Linux install using Debian netinst. As everything I wanted to do was local, once I installed the packages I needed1, I disconnected the network interface and used the guest additions to share a directory between the VM and my actual machine.

Making a virtual disk#

fallocate with the -l option creates an empty file of the specified size, which we can use as a virtual hard drive. On modern filesystems, it will do so without actually writing that many zeros.

Notably, the version of the VirtualBox guest additions I used do not count as a "modern filesystem" for this purpose:

$ fallocate -l 1G tank
fallocate: fallocate failed: Operation not supported

As a workaround, I just ran the fallocate command on the host instead. Alternatively, you can use the -x (--posix) option, but, as promised by the documentation, it's slow: it took 13 seconds to create a 1 GiB file on my machine.

Experiment framework#

I wanted to test both creating files and modifying or moving them around while taking snapshots in the middle as well as copying the result of doing so onto another filesystem.

For example, to use the move-file creation script (which creates a file, makes a snapshot, and moves the file) with the rsync-ahvx copy script (which just does a simple rsync to copy), you would run the following:

$ cd zfs-test/bin
$ ./create-and-measure create-test-setup/move-file
$ ./copy-all-and-measure copy-snapshot/rsync-ahvx
tank/target   827M  2.04M     1.02M   1.03M             0B         0B
tank/test     827M  1.04M       14K   1.03M             0B         0B

And you would observe at the end of the output the tank/test has only 1M used while tank/target has 2M used because it has two copies of the moved file.


One surprise I ran into was rsync not copying files that I knew I had changed. It turns out that rsync ignores timestamp differences under a second by default. And, unsurprisingly, my scripts to construct synthetic filesystems using touch, mv, and fallocate ran in milliseconds. After some confusion looking at very similar modification times in the output of stat and trying to add some very short sleep calls, I found the -@-1/--modify-window=-1 option.

Making the table#

The measure-all script runs all of experiments and dumps the logs to files for later viewing/analysis. To avoid the unnecessary complication of parsing the logs, it also records in separate files the "USED" value for the datasets, both in exact and human-readable form:

zfs list -o used -H tank/target > used_human_readable
zfs list -o used -p -H tank/target > used

zfs list provides the -H option for "scripting mode" that omits headers and uses tabs instead of spaces if multiple columns are requested. The -p option is for "parsable (exact) values". And the -o used tells it to just give the single column we want.

Then logs-to-table reads those files to build the Markdown table, using the human readable files for the cell text and the exact files for deciding whether to label the cell as ✅ or ❌.


After all this, what did I actually learn?

One area I didn't cover in these experiments is exactly how these interact with hardlinks and rysnc's -H/--hard-links option to preserve them.

  1. Okay, I may have reconnected to the internet to download more packages a few times. 


Have something to add? Post a comment by sending an email to comments@aweirdimagination.net. You may use Markdown for formatting.

There are no comments yet.