A Weird Imagination

Displaying Factorio history

Posted in

The problem#

Last week, I got all of the Factorio saves I had been keeping around into a single directory in order by the time they were created. But what should we do with that data? We could load arbitrary saves to see what our base looked like in the past, but loading the saves individually isn't a great way to do that when there's a lot of them.

The solution#

Luckily, I'm not the only one to want screenshots of my Factorio bases, so there are existing mods to do so.

The FactorioMaps Timelapse mod will take a list of saves and generates a web page like this demo that lets you look around your base across time. As the documentation explains, this is not actually a mod you enable for your save, but a script that you place in your mods/ directory that will run Factorio repeatedly to generate screenshots.

To set it up, use a non-Steam install of Factorio and put the saves you want under its saves directory (in a subdirectory named to_screenshot/ in this example). If you downloaded the mod as ZIP file (e.g., by installing the mod from within Factorio), unzip it; alternatively you can clone the GitHub repo or my fork which includes a few minor improvements for when displaying a lot of saves, especially ones less than an hour apart.

# get to Factorio install /mods/ directory
cd factorio/mods
# clone the git repo with the mod's internal name
git clone https://github.com/dperelman/FactorioMaps.git L0laapk3_FactorioMaps
cd L0laapk3_FactorioMaps
# install dependencies
python -m venv .venv
. .venv/bin/activate
pip install --upgrade -r requirements.txt
# add saves in /saves/to_screenshot/ to timelapse "mybase"
python auto.py --standalone mybase to_screenshot/*

Then you can find the timelapse in the directory script-output/FactorioMaps/mybase/ of your Factorio install; just open index.html in any web browser. Especially if you have a lot of saves, consider adding the --dayonly option to not take twice as much time also generating screenshots of the night view.

Generating screenshots as you play#

Note that if you just want to take screenshots automatically as you play, you can use the Screenshot Toolkit mod, which is what I use for single player games. But due to the way Factorio multiplayer works, doing so impacts every player, so in a multiplayer game it may be better to just copy the saves as the game runs and generate the screenshots later.

The details#

Read more…

Ordering saves by date

Posted in

The problem#

Last week, I shared a script which continuously backed up game saves whenever the game saved. The result is a series of directories that contain snapshots of the game saves from every autosave. But to view this data, we really want a list of unique files in order marked with the time they were created.

The solution#

The following will create symbolic links to the unique files named after their modification date:

for i in /tank/factorio/.zfs/snapshot/*/*.zip
do
  ln -sf "$i" "$(stat --printf=%y "$i").zip"
done

or if you want a custom date format, you can use date:

  ln -sf "$i" "$(date -r "$i" +%Y-%m-%d_%H-%M-%S).zip"

Alternatively, the following will just list the unique files with their timestamps:

find /tank/factorio/.zfs/snapshot/ -printf "%T+ %p\n" \
    | sort | uniq --check-chars=30

The details#

Read more…

Copy on save

The problem#

I was running a Factorio multiplayer server and was being paranoid about making sure I didn't lose any save data. But I also didn't want to put the saves directory on my ZFS file system as it's on a hard drive, not an SSD, and saves taking too long can cause lag for the players (although with non-blocking saving this is much less of an issue).

The solution#

The following script watches the saves/ directory for any new files being written and immediately copies them to the ZFS dataset tank/factorio mounted at /tank/factorio/ and creates a snapshot named with the current date and time. The result is a snapshot corresponding to every time the game saved with the save data.

#!/bin/sh
while true
do
  inotifywait -r saves/ -e close_write
  sleep 0.1s  # write is to *.tmp.zip, wait for rename
  rsync -avhx saves/ /tank/factorio/
  now="$(date +%Y-%m-%d_%H-%M-%S)"
  zfs snapshot tank/factorio@save-"$now"
done

The details#

Read more…

Finding broken {filename} links

Posted in

The problem#

I've recently been writing more series of blog posts or otherwise linking between posts using {filename} links. And also I've been adjusting the scheduling of my future planned blog posts, which involves changing the filename as my naming scheme includes the publication date in the filename. Which means there's opportunities for not adjusting the links to match and ending up with broken links between posts.

Pelican does generate warnings like

WARNING  Unable to find './invalid.md', skipping url        log.py:89
         replacement.                                                

but currently building my entire blog takes about a minute, so I generally only do it when publishing. So I wanted a more lightweight way to just check the intra-blog {filename} links.

The solution#

I wrote the script check_filename_links.sh:

#!/bin/bash

content="${1:-.}"

find "$content" -iname '*.md' -type f -print0 | 
  while IFS= read -r -d '' filename
  do
    grep '^\[.*]: {filename}' "$filename" |
      sed 's/^[^ ]* {filename}\([^\#]*\)\#\?.*$/\1/' |
      while read -r link
      do
        if [ "${link:0:1}" != "/" ]
        then
          linkedfile="$(dirname "$filename")/$link"
        else
          linkedfile="$content$link"
        fi
        if [ ! -f "$linkedfile" ]
        then
          echo "filename=$filename, link=$link,"\
               "file does not exist: $linkedfile"
        fi
      done
  done

Run it from your content/ directory or provide the path to the content/ directory as an argument and it will print out the broken links:

filename=./foo/bar.md, link=./invalid.md, file does not exist: ./foo/./invalid.md

The details#

Read more…

Shell script over characters

Posted in

The problem#

I wanted to find a specific Unicode character I had used somewhere in some previous blog post. But by the nature of not knowing exactly what character it was, I wasn't sure how to search for it.

The solution#

Instead, I wrote a script based on this post to simply list all of the characters appearing in any file in a given directory:

cat * | sed 's/./&\n/g' | sort -u

Although the output is small, to further reduce the noise, this version strips out the English letters, numbers, and common symbols:

cat * \
    | tr -d 'a-zA-Z0-9!@#$%^&*()_+=`~,./?;:"[]{}<>|\\'"'-" \
    | sed 's/./&\n/g' | sort -u

The details#

Read more…

Client/server over named pipes

Posted in

The problem#

Firefox Marionette only allows a single client to connect a time, so I'd like to have a program in charge of holding that connection that can communicate with the other parts of the system that know what I want Firefox to actually do. While a common way of handling this is to run an HTTP server, that seems pretty heavyweight and would allow access to any user on the machine.

This can be generalized to any case where we want to be able to control one program from another on the same computer. Some reasons why we might not be able to simply act from the first program is if the latter has different access permissions or is holding onto some state like, as mentioned, an open socket.

The solution#

The two programs can communicate over a named pipe, also known as a FIFO. The mkfifo command creates one on the filesystem:

$ mkfifo a_pipe
$ chown server_uid:clients_gid a_pipe
$ chmod 620 a_pipe

Then you can use tail -f a_pipe to watch the pipe and echo something > pipe to write to the pipe. To add a bit more structure, here's a very simple server and client in Python where the client sends one command per line as JSON and the server processes the commands one at a time:

# server.py
import fileinput
import json

def process_cmd(cmd, *args):
    print(f"In process_cmd({cmd})...")

for line in fileinput.input():
    try:
        process_cmd(*json.loads(line))
    except Exception as ex:
        print("Command failed:")
        print(ex, flush=True)
# client.py
import json
import sys

print(json.dumps(sys.argv[1:]))
# Run the server.
$ tail -f a_pipe | python server.py
# Send some commands using the client.
$ python client.py cmd foo bar > a_pipe
$ python client.py another_cmd baz > a_pipe

Then the server will print out

In process_cmd(cmd)...
In process_cmd(another_cmd)...

The details#

Read more…

Experimenting with ZFS

Posted in

The problem#

For my recent posts on ZFS, I wanted to quickly try out a bunch of variants of my proposed operations without worrying about accidentally modifying my real ZFS filesystems. Specifically, I wanted to know which ways of copying files would result in more efficiently reusing blocks from existing snapshots where possible.

The solution#

WARNING: The instructions below will modify the ZFS pool tank, which is the default name used in many ZFS examples, and therefore may be a real ZFS pool on your computer.

I strongly recommend doing all of this inside a VM to be sure you are not affecting any real filesystems. I used a VirtualBox VM that I installed Debian on and used the guest additions to share a directory between the VM and my actual machine.

First create a 1 GiB virtual (i.e. in a file instead of a physical device) ZFS pool to run tests on:

fallocate -l 1G /root/tank
zpool create tank /root/tank

Then perform various filesystem operations and inspect the result of zfs list -o space to determine if they were using more (or less) space than you expect. In order to make sure I was being consistent and make it easier to test out multiple variations, I wrote some scripts:

git clone https://git.aweirdimagination.net/perelman/zfs-test.git
cd zfs-test/bin
# dump logs from create-/copy-all- and-measure into ../logs/
./measure-all
# read ../logs/ and print space used as Markdown table
./logs-to-table --links
Create script orig rsync-ahvx rsync-ahvx-sparse rsync-inplace rsync-inplace-no-whole-file rsync-no-whole-file zfs-diff-move-then-rsync
empty 24K 24K✅ 24K✅ 24K✅ 24K✅ 24K✅ 24K✅
random-1M-file 1.03M 1.03M✅ 1.03M✅ 1.03M✅ 1.03M✅ 1.03M✅ 1.03M✅
zeros-1M-file 24K 1.03M❌ 24K✅ 1.03M❌ 1.03M❌ 1.03M❌ 1.03M❌
move-file 1.04M 2.04M❌ 2.04M❌ 2.04M❌ 2.04M❌ 2.04M❌ 1.04M✅
edit-part-of-file 1.16M 2.04M❌ 2.04M❌ 2.04M❌ 1.17M✅ 2.04M❌ 1.17M✅

The details#

Read more…

Recreate moves from zfs diff

Posted in

The problem#

When doing an incremental backup, any moved file on the source filesystem usually results in recopying the file to the destination filesystem. For a large file this can both be slow and possibly waste space if the destination keeps around deleted files (e.g. ZFS holding on to old snapshots). If both sides are ZFS, then you can get zfs send/recv to handle all of the details efficiently. But if only the source filesystem is ZFS or the ZFS datasets are not at the same granularity on both sides, that doesn't apply.

zfs diff gives the information about file moves from a snapshot, but its output format is a little awkward for scripting.

The solution#

Download the script I wrote, zfs-diff-move.sh and run it like

zfs-diff-move.sh /path/ /tank/dataset/ tank/dataset@base @new

The following is an abbreviated version of it:

#!/bin/bash
zfs diff -H "$3" "$4" | grep '^R' | while read -r line
do
  get_path() {
    path="$(echo -e "$(echo "$line" | cut -d$'\t' "-f$3")")"
    echo "${path/#$2/$1}"
  }

  from="$(get_path "$1" "$2" 2)"
  to="$(get_path "$1" "$2" 3)"
  mkdir -vp -- "$(dirname "$to")"
  mv -vn -- "$from" "$to" || echo "Unable to move $from"
done

The details#

Read more…

Status of long-running copy

The problem#

When running an incremental backup with rsync with the --progress flag, it often spends lot of time outputting nothing as it scans through many unchanged files. If you think of it before starting the transfer, --info=progress2 or the name2/skip2 --info flags would give more detail, but once the transfer has been going for a while, you probably don't want to cancel and restart it so you can add those flags.

The solution#

The documentation and this StackExchange answer say you can send a SIGVTALRM signal to rsync version 3.2.0+ and it will output its current progress, but that wasn't working for me.

As a workaround, you can use strace to get a running log of which files rsync is looking at, which includes files it skips without actually opening:

strace --attach="$(pidof rsync)" --trace=openat

(If that's not showing anything, try removing the --trace=openat filter and seeing if there's other syscalls with paths to filter on.)

Alternatively, this StackExchange answer suggests a way to see the currently open files including their sizes (including directories but not unchanged files being inspected):

watch lsof -p"$(pidof rsync | tr ' ' ',')"

(The same should work for a recursive cp/mv/rm.)

Similarly, for getting the status of a transfer of a single large file, this answer attempts to read the files cp is reading/writing to give a running percentage of how much it has copied; a similar approach might work for rsync.

The details#

Read more…

Hardlink identical directory trees

Posted in

The problem#

I will often make copies of important files onto multiple devices, and then later make backups of all of those devices onto the same drive. At which point, I now have multiple redundant copies of those files within my backup. Tools like rdfind, fdupes, and jdupes exist to deal with the general problem of searching a collection of files for duplicates efficiently, but none of them support only checking if files are identical if their filenames and/or paths match, so they end up doing a lot of extra work in this case.

The solution#

Download the script I wrote, hardlink-dups-by-name.sh and run it as follows:

hardlink-dups-by-name.sh a_backup/ another_backup/

Then all files like a_backup/some/path that are identical to the corresponding file another_backup/some/path will get hard-linked together so there will only be one copy of the data taking up space.

The details#

Read more…