A Weird Imagination

Recreate moves from zfs diff

Posted in

The problem#

When doing an incremental backup, any moved file on the source filesystem usually results in recopying the file to the destination filesystem. For a large file this can both be slow and possibly waste space if the destination keeps around deleted files (e.g. ZFS holding on to old snapshots). If both sides are ZFS, then you can get zfs send/recv to handle all of the details efficiently. But if only the source filesystem is ZFS or the ZFS datasets are not at the same granularity on both sides, that doesn't apply.

zfs diff gives the information about file moves from a snapshot, but its output format is a little awkward for scripting.

The solution#

Download the script I wrote, zfs-diff-move.sh and run it like

zfs-diff-move.sh /path/ /tank/dataset/ tank/dataset@base @new

The following is an abbreviated version of it:

#!/bin/bash
zfs diff -H "$3" "$4" | grep '^R' | while read -r line
do
  get_path() {
    path="$(echo -e "$(echo "$line" | cut -d$'\t' "-f$3")")"
    echo "${path/#$2/$1}"
  }

  from="$(get_path "$1" "$2" 2)"
  to="$(get_path "$1" "$2" 3)"
  mkdir -vp -- "$(dirname "$to")"
  mv -vn -- "$from" "$to" || echo "Unable to move $from"
done

The details#

Read more…

Generating specialized word lists

Posted in

The problem#

I've been playing Codenames online a lot lately (using my fork of codenames.plus), and a friend suggested it might be fun to have themed word lists. Specifically, they suggested Star Trek as a theme as it's a fandom that's fairly widely known. They left it up to me to figure out what should be in a Star Trek themed word list.

The solution#

If you just want to play Codenames with the list, go to my Codenames web app and select one or both of the Star Trek card packs. If you just want the word lists, you can download the Star Trek: The Next Generation words and the Star Trek: Deep Space 9 words.

To generate a word list yourself (I used this source for the Star Trek scripts), you will need a common words list like en_50k.txt which I mentioned in my previous post on anagram games, and then pipe the corpus through the following script (which you will likely have to modify for the idiosyncrasies of your data):

#!/bin/bash
set -euo pipefail

NUM_COMMON=2000 # Filter out the most common 2000 words
COMMON_WORDS="$(mktemp)"
<en_50k.txt head "-$NUM_COMMON" | cut -d' ' -f1 |\
    sort | tr '[:lower:]' '[:upper:]' >"$COMMON_WORDS"

# Select only dialogue lines (in Star Trek scripts)
grep -aP '^\t\t\t[^\t]' |\
    # Split words
    tr ' .,:()\[\]!?;"/\t[:cntrl:]' '[\n*]' |\
    sed 's/--/\n/' |\
    # Strip whitespace
    sed 's/^\s\+//' | sed 's/\s\+$//' |\
    grep -av '^\s*$' |\
    # Strip quotes
    sed "s/^'//" | sed "s/'$//" |\
    # Filter out numbers
    grep -av '^[[:digit:]]*$' |\
    tr '[:lower:]' '[:upper:]' |\
    # Fix for contractions not being in wordlist
    sed "s/'\(S\|RE\|VE\|LL\|M\|D\)$//" |\
    grep -av "'T$" |\
    # Remove some more non-words
    grep -avF '-' |\
    grep -avF '&' |\
    # Count
    sort | uniq -c |\
    # Only keep words with >25 occurrences
    awk '{ if ($1 > 25) { print } }' |\
    # Remove common words
    join -v2 -22 -o 2.1,2.2 "$COMMON_WORDS" - |\
    # Sort most common words first
    sort -rn

rm "$COMMON_WORDS"

The output of the script will require some manual effort to decide which words really belong in the final list, but it's a good start.

The details#

Read more…

Nvidia GLX not working

The problem#

I recently replaced my old Nvidia graphics card with a newer one. Upon booting up, I ran glxgears to test that 3D graphics were working properly and got an error like

X Error of failed request:  BadWindow (invalid Window parameter)
 Major opcode of failed request:  155 (NV-GLX)
 Minor opcode of failed request:  4 ()
 Resource id in failed request:  0x1200003
 Serial number of failed request:  34
 Current serial number in output stream:  34

The solution#

Either delete /etc/X11/xorg.conf or edit it and remove (or comment out) the "Files" section; that is, the lines

Section "Files"
    ...
EndSection

Read more…

Volume via shell

Posted in

The problem#

Sometimes a GUI is not the best way to control a computer's volume. Usually if you care about the volume of your computer, you're probably nearby but perhaps would rather be using a remote or other shortcut way of changing the volume. The specific use case that prompted this blog post was binding the volume up and volume down keys on my keyboard to the global volume control (as opposed to separately binding them in each application).

Read more…

Shadowrun's text compression

The problem#

Several years ago, I was in a ROM hacking IRC room where another regular Alchemic was reverse engineering the text system of the SNES game Shadowrun. He figured it out and wrote a python script to decompress the text but had some questions about why it was designed the way it was. So we're going to walk through figuring out how the code works, with some help from his notes, and try to understand the design.

If you don't want spoilers and would rather try to reverse engineer it yourself, just read up to the end of the Trace format section and see how much you can figure out on your own.

Read more…

Read-only filesystem errors

Linux has a tendency to give very unhelpful error messages when it is unable to create a file. I previously blogged about a few different reasons Linux might report a disk is full, but all of the reasons included the disk actually not having space for more files. Yet another reason to get similar errors is if the partition is mounted readonly (ro):

$ mount | grep -F /usr
/dev/sdc2 on /usr type ext4 (ro,nodev,noatime,data=ordered)

mount without any options lists all of the mounted partitions along with their mount options.

Many programs will show a helpful error message:

$ touch test
touch: cannot touch ‘test’: Read-only file system

But some others won't:

rtorrent: Could not lock session directory: "./session/", held by "<error>".

That error is normally caused by ./session/rtorrent.lock not being writable due to being held by another process, but in this case it's not writable due to the filesystem being readonly. rtorrent doesn't distinguish the two.

For that reason, when running into weird behavior from a program on Linux, it's a good idea to check that the directories the program might try to write to are actually writable.

Listing files into a file

Posted in

The problem#

$ ls > file

doesn't do what you expect:

$ touch foo
$ touch bar
$ ls > filelist
$ cat filelist
bar
filelist
foo

You probably didn't expect, or want, filelist to be listed in filelist.

The solution#

$ filelist=$(ls); echo "$filelist" >filelist

Read more…

Pi in shell

Posted in

Calculating π the hard way#

In honor of Pi Day, I was going to try to write a script that computed π in shell, but given the lack of floating point support, I decided it would be too messy. If you want to see hard to follow code to generate π, I highly recommend the IOCCC entry westley.c from 1998, the majority of which is an ASCII art circle which calculates its own area and radius in order to estimate π. The hint file suggests looking at the output of

$ cc -E westley.c

The 2012 entry, endoh2 is also a pretty amazing π calculator.

Getting π#

Instead, I will just generate π the shell way: using another program.

$ python -c 'import math; print(math.pi)'
3.14159265359

Read more…

Monitor all the things

CPU and memory#

On Linux, the basic way to monitor load is to use top. The only thing top really has going for it is that it is almost certainly available on any system you will ever use. Luckily, there's a better way: htop. htop supports colors and mouse clicks and lists the available key commands at the bottom of the terminal. It also can be customized to your liking. You can start by putting my htoprc in your ~/.config/htop/ directory:

$ mkdir -p ~/.config/htop/
$ cd ~/.config/htop/
$ wget https://gist.githubusercontent.com/dperelman/1e051f5705685cb41f31/raw/3ab9cf17b166120a805d5f76a71ce82452f553b4/htoprc

Or just explore the options yourself.

Hit F1 (or click Help in the bottom-left) to get an explanation of the colors used in the CPU and memory bars and a guide to keystrokes not listed at the bottom.

In my usage, I find insufficient memory is more often the problem than CPU, so I usually leave htop sorted by the MEM% column.

Other resources#

While CPU and memory are the easiest to monitor resources, they are not the only ones. Linux offers a wide variety of system monitors, depending on what resource you want to monitor and what format you want to view it in. This post focuses on real-time viewing with human-friendly displays but most of these have options or variants that support logging historical data in a more machine-friendly format as well.

Read more…

Changing Pelican URL scheme

Posted in

The problem#

I changed the URI scheme of this blog recently from /posts/YYYY/MM/slug/ to /YYYY/MM/DD/slug/. The latter looks better and makes the actual day of the post more visible.

But I already had posts using the old scheme and cool URIs don't change. Luckily, someone wrote a Pelican plugin called pelican-alias which allows articles to be tagged with additional URIs to redirect to their canonical location. All I had to do was add an Alias: /posts/2015/02/... line to the top of each of the posts I had already written and the plugin would take care of the rest.

Automating the aliasing#

The non-trivial part of automating this is that the URIs include the article's slug, which may have been generated by Pelican from the title, so Pelican has to be involved in generating the correct redirects.

There are two ways I could have automated this process:

  1. Modify the plugin to add a redirect from the old scheme to the new scheme for every article. Unless somehow controlled, this would result in creating redirects for new articles which do not need them.
  2. Write a one-off script to get the slugs out of Pelican and write the Alias: lines into the blog posts.

I took the latter approach because it was simpler and involved no new code to maintain.

Read more…