A Weird Imagination

Status of long-running copy

The problem#

When running an incremental backup with rsync with the --progress flag, it often spends lot of time outputting nothing as it scans through many unchanged files. If you think of it before starting the transfer, --info=progress2 or the name2/skip2 --info flags would give more detail, but once the transfer has been going for a while, you probably don't want to cancel and restart it so you can add those flags.

The solution#

The documentation and this StackExchange answer say you can send a SIGVTALRM signal to rsync version 3.2.0+ and it will output its current progress, but that wasn't working for me.

As a workaround, you can use strace to get a running log of which files rsync is looking at, which includes files it skips without actually opening:

strace --attach="$(pidof rsync)" --trace=openat

(If that's not showing anything, try removing the --trace=openat filter and seeing if there's other syscalls with paths to filter on.)

Alternatively, this StackExchange answer suggests a way to see the currently open files including their sizes (including directories but not unchanged files being inspected):

watch lsof -p"$(pidof rsync | tr ' ' ',')"

(The same should work for a recursive cp/mv/rm.)

Similarly, for getting the status of a transfer of a single large file, this answer attempts to read the files cp is reading/writing to give a running percentage of how much it has copied; a similar approach might work for rsync.

The details#

Read more…

Hardlink identical directory trees

Posted in

The problem#

I will often make copies of important files onto multiple devices, and then later make backups of all of those devices onto the same drive. At which point, I now have multiple redundant copies of those files within my backup. Tools like rdfind, fdupes, and jdupes exist to deal with the general problem of searching a collection of files for duplicates efficiently, but none of them support only checking if files are identical if their filenames and/or paths match, so they end up doing a lot of extra work in this case.

The solution#

Download the script I wrote, hardlink-dups-by-name.sh and run it as follows:

hardlink-dups-by-name.sh a_backup/ another_backup/

Then all files like a_backup/some/path that are identical to the corresponding file another_backup/some/path will get hard-linked together so there will only be one copy of the data taking up space.

The details#

Read more…

Generating specialized word lists

Posted in

The problem#

I've been playing Codenames online a lot lately (using my fork of codenames.plus), and a friend suggested it might be fun to have themed word lists. Specifically, they suggested Star Trek as a theme as it's a fandom that's fairly widely known. They left it up to me to figure out what should be in a Star Trek themed word list.

The solution#

If you just want to play Codenames with the list, go to my Codenames web app and select one or both of the Star Trek card packs. If you just want the word lists, you can download the Star Trek: The Next Generation words and the Star Trek: Deep Space 9 words.

To generate a word list yourself (I used this source for the Star Trek scripts), you will need a common words list like en_50k.txt which I mentioned in my previous post on anagram games, and then pipe the corpus through the following script (which you will likely have to modify for the idiosyncrasies of your data):

#!/bin/bash
set -euo pipefail

NUM_COMMON=2000 # Filter out the most common 2000 words
COMMON_WORDS="$(mktemp)"
<en_50k.txt head "-$NUM_COMMON" | cut -d' ' -f1 |\
    sort | tr '[:lower:]' '[:upper:]' >"$COMMON_WORDS"

# Select only dialogue lines (in Star Trek scripts)
grep -aP '^\t\t\t[^\t]' |\
    # Split words
    tr ' .,:()\[\]!?;"/\t[:cntrl:]' '[\n*]' |\
    sed 's/--/\n/' |\
    # Strip whitespace
    sed 's/^\s\+//' | sed 's/\s\+$//' |\
    grep -av '^\s*$' |\
    # Strip quotes
    sed "s/^'//" | sed "s/'$//" |\
    # Filter out numbers
    grep -av '^[[:digit:]]*$' |\
    tr '[:lower:]' '[:upper:]' |\
    # Fix for contractions not being in wordlist
    sed "s/'\(S\|RE\|VE\|LL\|M\|D\)$//" |\
    grep -av "'T$" |\
    # Remove some more non-words
    grep -avF '-' |\
    grep -avF '&' |\
    # Count
    sort | uniq -c |\
    # Only keep words with >25 occurrences
    awk '{ if ($1 > 25) { print } }' |\
    # Remove common words
    join -v2 -22 -o 2.1,2.2 "$COMMON_WORDS" - |\
    # Sort most common words first
    sort -rn

rm "$COMMON_WORDS"

The output of the script will require some manual effort to decide which words really belong in the final list, but it's a good start.

The details#

Read more…

Reacting to screensaver starting/stopping

Posted in

The problem#

I want my computer to act differently when I'm actively using it as opposed to away from. I almost always lock the screen when I step away from my computer, so I want to have the same signal do more than just start the screensaver.

The solution#

Save the follow script which is slightly modified from the example in the man page for xscreensaver-command as watch-xscreensaver.pl:

#!/usr/bin/perl

my $blanked = 0;
open (IN, "xscreensaver-command -watch |");
while (<IN>) {
    print;
    if (m/^(BLANK|LOCK)/) {
        if (!$blanked) {
            system "on-xscreensaver-lock";
            $blanked = 1;
        }
    } elsif (m/^UNBLANK/) {
        system "on-xscreensaver-unlock";
        $blanked = 0;
    }
}
if ($blanked) {
    system "on-xscreensaver-unlock";
}

Either call it from your ~/.xsessionrc file or just manually run from a terminal in your X session. I run it from a screen session so I can reattach to it and see the output:

screen -d -m -S xscreensaver-watch watch-xscreensaver.pl

My on-xscreensaver-lock and on-xscreensaver-unlock scripts are below and may be a good starting place, but yours will probably be different depending on your needs.

The details#

Read more…

Reacting to active window

Posted in

The problem#

Which window I have focused is a signal to the computer for the state I want it to be in. For instance, I normally leave my speaker muted so, for example, I don't accidentally play sound from a website with unexpected videos. But this means that when I do want sound, I need to manually unmute the sound, even though I've already told the computer that I want to watch Netflix, which always involves turning on the sound.

Of course, for the particular problem of unmuting the sound, adding a keyboard shortcut and rereading xkcd 1205: Is It Worth the Time? probably would have been a more appropriate solution. But I wanted a general solution to the problem.

The solution#

Download x11_watch_active_window.py. Then the following script will unmute the speakers if Netflix is focused:

#!/bin/sh
x11_watch_active_window.py | while read -r FocusApp
do
    if [ "Netflix - Google Chrome" = "$FocusApp" ]
    then
        echo Netflix is focused, unmuting.
        pactl set-sink-mute 0 0
    fi
done

The details#

Read more…

1 comment

Limit processor usage of multiple processes

Posted in

The problem#

In last week's post, I discussed using cpulimit on multiple processes in the special case of web browsers, but I wanted a more general solution.

The solution#

cpulimit-all.sh is a wrapper around cpulimit which will call cpulimit many times to cover multiple processes of the same name and subprocesses.

Using that script, the follow is the equivalent of the script from last week to limit all browser processes to 10% CPU:

cpulimit-all.sh --limit=10 --max-depth=1 \
    -e firefox -e firefox-esr -e chromium -e chrome

But also, we can add a couple options to include any grandchild processes and check for new processes to limit every minute:

cpulimit-all.sh --limit=10 --max-depth=2 \
    -e firefox -e firefox-esr -e chromium -e chrome \
    --watch-interval=1m

The details#

Read more…

Limit web browser processor usage

Posted in

The problem#

cpulimit is a useful utility for stopping a program from wasting CPU, but it only limits a single process. As all modern web browsers use process isolation, limiting just a single process doesn't do very much, we actually want to limit all of the browser processes.

The solution#

The following script will limit the CPU usage of all browser processes to $LIMIT percent CPU. Note that the limit is per process not total over all processes, so you may want to set it quite low to actually have an effect.

LIMIT=10 # Hard-code a limit of 10% CPU as an example.

# Kill child processes (stop limiting CPU) on script exit.
for sig in INT QUIT HUP TERM; do
  trap "
    pkill -P $$
    trap - $sig EXIT
    kill -s $sig "'"$$"' "$sig"
done
trap cleanup EXIT

# Find and limit all child processes of all browsers.
for name in firefox firefox-esr chromium chrome
do
    for ppid in $(pgrep "$name")
    do
        cpulimit --pid="$ppid" --limit="$LIMIT" &
        for pid in "$ppid" $(pgrep --parent "$ppid")
        do
            cpulimit --pid="$pid" --limit="$LIMIT" &
        done
    done
done

The details#

Read more…

Kill child jobs on script exit

Posted in

The problem#

When writing a shell script that starts background jobs, sometimes running those jobs past the lifetime of the script doesn't make sense. (Of course, sometimes background jobs really should keeping going after the script completes, but that's not the case this post is concerned with.) In the case that either the background jobs are used to do some background computation relevant to the script or the script can conceptually be thought of as a collection of processes, it makes sense for killing the script to also kill any background jobs it started.

The solution#

At the start of the script, add

cleanup() {
    # kill all processes whose parent is this process
    pkill -P $$
}

for sig in INT QUIT HUP TERM; do
  trap "
    cleanup
    trap - $sig EXIT
    kill -s $sig "'"$$"' "$sig"
done
trap cleanup EXIT

If you really want to kill only jobs and not all child processes, use the kill_child_jobs() function from all.sh or look at the other versions in the kill-child-jobs repository.

The details#

Read more…

Pelican publish without downtime

Posted in

The problem#

My existing script for publishing my blog has Pelican run on the web server and generate the static site directly into the directory served by nginx. This has the effect that while the blog is being published, it is inaccessible or some of the pages or styles are missing. The publish takes well under a minute, so this isn't a big issue, but there's no reason for any downtime at all.

The solution#

Instead of serving the output/ directory, instead generate it and then copy it over by changing the make publish line in schedule_publish.sh to the following:

make publish || exit 1
if [ -L output_dir ]
then
    cp -r output output_dir/
    rm -rf output_dir/html.old
    mv output_dir/html output_dir/html.old
    mv output_dir/output output_dir/html
fi

where output_dir/ is a symbolic link to the parent of the directory actually being served and html/ is the directory actually being served (which output/ previously was a symbolic link to).

The details#

Read more…

Timezones and scheduling tasks with at

The problem#

My system for automatically posting future-dated blog posts mysteriously stopped working recently. The posts would appear if I manually published the blog, but not with the automatic scheduling mechanism.

The solution#

In schedule_publish.sh, I changed the line

echo "$0" | at -q g $time

to

if [ "$(date -d "$time PST" +'%s')" -ge "$now" ]
then
    echo "$0" | at -q g -t "$(date +'%Y%m%d%H%M' -d "$time PST")"
fi

(where "PST" is the timezone of this blog; adjust as appropriate for your blog). $now is initialized with

now="$(date +'%s')"

before the call to make publish to avoid a race condition.

The details#

Read more…