A Weird Imagination

Generating specialized word lists

Posted in

The problem#

I've been playing Codenames online a lot lately (using my fork of codenames.plus), and a friend suggested it might be fun to have themed word lists. Specifically, they suggested Star Trek as a theme as it's a fandom that's fairly widely known. They left it up to me to figure out what should be in a Star Trek themed word list.

The solution#

If you just want to play Codenames with the list, go to my Codenames web app and select one or both of the Star Trek card packs. If you just want the word lists, you can download the Star Trek: The Next Generation words and the Star Trek: Deep Space 9 words.

To generate a word list yourself (I used this source for the Star Trek scripts), you will need a common words list like en_50k.txt which I mentioned in my previous post on anagram games, and then pipe the corpus through the following script (which you will likely have to modify for the idiosyncrasies of your data):

#!/bin/bash
set -euo pipefail

NUM_COMMON=2000 # Filter out the most common 2000 words
COMMON_WORDS="$(mktemp)"
<en_50k.txt head "-$NUM_COMMON" | cut -d' ' -f1 |\
    sort | tr '[:lower:]' '[:upper:]' >"$COMMON_WORDS"

# Select only dialogue lines (in Star Trek scripts)
grep -aP '^\t\t\t[^\t]' |\
    # Split words
    tr ' .,:()\[\]!?;"/\t[:cntrl:]' '[\n*]' |\
    sed 's/--/\n/' |\
    # Strip whitespace
    sed 's/^\s\+//' | sed 's/\s\+$//' |\
    grep -av '^\s*$' |\
    # Strip quotes
    sed "s/^'//" | sed "s/'$//" |\
    # Filter out numbers
    grep -av '^[[:digit:]]*$' |\
    tr '[:lower:]' '[:upper:]' |\
    # Fix for contractions not being in wordlist
    sed "s/'\(S\|RE\|VE\|LL\|M\|D\)$//" |\
    grep -av "'T$" |\
    # Remove some more non-words
    grep -avF '-' |\
    grep -avF '&' |\
    # Count
    sort | uniq -c |\
    # Only keep words with >25 occurrences
    awk '{ if ($1 > 25) { print } }' |\
    # Remove common words
    join -v2 -22 -o 2.1,2.2 "$COMMON_WORDS" - |\
    # Sort most common words first
    sort -rn

rm "$COMMON_WORDS"

The output of the script will require some manual effort to decide which words really belong in the final list, but it's a good start.

The details#

Read more…

Shadowrun's text compression

The problem#

Several years ago, I was in a ROM hacking IRC room where another regular Alchemic was reverse engineering the text system of the SNES game Shadowrun. He figured it out and wrote a python script to decompress the text but had some questions about why it was designed the way it was. So we're going to walk through figuring out how the code works, with some help from his notes, and try to understand the design.

If you don't want spoilers and would rather try to reverse engineer it yourself, just read up to the end of the Trace format section and see how much you can figure out on your own.

Read more…

Logging online status

Posted in

The problem#

I used to have an occasionally unreliable internet connection. I wanted logs of exactly how unreliable it was and an easy way to have notice when it was back up.

The solution#

Use cron to check online status once a minute and write the result to a file. An easy way to check is to confirm that google.com will reply to a ping (this does give a false negative in the unlikely event that Google is down).

To run a script every minute, put a file in /etc/cron.d containing the line

* * * * * root /root/bin/online-check

where /root/bin/online-check is the following script:

#!/bin/sh

# Check if computer is online by attempting to ping google.com.
PING_RESULT="`ping -c 2 google.com 2>/dev/null`"
if [ $? -eq 0 ] && ! echo "$PING_RESULT" | grep -F '64 bytes from 192.168.' >/dev/null 2>/dev/null
then
    ONLINE="online"
else
    ONLINE="offline"
fi
echo "`date '+%Y-%m-%d %T%z'` $ONLINE" >> /var/log/online.log

The details and pretty printing#

Read more…

Child process not in ps?

Posted in

A buggy program#

Consider the following (contrived) program1 which starts a background process to create a file and then waits while the background process is still running before checking to see if the file exists:

#!/bin/sh

# Make sure file doesn't exist.
rm -f file

# Create file in a background process.
touch file &
# While there is a touch process running...
while ps -C "touch" > /dev/null
do
    # ... wait one second for it to complete.
    sleep 1
done
# Check if file was created.
if [ -f file ]
then
    echo "Of course it worked."
else
    echo "Huh? File wasn't created."
    # Wait for background tasks to complete.
    wait
    if [ -f file ]
    then
        echo "Now it's there!"
    else
        echo "File never created."
    fi
fi

# Clean up.
rm -f file

Naturally, it will always output "Of course it worked.", right? Run it in a terminal yourself to confirm this. But I claimed this program is buggy; there's more going on.

Read more…

Out of inodes, what now?

Posted in

When you start getting disk full messages on Linux, there's a few different reasons why that might happen:

  1. The expected. Too many large files. You can track down large directories using WinDirStat or

    du -hx --max-depth=1 | sort -h
    where the -x option tells du to not cross filesystem boundaries and the -h option to both uses human-readable sizes like 11M or 1G.

  2. Deleted files aren't actually deleted if they are still open. You can use lsof to find open files. Give it the filesystem as an argument like lsof /home.

  3. By default 5% of each filesystem is reserved for writes by root. Depending on what the filesystem is being used for, this may be too much or simply unnecessary. See this Server Fault answer for how to deal with this.

  4. The files could be shadowed by a mount. If a filesystem is mounted over a non-empty directory, the files in that directory aren't visible.

  5. Last, the disk might not actually be out of space at all. It might actually be out of inodes. Some filesystems, notably the ext2/3/4 filesystems used by default on most Linux distributions have a fixed number of inodes allocated at filesystem creation time. The default is high enough that it is unlikely to be an issue unless there are a very large number of empty files. df -i will show the number of inodes free on each filesystem to verify if a filesystem is indeed out of inodes.

    But how do you find those empty files? As described above, du will help find large files, but now we want to find large numbers of files. The following command acts like du -hx --max-depth=$depth | sort -h for inodes instead of file sizes:

    find -xdev | sed "s@\(\([^/]*/\)\{$depth\}[^/]*\).*@\1@" | uniq -c | sort -n
    

    find -xdev lists all of the files under the current directory on the same filesystem. The sed command finds the first $depth directories (ending in /) and discards the rest of the filename (the .* at the end), so each directory appears once for every file or directory anywhere under it. Then the end of the command counts the repeated lines and sorts by those counts, highlighting the directories with the most files.