A Weird Imagination

Which command will be run?

Posted in

The problem#

While trying to develop a modification to the Pelican source, I was unexpectedly having my installed version of Pelican get run instead of the local version:

$ which pelican
/usr/local/bin/pelican
$ command -v pelican
/usr/bin/pelican

For some reason, which was pointing to the executable I was expecting bash to run, but the Bash builtin command was telling me that bash was running the installed version instead.

The solution#

Use the hash builtin to clear bash's cache of the location of pelican:

$ hash -d pelican
$ command -v pelican
/usr/local/bin/pelican

Read more…

Some things not allowed are compulsory

Physics#

The totalitarian principle is a concept in physics that states

Everything not forbidden is compulsory.

In order words, any observable outcome not forbidden by a physical law will occur. The many-worlds interpretation of quantum mechanics suggests an even stronger statement that for every such outcome, there is some alternate world in which it does occur.

Programming language design#

While

Everything not forbidden is compulsory.

may sound absurd, its contrapositive

Anything not mandatory is forbidden.

sounds more like a rule you would expect in a computer system, although it may not always be desirable. While it does not fully apply, programming languages and command lines are infamously picky about their inputs:

If you forget a comma in your English essay, your teacher doesn’t hand it back and say, I can’t understand any of this.

In contrast, HTML parsers have historically been quite flexible in what they accept. There's reason to believe this lack of strictness was a strength: less technical users could create their web pages using incorrect HTML that would still work. Somewhat related, browser vendors were also able to add their own extensions and explore what could be added to HTML. On the other hand, the result was large amounts of invalid HTML to the point that HTML5 had to add explicit rules for parsing invalid HTML.

Law and security#

Alternatively, the weaker statement,

Everything which is not forbidden is allowed.

enshrines the principle of law that citizens are free to do whatever they will except when explicitly forbidden by a law.

On the other hand, the same principle applied to computer security is called enumerating badness and is widely held to be a bad idea: put simply, while you as the user of your computer want the freedom to do whatever you want on it, you probably don't want arbitrary code which may have been written by malicious actors to have those same freedoms and it's unfeasible to list all the bad programs as much as you may try.

Instead, many modern systems support the reverse,

Everything which is not allowed is forbidden.

or enumerating goodness, in the form of software repositories and app stores. Although some of these implementations unfortunately go against user freedom.

.NET#

While it seems like we have exhausted the variants of this phrasing, as the title of this post suggests, some software systems follow yet another one:

Some things not allowed are compulsory.

Yes, you read that right.

Using the Windows Azure .NET SDK v2.3, in a web role, the web.config file contained the following automatically generated XML:

  <system.diagnostics>
    <trace>
      <listeners>
        <add type="Microsoft.WindowsAzure.Diagnostics.DiagnosticMonitorTraceListener, Microsoft.WindowsAzure.Diagnostics, Version=2.3.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" name="AzureDiagnostics">
          <filter type="" />
        </add>
      </listeners>
    </trace>
  </system.diagnostics>

I was cleaning up warnings in this project hoping narrow down an unrelated issue and saw this warning on the <filter type="" /> tag:

The 'type' attribute is not allowed.

Naturally, I removed the type="" attribute which didn't seem to be doing anything since the warning said it wasn't even allowed. To my surprise, when I ran the code, it failed to run due to the initialization code throwing an exception with the following message:

Required attribute 'type' not found.

Hence, the type attribute is apparently both not allowed and compulsory.

As the message telling me it is not allowed was merely a warning and not an exception, I put it back and decided not to worry too much about it. Removing the <filter /> tag entirely also seemed to work and eliminate the warning as well.

Speeding up Pelican's regenerate

Posted in

Pelican's default Makefile includes an option make regenerate which uses Pelican's -r/--autoreload option to regenerate the site whenever a file is modified. Combined with the Firefox extension Auto Reload, this makes it easy to keep an eye on how a blog post will be rendered as you author it and to quickly preview theme changes.

The problem#

With just thirty articles, Pelican already takes several seconds to regenerate the site. For publishing a site, this is plenty fast, but for tweaking formatting in a blog post or theme, this is too slow.

The quick solution#

Pelican has an option, --write-selected, which makes it only write out the files listed. Writing just one file takes about half a second on my computer, even though it still has to do some processing for all of the files in order to determine what to write. To use --write-selected, you have to determine the output filename of the article you are editing:

$ pelican -r content -o output -s pelicanconf.py \
    --relative-urls \
    --write-selected output/draft/in-progress-article.html

The right solution#

Optimally, we wouldn't have to tell Pelican which file to output; instead, it would figure out which files could be affected by a change and regenerate only those files.

Read more…

Monitor all the things

CPU and memory#

On Linux, the basic way to monitor load is to use top. The only thing top really has going for it is that it is almost certainly available on any system you will ever use. Luckily, there's a better way: htop. htop supports colors and mouse clicks and lists the available key commands at the bottom of the terminal. It also can be customized to your liking. You can start by putting my htoprc in your ~/.config/htop/ directory:

$ mkdir -p ~/.config/htop/
$ cd ~/.config/htop/
$ wget https://gist.githubusercontent.com/dperelman/1e051f5705685cb41f31/raw/3ab9cf17b166120a805d5f76a71ce82452f553b4/htoprc

Or just explore the options yourself.

Hit F1 (or click Help in the bottom-left) to get an explanation of the colors used in the CPU and memory bars and a guide to keystrokes not listed at the bottom.

In my usage, I find insufficient memory is more often the problem than CPU, so I usually leave htop sorted by the MEM% column.

Other resources#

While CPU and memory are the easiest to monitor resources, they are not the only ones. Linux offers a wide variety of system monitors, depending on what resource you want to monitor and what format you want to view it in. This post focuses on real-time viewing with human-friendly displays but most of these have options or variants that support logging historical data in a more machine-friendly format as well.

Read more…

Mysterious Twitter scraping bug

Posted in

The bug#

A couple days ago my Twitter screen scraper stopped working in Liferea. I hadn't changed anything, and the script's output at the command-line still looked okay to my inspection, but Liferea started giving the message

Read more…

Download all items in a podcast

Posted in

Podcasts are a simple extension to RSS: in addition to text and a link, posts can include a file to be downloaded. In the case of podcasts, this is an audio file. Due to this, while many specialized podcast applications exist, any news aggregator will work, although it might not have the best interface for that use case.

My normal workflow for podcasts is to keep track of them in a news aggregator and explicitly download the files to a local folder. While this isn't overly onerous for a weekly podcast, it is a repetitive task that could be automated. More importantly, downloading an entire backlog of dozens of episodes that way would take a while.

Read more…

Generate and download file in TypeScript

The goal#

Generate a file and offer it for download using only client-side JavaScript that is valid TypeScript.

The solution#

var fileContents = "Hello world!";
var filename = "hello.txt";
var filetype = "text/plain";

var a = document.createElement("a");
dataURI = "data:" + filetype +
    ";base64," + btoa(fileContents);
a.href = dataURI;
a['download'] = filename;
var e = document.createEvent("MouseEvents");
// Use of deprecated function to satisfy TypeScript.
e.initMouseEvent("click", true, false,
    document.defaultView, 0, 0, 0, 0, 0,
    false, false, false, false, 0, null);
a.dispatchEvent(e);
a.removeNode();

This code offers a download of a file named hello.txt with the Internet media type text/plain containing the string Hello world!.

Warning: This uses the deprecated initMouseEvent() as a workaround to this TypeScript bug. While presently functional in Chrome and Firefox, this code may stop working in future versions of those browsers.

Read more…

Reverse sequence for tr

The problem#

If you take the word wizard, reverse the order of the letters and reverse the alphabet:

From: abcdefghijklmnopqrstuvwxyz
To:   ZYXWVUTSRQPONMLKJIHGFEDCBA

then you get the word wizard back, an observation made at least as early as 1972.

Now let's write a shell script to verify this so we can find other words with similar interesting properties. The obvious shell script to verify this

echo wizard | tr a-z z-a | rev

unfortunately fails with the error

tr: range-endpoints of 'z-a' are in reverse collating sequence order

The error is by design: it's not clear what a sequence in reverse order should mean, so POSIX actually requires that it not work.

Read more…

Better hash-based colors

The problem#

Yesterday, I proposed using a hash function to choose colors:

ps1_color="32;38;5;$((0x$(hostname | md5sum | cut -f1 -d' ' | tr -d '\n' | tail -c2)))"

I did not discuss two issues with this approach:

  1. Colors may be too similar to other colors, such that they are not useful.
  2. Some colors may be undesirable altogether, particularly very dark ones may be too similar to a black terminal background color.

Read more…

Hash-based hostname colors

Random color selection#

In my post about hostname-based prompt colors, I suggested a fallback color scheme that was obviously wrong in order to remind you to set a color for that host:

alice@unknown:~$ 

This carried with it an implicit assumption: you care what color each host is assigned. You may instead be happy to assign a random color to each host. We could use shuf to generate a random color:

ps1_color="32;38;5;$(shuf -i 0-255 -n 1)"

The problem with this solution is the goal of the recoloring the prompt was not simply to make it more colorful, but for that color to have meaning. We want the color to always be the same for each login to a given host.

One way to accomplish this would be to use that code to randomly generate colors, but save the results in a table like the one used before for manually-chosen colors. But it turns out we can do better.

Hash-based color selection#

Hash functions have a useful property called determinism, which means that hashing the same value will always get the same result. The consequence is that we can use a hash function like it's a lookup table of random numbers shared among all of our computers:

ps1_color="32;38;5;$(($(hostname | sum | cut -f1 -d' ' | sed s/^0*//) % 256))"

The $((...)) syntax is bash's replacement for expr which is less portable but easier to use. Here we use it to make sure the hash value we compute is a number between 0 and 255. [sum][sum] computes a hash of its input, in this case the result of hostname. Its output is not just a number so cut selects out the number and sed gets rid of any leading zeros so it isn't misinterpreted as octal.

The idea of using sum was suggested by a friend after reading my previous post on the topic.

But this turns out to not work great for hosts with similar names like rob.example.com and orb.example.com:

alice@rob:~$ 
alice@orb:~$ 

Similar colors on hosts with very different names would not be so bad, but because of how sum works, it will tend to give similar results on similar strings (although less often than I expected; it took some effort to find such an example).

Better hash functions#

While this is not a security-critical application, here cryptographic hash functions solve the problem. Cryptographic hash functions guarantee (in theory) that knowing that two inputs are similar tells you nothing about their hash values. In other words, the output of cryptographic hash functions are indistinguishable from random and, in fact, they can be used to build pseudorandom generators like Linux's /dev/urandom.

The cryptographic hash function utilities output hex instead of decimal, so they aren't quite a drop-in replacement for sum:

ps1_color="32;38;5;$((0x$(hostname | md5sum | cut -f1 -d' ' | tr -d '\n' | tail -c2)))"

Here we use cut and tr to select just the hex string of the hash. tail's -c option specifies the number of bytes to read from the end, where 2 bytes corresponds to 2 hex digits, which can have a value of 0 to 255, so the modulo operation is not needed. Instead the 0x prefix inside $((...)) interprets the string as a hex number and outputs it as a decimal number.

This code uses the md5sum utility to compute an MD5 hash of the hostname. This is recommended because md5sum is likely to be available on all hosts. Do be aware that MD5 is insecure and it is only okay to use here because coloring the prompt is not a security-critical application.

sha1sum and sha256sum are also likely available on modern systems and work as drop-in replacements for md5sum in the above command should you wish to use a different hash. Additionally, you could also get different values out of the hash by adding a salt:

salt="Some string."
ps1_color="32;38;5;$((0x$( (echo "$salt"; hostname) | sha256sum | cut -f1 -d' ' | tr -d '\n' | tail -c2)))"